The ML Engineer 09-12-2025
ML practices, fairness debates, future AI architectures, perception science
🔧 Company Engineering Blogs
Getting from tested to battle-tested (blog.janestreet.com). Jane Street details Aria testing, Antithesis end-to-end chaos testing, and lessons from battle-testing distributed systems using OCaml-centric tooling
The Hidden Cost of Convenience: Rethinking Old ORM Patterns for Scale (eng.wealthfront.com). Wealthfront rewrites an old ORM-heavy balance system using modern architecture to cut runtime and scale efficiently
How Agentforce Achieved 3–5x Faster Response Times While Solving Enterprise-Scale Architectural Complexity (engineering.salesforce.com). How Salesforce refactors deterministic and LLM tasks in Apex, reduces latency 75%, and deploys multi-brand Agentforce agents for tailored brand voice
We Got Claude to Fine-Tune an Open Source LLM (huggingface.co). Demonstrates fine-tuning open-source LLMs with Hugging Face Skills to train Claude-like agents on Qwen3-0.6B using SFT, DPO, GRPO
Mob Programming: Smells Like Team Spirit (tech.trivago.com). Mob programming at trivago Intelligence: structured collaboration, quick onboarding, and improved PR flow using diverse team roles
📚 Applied ML Practice
Improving Water Quality Predictions with Machine Learning (eesa.lbl.gov). Ensemble machine learning improves river water temperature predictions in data-sparse regions using multiple model types and high-performance computing
Yes, You Need To Work Through Concrete Examples (justinmath.com). Concrete calculations and bottom-up intuition build mastery in ML; avoid cargo-cult abstractions and push beyond gradient descent
How Thousands of Citizen Readers Helped Build the Largest Open-Vocabulary Dataset of Narrative Emotions (txtlab.org). Open-vocabulary narrative emotion dataset built by 3,738 citizen readers on Zooniverse for 200k annotations across 43k passages, modeled with VAD and NRC emotions
Building a Bayesian Spam Classifier from First Principles (journal.hexmos.com). Bayesian spam filtering with Naive Bayes, Enron data, Python code, and CPT/Laplace smoothing explained
⚖️ Fairness & Causality
Willful Incompetence: Questionable Modeling Practices in Implicit Bias Research (replicationindex.com). Schimmack critiques IAT practices, argues shared method variance inflates validity, promotes multimethod models and careful interpretation
Talk at the JFLI, at the NII (国立情報学研究所) in Tokyo (東京) (freakonometrics.hypotheses.org). Talk on counterfactual and transport-based methods for understanding indirect discrimination in algorithmic systems using causal reasoning and optimal transport
Is the Implicit Association Test Too Big To Fail? (replicationindex.com). IAT validity challenged; latent models, method variance, and adversarial collaboration analyzed using psychometric critiques
Actuarial Pricing Discrimination and Fairness (freakonometrics.hypotheses.org). Actuarial pricing, fairness concepts, proxies, and regulatory tensions in discrimination-aware insurance modeling
🤖 Agent Architectures
Titans + MIRAS: Helping AI have long-term memory (research.google). Titans and MIRAS enable long-term memory in AI with on-the-fly learning, surprise metrics, and deep memory modules
Glia: A Human-Inspired AI for Systems Design and Optimization (sigops.org). Glia uses a human-inspired multi-agent AI to autonomously design and optimize AI infrastructure, including LLM serving routers, batch schedulers, and autoscalers
Scaling Agentic Inference Across Heterogeneous Compute with Zain Asgar - #757 (twimlai.com). Gimlet Labs’ Zain Asgar discusses heterogeneous AI inference across diverse hardware including H100s, CPUs, and older GPUs, with a three-layer architecture
The Inverted Agent (jlowin.dev). SEP-1577 enables MCP sampling with tools, flipping agent architecture to server-driven control using FastMCP and structured outputs
🎯 Perception & Sensing
Traffic Modeling Using Machine Learning (calendar.perfplanet.com). Predicts lab-to-field LCP relationships using Python, pandas, XGBoost with synthetic CrUX data and engineered features
Measuring What Actually Matters in Real-Time ADAS Perception (blog.us.fixstars.com). Memory bandwidth and latency dominate real-time ADAS perception; highlights include edge compute constraints, dataflow optimization, and federated learning approaches
Grounding DINO: Open Vocabulary Object Detection on Videos (pyimagesearch.com). Open vocabulary object detection on videos using Grounding DINO with Hugging Face and Gradio
Augmented reality meets neuroendoscopy (cs.jhu.edu). Hopkins researchers enable real-time 3D neuroendoscopy navigation with R2D2-E, AI-based feature tracking, and augmented visualization
From Waveforms to Wisdom: The New Benchmark for Auditory Intelligence (research.google). Google Research introduces Massive Sound Embedding Benchmark (MSEB) to unify eight sound tasks, datasets like SVQ, and a robust evaluation framework
📡 Observability & Benchmarks
Five Things to Check Before You Trust AI-Generated Machine Learning Code (statisticalhorizons.com). Practical five-check framework for evaluating AI-generated ML code across parameters, tuning, splits, metrics, and interpretation
How I Simplified LLM Telemetry Using Dual-Destination Observability Without Performance Degradation (blog.mphomphego.co.za). Python-based dual-destination telemetry bridge using OpenTelemetry and Traceloop SDK for Instana and Langfuse with circuit breakers
How to Benchmark C++ Code? (codspeed.io). Guided Google Benchmark-based C++ performance benchmarking with fixtures, parameters, and CI integration tips
A Guide to Web Application Monitoring (blog.saeloun.com). A practical guide to monitoring Rails API, React frontend, and PostgreSQL using metrics, logs, traces, and RUM
🛠️ Data Science Tools
Rogue Scholar is improving subject classification (Version 2) (blog.front-matter.de). OpenAlex/CWTS subject classification of Rogue Scholar posts using OpenAlex subfields and a machine learning classifier
Haskell IS a Great Language for Data Science (jcarroll.com.au). Haskell, dataHaskell, dataframe, and knitr integration showcase strong typing for data science workflows
QGIS to (Geo)Pandas – part 3 (anitagraser.com). QGIS to GeoPandas uses QgsArrowIterator to stream features as Arrow batches with Python GeoPandas integration
🗃️ Data Systems & Vectors
Product Quantization (arpitbhayani.me). Explains Product Quantization for compressing high-dimensional vectors, subspace coding, PQ codebooks, and distance computations with Python snippets
A complete guide to vector search (redis.io). Vector search explained with encoding models, KNN to ANN, hybrid filtering, and hybrid search in a unified data platform context
Adaptive Query Optimizer for MariaDB Vector – Innovation Winner of MariaDB Python Hackathon 2025 (mariadb.org). Innovation winner: adaptive query optimizer for MariaDB Vector using Python in Hackathon 2025 with Aakanksha Singh and Mihir Phalke
Polars in Aggregate: Polars Cloud, Streaming engine, and New Data Types (pola.rs). Polars Cloud, streaming engine, and new Decimal and Int128 types empower scalable dataframes in Python and Rust
Apache Flink 2.2.0: Advancing Real-Time Data + AI and Empowering Stream Processing for the AI Era (flink.apache.org). Flink 2.2.0 advances real-time data processing with AI features, materialized tables, Delta Joins, improved connectors, and PyFlink support, enabling LLM inference and vector search
Branch, Test, Deploy: A Git-Inspired Approach for Data (ssp.sh). Git-like workflows for data: branching, zero-copy cloning, Prolly Trees, LakeFS, and Nessie to enable testing and deploying data pipelines
🖥️ Serving & Queues
Optimizing PyTorch Model Inference on CPU (towardsdatascience.com). CPU inference optimization on Intel Xeon with PyTorch 2.x, AMP, channels-last, IPEX, OpenVINO, and ONNX in a toy ResNet50 workflow
How to run Ollama with docker compose and GPU support (sleeplessbeastie.eu). GPU-enabled Ollama setup using Docker Compose for accelerated model inference with Nvidia devices
Trying out the Absurd queue for AI Workloads (leblancfg.com). Explores Absurd, a Postgres-based durable queue for AI workloads, with Python/TypeScript SDKs and a FastAPI demo by François Leblanc
🚀 GPU Training Pipelines
Accelerating Autonomous Driving Model Training on AMD ROCm™ Software (rocm.blogs.amd.com). ROCm-accelerated autonomous driving model training with awesome-rocm-autodrive, MMCV optimizations, NHWC, bmm reshaping, and MIOpen tuning on AMD GPUs
Decoding high-bandwidth memory: A practical guide to GPU memory for fine-tuning AI models (cloud.google.com). Guides memory‑efficient fine‑tuning on GPUs with PEFT, LoRA, quantization, FlashAttention, and multi‑GPU strategies
DGL in Depth: SE(3)-Transformer on ROCm 7 (rocm.blogs.amd.com). SE(3)-Transformer runs efficiently with DGL on AMD ROCm 7/MI300X, exploring 3D graphs, equivariant attention, and cross‑platform benchmarks
Introducing checkpointless and elastic training on Amazon SageMaker HyperPod (aws.amazon.com). SageMaker HyperPod introduces checkpointless and elastic training to accelerate AI model training
New serverless customization in Amazon SageMaker AI accelerates model fine-tuning (aws.amazon.com). Serverless customization in SageMaker AI enables fine-tuning of models like Llama, Qwen, and GPT-OSS with UI or code, automating deployment
🧵 Distributed Training Internals
Machine Learning (danieldk.eu). Explores Dish Activation, attention mechanisms, logits, quantization, and multi-GPU model parallelism with Tensor Parallelism on machine learning models
Support FSDP2 as A Training Backend for Miles (lmsys.org). Miles adds FSDP2 as a flexible training backend, enabling DTensor-based sharding, true on-policy training, data packing, and CP/DP optimization for Qwen3-Next and VLM RL
Writing an LLM from scratch, part 28 -- training a base model from scratch on an RTX 3090 (gilesthomas.com). Using Hugging Face datasets, GPT-2 style base model training on FineWeb 10B samples with RTX 3090; tokenization, token counts, and data scaling
Learning to love mesh-oriented sharding (blog.ezyang.com). DTensor and mesh-oriented sharding clash: open vs closed designs, extensibility with Placement, and implications for PyTorch and JAX in distributed ML
🧠 LLM Representations
Cross Layer Transcoders for the Qwen3 LLM Family (lesswrong.com). Explores sparse autoencoders and cross layer transcoders (CLTs) for Qwen3 LLMs with BLUELightAI’s CLT features and TDA tools
Concept Subspaces for Targeted Model Editing (gojiberries.io). Counterfactual concept subspaces for targeted model editing using activation directions, PCA, orthogonalization, and subspace-constrained adapters
Picking Optimal Token IDs (notes.hella.cheap). How to arrange token IDs with PCA-like ordering to maximize zero runs in sparse bit vectors
Interactively Visualizing the Qwen3 MoE Architecture (vkethana.com). Interactive Qwen3 MoE architecture visualizations explore grouped query attention, RMS normalization, and RoPE rotations
DeepSeek V3.2 (aarnphm.xyz). DeepSeek V3.2 introduces Sparse Attention (DSA) and FP8 indexing for efficient memory and FLOPs, detailing MHA/MQA training and inference, compressed caches, and Hadamard transforms
📐 Statistical Diagnostics
New Preprint: Model Checking for Vector Autoregressive Models (jmbh.github.io). Tutorial on VAR model checking with diagnostics, plots, simulations, and R-code for multilevel VAR in psychological time series
Notes - NLA MT25, Marchenko-Pastur theorem (ollybritton.com). Notes on Marchenko-Pastur theorem for random matrices X, singular values, distribution, and conditioning estimates
Data Science Notes: 1. Bland-Altman plots (hoyleanalytics.org). Rotating data to create Bland-Altman plots reveals reproducibility and bias patterns using Python (statsmodels) in a data science context
Notes - NLA MT25, Gaussian random matrices (ollybritton.com). Gaussian random matrices, orthogonal invariance, and basics of G ~ N(0,1) entries for m x n matrices
📚 Academic Research
PystachIO: Efficient Distributed GPU Query Processing with PyTorch over Fast Networks & Fast Storage (arxiv:cs). Introduces PystachIO, a PyTorch-based distributed OLAP engine optimizing GPU, network, and NVMe utilization, offering up to 3x faster analytical queries on modern GPU clusters
Interaction Tensor Shap (arxiv:cs). IT-SHAP reformulates high-order Shapley interaction indices as tensor-network contractions, enabling polynomial-time computation of exact interaction attributions, scaling explainability to deep, high-dimensional models
Robust Tabular Foundation Models (arxiv:cs). Proposes Robust Tabular Foundation Models, adversarially adapting synthetic-data generators using an optimality-gap objective, improving TabPFN performance and robustness on diverse tabular benchmarks with limited pretraining
Hyperparameter Transfer Enables Consistent Gains of Matrix-Preconditioned Optimizers Across Scales (arxiv:cs). Analyzes how learning rate and weight decay should scale for matrix-preconditioned optimizers like Shampoo and Muon, enabling consistent speedups over AdamW across language-model sizes
Gradient Descent with Provably Tuned Learning-rate Schedules (arxiv:cs). Develops theory and algorithms for tuning learning-rate schedules and momentum in gradient descent on nonconvex, nonsmooth objectives, giving complexity guarantees applicable to neural network training
👋 Before you go...
I've got a big favor to ask - keeping Blaze running isn't expensive, but it does all add up, so I'm asking readers like you to help, if you can, by joining the Patreon page. Nothing flashy, just a way for folks who find value in these newsletters to chip in a little each month.
If you are getting value from blaze, checking this out would mean the absolute world. But if you can't contribute, no worries - the newsletters keep coming either way. Thanks for reading and being part of this nerdy corner of the internet. All the best for the coming week - Alastair.
Add a comment: