The ML Engineer 09-12-2025

        December 9, 2025

The ML Engineer 09-12-2025
ML practices, fairness debates, future AI architectures, perception science

            🔧 Company Engineering Blogs
Getting from tested to battle-tested (blog.janestreet.com). Jane Street details Aria testing, Antithesis end-to-end chaos testing, and lessons from battle-testing distributed systems using OCaml-centric tooling

The Hidden Cost of Convenience: Rethinking Old ORM Patterns for Scale (eng.wealthfront.com). Wealthfront rewrites an old ORM-heavy balance system using modern architecture to cut runtime and scale efficiently

How Agentforce Achieved 3–5x Faster Response Times While Solving Enterprise-Scale Architectural Complexity (engineering.salesforce.com). How Salesforce refactors deterministic and LLM tasks in Apex, reduces latency 75%, and deploys multi-brand Agentforce agents for tailored brand voice

We Got Claude to Fine-Tune an Open Source LLM (huggingface.co). Demonstrates fine-tuning open-source LLMs with Hugging Face Skills to train Claude-like agents on Qwen3-0.6B using SFT, DPO, GRPO

Mob Programming: Smells Like Team Spirit (tech.trivago.com). Mob programming at trivago Intelligence: structured collaboration, quick onboarding, and improved PR flow using diverse team roles

📚 Applied ML Practice

Improving Water Quality Predictions with Machine Learning (eesa.lbl.gov). Ensemble machine learning improves river water temperature predictions in data-sparse regions using multiple model types and high-performance computing

Yes, You Need To Work Through Concrete Examples (justinmath.com). Concrete calculations and bottom-up intuition build mastery in ML; avoid cargo-cult abstractions and push beyond gradient descent

How Thousands of Citizen Readers Helped Build the Largest Open-Vocabulary Dataset of Narrative Emotions (txtlab.org). Open-vocabulary narrative emotion dataset built by 3,738 citizen readers on Zooniverse for 200k annotations across 43k passages, modeled with VAD and NRC emotions

Building a Bayesian Spam Classifier from First Principles (journal.hexmos.com). Bayesian spam filtering with Naive Bayes, Enron data, Python code, and CPT/Laplace smoothing explained

⚖️ Fairness & Causality

Willful Incompetence: Questionable Modeling Practices in Implicit Bias Research (replicationindex.com). Schimmack critiques IAT practices, argues shared method variance inflates validity, promotes multimethod models and careful interpretation

Talk at the JFLI, at the NII (国立情報学研究所) in Tokyo (東京) (freakonometrics.hypotheses.org). Talk on counterfactual and transport-based methods for understanding indirect discrimination in algorithmic systems using causal reasoning and optimal transport

Is the Implicit Association Test Too Big To Fail? (replicationindex.com). IAT validity challenged; latent models, method variance, and adversarial collaboration analyzed using psychometric critiques

Actuarial Pricing Discrimination and Fairness (freakonometrics.hypotheses.org). Actuarial pricing, fairness concepts, proxies, and regulatory tensions in discrimination-aware insurance modeling

🤖 Agent Architectures

Titans + MIRAS: Helping AI have long-term memory (research.google). Titans and MIRAS enable long-term memory in AI with on-the-fly learning, surprise metrics, and deep memory modules

Glia: A Human-Inspired AI for Systems Design and Optimization (sigops.org). Glia uses a human-inspired multi-agent AI to autonomously design and optimize AI infrastructure, including LLM serving routers, batch schedulers, and autoscalers

Scaling Agentic Inference Across Heterogeneous Compute with Zain Asgar - #757 (twimlai.com). Gimlet Labs’ Zain Asgar discusses heterogeneous AI inference across diverse hardware including H100s, CPUs, and older GPUs, with a three-layer architecture

The Inverted Agent (jlowin.dev). SEP-1577 enables MCP sampling with tools, flipping agent architecture to server-driven control using FastMCP and structured outputs

🎯 Perception & Sensing

Traffic Modeling Using Machine Learning (calendar.perfplanet.com). Predicts lab-to-field LCP relationships using Python, pandas, XGBoost with synthetic CrUX data and engineered features

Measuring What Actually Matters in Real-Time ADAS Perception (blog.us.fixstars.com). Memory bandwidth and latency dominate real-time ADAS perception; highlights include edge compute constraints, dataflow optimization, and federated learning approaches

Grounding DINO: Open Vocabulary Object Detection on Videos (pyimagesearch.com). Open vocabulary object detection on videos using Grounding DINO with Hugging Face and Gradio

Augmented reality meets neuroendoscopy (cs.jhu.edu). Hopkins researchers enable real-time 3D neuroendoscopy navigation with R2D2-E, AI-based feature tracking, and augmented visualization

From Waveforms to Wisdom: The New Benchmark for Auditory Intelligence (research.google). Google Research introduces Massive Sound Embedding Benchmark (MSEB) to unify eight sound tasks, datasets like SVQ, and a robust evaluation framework

📡 Observability & Benchmarks

Five Things to Check Before You Trust AI-Generated Machine Learning Code (statisticalhorizons.com). Practical five-check framework for evaluating AI-generated ML code across parameters, tuning, splits, metrics, and interpretation

How I Simplified LLM Telemetry Using Dual-Destination Observability Without Performance Degradation (blog.mphomphego.co.za). Python-based dual-destination telemetry bridge using OpenTelemetry and Traceloop SDK for Instana and Langfuse with circuit breakers

How to Benchmark C++ Code? (codspeed.io). Guided Google Benchmark-based C++ performance benchmarking with fixtures, parameters, and CI integration tips

A Guide to Web Application Monitoring (blog.saeloun.com). A practical guide to monitoring Rails API, React frontend, and PostgreSQL using metrics, logs, traces, and RUM

🛠️ Data Science Tools

Rogue Scholar is improving subject classification (Version 2) (blog.front-matter.de). OpenAlex/CWTS subject classification of Rogue Scholar posts using OpenAlex subfields and a machine learning classifier

Haskell IS a Great Language for Data Science (jcarroll.com.au). Haskell, dataHaskell, dataframe, and knitr integration showcase strong typing for data science workflows

QGIS to (Geo)Pandas – part 3 (anitagraser.com). QGIS to GeoPandas uses QgsArrowIterator to stream features as Arrow batches with Python GeoPandas integration

🗃️ Data Systems & Vectors

Product Quantization (arpitbhayani.me). Explains Product Quantization for compressing high-dimensional vectors, subspace coding, PQ codebooks, and distance computations with Python snippets

A complete guide to vector search (redis.io). Vector search explained with encoding models, KNN to ANN, hybrid filtering, and hybrid search in a unified data platform context

Adaptive Query Optimizer for MariaDB Vector – Innovation Winner of MariaDB Python Hackathon 2025 (mariadb.org). Innovation winner: adaptive query optimizer for MariaDB Vector using Python in Hackathon 2025 with Aakanksha Singh and Mihir Phalke

Polars in Aggregate: Polars Cloud, Streaming engine, and New Data Types (pola.rs). Polars Cloud, streaming engine, and new Decimal and Int128 types empower scalable dataframes in Python and Rust

Apache Flink 2.2.0: Advancing Real-Time Data + AI and Empowering Stream Processing for the AI Era (flink.apache.org). Flink 2.2.0 advances real-time data processing with AI features, materialized tables, Delta Joins, improved connectors, and PyFlink support, enabling LLM inference and vector search

Branch, Test, Deploy: A Git-Inspired Approach for Data (ssp.sh). Git-like workflows for data: branching, zero-copy cloning, Prolly Trees, LakeFS, and Nessie to enable testing and deploying data pipelines

🖥️ Serving & Queues

Optimizing PyTorch Model Inference on CPU (towardsdatascience.com). CPU inference optimization on Intel Xeon with PyTorch 2.x, AMP, channels-last, IPEX, OpenVINO, and ONNX in a toy ResNet50 workflow

How to run Ollama with docker compose and GPU support (sleeplessbeastie.eu). GPU-enabled Ollama setup using Docker Compose for accelerated model inference with Nvidia devices

Trying out the Absurd queue for AI Workloads (leblancfg.com). Explores Absurd, a Postgres-based durable queue for AI workloads, with Python/TypeScript SDKs and a FastAPI demo by François Leblanc

🚀 GPU Training Pipelines

Accelerating Autonomous Driving Model Training on AMD ROCm™ Software (rocm.blogs.amd.com). ROCm-accelerated autonomous driving model training with awesome-rocm-autodrive, MMCV optimizations, NHWC, bmm reshaping, and MIOpen tuning on AMD GPUs

Decoding high-bandwidth memory: A practical guide to GPU memory for fine-tuning AI models (cloud.google.com). Guides memory‑efficient fine‑tuning on GPUs with PEFT, LoRA, quantization, FlashAttention, and multi‑GPU strategies

DGL in Depth: SE(3)-Transformer on ROCm 7 (rocm.blogs.amd.com). SE(3)-Transformer runs efficiently with DGL on AMD ROCm 7/MI300X, exploring 3D graphs, equivariant attention, and cross‑platform benchmarks

Introducing checkpointless and elastic training on Amazon SageMaker HyperPod (aws.amazon.com). SageMaker HyperPod introduces checkpointless and elastic training to accelerate AI model training

New serverless customization in Amazon SageMaker AI accelerates model fine-tuning (aws.amazon.com). Serverless customization in SageMaker AI enables fine-tuning of models like Llama, Qwen, and GPT-OSS with UI or code, automating deployment

🧵 Distributed Training Internals

Machine Learning (danieldk.eu). Explores Dish Activation, attention mechanisms, logits, quantization, and multi-GPU model parallelism with Tensor Parallelism on machine learning models

Support FSDP2 as A Training Backend for Miles (lmsys.org). Miles adds FSDP2 as a flexible training backend, enabling DTensor-based sharding, true on-policy training, data packing, and CP/DP optimization for Qwen3-Next and VLM RL

Writing an LLM from scratch, part 28 -- training a base model from scratch on an RTX 3090 (gilesthomas.com). Using Hugging Face datasets, GPT-2 style base model training on FineWeb 10B samples with RTX 3090; tokenization, token counts, and data scaling

Learning to love mesh-oriented sharding (blog.ezyang.com). DTensor and mesh-oriented sharding clash: open vs closed designs, extensibility with Placement, and implications for PyTorch and JAX in distributed ML

🧠 LLM Representations

Cross Layer Transcoders for the Qwen3 LLM Family (lesswrong.com). Explores sparse autoencoders and cross layer transcoders (CLTs) for Qwen3 LLMs with BLUELightAI’s CLT features and TDA tools

Concept Subspaces for Targeted Model Editing (gojiberries.io). Counterfactual concept subspaces for targeted model editing using activation directions, PCA, orthogonalization, and subspace-constrained adapters

Picking Optimal Token IDs (notes.hella.cheap). How to arrange token IDs with PCA-like ordering to maximize zero runs in sparse bit vectors

Interactively Visualizing the Qwen3 MoE Architecture (vkethana.com). Interactive Qwen3 MoE architecture visualizations explore grouped query attention, RMS normalization, and RoPE rotations

DeepSeek V3.2 (aarnphm.xyz). DeepSeek V3.2 introduces Sparse Attention (DSA) and FP8 indexing for efficient memory and FLOPs, detailing MHA/MQA training and inference, compressed caches, and Hadamard transforms

📐 Statistical Diagnostics

New Preprint: Model Checking for Vector Autoregressive Models (jmbh.github.io). Tutorial on VAR model checking with diagnostics, plots, simulations, and R-code for multilevel VAR in psychological time series

Notes - NLA MT25, Marchenko-Pastur theorem (ollybritton.com). Notes on Marchenko-Pastur theorem for random matrices X, singular values, distribution, and conditioning estimates

Data Science Notes: 1. Bland-Altman plots (hoyleanalytics.org). Rotating data to create Bland-Altman plots reveals reproducibility and bias patterns using Python (statsmodels) in a data science context

Notes - NLA MT25, Gaussian random matrices (ollybritton.com). Gaussian random matrices, orthogonal invariance, and basics of G ~ N(0,1) entries for m x n matrices

📚 Academic Research

PystachIO: Efficient Distributed GPU Query Processing with PyTorch over Fast Networks & Fast Storage (arxiv:cs). Introduces PystachIO, a PyTorch-based distributed OLAP engine optimizing GPU, network, and NVMe utilization, offering up to 3x faster analytical queries on modern GPU clusters

Interaction Tensor Shap (arxiv:cs). IT-SHAP reformulates high-order Shapley interaction indices as tensor-network contractions, enabling polynomial-time computation of exact interaction attributions, scaling explainability to deep, high-dimensional models

Robust Tabular Foundation Models (arxiv:cs). Proposes Robust Tabular Foundation Models, adversarially adapting synthetic-data generators using an optimality-gap objective, improving TabPFN performance and robustness on diverse tabular benchmarks with limited pretraining

Hyperparameter Transfer Enables Consistent Gains of Matrix-Preconditioned Optimizers Across Scales (arxiv:cs). Analyzes how learning rate and weight decay should scale for matrix-preconditioned optimizers like Shampoo and Muon, enabling consistent speedups over AdamW across language-model sizes

Gradient Descent with Provably Tuned Learning-rate Schedules (arxiv:cs). Develops theory and algorithms for tuning learning-rate schedules and momentum in gradient descent on nonconvex, nonsmooth objectives, giving complexity guarantees applicable to neural network training

👋 Before you go...
I've got a big favor to ask - keeping Blaze running isn't expensive, but it does all add up, so I'm asking readers like you to help, if you can, by joining the Patreon page. Nothing flashy, just a way for folks who find value in these newsletters to chip in a little each month.
If you are getting value from blaze, checking this out would mean the absolute world. But if you can't contribute, no worries - the newsletters keep coming either way. Thanks for reading and being part of this nerdy corner of the internet. All the best for the coming week - Alastair.

                            Don't miss what's next. Subscribe to The ML Engineer:

          Add a comment:

                Share this email:

                                Share on LinkedIn

                                Share on Hacker News

                                Share on Mastodon

                                Share on Bluesky