The ML Engineer 23-12-2025

        December 23, 2025

The ML Engineer 23-12-2025

            🧰 Production Engineering
The Machine Learning Engineer’s Checklist: Best Practices for Reliable Models (machinelearningmastery.com). Versioning, CI/CD pipelines, data quality, rigorous testing, robust deployment, monitoring, fairness, cost optimization, feedback loops, and strong engineering culture with tools like DVC, MLflow, Airflow, Kubeflow, Prometheus, SHAP, and Feast

2025 in review (veekaybee.github.io). Production-ready ML systems: tokenization, error handling, Dockerization, logging, and the human craft behind building in public

Scott Haines on the Future of Data Engineering (confessionsofadataguy.com). Scott Haines discusses the future of data engineering, cloud-native tooling, and practical approaches with Python, SQL, SQLAlchemy, and modern data stacks

DrP: Meta’s Root Cause Analysis Platform at Scale (engineering.fb.com). DrP automates root-cause investigations at Meta with an SDK-driven analyzer workflow, scalable backend, and alert-integrated post-processing

Making Claude Code Usage Observable (emsi.me). Observability stack for Claude Code using OpenTelemetry, Prometheus, Loki, and Grafana to track sessions, tokens, tool usage, and code changes

A Short History of Performance Engineering (calendar.perfplanet.com). Tracing performance engineering from mainframes to cloud, with tools like SMF, RMF, OMEGAMON, LoadRunner, JMeter, APM, and insights from SRE, WPO, and FinOps

🌍 ML in the Wild

Don’t mess with physics (the3dlab.org). Global maps of soil temperature misalign physics; ML depth-specific models reveal physically impossible patterns; follow-up 2.0 aims for physics-consistent, higher-resolution maps

Hardware Store Marauder’s Map is Clarkian Magic (hackaday.com). Marauder’s Map via CCTV, 3D store mapping, Jetson AGX, NVIDIA DeepStream, machine vision for tracking

A “scientific sandbox” lets researchers explore the evolution of vision systems (media.mit.edu). MIT researchers use an AI-driven sandbox to evolve vision systems, exploring sensor, neural, and environmental constraints with reinforcement learning

Eine [Studie von Nature Communications](https://www.nature.com/articles/s41467-025-67728-y) besch... (mikka.is). Portable device for frailty assessment using high-rate gait biosignals KI inference; continuous 10-day operation; automated ward vital-sign alerts for early deterioration; passive long-COVID sensing with ML

SuperWing Dataset: Large-scale, Realistic Aerodynamics for AI & Machine Learning (ge.in.tum.de). Large-scale, realistic transonic wing aerodynamics dataset with 4,239 geometries and 28,856 RANS solutions for AI/ML modeling using Transformers

🗄️ Data Platforms & Query
Ostatni raz w tym roku (blog.prokulski.science). Practical data engineering and AI model practices using DuckDB, dbt, OCR, graph entities, and MLOps on GCP for data handling, security, and deployment

Inside the feature store powering real-time AI in Dropbox Dash (dropbox.tech). Dropbox Dash feature store combines Feast, Spark, and Dynovault with Go-based serving for sub-100ms real-time relevance signals

Accelerating Tree-Based Models in SQL with Orbital 0.3.0 (posit.co). Orbital 0.3.0 accelerates SQL-based tree models using Posit tools for R, Python, and SQL environments with centralized management and sharing capabilities

Deep dive on pruning (spatialists.ch). Deep-dive into pruning GeoParquet with SedonaDB, DuckDB, GeoPandas, GDAL, and Sedona Spark using Python and SQL

How much data do you really have? (natemeyvis.com). How data sizing misestimates affect system design, querying, and cost, with tips to reduce data size for better performance

🧩 Polyglot Workflows

Finally figured out a way to port python packages to R using uv and reticulate: example with nnetsauce (thierrymoudiki.github.io). Using uv and reticulate to port Python nnetsauce into R with examples and benchmarks

Learning to Rank with Clojure – how we run our ML pipelines reliably (otto.de). Clojure-based ML pipelines, core.async streaming, Protobuf schemas, and Polylith monorepo for stable, efficient learning-to-rank in OTTO's e-commerce search

Annotating the Literature with Named Entity Recognition (cthoyt.com). Demonstrates NER annotation of PubMed abstracts with MeSH using Python tools and pyobo grounders and a Simple uv run script

Machine Learning in Clojure with libpython‑clj: Unlocking Causal Insights Using Microsoft’s EconML [Series 3] (flexiana.com). Clojure + libpython-clj enable EconML causal inference, showing heterogeneous treatment effects beyond A/B testing

R Code Optimization IV: Practical Tools and Workflow (blasbenito.com). Practical profiling with profvis, benchmarking with microbenchmark and bench, and a structured optimization workflow in R

🧼 Data Quality Practice

Third Week in Data Science: The 94% Accurate Model That Was Actually Terrible (igorstechnoclub.com). A journey through model evaluation traps, data preprocessing, KNN insights, and threshold tuning using Python

Peter Piper on the Four Ps of AI Data Quality: Purge, Patch, Push Back, or Pass (datakitchen.io). Four practical options—Pass, Purge, Patch, Push Back—for handling raw data in AI pipelines, with TestGen-powered quality profiling

Do a sanity check on your experiments (ehudreiter.com). Sanity checks on data, model outputs, and evaluation to detect bugs in NLP/AI experiments

DataSummarizer: Fast Local Data Profiling (mostlylucid.net). Fast local data profiling with DuckDB, deterministic profiles, drift and synthetic data cloner, plus optional LLM narration in a CLI

The Ostrich Problem: Your Data Team Thinks Their Job Ends at Deployment. (datakitchen.io). How to fix the deploy-and-forget mindset with surveys, error counts, dashboards, and TestGen data quality tooling

⚙️ Scaling & Performance

Using Horovod for Distributed Training (nas.nasa.gov). Horovod enables distributed training with TensorFlow and PyTorch on NAS using MPI, set_vars.sh, and PBS scripts

Why Stochastic Rounding is Essential for Modern Generative AI (cloud.google.com). Stochastic rounding enables low-precision training on Google Cloud TPUs and A4X GPUs with JAX and Qwix for INT4/INT8/FP8 formats

Mini-SGLang: Efficient Inference Engine in a Nutshell (lmsys.org). Mini-SGLang offers a lightweight, OpenAI-compatible LLM inference engine with Radix Attention and Tensor Parallelism implemented in a ~5k-line Python codebase

A Step-by-Step Walkthrough of Decentralized LLM Training on AMD GPUs (rocm.blogs.amd.com). Decentralized LLM training on AMD MI300 GPUs using DiLoCo and Prime to scale across geographically distributed clusters

Overview of Intel VTune Analysis (nas.nasa.gov). Overview of VTune Profiler analysis types for NAS systems, including sampling, event-based sampling, driverless collection, and command-line usage with vtune

VRAM vs System RAM: What Actually Limits Running LLMs Locally? (dewanahmed.com). VRAM vs system RAM in local LLMs: how GPU memory and host memory shape feasibility and performance with Qwen3-Next-style models

📊 Statistical Learning

Experiments to understand Singular Learning Theory's Free Energy & Local Learning Coefficient (LLC) (lesswrong.com). Explores Singular Learning Theory's free energy and LLC through SGLD-MCMC experiments, grokking, polynomials, and low-rank nets, with Python/Numpy-style analysis

Construcción de intervalos de confianza para gráficos de calibración vía "bootstrap" y algunos asuntos más (datanalytics.com). Calibración de gráficos con bootstrap, intervalos de confianza y temas relacionados en estadística y ML usando R y Python

Trying to fit a logistic curve (johndcook.com). Fitting a logistic curve from left-tail data with Python's SciPy may fail or be imprecise

Elo rating systems via Markov Chains (xianblog.wordpress.com). Explores Elo ratings via Markov Chains, Bradley–Terry–Luce models, spectral gap optimization, SGD updates, and Bayesian ranking discussions

a (sunny, crisp) day at ICSDS 2025 (xianblog.wordpress.com). Bayesian learning sessions at ICSDS 2025 in Xi’an; proper prior minimaxity, variational inference, DIC, AI priors, martingale prediction, and urn-based math discussed by George, Margossian, Christensen, Rockova, Ng, Cappello, Ghiglietti

Predicting survival using a super learner and right-censored data (aliceinstatisticsland.wordpress.com). Survival analysis with a super learner using right-censored data in R (survivalSL, flexsurv, glmnet) and methods like randomSRC and survival neural networks

Tracking animals in an underwater maze (methodsblog.com). Movement ecology of flapper skate using state-space models, particle filtering, Wahoo.jl GPU convolution in R&D pipeline

📚 Academic Research

Provably Extracting the Features from a General Superposition (arxiv:cs). Berkeley proves an efficient query algorithm to extract nonlinear features from ridge-function superpositions via Fourier searches. Gives identifiability guarantees that sharpen interpretability and auditing practice

TorchTraceAP: A New Benchmark Dataset for Detecting Performance Anti-Patterns in Computer Vision Models (arxiv:cs). UCI and Meta release TorchTraceAP, 600+ PyTorch traces labeled with performance anti-pattern regions across vision models and hardware. Enables training tools that automate profiling workflows

gridfm-datakit-v1: A Python Library for Scalable and Realistic Power Flow and Optimal Power Flow Data Generation (arxiv:cs). IBM Research and partners open-source gridfm-datakit, a Python toolkit generating realistic power-flow and optimal-power-flow datasets up to 30k buses. Accelerates graph ML for energy systems

Controllable Probabilistic Forecasting with Stochastic Decomposition Layers (arxiv:cs). NCAR introduces Stochastic Decomposition Layers, injecting learned multi-scale noise into weather models to produce calibrated ensembles with retraining cost. Latent control improves interpretability and deployment

Quantitative Verification of Fairness in Tree Ensembles (arxiv:cs). UEC Tokyo and AIST propose quantitative fairness verification for tree ensembles, estimating counterexample proportions and regions with efficient anytime upper/lower bounds. Improves bias diagnosis significantly

                            Don't miss what's next. Subscribe to The ML Engineer:

          Add a comment:

                Share this email:

                                Share on LinkedIn

                                Share on Hacker News

                                Share on Mastodon

                                Share on Bluesky