The ML Engineer 23-12-2025
đź§° Production Engineering
The Machine Learning Engineer’s Checklist: Best Practices for Reliable Models (machinelearningmastery​.com). Versioning, CI/CD pipelines, data quality, rigorous testing, robust deployment, monitoring, fairness, cost optimization, feedback loops, and strong engineering culture with tools like DVC, MLflow, Airflow, Kubeflow, Prometheus, SHAP, and Feast
2025 in review (veekaybee​.github​.io). Production-ready ML systems: tokenization, error handling, Dockerization, logging, and the human craft behind building in public
Scott Haines on the Future of Data Engineering (confessionsofadataguy​.com). Scott Haines discusses the future of data engineering, cloud-native tooling, and practical approaches with Python, SQL, SQLAlchemy, and modern data stacks
DrP: Meta’s Root Cause Analysis Platform at Scale (engineering​.fb​.com). DrP automates root-cause investigations at Meta with an SDK-driven analyzer workflow, scalable backend, and alert-integrated post-processing
Making Claude Code Usage Observable (emsi​.me). Observability stack for Claude Code using OpenTelemetry, Prometheus, Loki, and Grafana to track sessions, tokens, tool usage, and code changes
A Short History of Performance Engineering (calendar​.perfplanet​.com). Tracing performance engineering from mainframes to cloud, with tools like SMF, RMF, OMEGAMON, LoadRunner, JMeter, APM, and insights from SRE, WPO, and FinOps
🌍 ML in the Wild
Don’t mess with physics (the3dlab​.org). Global maps of soil temperature misalign physics; ML depth-specific models reveal physically impossible patterns; follow-up 2.0 aims for physics-consistent, higher-resolution maps
Hardware Store Marauder’s Map is Clarkian Magic (hackaday​.com). Marauder’s Map via CCTV, 3D store mapping, Jetson AGX, NVIDIA DeepStream, machine vision for tracking
A “scientific sandbox” lets researchers explore the evolution of vision systems (media​.mit​.edu). MIT researchers use an AI-driven sandbox to evolve vision systems, exploring sensor, neural, and environmental constraints with reinforcement learning
Eine [Studie von Nature Communications](https://www.nature.com/articles/s41467-025-67728-y) besch... (mikka​.is). Portable device for frailty assessment using high-rate gait biosignals KI inference; continuous 10-day operation; automated ward vital-sign alerts for early deterioration; passive long-COVID sensing with ML
SuperWing Dataset: Large-scale, Realistic Aerodynamics for AI & Machine Learning (ge​.in​.tum​.de). Large-scale, realistic transonic wing aerodynamics dataset with 4,239 geometries and 28,856 RANS solutions for AI/ML modeling using Transformers
🗄️ Data Platforms & Query
Ostatni raz w tym roku (blog​.prokulski​.science). Practical data engineering and AI model practices using DuckDB, dbt, OCR, graph entities, and MLOps on GCP for data handling, security, and deployment
Inside the feature store powering real-time AI in Dropbox Dash (dropbox​.tech). Dropbox Dash feature store combines Feast, Spark, and Dynovault with Go-based serving for sub-100ms real-time relevance signals
Accelerating Tree-Based Models in SQL with Orbital 0.3.0 (posit​.co). Orbital 0.3.0 accelerates SQL-based tree models using Posit tools for R, Python, and SQL environments with centralized management and sharing capabilities
Deep dive on pruning (spatialists​.ch). Deep-dive into pruning GeoParquet with SedonaDB, DuckDB, GeoPandas, GDAL, and Sedona Spark using Python and SQL
How much data do you really have? (natemeyvis​.com). How data sizing misestimates affect system design, querying, and cost, with tips to reduce data size for better performance
đź§© Polyglot Workflows
Finally figured out a way to port python packages to R using uv and reticulate: example with nnetsauce (thierrymoudiki​.github​.io). Using uv and reticulate to port Python nnetsauce into R with examples and benchmarks
Learning to Rank with Clojure – how we run our ML pipelines reliably (otto​.de). Clojure-based ML pipelines, core.async streaming, Protobuf schemas, and Polylith monorepo for stable, efficient learning-to-rank in OTTO's e-commerce search
Annotating the Literature with Named Entity Recognition (cthoyt​.com). Demonstrates NER annotation of PubMed abstracts with MeSH using Python tools and pyobo grounders and a Simple uv run script
Machine Learning in Clojure with libpython‑clj: Unlocking Causal Insights Using Microsoft’s EconML [Series 3] (flexiana​.com). Clojure + libpython-clj enable EconML causal inference, showing heterogeneous treatment effects beyond A/B testing
R Code Optimization IV: Practical Tools and Workflow (blasbenito​.com). Practical profiling with profvis, benchmarking with microbenchmark and bench, and a structured optimization workflow in R
đź§Ľ Data Quality Practice
Third Week in Data Science: The 94% Accurate Model That Was Actually Terrible (igorstechnoclub​.com). A journey through model evaluation traps, data preprocessing, KNN insights, and threshold tuning using Python
Peter Piper on the Four Ps of AI Data Quality: Purge, Patch, Push Back, or Pass (datakitchen​.io). Four practical options—Pass, Purge, Patch, Push Back—for handling raw data in AI pipelines, with TestGen-powered quality profiling
Do a sanity check on your experiments (ehudreiter​.com). Sanity checks on data, model outputs, and evaluation to detect bugs in NLP/AI experiments
DataSummarizer: Fast Local Data Profiling (mostlylucid​.net). Fast local data profiling with DuckDB, deterministic profiles, drift and synthetic data cloner, plus optional LLM narration in a CLI
The Ostrich Problem: Your Data Team Thinks Their Job Ends at Deployment. (datakitchen​.io). How to fix the deploy-and-forget mindset with surveys, error counts, dashboards, and TestGen data quality tooling
⚙️ Scaling & Performance
Using Horovod for Distributed Training (nas​.nasa​.gov). Horovod enables distributed training with TensorFlow and PyTorch on NAS using MPI, set_vars.sh, and PBS scripts
Why Stochastic Rounding is Essential for Modern Generative AI (cloud​.google​.com). Stochastic rounding enables low-precision training on Google Cloud TPUs and A4X GPUs with JAX and Qwix for INT4/INT8/FP8 formats
Mini-SGLang: Efficient Inference Engine in a Nutshell (lmsys​.org). Mini-SGLang offers a lightweight, OpenAI-compatible LLM inference engine with Radix Attention and Tensor Parallelism implemented in a ~5k-line Python codebase
A Step-by-Step Walkthrough of Decentralized LLM Training on AMD GPUs (rocm​.blogs​.amd​.com). Decentralized LLM training on AMD MI300 GPUs using DiLoCo and Prime to scale across geographically distributed clusters
Overview of Intel VTune Analysis (nas​.nasa​.gov). Overview of VTune Profiler analysis types for NAS systems, including sampling, event-based sampling, driverless collection, and command-line usage with vtune
VRAM vs System RAM: What Actually Limits Running LLMs Locally? (dewanahmed​.com). VRAM vs system RAM in local LLMs: how GPU memory and host memory shape feasibility and performance with Qwen3-Next-style models
📊 Statistical Learning
Experiments to understand Singular Learning Theory's Free Energy & Local Learning Coefficient (LLC) (lesswrong​.com). Explores Singular Learning Theory's free energy and LLC through SGLD-MCMC experiments, grokking, polynomials, and low-rank nets, with Python/Numpy-style analysis
ConstrucciĂłn de intervalos de confianza para gráficos de calibraciĂłn vĂa "bootstrap" y algunos asuntos más (datanalytics​.com). CalibraciĂłn de gráficos con bootstrap, intervalos de confianza y temas relacionados en estadĂstica y ML usando R y Python
Trying to fit a logistic curve (johndcook​.com). Fitting a logistic curve from left-tail data with Python's SciPy may fail or be imprecise
Elo rating systems via Markov Chains (xianblog​.wordpress​.com). Explores Elo ratings via Markov Chains, Bradley–Terry–Luce models, spectral gap optimization, SGD updates, and Bayesian ranking discussions
a (sunny, crisp) day at ICSDS 2025 (xianblog​.wordpress​.com). Bayesian learning sessions at ICSDS 2025 in Xi’an; proper prior minimaxity, variational inference, DIC, AI priors, martingale prediction, and urn-based math discussed by George, Margossian, Christensen, Rockova, Ng, Cappello, Ghiglietti
Predicting survival using a super learner and right-censored data (aliceinstatisticsland​.wordpress​.com). Survival analysis with a super learner using right-censored data in R (survivalSL, flexsurv, glmnet) and methods like randomSRC and survival neural networks
Tracking animals in an underwater maze (methodsblog​.com). Movement ecology of flapper skate using state-space models, particle filtering, Wahoo.jl GPU convolution in R&D pipeline
📚 Academic Research
Provably Extracting the Features from a General Superposition (arxiv:cs). Berkeley proves an efficient query algorithm to extract nonlinear features from ridge-function superpositions via Fourier searches. Gives identifiability guarantees that sharpen interpretability and auditing practice
TorchTraceAP: A New Benchmark Dataset for Detecting Performance Anti-Patterns in Computer Vision Models (arxiv:cs). UCI and Meta release TorchTraceAP, 600+ PyTorch traces labeled with performance anti-pattern regions across vision models and hardware. Enables training tools that automate profiling workflows
gridfm-datakit-v1: A Python Library for Scalable and Realistic Power Flow and Optimal Power Flow Data Generation (arxiv:cs). IBM Research and partners open-source gridfm-datakit, a Python toolkit generating realistic power-flow and optimal-power-flow datasets up to 30k buses. Accelerates graph ML for energy systems
Controllable Probabilistic Forecasting with Stochastic Decomposition Layers (arxiv:cs). NCAR introduces Stochastic Decomposition Layers, injecting learned multi-scale noise into weather models to produce calibrated ensembles with retraining cost. Latent control improves interpretability and deployment
Quantitative Verification of Fairness in Tree Ensembles (arxiv:cs). UEC Tokyo and AIST propose quantitative fairness verification for tree ensembles, estimating counterexample proportions and regions with efficient anytime upper/lower bounds. Improves bias diagnosis significantly
Add a comment: