The ML Engineer 21-01-2026

        January 21, 2026

The ML Engineer 21-01-2026

        🔧 Company Engineering Blogs
Adapting the Facebook Reels RecSys AI Model Based on User Feedback (engineering.fb.com). Facebook Reels uses UTIS to align recommendations with true user interests via surveys, boosting niche content and engagement

Introducing OptiMind, a research model designed for optimization (huggingface.co). OptiMind: Microsoft Research transforms natural language optimization problems into solver-ready mathematical formulations on Hugging Face for open-source exploration

Next generation medical image interpretation with MedGemma 1.5 and medical speech to text with MedASR (research.google). Google Research releases MedGemma 1.5 4B for high-dimensional medical imaging and MedASR for medical speech-to-text, plus a MedGemma Impact Challenge

Customizing multiturn AI agents with reinforcement learning (amazon.science). Reinforcement learning improves multiturn agent customization using environment simulators, ground-truth rewards, and small models with AppWorld and DeepSearch Agent

Paper Announcement: A Practical Approach to Replenishment Optimization with Extended (R, s, Q) Policy and Probabilistic Models (engineering.zalando.com). Zalando's ZEOS replenishment engine uses probabilistic forecasting, extended (R, s, Q) policy, and discrete event simulation to boost GMV

🧭 Careers & Open Source

Projections for Data-Related Roles in The AI Era (cesarsotovalero.net). Data systems ownership, production monitoring, and governance reshape data roles; from models to end-to-end systems using Python and PSI drift monitoring

Stepping up as probabl’s CSO to supercharge scikit-learn and its ecosystem (gael-varoquaux.info). Gaël Varoquaux steps up as Probabl’s CSO to accelerate scikit-learn and its ecosystem using Python, open source, and enterprise tooling

AI model choices 2026-01 (kau.sh). Stable AI model choices for 2026, detailing tools, languages, and strategies Kaushik Gopal uses to achieve reliable results

🧪 Production ML Reality

Why Your ML Model Works in Training But Fails in Production (towardsdatascience.com). Explores production ML failures from time leaks, default signals, and population shifts with real-world fraud and payments examples

SE Radio 703: Sahaj Garg on Low Latency AI (se-radio.net). Low latency AI approaches, systems, and tools discussed by Sahaj Garg with insights on optimization and deployment

GPUs: Enterprise AI’s New Architectural Control Point (oreilly.com). GPU-bound enterprise AI shifts from elastic compute to constrained architecture, driving cost, latency, and governance considerations

From deployment slop to production reality: How BriX bridges the gap with enterprise-grade AI infrastructure (engineering.grab.com). BriX turns prototypes into production-grade AI tools with model switches, MCP data access, and enterprise-grade security

The race toward Confidential AI inference (gjolly.fr). Confidential AI inference, AMD SEV-SNP/Intel TDX, Apple, Google Gemini, startups Confer and Tinfoil, Canonical, Kubernetes, TLS, attestation

🏗️ Data Engineering Practice

Agenci AI, skalowanie pipeline’ów i ewolucja Pythona (blog.prokulski.science). AI agents, pipeline scaling, and Python evolution in 2026, covering generative AI, agentic AI, API security, MCP Model Context Protocol, SVM, anomaly detection, DuckDB, dbt-checkpoint, and Python/Geospatial tooling

Setting Up A Cluster of Tiny PCs For Parallel Computing - A Note To Myself (kenkoonwong.com). Setting up a Ubuntu cluster of tiny PCs with passwordless SSH, automated R package installs, and parallel simulations using multicore futures and TMLE in R

Apache Hudi™ at Uber: Engineering for Trillion-Record-Scale Data Lake Operations (hudi.apache.org). Uber shares engineering insights on building a trillion-record data lake using Apache Hudi, Spark, and Java, focusing on scalability and reliability

A Diary of a Data Engineer (ssp.sh). A veteran data engineer traces 50 years of evolution from BI to modern pipelines, emphasizing fundamentals, data modeling, and human impact

🧬 Semantics & Data Quality

Explainable unsupervised query tagging (emiruz.com). Explainable unsupervised query tagging using Python, pyEvidence, and OpenStreetMap gazetteers for England

Semantic Mappings Enable Automated Assembly (cthoyt.com). Semantic mappings unify heterogeneous vocabularies for knowledge graphs and lexical resources using SSSOM, JSKOS, SeMRA, and Biomappings

Spot check random samples of your data (anderspoirel.net). Spot check random samples of data using tablesample reservoir and DuckDB syntax to uncover inconsistencies across sources

🔎 Retrieval & Vector Search

VideoSummarizer: Reduced RAG for Video (Shots → Scenes → Evidence) (mostlylucid.net). VideoSummarizer uses reduced RAG for video with CLIP-based embeddings, deduplication, batching, and multi-signal scene assembly (shots → scenes → evidence) in lucidRAG

An introduction to XET, Hugging Face's storage system (part 2) (00f.net). Content-defined chunking with GearHash in XET, detailing rolling hashes, chunk boundaries, LZ4F compression, and bit/byte grouping for AI model weights

ClickHouse as a Vector Database (lorbic.com). ClickHouse can store embeddings and run vector search with HNSW indexes using SQL for semantic retrieval

📏 Evals & Benchmarks

What LLM benchmarks get wrong about measuring model performance (t-redactyl.io). Critical examination of LLM benchmarks, validity, and measurement issues in psychometrics for evaluating model performance

Evaluation (inkdroid.org). A practical guide to evaluating genAI against non-genAI systems, benchmarks, and different models, with benchmarking insights and recommendations

From AI Agent Prototype to Product: Lessons from Building AWS DevOps Agent (efekarakus.com). Four mechanisms for transforming AI agent prototypes into reliable products: evals, trajectory visualization, fast feedback loops, and production sampling using AWS DevOps Agent and OpenTelemetry tooling

🚀 GPU Training Systems

Technical Deep Dive: How DigitalOcean and AMD Delivered a 2x Production Inference Performance Increase for Character.ai (digitalocean.com). How DigitalOcean and AMD boosted Character.ai’s production inference throughput using MI300X/MI325X GPUs, DP1/TP8/EP8 and DP2/TP4/EP4 optimizations

Deep Dive into Primus: High-Performance Training for Large Language Models (rocm.blogs.amd.com). Unified Primus training framework with optimized backends (Megatron-LM, TorchTitan) for dense LLMs on AMD Instinct GPUs, featuring GEMM/FlashAttention tuning and AITER-based kernels

Pipeline Parallelism in SGLang: Scaling to Million-Token Contexts and Beyond (lmsys.org). SGLang's Chunked Pipeline Parallelism, Async P2P, and Dynamic Prefill boost ultra-long context inference across multi-node GPU clusters

Applying Compute Partitioning for Workloads on MI300X GPUs (rocm.blogs.amd.com). GPU compute partitioning on MI300X boosts throughput for GROMACS MD ensembles and REINVENT4 AI workflows using CPX mode with multiple partitions

🔧 Fine-Tuning Workflows

How to Fine-Tune Vision Models for Expert AI Results (cognitivetoday.com). Fine-tuning vision models with a 7-step blueprint, addressing domain shift, data prep, hyperparameters, and advanced strategies

Transform AI development with new Amazon SageMaker AI model customization and large-scale training capabilities (aws.amazon.com). SageMaker AI enables serverless model customization, elastic and checkpointless training, and MLflow observability for frontier models

Advanced fine-tuning techniques for multi-agent orchestration: Patterns from Amazon at scale (aws.amazon.com). Advanced fine-tuning techniques for agentic AI using SFT, PPO, DPO, GRPO, DAPO, GSPO with Amazon Bedrock and SageMaker across real-world Amazon cases

🧠 Model Ideas & Recsys
Parameters Are Like Pixels (lesswrong.com). Explores non-linear scaling, data quality, and mixture concepts in ML, using metaphors like pixels and parameters to discuss model performance

Adapting the Facebook Reels RecSys AI Model Based on User Feedback (engineering.fb.com). Facebook Reels uses UTIS to align recommendations with true user interests via surveys, boosting niche content and engagement

Conditional Memory via Scalable Lookup: A New Axis of Sparsity for Large Language Models – Deepseek strikes again (blog.quintarelli.it). Conditional memory via scalable lookup enables near‑OEIS sparsity in LLMs, using Engram and MoE tradeoffs for improved reasoning and long-context retrieval

Starting from scratch: Training a 30M Topological Transformer (tuned.org.uk). Training a 30M Tauformer: topological attention, Laplacian energy, domain memory, and 60k tokens/s on H100 with 5k steps

[笔记] 生成式推荐：OpenOneRec 技术报告（快手，2026） (arthurchiao.art). OpenOneRec 技术报告解读：两阶段对齐、Itemic Tokens 与 RecIF-Bench 的实现要点

🔮 Forecasting Practice

Forecasting the Future: Time Series, Prophets, and Cross-Validation (jrogel.com). Practical forecasting with Prophet (Meta), time series cross-validation, and defensible evaluation for real-world deployments

ACX 2025 Prediction Contest Retrospective (entropicthoughts.com). Forecasting insights from ACX 2025 contest; Brier scores, questions, Vox comparison, and predictions across economics and tech events

Predicting Best Picture at the 2026 Academy Awards (markhw.com). Mark H. White II uses a probabilistic Best Picture model to forecast Oscars 2026 favorites and nomination risks

🛰️ Vision & Segmentation

Earth Observation on a Budget: Finding Solar Farms with a 42k-Parameter Model (toao.com). Finding solar farms with a 42k-parameter UNet using Tessera embeddings and OpenStreetMap/REPD data

Rolling your own serverless OCR in 40 lines of code (ckrapu.github.io). Serverless OCR with Modal in 40 lines using DeepSeek OCR, PyTorch, and FastAPI for batch PDF processing

Grounded SAM 2: From Open-Set Detection to Segmentation and Tracking (pyimagesearch.com). Grounded SAM 2 enables open-set detection, segmentation, and video tracking using Grounding DINO and SAM 2 with Python, Gradio, and OpenCV

Watershed Segmentation Using OpenCV (opencv.org). Learn watershed segmentation in OpenCV to separate touching objects using marker-based methods with preprocessing and Python code

🌲 Classical ML Math

Learning better decision tree splits - LLMs as Heuristics for Program Synthesis (mchav.github.io). Explores decision tree splits, program synthesis, and Haskell tooling for data science and feature generation

Science Discovery: The Advanced Matrix Factorization and Decomposition Jungle Page (nuit-blanche.blogspot.com). Explores advanced matrix factorization and decomposition topics with notes on CS, ML, MF, and references

Randomized SVD (leimao.github.io). Efficient approximation of SVD for large matrices using random projections, QR, and compact SVD, with Python/Matlab-like math exposition

Spectra smoothing based on information entropy (nirpyresearch.com). Smoothing spectra with information entropy using Savitzky-Golay parameters and delentropy criteria in Python

📚 Academic Research

Improved Algorithms for Fair Matroid Submodular Maximization (arxiv:cs). Microsoft Research and CMU propose better algorithms for fair submodular maximization under matroid constraints. Near-full fairness (1−ε) with constant approximation helps clustering, recommendation, coverage tasks

Optimising for Energy Efficiency and Performance in Machine Learning (arxiv:cs). Cambridge introduces ECOpt, a hyperparameter tuner optimizing accuracy and energy, including inference cost. It exposes Pareto frontiers and finds greener CIFAR‑10 models across common hardware

A pipeline for enabling path-specific causal fairness in observational health data (arxiv:cs). Columbia and Oxford present a pipeline for path-specific causal fairness on EHR data. It robustly estimates direct/indirect effects and tests mitigation tradeoffs across clinical tasks

Evaluating the Ability of Explanations to Disambiguate Models in a Rashomon Set (arxiv:cs). Oxford researchers propose AXE to evaluate local feature-importance explanations without ground-truth labels. On-manifold kNN surrogates reveal misleading metrics and detect fairwashing attacks in Rashomon sets

Combinatorial Optimization Augmented Machine Learning (arxiv:cs). TUM, Inria and Institut Polytechnique Paris survey combinatorial-optimization-augmented machine learning. A unifying framework links prediction to decisions, highlighting algorithms, applications, and open research frontiers today

                            Don't miss what's next. Subscribe to The ML Engineer:

            Email address (required)

          Add a comment:

                Share this email:

                                Share on LinkedIn

                                Share on Hacker News

                                Share on Mastodon

                                Share on Bluesky