The ML Engineer 13-01-2026

        January 13, 2026

The ML Engineer 13-01-2026

        🔧 Company Engineering Blogs
Why We Use Separate Tech Stacks for Personalization and Experimentation (engineering.atspotify.com). Spotify separates ML personalization stacks from experimentation, using contextual bandits within ML and a separate Confidence experimentation toolchain

Scaling Sales Agents: Engineering Next-Gen AI for the Enterprise Era (engineering.salesforce.com). How Salesforce re-architected a single-agent Engagement Agent into a dispatcher-driven multi-agent system using queues, prioritization, and quotas

Small Yet Mighty: Improve Accuracy In Multimodal Search and Visual Document Retrieval with Llama Nemotron RAG Models (huggingface.co). NVIDIA Nemotron VL models enable accurate, low-latency multimodal retrieval for PDFs and images using vector databases

NeuralGCM harnesses AI to better simulate long-range global precipitation (research.google). NeuralGCM blends physics with AI, trained on NASA satellite data, to improve global precipitation forecasts and diurnal cycles

Building a Global, Event-Driven Platform: Our Ongoing Journey, Part 1 (vinted.engineering). Global, event-driven platform shift: from monoliths to DDD-guided domain boundaries, events, and Saga-like orchestration at Vinted

🧭 Engineering Careers Craft

What a Platform Engineer Is (and why it’s basically DevOps with a sharper purpose) (chris.funderburg.me). Platform Engineering blends DevOps with a product mindset, expanding to data, ML, and AI infrastructure using IaC, CI/CD, Kubernetes, and observability

Startups I Didn't Start: Pipeline Optimizer (faingezicht.com). Explores pipeline optimization in enterprise sales using queuing theory, ROI, and early AI decisioning within RevOps and data engineering

The 1000 commits problem (davekiss.com). AI-assisted velocity outpaces changelog drift, parser assumptions, and release processes across Claude Code, with tools like Deploycast and Driftless tackling automation

Misc engineering truisms (macwright.com). Truisms across JavaScript, math, maps, and engineering, via Tom MacWright’s reflections and examples

🏗️ Serving & ML Platforms

Speed meets scale: Load testing SageMakerAI endpoints with Observe.AI’s testing tool (aws.amazon.com). Load testing SageMaker endpoints with Observe.AI OLAF, using Locust, Docker, and AWS STS to optimize performance and costs

How to choose the right open model for production (together.ai). A practical guide to selecting open-source AI models for production, covering evaluation, benchmarking, licensing, and deployment options

Writing an LLM from scratch, part 29 -- using DistributedDataParallel to train a base model from scratch in the cloud (gilesthomas.com). DistributedDataParallel training of a base model in the cloud using PyTorch, DDP setup, and checkpointing on AWS-like GPUs

AI workflow orchestration: why separate platforms fail (ciq.com). Unified AI workflows integrate training and inference, reducing manual handoffs and accelerating iteration across models and deployments

vLLM Quickstart: High-Performance LLM Serving (glukhov.org). High-throughput LLM serving with vLLM: OpenAI-compatible API, PagedAttention, CUDA optimization, and multi-GPU setups

Why Going Beyond Your Own Hardware to Host AI Might Make Sense (matthiasroder.com). Hosting AI on a server turns models into infrastructure, using Hugging Face for models, tooling, and scaling

🧼 Data Quality Governance

La qualité des données, un problème systémique en apprentissage automatique (ohmybox.info). Data quality in machine learning shapes model performance and governance, with bias, data sparsity, and LLM-scale data challenges

Data quality, a systemic problem in machine learning (ohmybox.info). Data quality challenges in machine learning, from acquisition to validation, biases, sparsity, and the impact on LLMs and SLMs

The “All You Need” Fallacy (zwillgen.com). Five intertwined fallacies in AI—benchmarking, data fairness, Western safety norms, testing, and overreliance on LLMs

Don’t train on this data or what’s a canary string? (stuker.com). Discusses training data leakage, canary strings, and excluding content from LLM pretraining to protect benchmarks and studies

🧱 Data Pipelines Storage

Optimizing data throughput for Postgres snapshots with batch size auto-tuning (xata.io). Automatic batch size tuning boosts Postgres snapshots with pgstream for networked environments and real-world throughput

Polars Pipe Operator in Action. (confessionsofadataguy.com). Explores clean data pipelines in Polars using pipe operator for readable, testable transformations in Python

Preview of dynamical.org Icechunk Zarrs are now listed on the Registry of Open Data on AWS! (dynamical.org). Icechunk Zarrs listed on AWS Registry of Open Data; overview of IFS ENS, GFS, HRRR usage with Icechunk

From Chaos to Scale: Templatizing Spark Declarative Pipelines with DLT-META (databricks.com). Metadata-driven metaprogramming automates Spark Declarative Pipelines with DLT-META for scalable, governance-backed data workflows

A World Without Kafka (ververica.com). Kafka limitations for real-time analytics are explored, with Apache Fluss and Flink as modern alternatives for unified streaming storage

🔁 Python-R Interop

rtopy: an R to Python bridge – novelties (thierrymoudiki.github.io). rtopy enables translating R to Python via an enhanced RBridge and call_r, showcasing SVM with e1071, forecast ARIMA, randomForest, dplyr, and clustering examples in Python

Using neural networks in R is still not obsolete in 2026 (joshuamarie.com). Explores R's neural networks in 2026, focusing on torch, tidymodels, and the kindling package for streamlined deep learning in R

From data to reports missing the potholes (colinpaice.blog). Python processing of z/OS datasets with Pandas, dicts vs. lists, ASCII conversion, and handling mixed record types

⚙️ GPU Performance Engineering

Optimizing Data Transfer in Batched AI/ML Inference Workloads (towardsdatascience.com). In-depth profiling with NVIDIA Nsight Systems on GPU-to-CPU data transfer in PyTorch, using Deeplabv3/ResNet-50 on AWS EC2 g6e with nvtx annotations

Using Gradient Boosting Libraries on MI300X for Financial Risk Prediction (rocm.blogs.amd.com). GPU-accelerated LightGBM and ThunderGBM on MI300X with ROCm for fraud detection and loan-default risk prediction

PyTorch CUDA Graph Capture (leimao.github.io). Luaired overview of PyTorch CUDA Graphs using torch.cuda.graph and torch.cuda.make_graphed_callables in training a model with MLP, profiling, and replay

High-Resolution Weather Forecasting with StormCast on AMD Instinct GPU Accelerators (rocm.blogs.amd.com). High-Resolution weather forecasting with StormCast on AMD Instinct GPUs using Earth2Studio, Python, and ROCm, exploring convection, HRRR data, and visualization

20,000 healthy GPUs (modal.com). Modal describes monitoring 20,000+ GPUs across AWS, GCP, Azure, and OCI with passive/active checks, DCGM, and image automation

📱 Edge Model Deployment

Fine-tuning vision-language models on memory-constrained devices (amazon.science). Hybrid sharpness-aware zeroth-order optimization (SharpZO) enables edge devices to fine-tune vision-language models using forward passes, improving accuracy and convergence

ONNX Multi-platform benchmark (karnwong.me). ONNX multi-platform benchmark compares inference across PC, Android, and Raspberry Pi using forest regression models in Python, Rust, Web, Kotlin, and Flutter

A 30B Qwen model walks into a Raspberry Pi and runs in real time (byteshape.com). ByteShape optimizes Qwen3-30B-A3B-Instruct-2507 for Raspberry Pi, CPUs, and GPUs using ShapeLearn to maximize TPS and quality

Accelerating LLM and VLM Inference for Automotive and Robotics with NVIDIA TensorRT Edge-LLM (developer.nvidia.com). NVIDIA TensorRT Edge-LLM enables real-time LLM/VLM inference on automotive/robotics platforms using Jetson Thor and DRIVE AGX Thor with EAGLE-3 decoding

State of Edge AI on Microcontrollers in 2026 (shawnhymel.com). Edge AI on microcontrollers (TinyML) matures with vendor pipelines, open runtimes, and end-to-end platforms guiding practical deployments

🔎 Vector Retrieval Scaling

HNSW at Scale: Why Your RAG System Gets Worse as the Vector Database Grows (towardsdatascience.com). HNSW-based ANN retrieval, recall, and RAG scaling using Python Faiss, LAION data, and CLIP embeddings

Semantic Search Without Embeddings (softwaredoug.com). Explores semantic search beyond embeddings, comparing simple feature-based similarity, taxonomy-driven retrieval, and LLM-augmented categorization

Using MLflow-RAGAS Integration Without Tracing (safjan.com). Guides static-data and simple predict_fn approaches to evaluate RAG pipelines with MLflow RAGAS using Python

Filtered ANN Search With Composite Vector Indexes (couchbase.com). Filtered ANN search using composite vector indexes with Couchbase: combining filters, vectors, and performance techniques in practical examples

🖼️ Multimodal Production Pipelines

Image to 3D Mesh Generation with Detection Grounding (debuggercafe.com). Image to 3D mesh generation with detection grounding using Qwen3-VL, BiRefNet, and Hunyuan3D demonstrated in a local, VRAM-aware pipeline

Image Summarizer : A Constrained Fuzzy Image RAG Engine (mostlylucid.net). Constrained Fuzziness for image analysis: a deterministic RAG pipeline in C#, Python-like pseudocode, with wave architecture and Vision LLM gating

Part 4.1: The Three-Tier OCR Pipeline (mostlylucid.net). Constrained Fuzzy OCR uses a three-tier pipeline (Tesseract, Florence-2 ONNX, Vision LLM) with ONNX locally, multi-frame voting, text-only filmstrip optimization, and cost-aware routing

Annotating Edge Cases in Visual Category Datasets (learningspiral.ai). Annotating edge cases in visual datasets improves model robustness and real-world AI performance across domains

🎲 Bayesian Bandits Decisions

Taming P99s in OpenFGA: How We Built a Self-Tuning Strategy Planner (auth0.com). Self-tuning strategy planner using Thompson Sampling reduces P99 latency in OpenFGA for multi-tenant environments with Bayesian priors and adaptive strategy selection

Why We Use Separate Tech Stacks for Personalization and Experimentation (engineering.atspotify.com). Spotify separates ML personalization stacks from experimentation, using contextual bandits within ML and a separate Confidence experimentation toolchain

Bayesian Decision Analysis (allendowney.com). Bayesian decision analysis with PyMC: A hands-on workshop on A/B testing, uncertainty, and adaptive methods

Decaying Bayesian Updating for Non-stationary Time Series Models (austinrochford.com). Decaying Bayesian updating for non-stationary time series using beta-binomial models in Python to adapt to changing p

📏 Measurement Under Uncertainty

The Hidden Math That Shapes Our Health: 5 Surprising Truths from Data Science (federicagazzelloni.com). Five health-data truths, DALYs, TMREL, AI in epidemics, risk perception, and spatial Kriging mapping in public health

Why Benchmarks Fail in Analog Systems (blog.mycal.net). Analog-style AI: benchmarks fail for context-sensitive LLMs; evaluate under constraints rather than single scores, and embrace variability inspired by weather forecasting

Improve quality by learning from your process data – a review of “Twenty Things You Need to Know” by Donald J. Wheeler (testandanalysis.home.blog). Review of Twenty Things You Need to Know by Wheeler, using process behaviour charts and Assignable Causes to analyze variation in quality data

🧮 Practical Algorithms Structures

Les filtres de Bloom dans Parquet (icem7.fr). Bloom filters in Parquet, data skipping, DuckDB, Sirene dataset, performance trade-offs in columnar storage

Reversing YouTube's Most Replayed (priyavr.at). Interactive exploration of YouTube's Most Replayed graph, using JS, SVG, Bezier curves, and a差 difference array approach

sorted string tables (SST) from first principles (williballenthin.com). SST data structures, first principles, algorithmic design, and implementation considerations explored with readers for efficient string table management

🧠 Theory Representations

Theoretical predictions on the sample efficiency of training policies and activation monitors (lesswrong.com). Theoretical sample-efficiency predictions for training policies and activation monitors in AI risk, with SGD, VC theory, and activation monitors

What Does the Linear Representation Hypothesis Even Mean? (alok.github.io). Explores what a 'linear representation' means in AI, comparing word analogies, logistic probes, and steering vectors

NeuralGCM harnesses AI to better simulate long-range global precipitation (research.google). NeuralGCM blends physics with AI, trained on NASA satellite data, to improve global precipitation forecasts and diurnal cycles

Five Research Papers Accepted to FOCS 2025 (cs.columbia.edu). Columbia researchers present FOCS 2025 papers on learning k-term DNFs, NN search embeddings, generalized flow, cell-probe lower bounds, and Kronecker circuit theory

📚 Academic Research

Implicit bias as a Gauge correction: Theory and Inverse Design (arxiv:stat). Reframes SGD implicit bias as geometric gauge correction from parameter symmetries, yielding closed-form stationary preference. Helps engineers predict or inverse-design biases like sparsity systematically today

How to Set the Learning Rate for Large-Scale Pre-training? (arxiv:cs). Derives scaling law to extrapolate optimal learning rates from cheap runs and tests μTransfer for huge MoE pretraining. Offers practical LR guidelines for large training

XBTorch: A Unified Framework for Modeling and Co-Design of Crossbar-Based Deep Learning Accelerators (arxiv:cs). XBTorch adds PyTorch-integrated simulation for analog crossbar accelerators, enabling device-to-model co-design, hardware-aware training, and fault studies. Useful for energy/latency engineers, reproducible research and prototyping pipelines

Rapid Augmentations for Time Series (RATS): A High-Performance Library for Time Series Augmentation (arxiv:cs). RATS is a Rust-backed time-series augmentation library with Python bindings, delivering major speed and memory wins over tsaug. Great for production augmentation pipelines benchmarking too

Scalable Ultra-High-Dimensional Quantile Regression with Genomic Applications (arxiv:stat). FS-QRPPA makes penalized quantile regression scalable when p≫n via feature-splitting proximal updates with proven Q-linear convergence. Enables genomic prediction intervals using parallel fsQRPPA R package

                            Don't miss what's next. Subscribe to The ML Engineer:

            Email address (required)

          Add a comment:

                Share this email:

                                Share on LinkedIn

                                Share on Hacker News

                                Share on Mastodon

                                Share on Bluesky