The ML Engineer 13-01-2026
🔧 Company Engineering Blogs
Why We Use Separate Tech Stacks for Personalization and Experimentation (engineering.atspotify.com). Spotify separates ML personalization stacks from experimentation, using contextual bandits within ML and a separate Confidence experimentation toolchain
Scaling Sales Agents: Engineering Next-Gen AI for the Enterprise Era (engineering.salesforce.com). How Salesforce re-architected a single-agent Engagement Agent into a dispatcher-driven multi-agent system using queues, prioritization, and quotas
Small Yet Mighty: Improve Accuracy In Multimodal Search and Visual Document Retrieval with Llama Nemotron RAG Models (huggingface.co). NVIDIA Nemotron VL models enable accurate, low-latency multimodal retrieval for PDFs and images using vector databases
NeuralGCM harnesses AI to better simulate long-range global precipitation (research.google). NeuralGCM blends physics with AI, trained on NASA satellite data, to improve global precipitation forecasts and diurnal cycles
Building a Global, Event-Driven Platform: Our Ongoing Journey, Part 1 (vinted.engineering). Global, event-driven platform shift: from monoliths to DDD-guided domain boundaries, events, and Saga-like orchestration at Vinted
🧭 Engineering Careers Craft
What a Platform Engineer Is (and why it’s basically DevOps with a sharper purpose) (chris.funderburg.me). Platform Engineering blends DevOps with a product mindset, expanding to data, ML, and AI infrastructure using IaC, CI/CD, Kubernetes, and observability
Startups I Didn't Start: Pipeline Optimizer (faingezicht.com). Explores pipeline optimization in enterprise sales using queuing theory, ROI, and early AI decisioning within RevOps and data engineering
The 1000 commits problem (davekiss.com). AI-assisted velocity outpaces changelog drift, parser assumptions, and release processes across Claude Code, with tools like Deploycast and Driftless tackling automation
Misc engineering truisms (macwright.com). Truisms across JavaScript, math, maps, and engineering, via Tom MacWright’s reflections and examples
🏗️ Serving & ML Platforms
Speed meets scale: Load testing SageMakerAI endpoints with Observe.AI’s testing tool (aws.amazon.com). Load testing SageMaker endpoints with Observe.AI OLAF, using Locust, Docker, and AWS STS to optimize performance and costs
How to choose the right open model for production (together.ai). A practical guide to selecting open-source AI models for production, covering evaluation, benchmarking, licensing, and deployment options
Writing an LLM from scratch, part 29 -- using DistributedDataParallel to train a base model from scratch in the cloud (gilesthomas.com). DistributedDataParallel training of a base model in the cloud using PyTorch, DDP setup, and checkpointing on AWS-like GPUs
AI workflow orchestration: why separate platforms fail (ciq.com). Unified AI workflows integrate training and inference, reducing manual handoffs and accelerating iteration across models and deployments
vLLM Quickstart: High-Performance LLM Serving (glukhov.org). High-throughput LLM serving with vLLM: OpenAI-compatible API, PagedAttention, CUDA optimization, and multi-GPU setups
Why Going Beyond Your Own Hardware to Host AI Might Make Sense (matthiasroder.com). Hosting AI on a server turns models into infrastructure, using Hugging Face for models, tooling, and scaling
🧼 Data Quality Governance
La qualité des données, un problème systémique en apprentissage automatique (ohmybox.info). Data quality in machine learning shapes model performance and governance, with bias, data sparsity, and LLM-scale data challenges
Data quality, a systemic problem in machine learning (ohmybox.info). Data quality challenges in machine learning, from acquisition to validation, biases, sparsity, and the impact on LLMs and SLMs
The “All You Need” Fallacy (zwillgen.com). Five intertwined fallacies in AI—benchmarking, data fairness, Western safety norms, testing, and overreliance on LLMs
Don’t train on this data or what’s a canary string? (stuker.com). Discusses training data leakage, canary strings, and excluding content from LLM pretraining to protect benchmarks and studies
🧱 Data Pipelines Storage
Optimizing data throughput for Postgres snapshots with batch size auto-tuning (xata.io). Automatic batch size tuning boosts Postgres snapshots with pgstream for networked environments and real-world throughput
Polars Pipe Operator in Action. (confessionsofadataguy.com). Explores clean data pipelines in Polars using pipe operator for readable, testable transformations in Python
Preview of dynamical.org Icechunk Zarrs are now listed on the Registry of Open Data on AWS! (dynamical.org). Icechunk Zarrs listed on AWS Registry of Open Data; overview of IFS ENS, GFS, HRRR usage with Icechunk
From Chaos to Scale: Templatizing Spark Declarative Pipelines with DLT-META (databricks.com). Metadata-driven metaprogramming automates Spark Declarative Pipelines with DLT-META for scalable, governance-backed data workflows
A World Without Kafka (ververica.com). Kafka limitations for real-time analytics are explored, with Apache Fluss and Flink as modern alternatives for unified streaming storage
🔁 Python-R Interop
rtopy: an R to Python bridge – novelties (thierrymoudiki.github.io). rtopy enables translating R to Python via an enhanced RBridge and call_r, showcasing SVM with e1071, forecast ARIMA, randomForest, dplyr, and clustering examples in Python
Using neural networks in R is still not obsolete in 2026 (joshuamarie.com). Explores R's neural networks in 2026, focusing on torch, tidymodels, and the kindling package for streamlined deep learning in R
From data to reports missing the potholes (colinpaice.blog). Python processing of z/OS datasets with Pandas, dicts vs. lists, ASCII conversion, and handling mixed record types
⚙️ GPU Performance Engineering
Optimizing Data Transfer in Batched AI/ML Inference Workloads (towardsdatascience.com). In-depth profiling with NVIDIA Nsight Systems on GPU-to-CPU data transfer in PyTorch, using Deeplabv3/ResNet-50 on AWS EC2 g6e with nvtx annotations
Using Gradient Boosting Libraries on MI300X for Financial Risk Prediction (rocm.blogs.amd.com). GPU-accelerated LightGBM and ThunderGBM on MI300X with ROCm for fraud detection and loan-default risk prediction
PyTorch CUDA Graph Capture (leimao.github.io). Luaired overview of PyTorch CUDA Graphs using torch.cuda.graph and torch.cuda.make_graphed_callables in training a model with MLP, profiling, and replay
High-Resolution Weather Forecasting with StormCast on AMD Instinct GPU Accelerators (rocm.blogs.amd.com). High-Resolution weather forecasting with StormCast on AMD Instinct GPUs using Earth2Studio, Python, and ROCm, exploring convection, HRRR data, and visualization
20,000 healthy GPUs (modal.com). Modal describes monitoring 20,000+ GPUs across AWS, GCP, Azure, and OCI with passive/active checks, DCGM, and image automation
📱 Edge Model Deployment
Fine-tuning vision-language models on memory-constrained devices (amazon.science). Hybrid sharpness-aware zeroth-order optimization (SharpZO) enables edge devices to fine-tune vision-language models using forward passes, improving accuracy and convergence
ONNX Multi-platform benchmark (karnwong.me). ONNX multi-platform benchmark compares inference across PC, Android, and Raspberry Pi using forest regression models in Python, Rust, Web, Kotlin, and Flutter
A 30B Qwen model walks into a Raspberry Pi and runs in real time (byteshape.com). ByteShape optimizes Qwen3-30B-A3B-Instruct-2507 for Raspberry Pi, CPUs, and GPUs using ShapeLearn to maximize TPS and quality
Accelerating LLM and VLM Inference for Automotive and Robotics with NVIDIA TensorRT Edge-LLM (developer.nvidia.com). NVIDIA TensorRT Edge-LLM enables real-time LLM/VLM inference on automotive/robotics platforms using Jetson Thor and DRIVE AGX Thor with EAGLE-3 decoding
State of Edge AI on Microcontrollers in 2026 (shawnhymel.com). Edge AI on microcontrollers (TinyML) matures with vendor pipelines, open runtimes, and end-to-end platforms guiding practical deployments
🔎 Vector Retrieval Scaling
HNSW at Scale: Why Your RAG System Gets Worse as the Vector Database Grows (towardsdatascience.com). HNSW-based ANN retrieval, recall, and RAG scaling using Python Faiss, LAION data, and CLIP embeddings
Semantic Search Without Embeddings (softwaredoug.com). Explores semantic search beyond embeddings, comparing simple feature-based similarity, taxonomy-driven retrieval, and LLM-augmented categorization
Using MLflow-RAGAS Integration Without Tracing (safjan.com). Guides static-data and simple predict_fn approaches to evaluate RAG pipelines with MLflow RAGAS using Python
Filtered ANN Search With Composite Vector Indexes (couchbase.com). Filtered ANN search using composite vector indexes with Couchbase: combining filters, vectors, and performance techniques in practical examples
🖼️ Multimodal Production Pipelines
Image to 3D Mesh Generation with Detection Grounding (debuggercafe.com). Image to 3D mesh generation with detection grounding using Qwen3-VL, BiRefNet, and Hunyuan3D demonstrated in a local, VRAM-aware pipeline
Image Summarizer : A Constrained Fuzzy Image RAG Engine (mostlylucid.net). Constrained Fuzziness for image analysis: a deterministic RAG pipeline in C#, Python-like pseudocode, with wave architecture and Vision LLM gating
Part 4.1: The Three-Tier OCR Pipeline (mostlylucid.net). Constrained Fuzzy OCR uses a three-tier pipeline (Tesseract, Florence-2 ONNX, Vision LLM) with ONNX locally, multi-frame voting, text-only filmstrip optimization, and cost-aware routing
Annotating Edge Cases in Visual Category Datasets (learningspiral.ai). Annotating edge cases in visual datasets improves model robustness and real-world AI performance across domains
🎲 Bayesian Bandits Decisions
Taming P99s in OpenFGA: How We Built a Self-Tuning Strategy Planner (auth0.com). Self-tuning strategy planner using Thompson Sampling reduces P99 latency in OpenFGA for multi-tenant environments with Bayesian priors and adaptive strategy selection
Why We Use Separate Tech Stacks for Personalization and Experimentation (engineering.atspotify.com). Spotify separates ML personalization stacks from experimentation, using contextual bandits within ML and a separate Confidence experimentation toolchain
Bayesian Decision Analysis (allendowney.com). Bayesian decision analysis with PyMC: A hands-on workshop on A/B testing, uncertainty, and adaptive methods
Decaying Bayesian Updating for Non-stationary Time Series Models (austinrochford.com). Decaying Bayesian updating for non-stationary time series using beta-binomial models in Python to adapt to changing p
📏 Measurement Under Uncertainty
The Hidden Math That Shapes Our Health: 5 Surprising Truths from Data Science (federicagazzelloni.com). Five health-data truths, DALYs, TMREL, AI in epidemics, risk perception, and spatial Kriging mapping in public health
Why Benchmarks Fail in Analog Systems (blog.mycal.net). Analog-style AI: benchmarks fail for context-sensitive LLMs; evaluate under constraints rather than single scores, and embrace variability inspired by weather forecasting
Improve quality by learning from your process data – a review of “Twenty Things You Need to Know” by Donald J. Wheeler (testandanalysis.home.blog). Review of Twenty Things You Need to Know by Wheeler, using process behaviour charts and Assignable Causes to analyze variation in quality data
🧮 Practical Algorithms Structures
Les filtres de Bloom dans Parquet (icem7.fr). Bloom filters in Parquet, data skipping, DuckDB, Sirene dataset, performance trade-offs in columnar storage
Reversing YouTube's Most Replayed (priyavr.at). Interactive exploration of YouTube's Most Replayed graph, using JS, SVG, Bezier curves, and a差 difference array approach
sorted string tables (SST) from first principles (williballenthin.com). SST data structures, first principles, algorithmic design, and implementation considerations explored with readers for efficient string table management
🧠 Theory Representations
Theoretical predictions on the sample efficiency of training policies and activation monitors (lesswrong.com). Theoretical sample-efficiency predictions for training policies and activation monitors in AI risk, with SGD, VC theory, and activation monitors
What Does the Linear Representation Hypothesis Even Mean? (alok.github.io). Explores what a 'linear representation' means in AI, comparing word analogies, logistic probes, and steering vectors
NeuralGCM harnesses AI to better simulate long-range global precipitation (research.google). NeuralGCM blends physics with AI, trained on NASA satellite data, to improve global precipitation forecasts and diurnal cycles
Five Research Papers Accepted to FOCS 2025 (cs.columbia.edu). Columbia researchers present FOCS 2025 papers on learning k-term DNFs, NN search embeddings, generalized flow, cell-probe lower bounds, and Kronecker circuit theory
📚 Academic Research
Implicit bias as a Gauge correction: Theory and Inverse Design (arxiv:stat). Reframes SGD implicit bias as geometric gauge correction from parameter symmetries, yielding closed-form stationary preference. Helps engineers predict or inverse-design biases like sparsity systematically today
How to Set the Learning Rate for Large-Scale Pre-training? (arxiv:cs). Derives scaling law to extrapolate optimal learning rates from cheap runs and tests μTransfer for huge MoE pretraining. Offers practical LR guidelines for large training
XBTorch: A Unified Framework for Modeling and Co-Design of Crossbar-Based Deep Learning Accelerators (arxiv:cs). XBTorch adds PyTorch-integrated simulation for analog crossbar accelerators, enabling device-to-model co-design, hardware-aware training, and fault studies. Useful for energy/latency engineers, reproducible research and prototyping pipelines
Rapid Augmentations for Time Series (RATS): A High-Performance Library for Time Series Augmentation (arxiv:cs). RATS is a Rust-backed time-series augmentation library with Python bindings, delivering major speed and memory wins over tsaug. Great for production augmentation pipelines benchmarking too
Scalable Ultra-High-Dimensional Quantile Regression with Genomic Applications (arxiv:stat). FS-QRPPA makes penalized quantile regression scalable when p≫n via feature-splitting proximal updates with proven Q-linear convergence. Enables genomic prediction intervals using parallel fsQRPPA R package
Add a comment: