The ML Engineer 25-11-2025

        November 25, 2025

The ML Engineer 25-11-2025
ML infrastructure, geospatial projects, MLOps & production

            Conferences
Announcing AI in Production 2026: A New Conference for AI and ML Practitioners (jumpingrivers.com). New AI in Production 2026 conference focuses on practical AI/ML deployment, with streams on engineering and model development, workshops, and real-world lessons  

🔧 Company Engineering Blogs
Efficient Optimization With Ax, an Open Platform for Adaptive Experimentation (engineering.fb.com). Ax 1.0 open-source platform uses Bayesian optimization (BoTorch) for adaptive experimentation across AI models and systems, with Meta-driven real-world deployments and multi-objective, constrained optimization  
LyftLearn Evolution: Rethinking ML Platform Architecture (eng.lyft.com). LyftLearn evolves to a hybrid architecture, using SageMaker for offline training and Kubernetes for online serving, with compatibility layers and AWS integration  
Background Coding Agents: Context Engineering (Part 2) (engineering.atspotify.com). Context engineering for autonomous coding agents using Claude Code, prompts, and limited tools to migrate large codebases at Spotify  
Zoomer: Powering AI Performance at Meta’s Scale Through Intelligent Debugging and Optimization (engineering.fb.com). Meta's Zoomer automates AI performance profiling and optimization across training and inference with Kineto, DCGM, Strobelight, and dyno telemetry  
Evolving GitHub Copilot’s next edit suggestions through custom model training (github.blog). NES next-edit suggestions in GitHub Copilot trained with custom data, RL, and in-editor UX for VS Code  

🛠️ ML Platform Infrastructure
How Upgrading to Databricks Runtime 16.4 sped up our Python script by 10x (blog.devgenius.io). Databricks Runtime 16.4 upgrade speeds up a large Python script by 10x, revealing runtime bottlenecks and logging issues  
TPU Monitoring in MLFlow Part 2 (nmilosev.svbtle.com). Cleaner approach to monitor Google's TPU performance in MLFlow using a background multiprocessing process  
Announcing the Public Preview of Distributed ML on Serverless and Standard Clusters (databricks.com). Distributed ML on serverless and standard Databricks clusters using Spark MLlib, Optuna, MLflow Spark, and Joblib Spark with FGAC and Lakeguard  
HyperPod enhances ML infrastructure with security and storage (aws.amazon.com). SageMaker HyperPod adds CMK encryption and EBS CSI driver for scalable, secure FM training on EKS with CMK, AMI, and storage options  

🛰️ Geospatial ML Projects
Quick change detection with geospatial embeddings (fnands.com). Quick change detection with geospatial embeddings using Google Earth Engine, Python-like code, and Elastic Embeddings for Berlin 2017–2024  
Antenna: AI-Powered Entomology (escience.washington.edu). Antenna uses AI-powered camera traps and scalable ML pipelines to study entomology with a pull-based backend across multi-GPU/CPU systems  
Deploy geospatial agents with Foursquare Spatial H3 Hub and Amazon SageMaker AI (aws.amazon.com). Deploy geospatial agents using Foursquare Spatial H3 Hub and SageMaker AI to enable nontechnical experts to answer complex spatial questions  

🧰 Projects & Tutorials
Machines Reading Latin Epigraphy (electricarchaeology.ca). Retraining a spaCy model on Latin inscriptions using EDH data, with synthetic-augmented training and LLM-assisted annotation  
Grace Blackwell Desktop Supercomputer: First Impressions (jasoneckert.github.io). Grace Blackwell desktop AI supercomputer with GB10, GNOME on Linux, SSH headless setup, JupyterLab, and DGX Dashboard demonstrations  
Two New tidymodels Packages (tidyverse.org). Two new tidymodels packages filtro and important introduce feature selection tools for tidymodels workflows in R  
Cross-Modal Embeddings: Bridging AI Modalities (glukhov.org). Cross-modal embeddings, CLIP, ImageBind, and contrastive learning for unified text, image, and audio representations in Python  
TIL #137 – Inline SVGs in Jupyter notebooks (mathspp.com). Inline SVG rendering in Jupyter notebooks using Markdown data URLs and SVG markup techniques  

🔧 MLOps & Production
DySec: Is a Python package Actually a Hacker Trap? (nocomplexity.com). Explores DySec ML-based dynamic analysis for PyPI malware detection in Python using ML, ML/AI tooling, and security reviews  
Bridging Vibe Coding to Production with MCP (anna.kiwi). Explores MCP-driven tooling to bridge vibe coding to production using Chrome DevTools MCP, Context7, and Playwright with Anna McPhee  
Go Microservices for AI/ML Orchestration (glukhov.org). Go microservices for AI/ML orchestration using patterns like event-driven choreography, centralized orchestration, CQRS, and SAGAs with Go, Temporal, gRPC, and Dockerized Python models  
Cooking with DiSE (Part 2): Graduated Apprenticeships - Training Workflows to Run Without a Safety Net (mostlylucid.net). Graduated Apprenticeships for AI workflows: from supervised to independent, with tiered monitoring, drift detection, and cost-saving evolution  

🧪 ML Science & Research
Explainable Transfer Learning with Residual Attention BiLSTM for Prognosis of Ischemic Heart Disease. [version 3; peer review: 1 approved, 2 approved with reservations] (f1000research.com). Explainable transfer learning with residual attention BiLSTM for IHD prognosis using SHAP and fairness reweighting in a UCI Heart Disease dataset  
🚀 Additional Funding for High-Dimensional Time Series Research (johanneslederer.com). High-dimensional time series research funding, stability, and stationarity, featuring DFG support and Hamburg collaboration  
AI tool can analyse complex cancer images rapidly – offering potential to personalise treatment (cam.ac.uk). AI tool SMMILe rapidly analyzes cancer tissue slides, maps tumor subtypes, and estimates spatial distribution to aid personalised treatment  
How SoraChain AI Uses EigenCompute to Enable Global Federated Learning Infra That Actually Scales (blog.eigencloud.xyz). SoraChain AI uses EigenCompute's verifiable offchain TEEs for global federated learning with privacy-preserving, multi-client orchestration  

⚡ Training & Optimization
Pretraining at home: 20B tokens from 222 hours to 12 (hackbot.dad). How to pretrain a 1B Llama-3.2 model on 20B tokens in under 12 hours using BF16, Flash Attention, and distributed data parallel training  
'Introducing Miles — RL Framework To Fire Up Large-Scale MoE Training' (lmsys.org). Miles enables large-scale MoE reinforcement learning for enterprise workloads, building on slime with improved on-policy, memory, and production-grade features  
'Unified FP8: Moving Beyond Mixed Precision for Stable and Accelerated MoE RL' (lmsys.org). Unified FP8 enables end-to-end FP8 training and sampling in RL for MoE models, boosting stability and speed with TE integration  

📐 Math & Linear Algebra
Ten Recent Questions for ChatGPT (gilkalai.wordpress.com). Ten questions for ChatGPT span combinatorics, topology, representation theory, and travel ideas, with AI-informed discussion and personal anecdotes  
Manifold Learning: A Friendly Guide to Understanding High-Dimensional Magic (howtolearnmachinelearning.com). Explains manifold learning, t-SNE, UMAP, Isomap, LLE, PCA, with applications in visualization and embeddings  
A New Bridge Links the Strange Math of Infinity to Computer Science (quantamagazine.org). Descriptive set theory meets computer science as Bernshteyn links infinite graphs coloring to Lebesgue-measurable algorithms  
Notes - NLA MT25, Power method (ollybritton.com). Power method notes cover shifted inverse method, Rayleigh quotient iteration, eigenvalue convergence rates, and symmetry advantages  
Notes - NLA MT25, Eigenvalue problems (ollybritton.com). Overview of generalized eigenvalue problems, power method, QR algorithm, and related NLA MT25 topics in numerical linear algebra  
Linklog | Understanding Isomap | alechelbling.com (alechelbling.com). Visual intro to Isomap: graph-based geodesic distances, epsilon and kNN graphs, MDS embedding, PCA equivalence, with discussions on limitations  
Linear Regression with Pseudo-Inverse Training Via QR-Householder Using C# (jamesmccaffreyblog.com). Linear regression with pseudo-inverse training via QR-Householder in C# demonstrated on synthetic data  
Matrix Pseudo-Inverse Using QR (Householder) With C# (jamesmccaffreyblog.com). Moore-Penrose pseudo-inverse via QR (Householder) in C# with a demo and NumPy validation  

📚 Academic Research
Training Foundation Models on a Full-Stack AMD Platform: Compute, Networking, and System Design (arxiv:cs). First large-scale MoE pretraining on AMD MI300X/Pollara, with microbenchmarks, sizing rules, and a ZAYA1 case study. Engineers get actionable system and model design guidance for non‑NVIDIA stacks  
SilverTorch: A Unified Model-based System to Democratize Large-Scale Recommendation on GPUs (arxiv:cs). SilverTorch replaces CPU indexing with GPU-native Bloom indexing and fused Int8 ANN kernels to serve large recommender fleets, cutting latency and cost dramatically. Practical blueprint for scalable real‑world serving  
iLTM: Integrated Large Tabular Model (arxiv:cs). iLTM is a pretrained tabular foundation model unifying tree embeddings, hypernetworks, MLPs, and retrieval, beating GBDTs across heterogeneous datasets. Important for engineers tackling tabular transfer and production ML  
MicroMoE: Fine-Grained Load Balancing for Mixture-of-Experts with Token Scheduling (arxiv:cs). Introduces MicroEP token scheduling and MicroMoE system to achieve near-optimal per-microbatch load balance in MoE training, improving throughput up to ~47.6%. Essential for engineers optimizing large MoE pipelines  
Learning with Statistical Equality Constraints (arxiv:cs). Provides generalization theory and a practical algorithm for equality‑constrained learning, avoiding brittle penalty tuning. Valuable to mathematically minded ML engineers working on constrained optimization and fairness

👋 Before you go...
I've got a big favor to ask - keeping Blaze running isn't expensive, but it does all add up, so I'm asking readers like you to help, if you can, by joining the Patreon page. Nothing flashy, just a way for folks who find value in these newsletters to chip in a little each month. 
If you are getting value from blaze, checking this out would mean the absolute world. But if you can't contribute, no worries - the newsletters keep coming either way. Thanks for reading and being part of this nerdy corner of the internet. All the best for the coming week - Alastair.

                            Don't miss what's next. Subscribe to The ML Engineer:

          Add a comment:

                Share this email:

                                Share on LinkedIn

                                Share on Hacker News

                                Share on Mastodon

                                Share on Bluesky