The ML Engineer logo

The ML Engineer

Subscribe
Archives
November 18, 2025

The ML Engineer 18-11-2025

improved shuffling at Spotify, Google's deforestation-detecting AI, advice on working with vector search & graphs

🔧 Company Engineering Blogs

Shuffle: Making Random Feel More Human (engineering​.atspotify​.com). Spotify explains Fewer Repeats shuffle: multiple random sequences scored by freshness using recent plays to bias results

Normalized Entropy or Apply Rate? Evaluation Metrics for Online Modeling Experiments (engineering​.indeedblog​.com). Normalized Entropy and Apply Rate metrics for online modeling experiments in ranking, bid-scaling, and thresholding at Indeed with calibration and business-aligned evaluation

Separating natural forests from other tree cover with AI for deforestation-free supply chains (research​.google). AI model (MTSViT) analyzes multi-temporal Sentinel-2 data to separate natural forests from planted forests, releasing Natural Forests of the World 2020 with 10m resolution

Building The Intent Engine: How Instacart is Revamping Query Understanding with LLMs (tech​.instacart​.com). Instacart uses LLMs for query understanding: taxonomy classification, query rewrites, and semantic role labeling with offline/real-time hybrid systems

Explaining Financial Models by Deleting Transactions (building​.nubank​.com). LOTO method explains transaction-level impact on transformer-based financial models, with SHAP context and global/local analysis

📝 Essays & Perspectives

A Decade of Writing Stuff That People, To My Surprise, Actually Read (rachel​.fast​.ai). Diverse blog posts on AI ethics, education, and tech culture, plus practical ML guidance and fast.ai community highlights

Verifiability (karpathy​.bearblog​.dev). Verifiability guides AI automation: from Software 1.0 to 2.0, highlighting verifiable tasks and the jagged frontier

Gradient Boosting Machines (GBMs) in the Age of LLMs and ChatGPT (r-consortium​.org). GBMs remain strong for structured/tabular data; XGBoost, LightGBM, and h2o prevail amid LLM-era shifts in AI and data analysis

What Are World Models? AI's Path to Understanding Reality (rewire​.it). Explores world models in AI, from Dreamer and PlaNet to Genie and Sora, with DeepMind, OpenAI, and NVIDIA's Cosmos as context

One Weird Hashing Trick (notes​.hella​.cheap). Discusses bag-of-ngrams hashing, randomized projections, and fast single-pass feature hashing in C/C++ style code

Paper Review: HunyuanImage 3.0 Technical Report (andlukyane​.com). Review of HunyuanImage 3.0: Mixture-of-Experts architecture, multimodal training, and SSAE evaluation for text–image generation

Why I like working at Gainwell (andrewpwheeler​.com). Gainwell data science team lead discusses roles, tech mix (Python, SQL, ML), on-prem vs cloud, and cross-team impact on Medicaid fraud/waste/abuse

🛠️ Data Engineering & DevOps

Python Rgonomics: User-defined functions in polars (emilyriederer​.com). Python Polars UDFs and map/map_batches patterns for expressions, aggregation, and complex objects with Python integration

Transforming Data Engineering with DevOps on the Databricks Platform (cevo​.com​.au). DevOps for Data Engineering on Databricks using modular code, automated testing, CI/CD, Git Folders, and Databricks Asset Bundles

Making Data Engineering Safe for Automation and Agents (thedataexchange​.media). Ciro Greco discusses a code-first, transactional lakehouse for AI agents, Git-for-data principles, and safe automation with Bauplan

Lazy Execution with Polars and DuckDB (confessionsofadataguy​.com). Explores lazy execution in Polars and DuckDB for large datasets, contrasts with Spark, and advocates Python-based streaming data processing

Unlocking the future of the Automotive Industry (Part 2): Implementing Scalable Geospatial Analytics & AI (databricks​.com). Scalable geospatial analytics with Databricks using H3, Liquid Clustering, Unity Catalog, AutoML, and synthetic data for automotive mobility solutions

🚀 Hardware & Performance

NVFP4 GEMV (veitner​.bearblog​.dev). Introduction to CuTeDSL for Blackwell NVFP4, FP8 scaling, and a GEMV kernel in CUDA-style DSL with tiling and FP32 accumulation

TPU Monitoring in MLFlow (nmilosev​.svbtle​.com). Using TPU system metrics in MLFlow with tpu-info to monitor chips and usage during training

Accelerating Vector Search: hipVS and hipRAFT on AMD (rocm​.blogs​.amd​.com). AMD hipVS and hipRAFT on ROCm enable GPU-accelerated vector search with IVF-Flat, IVF-PQ, and CAGRA using Python, C++, and Rust in Jupyter notebooks

Reproducing AMD MLPerf Training v5.1 Submission Result (rocm​.blogs​.amd​.com). AMD MLPerf Training v5.1 results reproduce Llama 2 70B LoRA finetuning and Llama 3.1 8B pretraining on MI300X, MI325X, MI350X, MI355X using ROCm 7.0+, with Docker workflows and RCP-based validation

NVIDIA Blackwell Architecture Sweeps MLPerf Training v5.1 Benchmarks (developer​.nvidia​.com). NVIDIA Blackwell outperforms MLPerf Training v5.1 benchmarks with NVFP4, FP8/F P8 precision, full-stack optimizations, and 512–5,120 GPUs

Scaling HNSWs (antirez​.com). Redis Vector Sets, HNSWs, 8-bit quantization, threading, memory scaling, and JSON filtering insights for fast vector search

🌐 Graphs & Spatial

You’ve made the wrong connection (richardbrath​.wordpress​.com). Force-directed graphs: edge routing, bundling, and gradient tips explored with LLMs to reduce long-edge salience and ambiguity

Graph Neural Nets for Spatial Data Science (josiahparry​.com). Graph Neural Nets for Spatial Data Science uses R, igraph, sfdep, spdep, dplyr, ggplot2 and torchgnn to connect spatial lags, GCNs, and SLX modeling

Putting 50 years of neuroscience on the map (thetransmitter​.org). Explore five decades of neuroscience topics via a semantic map built from ~350,000 abstracts, revealing trends in Alzheimer’s, Parkinson’s, depression, neuroinflammation, and deep learning

Experiments with THRML (madebynathan​.com). Experiments with THRML: exploring THRML library, AI-assisted maze solving, DFS variants, and playful simulations using GPU prototypes

🧠 ML Methods & Training

Implementing NeRFs (Neural Radiance Fields) from scratch (vkethana​.com). Explains implementing NeRFs from scratch, camera calibration, multi-view neural radiance fields, PyTorch pipelines, and training insights

Differentially private machine learning at scale with JAX-Privacy (research​.google). JAX-Privacy 1.0 release enables scalable, DP-trained models on JAX with clipping, noise, auditing, and distributed training

Expert-Level Feature Engineering: Advanced Techniques for High-Stakes Models (machinelearningmastery​.com). Three expert-level feature engineering techniques—counterfactual features, domain-constrained representations, and causal-invariant features—for robust, explainable high-stakes models using Python (NumPy, pandas, scikit-learn) and PyTorch

Disentangled Deep Smoothed Bootstrap for Fair Imbalanced Regression (freakonometrics​.hypotheses​.org). Disentangled deep smoothed bootstrap using VAEs and latent-space bootstrap for fair imbalanced regression on tabular data

I Measured Neural Network Training Every 5 Steps for 10,000 Iterations (towardsdatascience​.com). High-resolution training dynamics reveal expanding representational space in DNNs using ndtracker tooling and 5-step checkpoints

Notes - Computer Vision MT25, Video (ollybritton​.com). Short notes on video processing, training tricks, early/late fusion, and optical flow in computer vision MT25 tutorials

🧮 Math & Numerical

New Nikodym set constructions over finite fields (terrytao​.wordpress​.com). Terence Tao uploads arXiv paper on Nikodym sets over finite fields using AlphaEvolve, DeepThink; discusses constructions, probabilistic methods, and surface removals

(2025-11) All Polynomial Generators Preserve Distance with Mutual Correlated Agreement (cronokirby​.com). Polynomial generators preserve distance with mutual correlated agreement across linear codes and Reed–Solomon codes

Paper - ADADELTA, An Adaptive Learning Rate Method (ollybritton​.com). Adaptive learning rate method Adadelta implemented and analyzed by Zeiler (2012) for neural networks, with RMS-like updates and no manual tuning

Approximating arbitrary complex-valued continuous functions (lesswrong​.com). Complex-valued function approximation via uniform algebras, density operators, and tensor construction in functional analysis

QR Decomposition (Householder Algorithm) For Non-Square Matrices Using C# (jamesmccaffreyblog​.com). QR decomposition using Householder on non-square matrices in C# for pseudo-inverse computations

Matrix Inverse From Scratch Using C# – Five Different Techniques (jamesmccaffreyblog​.com). C# matrix inversion techniques explored: LUP, QR, SVD, Newton-Pan, and Cholesky on a sample matrix and a positive-definite case

📚 Academic Research

LeJEPA: Provable and Scalable Self-Supervised Learning Without the Heuristics (arxiv:stat). Presents LeJEPA: a theoretically grounded JEPA objective using Sketched Isotropic Gaussian Regularization to enforce isotropic Gaussian embeddings, eliminating many SSL heuristics. Engineers gain a principled, scalable SSL method with provable downstream guarantees and simple implementation

FairReweighing: Density Estimation-Based Reweighing Framework for Improving Separation in Fair Regression (arxiv:cs). TabPFN-2.5 scales a tabular foundation model to 50k×2k feature datasets, adds calibration/inference and distillation to compact MLP/tree predictors for low-latency production. Practical for engineers needing strong, deployable tabular baselines and distillation recipes

MoFa: A Unified Performance Modeling Framework for LLM Pretraining (arxiv:cs). MoFa unifies multi-dimensional optimization features and fault-tolerance into a cost+reliability model to predict and tune large-scale pretraining performance. Valuable for engineers planning cluster-scale pretraining and designing parallelization/checkpointing strategies

LLM Inference Beyond a Single Node: From Bottlenecks to Mitigations with Fast All-Reduce Communication (arxiv:cs). Analyzes multi-node LLM inference bottlenecks and introduces NVRAR, an NVSHMEM-based hierarchical all-reduce that cuts small-message latency and end-to-end decode-heavy inference time. Engineers running multi-node inference get tested mitigations and performance engineering guidance

MOON Embedding: Multimodal Representation Learning for E-commerce Search Advertising (arxiv:cs). MOON is a three-stage multimodal representation pipeline (pretrain/post-train/application) yielding large online CTR gains and a production-ready infrastructure plus distillation. Important for engineers building multimodal embeddings tied to online business metrics and deployment

Don't miss what's next. Subscribe to The ML Engineer:
Start the conversation:
Bluesky Mastodon LinkedIn
Powered by Buttondown, the easiest way to start and grow your newsletter.