The ML Engineer 18-11-2025
improved shuffling at Spotify, Google's deforestation-detecting AI, advice on working with vector search & graphs
🔧 Company Engineering Blogs
Shuffle: Making Random Feel More Human (engineering.atspotify.com). Spotify explains Fewer Repeats shuffle: multiple random sequences scored by freshness using recent plays to bias results
Normalized Entropy or Apply Rate? Evaluation Metrics for Online Modeling Experiments (engineering.indeedblog.com). Normalized Entropy and Apply Rate metrics for online modeling experiments in ranking, bid-scaling, and thresholding at Indeed with calibration and business-aligned evaluation
Separating natural forests from other tree cover with AI for deforestation-free supply chains (research.google). AI model (MTSViT) analyzes multi-temporal Sentinel-2 data to separate natural forests from planted forests, releasing Natural Forests of the World 2020 with 10m resolution
Building The Intent Engine: How Instacart is Revamping Query Understanding with LLMs (tech.instacart.com). Instacart uses LLMs for query understanding: taxonomy classification, query rewrites, and semantic role labeling with offline/real-time hybrid systems
Explaining Financial Models by Deleting Transactions (building.nubank.com). LOTO method explains transaction-level impact on transformer-based financial models, with SHAP context and global/local analysis
📝 Essays & Perspectives
A Decade of Writing Stuff That People, To My Surprise, Actually Read (rachel.fast.ai). Diverse blog posts on AI ethics, education, and tech culture, plus practical ML guidance and fast.ai community highlights
Verifiability (karpathy.bearblog.dev). Verifiability guides AI automation: from Software 1.0 to 2.0, highlighting verifiable tasks and the jagged frontier
Gradient Boosting Machines (GBMs) in the Age of LLMs and ChatGPT (r-consortium.org). GBMs remain strong for structured/tabular data; XGBoost, LightGBM, and h2o prevail amid LLM-era shifts in AI and data analysis
What Are World Models? AI's Path to Understanding Reality (rewire.it). Explores world models in AI, from Dreamer and PlaNet to Genie and Sora, with DeepMind, OpenAI, and NVIDIA's Cosmos as context
One Weird Hashing Trick (notes.hella.cheap). Discusses bag-of-ngrams hashing, randomized projections, and fast single-pass feature hashing in C/C++ style code
Paper Review: HunyuanImage 3.0 Technical Report (andlukyane.com). Review of HunyuanImage 3.0: Mixture-of-Experts architecture, multimodal training, and SSAE evaluation for text–image generation
Why I like working at Gainwell (andrewpwheeler.com). Gainwell data science team lead discusses roles, tech mix (Python, SQL, ML), on-prem vs cloud, and cross-team impact on Medicaid fraud/waste/abuse
🛠️ Data Engineering & DevOps
Python Rgonomics: User-defined functions in polars (emilyriederer.com). Python Polars UDFs and map/map_batches patterns for expressions, aggregation, and complex objects with Python integration
Transforming Data Engineering with DevOps on the Databricks Platform (cevo.com.au). DevOps for Data Engineering on Databricks using modular code, automated testing, CI/CD, Git Folders, and Databricks Asset Bundles
Making Data Engineering Safe for Automation and Agents (thedataexchange.media). Ciro Greco discusses a code-first, transactional lakehouse for AI agents, Git-for-data principles, and safe automation with Bauplan
Lazy Execution with Polars and DuckDB (confessionsofadataguy.com). Explores lazy execution in Polars and DuckDB for large datasets, contrasts with Spark, and advocates Python-based streaming data processing
Unlocking the future of the Automotive Industry (Part 2): Implementing Scalable Geospatial Analytics & AI (databricks.com). Scalable geospatial analytics with Databricks using H3, Liquid Clustering, Unity Catalog, AutoML, and synthetic data for automotive mobility solutions
🚀 Hardware & Performance
NVFP4 GEMV (veitner.bearblog.dev). Introduction to CuTeDSL for Blackwell NVFP4, FP8 scaling, and a GEMV kernel in CUDA-style DSL with tiling and FP32 accumulation
TPU Monitoring in MLFlow (nmilosev.svbtle.com). Using TPU system metrics in MLFlow with tpu-info to monitor chips and usage during training
Accelerating Vector Search: hipVS and hipRAFT on AMD (rocm.blogs.amd.com). AMD hipVS and hipRAFT on ROCm enable GPU-accelerated vector search with IVF-Flat, IVF-PQ, and CAGRA using Python, C++, and Rust in Jupyter notebooks
Reproducing AMD MLPerf Training v5.1 Submission Result (rocm.blogs.amd.com). AMD MLPerf Training v5.1 results reproduce Llama 2 70B LoRA finetuning and Llama 3.1 8B pretraining on MI300X, MI325X, MI350X, MI355X using ROCm 7.0+, with Docker workflows and RCP-based validation
NVIDIA Blackwell Architecture Sweeps MLPerf Training v5.1 Benchmarks (developer.nvidia.com). NVIDIA Blackwell outperforms MLPerf Training v5.1 benchmarks with NVFP4, FP8/F P8 precision, full-stack optimizations, and 512–5,120 GPUs
Scaling HNSWs (antirez.com). Redis Vector Sets, HNSWs, 8-bit quantization, threading, memory scaling, and JSON filtering insights for fast vector search
🌐 Graphs & Spatial
You’ve made the wrong connection (richardbrath.wordpress.com). Force-directed graphs: edge routing, bundling, and gradient tips explored with LLMs to reduce long-edge salience and ambiguity
Graph Neural Nets for Spatial Data Science (josiahparry.com). Graph Neural Nets for Spatial Data Science uses R, igraph, sfdep, spdep, dplyr, ggplot2 and torchgnn to connect spatial lags, GCNs, and SLX modeling
Putting 50 years of neuroscience on the map (thetransmitter.org). Explore five decades of neuroscience topics via a semantic map built from ~350,000 abstracts, revealing trends in Alzheimer’s, Parkinson’s, depression, neuroinflammation, and deep learning
Experiments with THRML (madebynathan.com). Experiments with THRML: exploring THRML library, AI-assisted maze solving, DFS variants, and playful simulations using GPU prototypes
🧠 ML Methods & Training
Implementing NeRFs (Neural Radiance Fields) from scratch (vkethana.com). Explains implementing NeRFs from scratch, camera calibration, multi-view neural radiance fields, PyTorch pipelines, and training insights
Differentially private machine learning at scale with JAX-Privacy (research.google). JAX-Privacy 1.0 release enables scalable, DP-trained models on JAX with clipping, noise, auditing, and distributed training
Expert-Level Feature Engineering: Advanced Techniques for High-Stakes Models (machinelearningmastery.com). Three expert-level feature engineering techniques—counterfactual features, domain-constrained representations, and causal-invariant features—for robust, explainable high-stakes models using Python (NumPy, pandas, scikit-learn) and PyTorch
Disentangled Deep Smoothed Bootstrap for Fair Imbalanced Regression (freakonometrics.hypotheses.org). Disentangled deep smoothed bootstrap using VAEs and latent-space bootstrap for fair imbalanced regression on tabular data
I Measured Neural Network Training Every 5 Steps for 10,000 Iterations (towardsdatascience.com). High-resolution training dynamics reveal expanding representational space in DNNs using ndtracker tooling and 5-step checkpoints
Notes - Computer Vision MT25, Video (ollybritton.com). Short notes on video processing, training tricks, early/late fusion, and optical flow in computer vision MT25 tutorials
🧮 Math & Numerical
New Nikodym set constructions over finite fields (terrytao.wordpress.com). Terence Tao uploads arXiv paper on Nikodym sets over finite fields using AlphaEvolve, DeepThink; discusses constructions, probabilistic methods, and surface removals
(2025-11) All Polynomial Generators Preserve Distance with Mutual Correlated Agreement (cronokirby.com). Polynomial generators preserve distance with mutual correlated agreement across linear codes and Reed–Solomon codes
Paper - ADADELTA, An Adaptive Learning Rate Method (ollybritton.com). Adaptive learning rate method Adadelta implemented and analyzed by Zeiler (2012) for neural networks, with RMS-like updates and no manual tuning
Approximating arbitrary complex-valued continuous functions (lesswrong.com). Complex-valued function approximation via uniform algebras, density operators, and tensor construction in functional analysis
QR Decomposition (Householder Algorithm) For Non-Square Matrices Using C# (jamesmccaffreyblog.com). QR decomposition using Householder on non-square matrices in C# for pseudo-inverse computations
Matrix Inverse From Scratch Using C# – Five Different Techniques (jamesmccaffreyblog.com). C# matrix inversion techniques explored: LUP, QR, SVD, Newton-Pan, and Cholesky on a sample matrix and a positive-definite case
📚 Academic Research
LeJEPA: Provable and Scalable Self-Supervised Learning Without the Heuristics (arxiv:stat). Presents LeJEPA: a theoretically grounded JEPA objective using Sketched Isotropic Gaussian Regularization to enforce isotropic Gaussian embeddings, eliminating many SSL heuristics. Engineers gain a principled, scalable SSL method with provable downstream guarantees and simple implementation
FairReweighing: Density Estimation-Based Reweighing Framework for Improving Separation in Fair Regression (arxiv:cs). TabPFN-2.5 scales a tabular foundation model to 50k×2k feature datasets, adds calibration/inference and distillation to compact MLP/tree predictors for low-latency production. Practical for engineers needing strong, deployable tabular baselines and distillation recipes
MoFa: A Unified Performance Modeling Framework for LLM Pretraining (arxiv:cs). MoFa unifies multi-dimensional optimization features and fault-tolerance into a cost+reliability model to predict and tune large-scale pretraining performance. Valuable for engineers planning cluster-scale pretraining and designing parallelization/checkpointing strategies
LLM Inference Beyond a Single Node: From Bottlenecks to Mitigations with Fast All-Reduce Communication (arxiv:cs). Analyzes multi-node LLM inference bottlenecks and introduces NVRAR, an NVSHMEM-based hierarchical all-reduce that cuts small-message latency and end-to-end decode-heavy inference time. Engineers running multi-node inference get tested mitigations and performance engineering guidance
MOON Embedding: Multimodal Representation Learning for E-commerce Search Advertising (arxiv:cs). MOON is a three-stage multimodal representation pipeline (pretrain/post-train/application) yielding large online CTR gains and a production-ready infrastructure plus distillation. Important for engineers building multimodal embeddings tied to online business metrics and deployment