ML Engineering newsletter

                        August 22, 2025

            ML Engineering newsletter

   Blaze Email

               ML Engineering

               2025-08-22

                •  read online

                •  patreon

            🔧 Company Engineering Blogs

             Intern Experience at Lyft

             (eng.lyft.com)

            . Lyft data scientists Morteza Taiebat and Han Gong recount internships on Sustainability and Driver Loyalty teams, using difference-in-differences, hierarchical linear models, CPIDH, causal prediction, and budget optimization for EV adoption, driver productivity, and referral incentives

             LLM Embeddings Explained: A Visual and Intuitive Guide

             (huggingface.co)

            . Explore how LLMs convert text to meaning, covering techniques, embeddings, tools, and visualization in natural language processing

             MLE-STAR: A state-of-the-art machine learning engineering agent

             (research.google)

            . MLE-STAR automates machine learning engineering tasks by leveraging web search, code refinement, and ensemble strategies, achieving high performance in Kaggle competitions

            🎯 Applications & Use Cases

             3D Reconstruction From Public Photos with Machine Learning

             (blog.skz.dev)

            . 3D reconstruction from public photos using DepthPro, camera intrinsics, focal length, depth masks, Open3D visualization, and linear algebra for projecting pixels into 3D space

             Word Embeddings: Theory and Analysis

             (blog.sparsh.dev)

            . Overview of word embeddings, vocabulary discretization, and dense representations; highlights Word2Vec and GloVe, semantic similarity via cosine similarity, analogy examples, subword n-grams, and embedding dimensionality

             Parallelization Strategies in Neural Networks

             (nwktimes.blogspot.com)

            . Data, model, and tensor parallelism explained; 3D parallelism enables scalable AI training across GPUs and nodes, covering FNNs, forward/backward passes, activations, DMA, RDMA, and memory considerations

             Spam Classification with a Fine-Tuned LLM, Part IV: Model Training and Inference

             (rnowling.github.io)

            . Fine-tuning Llama-3.2-1B for spam classification with Hugging Face datasets, transformers, TrainingArguments, Trainer, DataCollatorWithPadding; training, inference, metrics, and performance scaling versus SGD logistic regression

             CrowdStrike’s Approach to Better Machine Learning Evaluation Using Strategic Data Splitting

             (crowdstrike.com)

            . CrowdStrike employs strategic data splitting, like blocking by machine, to prevent data leakage in cybersecurity ML models, ensuring reliable performance against novel threats

             3D Line Drawings

             (amritkwatra.com)

            . Amritansh Kwatra explores 3D line drawings using GANs, 3D Gaussian Splatting, image transformation, and techniques for interactive artistic rendering

            🔧 ML Engineering & Infrastructure

             Translating Cython to Mojo, a first attempt

             (fnands.com)

            . Exploring the translation of Cython code from scikit-learn to Mojo, focusing on DBSCAN's inner loop for improved performance

             The evolution of Grab's machine learning feature store

             (engineering.grab.com)

            . Grab evolves its ML feature store by adopting a feature table architecture with AWS Aurora, enhancing performance and addressing complex data management challenges

             MLOrbs?: MLOps in the database with orbital and dbt

             (emilyriederer.com)

            . MLOps in the analytical database using orbital's sklearn-to-sql and tidymodels, sqlglot, and dbt; churn modeling with IBM telecom data; zero-infrastructure deployment inside dbt pipelines

             An efficient path to production AI: Kakao’s journey with JAX and Cloud TPUs

             (cloud.google.com)

            . Kakao shifts to Google Cloud TPUs and JAX, detailing data pipelines with Grain, XPK on Kubernetes, MaxText customization, multi-source data blends, and MoE training on Kanana models

             Avalanche stack and real-time streaming applications at Nu

             (building.nubank.com)

            . Nubank's Avalanche stack enables real-time analytics with Kubernetes, Kafka, Flink, and Pinot for fraud detection, Autopilot risk calibration, On-Demand Features Handler, and case studies in opportunistic loans

             I built a toy TPU that can do inference and training on the XOR problem

             (tinytpu.com)

            . Tiny-TPU puppet project explores a toy TPU for inference and training on XOR, detailing MLP architecture, XOR data, matrix multiplications, and Verilog-inspired hardware concepts

            🧮 Mathematical Foundations & Theory

             Paying attention to feature distribution alignment

             (alexshtf.github.io)

            . Weighted-orthogonality, Legendre bases, and CDF-based mappings for feature decorrelation; quantile transformers, LegendrePolynomialFeatures, and pipelines in SciPy/Scikit-Learn with simulated distributions

             Vandermonde Matrices are Merely Exponentially Ill-Conditioned

             (ethanepperly.com)

            . Gautschi’s bound on Vandermonde conditioning, exponential ill-conditioning, block Krylov iterations, RBKI, elementary symmetric polynomials, Lagrange vs Vandermonde, robust FTA, inverse Vandermonde entries

             Derivatives, Gradients, Jacobians and Hessians – Oh My!

             (blog.demofox.org)

            . Derivatives, gradients, Jacobians and Hessians explained: optimize with gradient descent, compute partial derivatives, build Jacobians, explore determinants, and apply in rendering and ML

             Gaussian Integration by Parts

             (ethanepperly.com)

            . Gaussian integration by parts simplifies computations of moments and eigenvalue estimates in random processes, with insights from Tropp and an application to power iteration

             New Physics-Inspired Proof Probes the Borders of Disorder

             (quantamagazine.org)

            . Band matrix thresholds for localization-delocalization transitions; Yau, Yin, Erdős, Knowles, and collaborators prove delocalization just above predicted band widths in 1D, 2D, 3D

            🧠 Neural Networks & Deep Learning

             modded-nanogpt: Analyzing value-embedding-, UNet-, and x0-lambdas

             (snimu.github.io)

            . modded-nanogpt analyzes value-embedding-, UNet-, and x0-lambdas, detailing three residual-mixing tricks, learned lambda dynamics, layer skipping, training effects, and links to learning-rate and sequence-length schedules, patterns

             Output Latent Spaces in Multihead Attention

             (mccormickml.com)

            . Exploration of shared output latent spaces in Multihead Latent Attention models, enhancing efficiency in deep learning with techniques like SVD and model compression

             Dot Product in the Attention Mechanism

             (eranraviv.com)

            . Explores the dot product in attention mechanisms, vector similarity, cosine calculations, and implications for self-attention in deep learning models

             Exploring fun parts of Neural Network

             (shivasurya.me)

            . Explores neural networks from XOR basics in NumPy to sigmoid versus ReLU, training dynamics, 3Blue1Brown insights, MNIST hints, and implications for security reviews and LLMs

             Deep linear networks

             (danmackinlay.name)

            . Exploration of deep linear networks, gradient flow, singular value dynamics, and gated models with a focus on feature learning and hierarchical structures

             2025-08-19: Paper Summary: Reproducibility Study on Network Deconvolution

             (ws-dl.blogspot.com)

            . Reproducing Ye et al.'s Network Deconvolution: BN replacement with deconvolution layers, 134 tests, 116 reproducible within 10%, soft-reproducibility, ReScience C, GitHub workflow

             Soft Inductive Biases: How Reformulating Constraint Architecture Dissolves Deep Learning’s…

             (medium.com/intuitionmachine)

            . Soft inductive biases reframing constraint architectures in deep learning; PAC-Bayes, compressibility, benign overfitting, double descent, overparametrization, Grothendieck analogy, Andrew Gordon Wilson framework

             Using geometry and physics to explain feature learning in deep neural networks

             (phys.org)

            . Spring-block phenomenology models feature learning in deep neural networks, linking data separation across layers to friction, noise, and training dynamics; revealing relations akin to thermodynamics

            ⚙️ Optimization & Statistical Methods

             Another interesting decision, now for ‘Beyond Nelson-Siegel and splines: A model-agnostic Machine Learning framework for discount curve calibration, interpolation and extrapolation’

             (thierrymoudiki.github.io)

            . Beyond Nelson-Siegel and splines: model-agnostic ML for discount curve calibration, interpolation, extrapolation; linearized bond pricing; regression with Laguerre polynomials, kernels, regularized linear models; critique of reviewer concerns

             Using Imperfect Synthetic Data in Downstream Inference Tasks

             (donskerclass.github.io)

            . Introduces a hyperparameter-free estimator based on generalized method of moments to combine imperfect synthetic data from large language models with real data for valid inference

             Boosting any randomized based learner for regression, classification and univariate/multivariate time series forcasting

             (thierrymoudiki.github.io)

            . Explore boosting randomized learners for regression, classification, and time series forecasting using Python's cybooster library

             Smarter training for smarter AI

             (cs.jhu.edu)

            . Johns Hopkins researchers introduce MomSPS and USAM optimization methods to speed up deep learning training, reduce hyperparameter tuning, and improve model robustness and real-world generalization

             Dion: the distributed orthonormal update revolution is here

             (microsoft.com)

            . Microsoft Research introduces Dion, a scalable distributed orthonormal update optimizer leveraging low-rank approximation, amortized power iteration, and error feedback to improve large-scale AI model training efficiency

             Disentangled Deep Smoothed Bootstrap for Fair Imbalanced Regression

             (freakonometrics.hypotheses.org)

            . Imbalanced regression, VAEs, disentangled latent space, Smoothed Bootstrap, deep learning, fair learning, variational autoencoders, data generation, IR benchmarks, tabular data, Charpentier, ArXiv

            📚 Academic Research

             xRFM: Accurate, scalable, and interpretable feature learning models for   tabular data

             (arxiv:stat)

            . Combines feature learning kernel machines with tree structures, outperforming GBDTs across 300 datasets with native interpretability. Offers scikit-learn API making it immediately practical for tabular data workflows

             SIFOTL: A Principled, Statistically-Informed Fidelity-Optimization   Method for Tabular Learning

             (arxiv:cs)

            . Cornell-developed method identifies data shift factors using only privacy-safe summary statistics, achieving F1 scores of 0.86-0.96. Essential tool for monitoring production ML systems while maintaining privacy compliance

             BOASF: A Unified Framework for Speeding up Automatic Machine Learning   via Adaptive Successive Filtering

             (arxiv:cs)

            . Combines Bayesian optimization with multi-armed bandits to accelerate AutoML model selection and hyperparameter optimization. Provides better anytime performance than existing AutoML methods across various time budgets

             FuXi-β: Towards a Lightweight and Fast Large-Scale Generative   Recommendation Model

             (arxiv:cs)

            . Huawei-developed framework eliminates attention computation bottlenecks in Transformer recommendation models, achieving 27-47% NDCG@10 improvements. Demonstrates practical scaling optimizations for production recommendation systems

             Controllable Feature Whitening for Hyperparameter-Free Bias Mitigation

             (arxiv:cs)

            . KAIST method eliminates linear correlations between target and bias features without adversarial training complexity. Provides stable bias mitigation across four benchmark datasets with controllable fairness-utility trade-offs

             Kernel-based Equalized Odds: A Quantification of Accuracy-Fairness   Trade-off in Fair Representation Learning

             (arxiv:stat)

            . Georgia Tech develops rigorous kernel formulation for fairness criteria with formal concentration inequalities and performance guarantees. Provides mathematical foundation for principled fairness compliance in supervised learning

            👋 Before you go

            I've got a big favor to ask - keeping Blaze running isn't expensive, but it does all add up, so I'm asking readers like you to help, if you can.
That's why I'm launching

             a Patreon page!

            .  Nothing flashy, just a way for folks who find value in these newsletters to chip in a little each month. In return, you'll get:

             Real say in how Blaze evolves — vote on new topics, features, topic curation ideas

             First dibs on merch (details still cooking)

             That warm fuzzy feeling knowing you're supporting something that saves you time and keeps you plugged into great tech writing

            If you are getting value from blaze, checking this out would mean the world. And if you can't contribute, no worries—the newsletters keep coming either way, and you can follow along on patreon for free.
Thanks for reading and being part of this nerdy corner of the internet. All the best - Alastair.

              Have an idea for how blaze could be better? Please visit the

               feedback form

              to let us know. To update your preferences, or to unsubscribe, please go to

               blaze.email/unsubscribe

              .

Don't miss what's next. Subscribe to The ML Engineer:

Start the conversation: