ML Engineering newsletter


            
        August 26, 2025
    
    
ML Engineering newsletter


   Blaze Email
  

               ML Engineering
              

               2025-08-26
               
                •  read online
               

                •  patreon
               

            🔧 Company Engineering Blogs
           

             How I Wrote Code That Allocates Cash Account Interest Daily as a Wealthfront Intern
            

             (eng.wealthfront.com)
            
            . Wealthfront intern details daily cash account interest allocation by category, design docs, test-driven development, Modern Java training, mentorship, and beta release to employees
           

             From massive models to mobile magic: The tech behind YouTube real-time generative AI effects
            

             (research.google)
            
            . YouTube real-time AI effects on mobile: distilling large generative models with PTI inversion, UNet-MobileNet student, on-device MediaPipe pipelines, 30fps latency, 6–10 ms GPUs, datasets with Monk Skin Tone scaling, and effects like Never Blink, Toon 2, Risen zombie
           

            🔬 Research Applications & Innovation
           

             3D Reconstruction From Public Photos with Machine Learning
            

             (blog.skz.dev)
            
            . 3D reconstruction from public photos using DepthPro, camera intrinsics, focal length, depth masks, Open3D visualization, and linear algebra for projecting pixels into 3D space
           

             Showcasing Research And Insights In London
            

             (cs.columbia.edu)
            
            . Columbia Data Management Group presents Suna for scalable confounder discovery, FaDE fast-provenance based what-if analytics, HoliPaxos for predictable state machine replication, DocETL for document processing, plus VLDB panels
           

             2025-08-19: Paper Summary: Reproducibility Study on Network Deconvolution
            

             (ws-dl.blogspot.com)
            
            . Reproducing Ye et al.'s Network Deconvolution: BN replacement with deconvolution layers, 134 tests, 116 reproducible within 10%, soft-reproducibility, ReScience C, GitHub workflow
           

             DGL in the Real World: Running GNNs on Real Use Cases
            

             (rocm.blogs.amd.com)
            
            . Four real-world GNN workloads on ROCm: GNN-FiLM, ARGO CPU runtime, GATNE multiplex heterogeneous networks, EEG-GCNN for neurological diagnostics, via DGL on AMD GPUs
           

             GPT-5: Have We Finally Hit The AI Scaling Wall?
            

             (backreaction.blogspot.com)
            
            . GPT-5 discussion on scaling laws, hardware limits, and potential scaling wall; analysis of model size, data, compute, training efficiency, bottlenecks, and empirical trends
           

             New AI model advances fusion power research by predicting the success of experiments
            

             (phys.org)
            
            . Lawrence Livermore Lab AI model predicts 74% ignition likelihood for 2022 NIF inertial confinement fusion shot, trained on 150,000 simulations plus real data, using Bayesian inference to fuse physics and data
           

            ⚖️ AI Fairness & Trustworthy ML
           

             SCS Tool Catches AI Bias Early
            

             (cs.cmu.edu)
            
            . FairSense simulates ML systems over long periods to measure bias, exploring feedback loops in banking-like environments with credit scores, loan decisions, and fairness metrics
           

             Disentangled Deep Smoothed Bootstrap for Fair Imbalanced Regression
            

             (freakonometrics.hypotheses.org)
            
            . Imbalanced regression, VAEs, disentangled latent space, Smoothed Bootstrap, deep learning, fair learning, variational autoencoders, data generation, IR benchmarks, tabular data, Charpentier, ArXiv
           

             Designing Trustworthy ML Models: Alan & Aida Discover Monotonicity in Machine Learning
            

             (towardsdatascience.com)
            
            . Monotonicity constraints in ML with XGBoost: ensure predictions rise with square footage and fall with age; illustrate violations, use Pandas toy data, plotting dips, and domain knowledge for trustworthy models
           

             Global Mathematics Lecture IV, Kyoto University
            

             (freakonometrics.hypotheses.org)
            
            . Global Mathematics Lecture IV at Kyoto University by Arthur Charpentier discusses algorithmic discrimination in predictive models, with emphasis on insurance, biases, fairness, and actuarial applications within graduate studies
           

            🔧 ML Infrastructure & Performance
           

             You could have invented CuTe hierarchical layout (but maybe not the rest of it?)
            

             (blog.ezyang.com)
            
            . CuTe layouts generalized from PyTorch strides; hierarchical sizes/strides as trees; co-lexicographic indexing; flatten/unflatten of multi-dimensional structures; practical examples with (2,2,2) and transposed strides
           

             vmanomaly Deep Dive: Smarter Alerting with AI (Tech Talk Companion)
            

             (victoriametrics.com)
            
            . vmanomaly uses ML-driven anomaly scoring to replace static thresholds, enabling smarter alerting for VictoriaMetrics, with an ML-powered recording rule, fine-tuning, handling missing data, and Grafana dashboards
           

             From Logic to Linear Algebra: How AI is Rewiring the Computer
            

             (journal.hexmos.com)
            
            . AI workloads shifted from branching logic to matrix multiplications; CPUs lose edge as GPUs, TPUs, and Groq optimize for linear algebra, FLOPS benchmarks, and economic efficiency
           

             Feature Platform — On Lakehouse
            

             (blog.devgenius.io)
            
            . Feature Platform on Real-Time Lakehouse uses Apache Flink, Apache Paimon, Apache Iceberg with Doris/StarRocks/Presto for online/offline feature stores, registration, and low-latency serving
           

             H100 vs GB200 NVL72 Training Benchmarks – Power, TCO, and Reliability Analysis, Software Improvement Over Time
            

             (semianalysis.com)
            
            . Benchmarks compare H100 vs GB200 NVL72 across MFU, TCO, and reliability; includes 128–2048 H100 scales, NeMo Megatron-LM, DGXC/NVIDIA benchmarks, and energy per token analysis
           

            🏦 Risk Modeling & Financial ML
           

             How Nubank models risk for smarter, scalable credit limit increases
            

             (building.nubank.com)
            
            . Nubank uses survival analysis, risk ranking, feature stores, CI/CD, and scalable engineering to optimize credit limit increases across Brazil, Mexico, and Colombia
           

             Another interesting decision, now for ‘Beyond Nelson-Siegel and splines: A model-agnostic Machine Learning framework for discount curve calibration, interpolation and extrapolation’
            

             (thierrymoudiki.github.io)
            
            . Beyond Nelson-Siegel and splines: model-agnostic ML for discount curve calibration, interpolation, extrapolation; linearized bond pricing; regression with Laguerre polynomials, kernels, regularized linear models; critique of reviewer concerns
           

             scikit-survival 0.25.0 with improved documentation released
            

             (k-d-w.org)
            
            . scikit-survival 0.25.0 adds scikit-learn 1.7 support; overhauled API docs; guidance on performance metrics; tutorials on C-index, IPCW, AUC, Brier score; examples with Veterans Cancer data
           

             A calmer, karma, CARMA algorithmic chameleon
            

             (sciencespot.co.uk)
            
            . CARMA models with missing data: INAG interpolation + auxiliary model estimates, nuisance noise handling, and Nesterov Accelerated Gradient optimisation for faster, robust parameter estimation in coloured noise scenarios
           

            📊 Statistical Methods & Probabilistic Models
           

             Simplest implementation of LDA topic modeling with collapsed Gibbs sampling
            

             (e-dorigatti.github.io)
            
            . Simplest LDA with collapsed Gibbs sampling: topic modeling, Dirichlet priors alpha/beta, phi and theta distributions, Z and W latent variables, synthetic data generation, NUM_TOPICS, VOCAB_SIZE, DOC_LENGTH, and Gibbs updates
           

             PCA on PCA
            

             (runningonnumbers.com)
            
            . PCA on Pete Crow-Armstrong 2025 stats; explains curse of dimensionality, principal components, covariance, eigenvalues/eigenvectors, and visual PCA plotting with NumPy, matplotlib, and quiver arrows
           

             Analytic Bandwidth Selection Without the Grid Search
            

             (gojiberries.io)
            
            . Analytic bandwidth selection via LOO cross-validation derivatives for NW regression and KDE; Newton methods in h or g; closed-form S'(h), R'(h), and LSCV derivatives; Gaussian/Epanechnikov kernels; Python hessband package with analytic, grid, golden, plug-in, fdnewton, bayes strategies
           

             Soft Inductive Biases: How Reformulating Constraint Architecture Dissolves Deep Learning’s…
            

             (medium.com/intuitionmachine)
            
            . Soft inductive biases reframing constraint architectures in deep learning; PAC-Bayes, compressibility, benign overfitting, double descent, overparametrization, Grothendieck analogy, Andrew Gordon Wilson framework
           

             Cracking the Density Code: Why MAF Flows Where KDE Stalls
            

             (towardsdatascience.com)
            
            . High-dimensional density estimation challenges with KDE vs autoregressive normalizing flows; KDE bandwidth, Scott’s rule, mean relative error; MAF foundations; block illustrations; 2D target distributions; animation of flow transformations; references to Papamakarios, Lakshminarayanan, Murphy; code snippets in Python; Gaussian samples; KDE error computation; Barnes methods; density estimation critique
           

            📚 Academic Research
           

             The C-index Multiverse
            

             (arxiv:cs)
            
            . Reveals reproducibility crisis where identical survival analysis metrics yield different results across software implementations. Critical for ML engineers ensuring consistent model evaluation and fair comparisons
           

             Interpretable Kernels
            

             (arxiv:stat)
            
            . Shows how kernel methods can be re-expressed as interpretable linear combinations of original features. Bridges powerful non-linear methods with explainable AI requirements
           

             Chunked Data Shapley: A Scalable Dataset Quality Assessment for Machine   Learning
            

             (arxiv:cs)
            
            . Introduces scalable data valuation method achieving 80-2300x speedups over existing approaches while maintaining accuracy. Essential tool for large-scale data quality assessment
           

             Multiply Robust Conformal Risk Control with Coarsened Data
            

             (arxiv:stat)
            
            . Extends conformal prediction to handle missing data and censoring using semiparametric theory. Provides robust uncertainty quantification for real-world incomplete datasets
           

             Conformalized Exceptional Model Mining: Telling Where Your Model   Performs (Not) Well
            

             (arxiv:cs)
            
            . Combines conformal prediction with subgroup discovery to identify data regions where models fail. Critical framework for understanding and improving model reliability
           

             Machine Learning for Medicine Must Be Interpretable, Shareable,   Reproducible and Accountable by Design
            

             (arxiv:stat)
            
            . Establishes foundational principles for trustworthy ML in high-stakes domains like healthcare. Provides practical guidance for building interpretable and accountable systems
           

            👋 Before you go
           

            I've got a big favor to ask - keeping Blaze running isn't expensive, but it does all add up, so I'm asking readers like you to help, if you can.
That's why I'm launching
            
             a Patreon page!
            
            .  Nothing flashy, just a way for folks who find value in these newsletters to chip in a little each month. In return, you'll get:
           

             Real say in how Blaze evolves — vote on new topics, features, topic curation ideas
            

             First dibs on merch (details still cooking)
            

             That warm fuzzy feeling knowing you're supporting something that saves you time and keeps you plugged into great tech writing
            

            If you are getting value from blaze, checking this out would mean the world. And if you can't contribute, no worries—the newsletters keep coming either way, and you can follow along on patreon for free.
Thanks for reading and being part of this nerdy corner of the internet. All the best - Alastair.
           

              Have an idea for how blaze could be better? Please visit the
              
               feedback form
              
              to let us know. To update your preferences, or to unsubscribe, please go to
              
               blaze.email/unsubscribe
              
              .
             

                            Don't miss what's next. Subscribe to The ML Engineer:
                        
                    
            Email address (required)
            
            
          Add a comment:
          
            
                Share this email:
                
                    
                                Share on LinkedIn
                            
                        
                                Share on Hacker News
                            
                        
                                Share on Mastodon
                            
                        
                                Share on Bluesky