🔧 Company Engineering Blogs
           
             Intern Experience at Lyft
            
             (eng.lyft.com)
            
            . Lyft data scientists Morteza Taiebat and Han Gong recount internships on Sustainability and Driver Loyalty teams, using difference-in-differences, hierarchical linear models, CPIDH, causal prediction, and budget optimization for EV adoption, driver productivity, and referral incentives
            
             LLM Embeddings Explained: A Visual and Intuitive Guide
            
             (huggingface.co)
            
            . Explore how LLMs convert text to meaning, covering techniques, embeddings, tools, and visualization in natural language processing
            
             MLE-STAR: A state-of-the-art machine learning engineering agent
            
             (research.google)
            
            . MLE-STAR automates machine learning engineering tasks by leveraging web search, code refinement, and ensemble strategies, achieving high performance in Kaggle competitions
            
            🎯 Applications & Use Cases
           
             3D Reconstruction From Public Photos with Machine Learning
            
             (blog.skz.dev)
            
            . 3D reconstruction from public photos using DepthPro, camera intrinsics, focal length, depth masks, Open3D visualization, and linear algebra for projecting pixels into 3D space
            
             Word Embeddings: Theory and Analysis
            
             (blog.sparsh.dev)
            
            . Overview of word embeddings, vocabulary discretization, and dense representations; highlights Word2Vec and GloVe, semantic similarity via cosine similarity, analogy examples, subword n-grams, and embedding dimensionality
            
             Parallelization Strategies in Neural Networks
            
             (nwktimes.blogspot.com)
            
            . Data, model, and tensor parallelism explained; 3D parallelism enables scalable AI training across GPUs and nodes, covering FNNs, forward/backward passes, activations, DMA, RDMA, and memory considerations
            
             Spam Classification with a Fine-Tuned LLM, Part IV: Model Training and Inference
            
             (rnowling.github.io)
            
            . Fine-tuning Llama-3.2-1B for spam classification with Hugging Face datasets, transformers, TrainingArguments, Trainer, DataCollatorWithPadding; training, inference, metrics, and performance scaling versus SGD logistic regression
            
             CrowdStrike’s Approach to Better Machine Learning Evaluation Using Strategic Data Splitting
            
             (crowdstrike.com)
            
            . CrowdStrike employs strategic data splitting, like blocking by machine, to prevent data leakage in cybersecurity ML models, ensuring reliable performance against novel threats
            
             3D Line Drawings
            
             (amritkwatra.com)
            
            . Amritansh Kwatra explores 3D line drawings using GANs, 3D Gaussian Splatting, image transformation, and techniques for interactive artistic rendering
            
            🔧 ML Engineering & Infrastructure
           
             Translating Cython to Mojo, a first attempt
            
             (fnands.com)
            
            . Exploring the translation of Cython code from scikit-learn to Mojo, focusing on DBSCAN's inner loop for improved performance
            
             The evolution of Grab's machine learning feature store
            
             (engineering.grab.com)
            
            . Grab evolves its ML feature store by adopting a feature table architecture with AWS Aurora, enhancing performance and addressing complex data management challenges
            
             MLOrbs?: MLOps in the database with orbital and dbt
            
             (emilyriederer.com)
            
            . MLOps in the analytical database using orbital's sklearn-to-sql and tidymodels, sqlglot, and dbt; churn modeling with IBM telecom data; zero-infrastructure deployment inside dbt pipelines
            
             An efficient path to production AI: Kakao’s journey with JAX and Cloud TPUs
            
             (cloud.google.com)
            
            . Kakao shifts to Google Cloud TPUs and JAX, detailing data pipelines with Grain, XPK on Kubernetes, MaxText customization, multi-source data blends, and MoE training on Kanana models
            
             Avalanche stack and real-time streaming applications at Nu
            
             (building.nubank.com)
            
            . Nubank's Avalanche stack enables real-time analytics with Kubernetes, Kafka, Flink, and Pinot for fraud detection, Autopilot risk calibration, On-Demand Features Handler, and case studies in opportunistic loans
            
             I built a toy TPU that can do inference and training on the XOR problem
            
             (tinytpu.com)
            
            . Tiny-TPU puppet project explores a toy TPU for inference and training on XOR, detailing MLP architecture, XOR data, matrix multiplications, and Verilog-inspired hardware concepts
            
            🧮 Mathematical Foundations & Theory
           
             Paying attention to feature distribution alignment
            
             (alexshtf.github.io)
            
            . Weighted-orthogonality, Legendre bases, and CDF-based mappings for feature decorrelation; quantile transformers, LegendrePolynomialFeatures, and pipelines in SciPy/Scikit-Learn with simulated distributions
            
             Vandermonde Matrices are Merely Exponentially Ill-Conditioned
            
             (ethanepperly.com)
            
            . Gautschi’s bound on Vandermonde conditioning, exponential ill-conditioning, block Krylov iterations, RBKI, elementary symmetric polynomials, Lagrange vs Vandermonde, robust FTA, inverse Vandermonde entries
            
             Derivatives, Gradients, Jacobians and Hessians – Oh My!
            
             (blog.demofox.org)
            
            . Derivatives, gradients, Jacobians and Hessians explained: optimize with gradient descent, compute partial derivatives, build Jacobians, explore determinants, and apply in rendering and ML
            
             Gaussian Integration by Parts
            
             (ethanepperly.com)
            
            . Gaussian integration by parts simplifies computations of moments and eigenvalue estimates in random processes, with insights from Tropp and an application to power iteration
            
             New Physics-Inspired Proof Probes the Borders of Disorder
            
             (quantamagazine.org)
            
            . Band matrix thresholds for localization-delocalization transitions; Yau, Yin, Erdős, Knowles, and collaborators prove delocalization just above predicted band widths in 1D, 2D, 3D
            
            🧠 Neural Networks & Deep Learning
           
             modded-nanogpt: Analyzing value-embedding-, UNet-, and x0-lambdas
            
             (snimu.github.io)
            
            . modded-nanogpt analyzes value-embedding-, UNet-, and x0-lambdas, detailing three residual-mixing tricks, learned lambda dynamics, layer skipping, training effects, and links to learning-rate and sequence-length schedules, patterns
            
             Output Latent Spaces in Multihead Attention
            
             (mccormickml.com)
            
            . Exploration of shared output latent spaces in Multihead Latent Attention models, enhancing efficiency in deep learning with techniques like SVD and model compression
            
             Dot Product in the Attention Mechanism
            
             (eranraviv.com)
            
            . Explores the dot product in attention mechanisms, vector similarity, cosine calculations, and implications for self-attention in deep learning models
            
             Exploring fun parts of Neural Network
            
             (shivasurya.me)
            
            . Explores neural networks from XOR basics in NumPy to sigmoid versus ReLU, training dynamics, 3Blue1Brown insights, MNIST hints, and implications for security reviews and LLMs
            
             Deep linear networks
            
             (danmackinlay.name)
            
            . Exploration of deep linear networks, gradient flow, singular value dynamics, and gated models with a focus on feature learning and hierarchical structures
            
             2025-08-19: Paper Summary: Reproducibility Study on Network Deconvolution
            
             (ws-dl.blogspot.com)
            
            . Reproducing Ye et al.'s Network Deconvolution: BN replacement with deconvolution layers, 134 tests, 116 reproducible within 10%, soft-reproducibility, ReScience C, GitHub workflow
            
             Soft Inductive Biases: How Reformulating Constraint Architecture Dissolves Deep Learning’s…
            
             (medium.com/intuitionmachine)
            
            . Soft inductive biases reframing constraint architectures in deep learning; PAC-Bayes, compressibility, benign overfitting, double descent, overparametrization, Grothendieck analogy, Andrew Gordon Wilson framework
            
             Using geometry and physics to explain feature learning in deep neural networks
            
             (phys.org)
            
            . Spring-block phenomenology models feature learning in deep neural networks, linking data separation across layers to friction, noise, and training dynamics; revealing relations akin to thermodynamics
            
            ⚙️ Optimization & Statistical Methods
           
             Another interesting decision, now for ‘Beyond Nelson-Siegel and splines: A model-agnostic Machine Learning framework for discount curve calibration, interpolation and extrapolation’
            
             (thierrymoudiki.github.io)
            
            . Beyond Nelson-Siegel and splines: model-agnostic ML for discount curve calibration, interpolation, extrapolation; linearized bond pricing; regression with Laguerre polynomials, kernels, regularized linear models; critique of reviewer concerns
            
             Using Imperfect Synthetic Data in Downstream Inference Tasks
            
             (donskerclass.github.io)
            
            . Introduces a hyperparameter-free estimator based on generalized method of moments to combine imperfect synthetic data from large language models with real data for valid inference
            
             Boosting any randomized based learner for regression, classification and univariate/multivariate time series forcasting
            
             (thierrymoudiki.github.io)
            
            . Explore boosting randomized learners for regression, classification, and time series forecasting using Python's cybooster library
            
             Smarter training for smarter AI
            
             (cs.jhu.edu)
            
            . Johns Hopkins researchers introduce MomSPS and USAM optimization methods to speed up deep learning training, reduce hyperparameter tuning, and improve model robustness and real-world generalization
            
             Dion: the distributed orthonormal update revolution is here
            
             (microsoft.com)
            
            . Microsoft Research introduces Dion, a scalable distributed orthonormal update optimizer leveraging low-rank approximation, amortized power iteration, and error feedback to improve large-scale AI model training efficiency
            
             Disentangled Deep Smoothed Bootstrap for Fair Imbalanced Regression
            
             (freakonometrics.hypotheses.org)
            
            . Imbalanced regression, VAEs, disentangled latent space, Smoothed Bootstrap, deep learning, fair learning, variational autoencoders, data generation, IR benchmarks, tabular data, Charpentier, ArXiv
            
            📚 Academic Research
           
             xRFM: Accurate, scalable, and interpretable feature learning models for   tabular data
            
             (arxiv:stat)
            
            . Combines feature learning kernel machines with tree structures, outperforming GBDTs across 300 datasets with native interpretability. Offers scikit-learn API making it immediately practical for tabular data workflows
            
             SIFOTL: A Principled, Statistically-Informed Fidelity-Optimization   Method for Tabular Learning
            
             (arxiv:cs)
            
            . Cornell-developed method identifies data shift factors using only privacy-safe summary statistics, achieving F1 scores of 0.86-0.96. Essential tool for monitoring production ML systems while maintaining privacy compliance
            
             BOASF: A Unified Framework for Speeding up Automatic Machine Learning   via Adaptive Successive Filtering
            
             (arxiv:cs)
            
            . Combines Bayesian optimization with multi-armed bandits to accelerate AutoML model selection and hyperparameter optimization. Provides better anytime performance than existing AutoML methods across various time budgets
            
             FuXi-β: Towards a Lightweight and Fast Large-Scale Generative   Recommendation Model
            
             (arxiv:cs)
            
            . Huawei-developed framework eliminates attention computation bottlenecks in Transformer recommendation models, achieving 27-47% NDCG@10 improvements. Demonstrates practical scaling optimizations for production recommendation systems
            
             Controllable Feature Whitening for Hyperparameter-Free Bias Mitigation
            
             (arxiv:cs)
            
            . KAIST method eliminates linear correlations between target and bias features without adversarial training complexity. Provides stable bias mitigation across four benchmark datasets with controllable fairness-utility trade-offs
            
             Kernel-based Equalized Odds: A Quantification of Accuracy-Fairness   Trade-off in Fair Representation Learning
            
             (arxiv:stat)
            
            . Georgia Tech develops rigorous kernel formulation for fairness criteria with formal concentration inequalities and performance guarantees. Provides mathematical foundation for principled fairness compliance in supervised learning
            
            👋 Before you go
           
            I've got a big favor to ask - keeping Blaze running isn't expensive, but it does all add up, so I'm asking readers like you to help, if you can.
That's why I'm launching
            
             a Patreon page!
            
            .  Nothing flashy, just a way for folks who find value in these newsletters to chip in a little each month. In return, you'll get:
            
- 
             Real say in how Blaze evolves — vote on new topics, features, topic curation ideas
            
 
- 
             First dibs on merch (details still cooking)
            
 
- 
             That warm fuzzy feeling knowing you're supporting something that saves you time and keeps you plugged into great tech writing
            
 
 
            If you are getting value from blaze, checking this out would mean the world. And if you can't contribute, no worries—the newsletters keep coming either way, and you can follow along on patreon for free.
Thanks for reading and being part of this nerdy corner of the internet. All the best - Alastair.
            
 |