🔧 Company Engineering Blogs
           
             Revolutionizing Data Cloud: Unleashing the Power of the New ML Recommendations System
            
             (engineering.salesforce.com)
            
            . Data Cloud-native ML recommendations system; flexible abstract schemas; multi-cluster architecture; CI/CD NDCG evaluation; Cursor AI-assisted development
            
             SOTA OCR on-device with Core ML and dots.ocr
            
             (huggingface.co)
            
            . On-device OCR with Core ML and dots.ocr: converting a 3B parameter model via CoreML/MLX, debugging, and benchmarking on Apple Neural Engine
            
             Compute-Optimal Quantization-Aware Training
            
             (machinelearning.apple.com)
            
            . Compute-Optimal Quantization-Aware Training improves QAT efficiency by modeling FP and QAT compute trade-offs and deriving a scaling law
            
             AI as a research partner: Advancing theoretical computer science with AlphaEvolve
            
             (research.google)
            
            . AI-assisted theory via AlphaEvolve evolves finite gadgets to improve MAX-4-CUT inapproximability and Ramanujan graphs for average-case hardness with rigorous verification
            
            📈 Applied modeling and anomalies
           
             Real-time pricing with a pretrained probabilistic stock return model
            
             (thierrymoudiki.github.io)
            
            . Real-time pricing with a pretrained probabilistic stock return model using Python FastAPI and R Plumber
            
             NGBoost (Natural Gradient Boosting) for Regression, Classification, Time Series forecasting and Reserving
            
             (thierrymoudiki.github.io)
            
            . NGBoost-based regression, classification, time series forecasting, and reserving using cybooster with multiple base learners and sklearn integrations
            
             Geological Modeling based on Machine Learning with Python and hatariTools - Tutorial
            
             (hatarilabs.com)
            
            . Geological unit modeling with Python, hatariTools, 3D visualization, and ML classification on Queens Mary Reservoir data
            
             Advancing Anomaly Detection for Industry Applications with NVIDIA NV-Tesseract-AD
            
             (developer.nvidia.com)
            
            . NV-Tesseract-AD uses diffusion modeling, curriculum learning, and adaptive thresholds to enhance multivariate time-series anomaly detection
            
             Smarter Anomaly Detection in Semiconductor Manufacturing with NVIDIA NV-Tesseract and NVIDIA NIM
            
             (developer.nvidia.com)
            
            . NV-Tesseract anomaly detection for semiconductor fabs; multivariate time-series, anomaly localization, and NIM deployment
            
             Order from disordered proteins: Physics-based algorithm designs biomolecules with custom properties
            
             (phys.org)
            
            . Physics-based gradient optimization designs intrinsically disordered proteins with tailored properties using molecular dynamics and automatic differentiation
            
            🏗️ MLOps and data platforms
           
             Restoring Reliability in the AI-Aided Software Development Life Cycle
            
             (cacm.acm.org)
            
            . AI-generated code boosts velocity; SRE-led risk models, testing, and observability drive reliability and resilience
            
             Fine-Tuning Local Models with Docker Offload and Unsloth
            
             (docker.com)
            
            . Fine-tuning local models with Docker Offload and Unsloth to create LoRA-based GGUF artifacts for PII masking
            
             Revolutionizing Car Measurement Data Storage and Analysis: Mercedes-Benz's Petabyte-Scale Solution on the Databricks Intelligence Platform
            
             (databricks.com)
            
            . Mercedes-Benz and Databricks benchmark petabyte-scale automotive time series using RLE, Liquid Clustering, and hierarchical metadata for optimized storage and analytics
            
             Data & AI Infrastructure Are Fusing
            
             (tomtunguz.com)
            
            . Unified AI-ready data stacks with vector databases, context layers, and real-time observability from Netflix and Stripe
            
             Modernize fraud prevention: GraphStorm v0.5 for real-time inference
            
             (aws.amazon.com)
            
            . GraphStorm v0.5 enables sub-second real-time inference and streamlined SageMaker deployment for enterprise-scale GNN fraud prevention
            
             How Hapag-Lloyd improved schedule reliability with ML-powered vessel schedule predictions using Amazon SageMaker
            
             (aws.amazon.com)
            
            . ML-powered vessel ETA predictions with hierarchical XGBoost models on SageMaker, orchestrated by SageMaker Pipelines and Step Functions at Hapag-Lloyd
            
            🚀 ML systems and acceleration
           
             Machine Learning in Trading : the CPU-GPU latency problem
            
             (quantblog.wordpress.com)
            
            . Latency-driven ML trading: GPU inference, CPU simulation, and CPU-GPU colocated architectures like AMD Strix Halo for low-latency decision making
            
             DiLoCo: Data Parallelism for the Datacenter Poor
            
             (hackbot.dad)
            
            . Data parallelism basics, gradient accumulation, and DiLoCo for training large LLMs across heterogeneous, non-densely connected compute
            
             SOTA OCR on-device with Core ML and dots.ocr
            
             (huggingface.co)
            
            . On-device OCR with Core ML and dots.ocr: converting a 3B parameter model via CoreML/MLX, debugging, and benchmarking on Apple Neural Engine
            
             Optimizing Drug Discovery Tools on AMD MI300s Part 2: 3D Molecular Generation with SemlaFlow
            
             (rocm.blogs.amd.com)
            
            . SemlaFlow on AMD MI300X enables two-order speedups in 3D molecular generation and training optimizations with ROCm/PyTorch
            
             CAP4D: 4D Avatars with Morphable Multi-View Diffusion Models
            
             (opencv.org)
            
            . CAP4D merges Morphable Multi-View Diffusion Models with 3D Gaussian Splatting to render 4D avatars from few inputs in real time
            
             Elevating 3D Scene Rendering with GSplat
            
             (rocm.blogs.amd.com)
            
            . GPU-accelerated GSplat port for AMD ROCm; train and render 3D Gaussian splatting scenes on MI300X with multi-GPU support
            
            🧠 LLM attention and evaluation
           
             A practical blueprint for evaluating conversational AI at scale
            
             (dropbox.tech)
            
            . Structured evaluation blueprint for conversational AI at scale: datasets, LLM judges, Braintrust, gated QA pipelines, and production-grade metrics
            
             Attention in LLMs and Extrapolation
            
             (data-processing.club)
            
            . Attention heads in LLMs: syntactic, streaming, retrieval, induction, function vectors, and iteration heads underpin in-context learning and extrapolation
            
             Why do LLMs freak out over the seahorse emoji?
            
             (vgel.me)
            
            . Investigation of seahorse emoji belief in LLMs using logit lens, lm_head mechanics, and cross-model behaviors
            
             Iterating some sample data
            
             (kieranhealy.org)
            
            . Iterates sample data to illustrate LLM evaluation via confusion matrices, R code, and tibble-based data frames
            
             Evidence that Recent AI Gains are Mostly from Inference-Scaling
            
             (tobyord.com)
            
            . Inference-scaling dominates gains over RL post-training in MATH 5, GPQA Diamond, and OTIS AIME benchmarks, per Sonnet 3.7 vs Sonnet 3.6 data
            
             About DeepSeek Sparse Attention
            
             (sibellavia.lol)
            
            . Dynamic sparse attention (DSA) with a lightning indexer and top-k token selection for query-specific adaptive context
            
            📚 Indie math and NLP
           
             CAUSAL CONCEPT-BASED EXPLANATIONS
            
             (medium.com/feedzaitech)
            
            . Post-hoc, concept-based explanations using DiConStruct: a causal, SCM-enabled explainer for CBMs with counterfactual reasoning
            
             APLearn: The Winning APL Forge 2025 Project
            
             (dyalog.com)
            
            . Borna Ahmadzadeh presents APLearn, an open-source ML toolkit in APL inspired by scikit-learn, detailing design, features, and future improvements
            
             Math Academy, update 3: I completed Linear Algebra
            
             (frankhecker.com)
            
            . Progress update: completes Linear Algebra course, learns eigenvectors; reflects on Math Academy system and future courses
            
             Latent Semantic Scale based on Word2vec
            
             (blog.koheiw.net)
            
            . Latent Semantic Scaling with Word2vec: probabilistic LSS using seed words and quanteda tokens
            
             Linkage with feijoas
            
             (11011110.github.io)
            
            . Explores feijoas, Wikipedia entries, CSS features, geometric polyhedra, mesher concepts, and mean curvature flow in a playful blog post
            
             Let the LLM Write the Prompts: An Intro to DSPy in Compound Al Pipelines
            
             (simonwillison.net)
            
            . DSPy optimizes prompts for smaller models like Qwen3-0.6B to improve conflation in GIS with MIPROv2, enabling easy model switching
            
            📐 Learning theory and scaling
           
             Hyperparameter Optimization in Machine Learning
            
             (nowpublishers.com)
            
            . Hyperparameter optimization techniques including random, bandit-, model-, population-, and gradient-based methods for ML, with online, constrained, and multi-objective extensions
            
             Learning to act in generative settings
            
             (danmackinlay.name)
            
            . Survey of optimizing agents vs. replicating persisters; with curiosity, empowerment, and POET as open-ended generators
            
             AI as a research partner: Advancing theoretical computer science with AlphaEvolve
            
             (research.google)
            
            . AI-assisted theory via AlphaEvolve evolves finite gadgets to improve MAX-4-CUT inapproximability and Ramanujan graphs for average-case hardness with rigorous verification
            
             the Harvard and Brown school of computer science
            
             (xianblog.wordpress.com)
            
            . Harvard-Brown school vs LeCun's neural networks; Bayesian inference and Markov random fields in pattern learning
            
             connectionist networks
            
             (aarnphm.xyz)
            
            . Explores connectionist networks, representations, backpropagation, universal approximation, inductive bias, tensor product representations, SMT/NTK, attention, and emergent cognition
            
             lecture five
            
             (aarnphm.xyz)
            
            . Lecture five covers scaling laws, power-law relations, and MuP parameterization for Transformers with practical takeaways
            
            📚 Academic Research
           
             FairContrast: Enhancing Fairness through Contrastive learning and   Customized Augmenting Methods on Tabular Data
            
             (arxiv:cs)
            
            . FairContrast: contrastive learning with customized augmentations to mitigate bias in tabular data while preserving accuracy
            
             CardioForest: An Explainable Ensemble Learning Model for Automatic Wide   QRS Complex Tachycardia Diagnosis from ECG
            
             (arxiv:cs)
            
            . CardioForest: an optimized Random Forest ensemble with XGBoost/LightGBM for explainable WCT detection from MIMIC-IV ECG using SHAP explanations
            
             C2AL: Cohort-Contrastive Auxiliary Learning for Large-scale   Recommendation Systems
            
             (arxiv:cs)
            
            . Cohort-Contrastive Auxiliary Learning (C2AL) enhances attention in factorization machines to preserve minority cohorts in large-scale recommendations
            
             fev-bench: A Realistic Benchmark for Time Series Forecasting
            
             (arxiv:cs)
            
            . fev-bench: a 100-task time series forecasting benchmark with covariates and bootstrapped evaluation via fev library
            
             Private and Fair Machine Learning: Revisiting the Disparate Impact of   Differentially Private SGD
            
             (arxiv:cs)
            
            . Analyzes how DPSGD affects fairness across metrics and hyperparameter choices, examining DP leakage, utility-fairness trade-offs, and DPSGD-Global-Adapt
            
            👋 Before you go
           
            I've got a big favor to ask - keeping Blaze running isn't expensive, but it does all add up, so I'm asking readers like you to help, if you can.
That's why I'm launching
            
             a Patreon page!
            
            .  Nothing flashy, just a way for folks who find value in these newsletters to chip in a little each month. In return, you'll get:
            
- 
             Real say in how Blaze evolves — vote on new topics, features, topic curation ideas
            
 
- 
             First dibs on merch (details still cooking)
            
 
- 
             That warm fuzzy feeling knowing you're supporting something that saves you time and keeps you plugged into great tech writing
            
 
 
            If you are getting value from blaze, checking this out would mean the world. And if you can't contribute, no worries—the newsletters keep coming either way, and you can follow along on patreon for free.
Thanks for reading and being part of this nerdy corner of the internet. All the best - Alastair.
            
 |