The ML Engineer 07-01-2026

        January 7, 2026

The ML Engineer 07-01-2026

        📝 Engineering Notes
DGX Spark: Hello World (svnscha.de). NGDGX Spark brings a personal AI supercomputer with 128GB memory; exploring Ollama, LibreChat, ComfyUI, and Stable Diffusion courses on Blackwell hardware

2025-12-31: From Tables to Triumph: A PhD Journey in Uncertainty-Aware Scientific Data Extraction (ws-dl.blogspot.com). PhD journey in uncertainty-aware scientific table data extraction using TTA-m, TSR-OCR-UQ, SciTableQA, and SCITEUQ at ODU with Dr. Jian Wu

2025: Career in Review (sajalsharma.com). Year in AI product building, teaching courses, and viral blogging with LangGraph, Liminal, Claude Code, and multi-agent systems

Looking back at 2025 (blog.lawrencejones.dev). Lawrence Jones chronicles incident.io's AI SRE rise: Series B, telemetry push, 18 engineers, Sev0 demo, and hands-on tooling and backtests

🚦 ML/LLMOps
Drift Detection in Robust Machine Learning Systems (towardsdatascience.com). Drift detection in ML systems using data and concept drift, KS test, PSI, chi-square, autoencoder-based multivariate checks, with practical guidance

Why 90% of AI Agents Never Leave the Demo (mikulskibartosz.name). Practical AI engineering guide exploring three production pitfalls and how to build reliable AI pilots with abstractions, metrics, and real-data testing

Actor Mesh: Enterprise Architecture for Scalable AI Engineering (blog.kodigy.com). A practical guide to scalable AI engineering using Actor Model, distributed sagas, Kubernetes, and Monotonic content enrichment

The Control Layers of AI (dri.es). Deterministic workflows, AI decisioning, and open-source orchestration tools like n8n and Activepieces for reliable enterprise AI

Kubernetes v1.35: New level of efficiency with in-place Pod restart (kubernetes.io). Kubernetes 1.35 introduces in-place Pod restart (alpha), enabling full Pod restarts to reset state for AI/ML workloads

Getting metrics by logging (natemeyvis.com). Using logs to emit metrics in AWS CloudWatch with Python, refactoring for testability and performance benefits

⚙️ PyTorch, LLMs & GPUs
Accelerating Multimodal Inference in vLLM: The One-Line Optimization for Large Multimodal Models (rocm.blogs.amd.com). Batch-level data parallelism for vision encoders in vLLM boosts throughput on AMD MI300X using a one-line --mm-encoder-tp-mode data switch

Optimizing Data Transfer in AI/ML Workloads (towardsdatascience.com). explores data-transfer bottlenecks in AI/ML workloads using NVIDIA Nsight Systems and PyTorch, with CUDA streams and prefetching

Inspecting and Visualizing Torch FX Graph (leimao.github.io). Inspect and visualize Torch FX graphs and ATen IR with PyTorch 2.x, using FxGraphDrawer, TorchFunctionMode, and TorchDispatchMode

Train Your Large Model on Multiple GPUs with Tensor Parallelism (machinelearningmastery.com). Tensor parallelism for large transformers on multi-GPU systems using PyTorch, with TP plans, DTensor, and 2D parallelism concepts

Parameter-efficient fine-tuning in tinygrad (dxuuu.xyz). Parameter-efficient fine-tuning with LoRA in tinygrad on Llama 3.2 1B, exploring implementation and inference

SparK: Query-Aware Unstructured Sparsity with Recoverable KV Cache Channel Pruning (rocm.blogs.amd.com). SparK enables query-aware unstructured KV cache pruning and recoverable channels to compress LLM KV caches on AMD Instinct GPUs

Finding Hotspots in Your Code with the Intel VTune Command-Line Interface (nas.nasa.gov). Profiling NVIDIA Intel VTune hotspots via command-line for serial, Python, OpenMP, and MPI with MPT on NASA HECC clusters

Zhang et al (2024) TinyLlama (adrian.idv.hk). TinyLlama 1.1B trains on SlimPajama-derived data to outperform larger models; uses Llama 2 architecture, FSDP, FlashAttention, xFormer; reports 24K tokens/s per A100-40G

🎯 Applications
Use Claude Code to Query 600 GB Indexes over Hacker News, ArXiv, etc. (exopriors.com). Semantic search over alignment documents with Claude Code, vector mixing, debiasing, and public/private embeddings for arXiv, Hacker News, and More

Not just numbers: Understanding cities through their words (gisagents.org). Explores how housing, tourism, digital economy, and government reports reveal city dynamics using online reviews and social data
Machine Learning for Optimization: Toy Example (juanitorduz.github.io). Brute-force ML optimization in Python using HistGradientBoostingRegressor to tune bids, with Nelder-Mead/Powell, and visualization of x1/x2 synthetic data

When Kevin Malone meets Claude (koaning.io). Concise guide to HuberRegressor in scikit-learn, its robust loss, epsilon tuning, and practical usage

How Netflix Uses Matrix Factorization to Predict Your Next Favorite Movie (journal.hexmos.com). Explains Matrix Factorization with latent factors and SGD for sparse Netflix-like data using Python vectors

Generating Human Faces with Variational Autoencoders (mayberay.bearblog.dev). Variational autoencoders (VAEs) explored: KL-divergence, reparameterization, MNIST digits, and convolutional faces using PyTorch

⏱️ Forecasting & State Space

Forecasting Hierarchical Models - Part III (juanitorduz.github.io). Hybrid deep state-space forecasting in NumPyro with Flax NNX, station embeddings, and SVI in Python

Python examples for ‘Beyond Nelson-Siegel and splines: A model- agnostic Machine Learning framework for discount curve calibration, interpolation and extrapolation’ (thierrymoudiki.github.io). Python examples using yieldcurveml to interpolate and extrapolate discount curves with Laguerre and Cubic bases in ML models

Forecasting benchmark: Dynrmf (a new serious competitor in town) vs Theta Method on M-Competitions and Tourism competitition (thierrymoudiki.github.io). A benchmarking study comparing Dynrmf and Theta Method on M3, M1, and Tourism datasets using R, parallel processing, and standard accuracy metrics

Modelling time to next reported fault (shape-of-code.com). Explores modelling time to next fault reports using Poisson processes, exponential interarrival times, and user activity effects in software fault prediction

🧪 Causal & Statistical Evidence

Testing Super Learner's Coverage - A Note To Myself (kenkoonwong.com). Explores SuperLearner with TMLE in R, comparing XGBoost, Random Forest, GLM, NNLS, and parallel computation

Effects of Algorithmic Flagging on Fairness: Quasi-experimental Evidence from Wikipedia (mako.cc). Quasi-experimental analysis using regression discontinuity to study algorithmic flagging effects on Wikipedia fairness across editor groups

Willful Incompetence: Repeating False Claims Does Not Make them True (replicationindex.com). Schimmack defends z-curve, critiques Pek et al., discusses EDR, LLN, bias, and meta-analysis in emotion psychology with R-like statistical nuance

🧠 Representation & Vision

Neural Networks: Zero to Hero (karpathy.ai). A practical guide to neural networks with PyTorch and Python, detailing concepts from linear models to deep learning, including backpropagation and optimization

Introducing Brand New Face Recognition in DeepFace (sefiks.com). DeepFace adds register, build_index, and search to enable scalable, stateless face recognition with databases and ANN

Human embeddings (trfetzer.com). Explores human embeddings for team design using 1024D vector space, open-source embeddings, GPT-OSS, and clustering to form diverse groups

Beating BERT? Small LLMs vs Fine-Tuned Encoders for Classification (alex-jacobs.com). 32 experiments compare small LLMs to fine-tuned encoders like BERT/DeBERTa for classification, revealing nuanced performance and throughput insights

The 1,000 neuron challenge (thetransmitter.org). Small neural models in a 1,000-neuron Braincraft competition explore energy-efficient AI with Python/GitHub, featuring Rougier and Churchland

🧲 Physics & Spectra

Robustness, interpretability, and scaling of eigenvalue models (alexshtf.github.io). Robustness, interpretability, and scaling of eigenvalue models using PyTorch on real data (California Housing), examining spectral bounds and feature importance

DeepSeek’s mHC Explained: Manifold-Constrained Hyper-Connections (aipapersacademy.com). Explains DeepSeek’s mHC: manifold-constrained hyper-connections that stabilize and enhance residual streams in LLMs using doubly stochastic mixing via Sinkhorn–Knopp

The Physics of mHC: Why Deep Learning Needs Energy Conservation (toooold.com). Conservation-inspired mHC uses doubly stochastic matrices and Sinkhorn iterations to build energy-conservative neural layers in Python-like reasoning

The Transformer as Renormalization Group Flow (symmetrybroken.com). Transformer attention as a Bayesian RG flow: coarse-graining tokens to semantic attractors using neural networks, physics-inspired interpretation

Information entropy for spectra (nirpyresearch.com). Shannon entropy applied to NIR spectra with Python (SciPy, NumPy); derivative entropy for smoothing optimization, inspired by K. G. Larkin

Limits of the Transformer Architecture and a QCD-like Alternative (symmetrybroken.com). A physics-inspired critique of transformers, exploring UV/IR limits, QCD analogies, and multi-scale architectures for cognition using Python-like pseudocode

📚 Research

Can Small Training Runs Reliably Guide Data Curation? Rethinking Proxy-Model Practice (arxiv:cs). Shows small proxy runs can mis-rank data recipes because optimal hyperparameters depend on data. Lower learning rates make rankings correlate with tuned large-scale pretraining better

Analyzing Communication Predictability in LLM Training (arxiv:cs). Characterizes predictable communication patterns in distributed LLM training and models overhead analytically. ConfigTuner uses the model to pick parallelism settings, boosting throughput up to 1.36×

Disordered Dynamics in High Dimensions: Connections to Random Matrices and Machine Learning (arxiv:stat). Pedagogical dynamical mean-field theory for high-dimensional random-matrix learning dynamics: gradient flow, SGD, random features, deep linear nets. Explains loss nonmonotonicity and bias–variance scaling for practitioners

Conformal Prediction Under Distribution Shift: A COVID-19 Natural Experiment (arxiv:stat). Uses COVID-19 supply-chain shifts to stress-test conformal prediction; coverage collapses unpredictably. Finds SHAP importance concentration predicts failures and suggests monitoring plus quarterly retraining triggers operationally

                            Don't miss what's next. Subscribe to The ML Engineer:

            Email address (required)

          Add a comment:

                Share this email:

                                Share on LinkedIn

                                Share on Hacker News

                                Share on Mastodon

                                Share on Bluesky