The ML Engineer 07-01-2026
📝 Engineering Notes
DGX Spark: Hello World (svnscha.de). NGDGX Spark brings a personal AI supercomputer with 128GB memory; exploring Ollama, LibreChat, ComfyUI, and Stable Diffusion courses on Blackwell hardware
2025-12-31: From Tables to Triumph: A PhD Journey in Uncertainty-Aware Scientific Data Extraction (ws-dl.blogspot.com). PhD journey in uncertainty-aware scientific table data extraction using TTA-m, TSR-OCR-UQ, SciTableQA, and SCITEUQ at ODU with Dr. Jian Wu
2025: Career in Review (sajalsharma.com). Year in AI product building, teaching courses, and viral blogging with LangGraph, Liminal, Claude Code, and multi-agent systems
Looking back at 2025 (blog.lawrencejones.dev). Lawrence Jones chronicles incident.io's AI SRE rise: Series B, telemetry push, 18 engineers, Sev0 demo, and hands-on tooling and backtests
🚦 ML/LLMOps
Drift Detection in Robust Machine Learning Systems (towardsdatascience.com). Drift detection in ML systems using data and concept drift, KS test, PSI, chi-square, autoencoder-based multivariate checks, with practical guidance
Why 90% of AI Agents Never Leave the Demo (mikulskibartosz.name). Practical AI engineering guide exploring three production pitfalls and how to build reliable AI pilots with abstractions, metrics, and real-data testing
Actor Mesh: Enterprise Architecture for Scalable AI Engineering (blog.kodigy.com). A practical guide to scalable AI engineering using Actor Model, distributed sagas, Kubernetes, and Monotonic content enrichment
The Control Layers of AI (dri.es). Deterministic workflows, AI decisioning, and open-source orchestration tools like n8n and Activepieces for reliable enterprise AI
Kubernetes v1.35: New level of efficiency with in-place Pod restart (kubernetes.io). Kubernetes 1.35 introduces in-place Pod restart (alpha), enabling full Pod restarts to reset state for AI/ML workloads
Getting metrics by logging (natemeyvis.com). Using logs to emit metrics in AWS CloudWatch with Python, refactoring for testability and performance benefits
⚙️ PyTorch, LLMs & GPUs
Accelerating Multimodal Inference in vLLM: The One-Line Optimization for Large Multimodal Models (rocm.blogs.amd.com). Batch-level data parallelism for vision encoders in vLLM boosts throughput on AMD MI300X using a one-line --mm-encoder-tp-mode data switch
Optimizing Data Transfer in AI/ML Workloads (towardsdatascience.com). explores data-transfer bottlenecks in AI/ML workloads using NVIDIA Nsight Systems and PyTorch, with CUDA streams and prefetching
Inspecting and Visualizing Torch FX Graph (leimao.github.io). Inspect and visualize Torch FX graphs and ATen IR with PyTorch 2.x, using FxGraphDrawer, TorchFunctionMode, and TorchDispatchMode
Train Your Large Model on Multiple GPUs with Tensor Parallelism (machinelearningmastery.com). Tensor parallelism for large transformers on multi-GPU systems using PyTorch, with TP plans, DTensor, and 2D parallelism concepts
Parameter-efficient fine-tuning in tinygrad (dxuuu.xyz). Parameter-efficient fine-tuning with LoRA in tinygrad on Llama 3.2 1B, exploring implementation and inference
SparK: Query-Aware Unstructured Sparsity with Recoverable KV Cache Channel Pruning (rocm.blogs.amd.com). SparK enables query-aware unstructured KV cache pruning and recoverable channels to compress LLM KV caches on AMD Instinct GPUs
Finding Hotspots in Your Code with the Intel VTune Command-Line Interface (nas.nasa.gov). Profiling NVIDIA Intel VTune hotspots via command-line for serial, Python, OpenMP, and MPI with MPT on NASA HECC clusters
Zhang et al (2024) TinyLlama (adrian.idv.hk). TinyLlama 1.1B trains on SlimPajama-derived data to outperform larger models; uses Llama 2 architecture, FSDP, FlashAttention, xFormer; reports 24K tokens/s per A100-40G
🎯 Applications
Use Claude Code to Query 600 GB Indexes over Hacker News, ArXiv, etc. (exopriors.com). Semantic search over alignment documents with Claude Code, vector mixing, debiasing, and public/private embeddings for arXiv, Hacker News, and More
Not just numbers: Understanding cities through their words (gisagents.org). Explores how housing, tourism, digital economy, and government reports reveal city dynamics using online reviews and social data
Machine Learning for Optimization: Toy Example (juanitorduz.github.io). Brute-force ML optimization in Python using HistGradientBoostingRegressor to tune bids, with Nelder-Mead/Powell, and visualization of x1/x2 synthetic data
When Kevin Malone meets Claude (koaning.io). Concise guide to HuberRegressor in scikit-learn, its robust loss, epsilon tuning, and practical usage
How Netflix Uses Matrix Factorization to Predict Your Next Favorite Movie (journal.hexmos.com). Explains Matrix Factorization with latent factors and SGD for sparse Netflix-like data using Python vectors
Generating Human Faces with Variational Autoencoders (mayberay.bearblog.dev). Variational autoencoders (VAEs) explored: KL-divergence, reparameterization, MNIST digits, and convolutional faces using PyTorch
⏱️ Forecasting & State Space
Forecasting Hierarchical Models - Part III (juanitorduz.github.io). Hybrid deep state-space forecasting in NumPyro with Flax NNX, station embeddings, and SVI in Python
Python examples for ‘Beyond Nelson-Siegel and splines: A model- agnostic Machine Learning framework for discount curve calibration, interpolation and extrapolation’ (thierrymoudiki.github.io). Python examples using yieldcurveml to interpolate and extrapolate discount curves with Laguerre and Cubic bases in ML models
Forecasting benchmark: Dynrmf (a new serious competitor in town) vs Theta Method on M-Competitions and Tourism competitition (thierrymoudiki.github.io). A benchmarking study comparing Dynrmf and Theta Method on M3, M1, and Tourism datasets using R, parallel processing, and standard accuracy metrics
Modelling time to next reported fault (shape-of-code.com). Explores modelling time to next fault reports using Poisson processes, exponential interarrival times, and user activity effects in software fault prediction
🧪 Causal & Statistical Evidence
Testing Super Learner's Coverage - A Note To Myself (kenkoonwong.com). Explores SuperLearner with TMLE in R, comparing XGBoost, Random Forest, GLM, NNLS, and parallel computation
Effects of Algorithmic Flagging on Fairness: Quasi-experimental Evidence from Wikipedia (mako.cc). Quasi-experimental analysis using regression discontinuity to study algorithmic flagging effects on Wikipedia fairness across editor groups
Willful Incompetence: Repeating False Claims Does Not Make them True (replicationindex.com). Schimmack defends z-curve, critiques Pek et al., discusses EDR, LLN, bias, and meta-analysis in emotion psychology with R-like statistical nuance
🧠 Representation & Vision
Neural Networks: Zero to Hero (karpathy.ai). A practical guide to neural networks with PyTorch and Python, detailing concepts from linear models to deep learning, including backpropagation and optimization
Introducing Brand New Face Recognition in DeepFace (sefiks.com). DeepFace adds register, build_index, and search to enable scalable, stateless face recognition with databases and ANN
Human embeddings (trfetzer.com). Explores human embeddings for team design using 1024D vector space, open-source embeddings, GPT-OSS, and clustering to form diverse groups
Beating BERT? Small LLMs vs Fine-Tuned Encoders for Classification (alex-jacobs.com). 32 experiments compare small LLMs to fine-tuned encoders like BERT/DeBERTa for classification, revealing nuanced performance and throughput insights
The 1,000 neuron challenge (thetransmitter.org). Small neural models in a 1,000-neuron Braincraft competition explore energy-efficient AI with Python/GitHub, featuring Rougier and Churchland
🧲 Physics & Spectra
Robustness, interpretability, and scaling of eigenvalue models (alexshtf.github.io). Robustness, interpretability, and scaling of eigenvalue models using PyTorch on real data (California Housing), examining spectral bounds and feature importance
DeepSeek’s mHC Explained: Manifold-Constrained Hyper-Connections (aipapersacademy.com). Explains DeepSeek’s mHC: manifold-constrained hyper-connections that stabilize and enhance residual streams in LLMs using doubly stochastic mixing via Sinkhorn–Knopp
The Physics of mHC: Why Deep Learning Needs Energy Conservation (toooold.com). Conservation-inspired mHC uses doubly stochastic matrices and Sinkhorn iterations to build energy-conservative neural layers in Python-like reasoning
The Transformer as Renormalization Group Flow (symmetrybroken.com). Transformer attention as a Bayesian RG flow: coarse-graining tokens to semantic attractors using neural networks, physics-inspired interpretation
Information entropy for spectra (nirpyresearch.com). Shannon entropy applied to NIR spectra with Python (SciPy, NumPy); derivative entropy for smoothing optimization, inspired by K. G. Larkin
Limits of the Transformer Architecture and a QCD-like Alternative (symmetrybroken.com). A physics-inspired critique of transformers, exploring UV/IR limits, QCD analogies, and multi-scale architectures for cognition using Python-like pseudocode
📚 Research
Can Small Training Runs Reliably Guide Data Curation? Rethinking Proxy-Model Practice (arxiv:cs). Shows small proxy runs can mis-rank data recipes because optimal hyperparameters depend on data. Lower learning rates make rankings correlate with tuned large-scale pretraining better
Analyzing Communication Predictability in LLM Training (arxiv:cs). Characterizes predictable communication patterns in distributed LLM training and models overhead analytically. ConfigTuner uses the model to pick parallelism settings, boosting throughput up to 1.36×
Disordered Dynamics in High Dimensions: Connections to Random Matrices and Machine Learning (arxiv:stat). Pedagogical dynamical mean-field theory for high-dimensional random-matrix learning dynamics: gradient flow, SGD, random features, deep linear nets. Explains loss nonmonotonicity and bias–variance scaling for practitioners
Conformal Prediction Under Distribution Shift: A COVID-19 Natural Experiment (arxiv:stat). Uses COVID-19 supply-chain shifts to stress-test conformal prediction; coverage collapses unpredictably. Finds SHAP importance concentration predicts failures and suggests monitoring plus quarterly retraining triggers operationally
Add a comment: