The ML Engineer 30-12-2025
AI guardrails at huggingface, pretraining optimisations at character.ai
🪏 Tech Notes
AprielGuard: A Guardrail for Safety and Adversarial Robustness in Modern LLM Systems (huggingface​.co). AprielGuard, an 8B safety-security guardrail for LLMs, tackles safety and adversarial robustness in agentic, long-context workflows using synthetic data and multilingual evaluation
Optimizing Large-Scale Pretraining at Character.ai (blog​.character​.ai). Character.ai outlines 6-bit gradient compression Squinch, Attention Z-Reg, Dynamic Clamping, visibility masks, and Gumbel-based distillation
đź§ Learning Theory & Optimization
From PCA to Barlow Twins: A Statistical View of Redundancy Reduction in Self-Supervised Learning (egpivo​.github​.io). Barlow Twins decorrelates features during training, compares to PCA, and discusses information geometry, redundancy, and isotropic Fisher analysis using synthetic toys
Initial Quick Thoughts on Singular Learning Theory (beren​.io). Explores singular learning theory (SLT) concepts, posterior geometry, WBIC, SGD noise, and practical implications for generalization and optimization
AIXI with general utility functions: “Value under ignorance in UAI” (uaiasi​.com). AIXI with general utility functions explores value under ignorance and semimeasures for AIXI in AI safety context
đź§± Pipelines & Retrieval
Week 4 in Data Science: Building ML Systems From the Ground Up (igorstechnoclub​.com). Gradient descent from scratch, loss functions, SGDRegressor/Classifier demos in Python (NumPy, scikit-learn) on production-minded ML systems
OpenCV G-API: From Imperative to Declarative Pipelines (opencv​.org). OpenCV G-API shifts image processing from imperative code to a graph-based declarative model with CPU, GPU, and OpenCL backends
GraphRAG: Why Vector Search Breaks Down at the Corpus Level (mostlylucid​.net). GraphRAG: using a knowledge graph and community summaries with vector search to enable corpus-level reasoning in RAG workflows
đź§° Performance & Data Structures
The 25x Speedup: Why Python Performance Rules Matter (vinitkumar​.me). Practical Python performance tips using NumPy for vectorization, preallocation, and batch processing to achieve a 25x speedup
Overengineering float serialization for no good reason (wejn​.org). A Python-centered exploration of float serialization, mantissa trimming, and polynomial coefficient encoding in Ocean Optics calibration
Bloom Filters (arpitbhayani​.me). Overview of Bloom filters: probabilistic membership, false positives, hashing, double hashing, counting and deletable variants, and database/system benchmarks
Picomon 0.2.0: From AMD Crash Fix to GPU Monitoring That Doesn’t Suck (omarkama​.li). A Python-based, multi-vendor GPU monitoring tool using LLM parsing, Textual UI, and async metrics across AMD, Nvidia, and Apple Silicon
📏 Monitoring & Interpretability
Stop Retraining Blindly: Use PSI to Build a Smarter Monitoring Pipeline (towardsdatascience​.com). PSI monitoring for data drift using Python; demonstrates detection of feature drift and impact on predictions
Training Matching Pursuit SAEs on LLMs (lesswrong​.com). Training MP-SAEs on LLMs with SAELens, comparing to BatchTopK and Matryoshka SAEs, and exploring reconstruction vs. interpretability performance
Call for Science of Eval Awareness (+ Research Directions) (lesswrong​.com). Explores evaluation awareness in AI models, proposing research directions on pre-training, post-training, mechanistic interpretation, and measurement
🏋️ Model Training Engineering
Rebuilding AlexNet (y​.tsutsumi​.io). Rebuilding AlexNet with PyTorch, vanilla training, and a tiny imagenette dataset, exploring architecture, data, and dropout effects
Training a Model with Limited Memory using Mixed Precision and Gradient Checkpointing (machinelearningmastery​.com). Techniques for training memory-limited models with PyTorch: mixed precision, AMP autocast, GradScaler, and gradient checkpointing
Fine-Tuning Qwen3-VL (debuggercafe​.com). Fine-tuning Qwen3-VL 2B on a sketch-to-HTML dataset using LoRA for improved HTML generation
Solve Hi-Q with AlphaZero and Curriculum Learning (robw​.fyi). Hi-Q solver with AlphaZero and curriculum learning using Python/PyTorch, MCTS, and Claude Code by Robert W
🎯 Recommenders & Personalization
A Modern Recommender Model Architecture (cprimozic​.net). Detailed dive into a modern recommender using a Denoising Autoencoder, JAX (AMD GPU), multi-head decoder, custom rating features, and advanced loss balancing tricks
Break the Lock-In: Carry Your Preferences Anywhere (data-processing​.club). Explains Pretender: end-user preference transfer to counter cold start using MMD or 1-Wasserstein, with K-item rating strategy
Zero PII Customer Intelligence - Part 1: The Segmentation Model (mostlylucid​.net). Zero-PII segmentation using session signals, decay-based interests, vector embeddings (Qdrant), DuckDB analytics, and transparent user-facing controls in a .NET/C# ecommerce proof-of-concept
📚 Academic Research
KernelEvolve: Scaling Agentic Kernel Coding for Heterogeneous AI Accelerators at Meta (arxiv:cs). Meta’s KernelEvolve auto-generates and optimizes DLRM kernels across NVIDIA, AMD, and MTIA using search plus retrieval-augmented prompts. Cuts kernel work weeks to hours, boosts throughput
FUSCO: High-Performance Distributed Data Shuffling via Transformation-Communication Fusion (arxiv:cs). Fusco fuses layout transformation with communication to accelerate Mixture-of-Experts token shuffling. Outperforms NCCL/DeepEP, reducing MoE training and inference latency—useful for anyone scaling expert-parallel models today
Completed Hyperparameter Transfer across Modules, Width, Depth, Batch and Duration (arxiv:stat). Apple and Cambridge propose Complete parameterization enabling hyperparameter transfer across width, depth, batch, and duration, even per-module. Tune once small, train faster large models reliably
Measuring Variable Importance via Accumulated Local Effects (arxiv:stat). ALE-based variable importance avoids misleading extrapolation and deflation under correlated features, while being cheaper than Shapley or permutation methods. Better interpretability for tabular models practitioners
A Profit-Based Measure of Lending Discrimination (arxiv:stat). Harvard–Stanford introduce a profit-based discrimination metric for loan pricing audits. Real fintech data shows miscalibration favors some groups; including protected attributes can improve fairness measurably
Add a comment: