The ML Engineer 30-12-2025

        December 30, 2025

The ML Engineer 30-12-2025
AI guardrails at huggingface, pretraining optimisations at character.ai

        🪏 Tech Notes
AprielGuard: A Guardrail for Safety and Adversarial Robustness in Modern LLM Systems (huggingface.co). AprielGuard, an 8B safety-security guardrail for LLMs, tackles safety and adversarial robustness in agentic, long-context workflows using synthetic data and multilingual evaluation

Optimizing Large-Scale Pretraining at Character.ai (blog.character.ai). Character.ai outlines 6-bit gradient compression Squinch, Attention Z-Reg, Dynamic Clamping, visibility masks, and Gumbel-based distillation

🧠 Learning Theory & Optimization
From PCA to Barlow Twins: A Statistical View of Redundancy Reduction in Self-Supervised Learning (egpivo.github.io). Barlow Twins decorrelates features during training, compares to PCA, and discusses information geometry, redundancy, and isotropic Fisher analysis using synthetic toys

Initial Quick Thoughts on Singular Learning Theory (beren.io). Explores singular learning theory (SLT) concepts, posterior geometry, WBIC, SGD noise, and practical implications for generalization and optimization

AIXI with general utility functions: “Value under ignorance in UAI” (uaiasi.com). AIXI with general utility functions explores value under ignorance and semimeasures for AIXI in AI safety context

🧱 Pipelines & Retrieval
Week 4 in Data Science: Building ML Systems From the Ground Up (igorstechnoclub.com). Gradient descent from scratch, loss functions, SGDRegressor/Classifier demos in Python (NumPy, scikit-learn) on production-minded ML systems

OpenCV G-API: From Imperative to Declarative Pipelines (opencv.org). OpenCV G-API shifts image processing from imperative code to a graph-based declarative model with CPU, GPU, and OpenCL backends

GraphRAG: Why Vector Search Breaks Down at the Corpus Level (mostlylucid.net). GraphRAG: using a knowledge graph and community summaries with vector search to enable corpus-level reasoning in RAG workflows

🧰 Performance & Data Structures

The 25x Speedup: Why Python Performance Rules Matter (vinitkumar.me). Practical Python performance tips using NumPy for vectorization, preallocation, and batch processing to achieve a 25x speedup

Overengineering float serialization for no good reason (wejn.org). A Python-centered exploration of float serialization, mantissa trimming, and polynomial coefficient encoding in Ocean Optics calibration

Bloom Filters (arpitbhayani.me). Overview of Bloom filters: probabilistic membership, false positives, hashing, double hashing, counting and deletable variants, and database/system benchmarks

Picomon 0.2.0: From AMD Crash Fix to GPU Monitoring That Doesn’t Suck (omarkama.li). A Python-based, multi-vendor GPU monitoring tool using LLM parsing, Textual UI, and async metrics across AMD, Nvidia, and Apple Silicon

📏 Monitoring & Interpretability

Stop Retraining Blindly: Use PSI to Build a Smarter Monitoring Pipeline (towardsdatascience.com). PSI monitoring for data drift using Python; demonstrates detection of feature drift and impact on predictions

Training Matching Pursuit SAEs on LLMs (lesswrong.com). Training MP-SAEs on LLMs with SAELens, comparing to BatchTopK and Matryoshka SAEs, and exploring reconstruction vs. interpretability performance

Call for Science of Eval Awareness (+ Research Directions) (lesswrong.com). Explores evaluation awareness in AI models, proposing research directions on pre-training, post-training, mechanistic interpretation, and measurement

🏋️ Model Training Engineering

Rebuilding AlexNet (y.tsutsumi.io). Rebuilding AlexNet with PyTorch, vanilla training, and a tiny imagenette dataset, exploring architecture, data, and dropout effects

Training a Model with Limited Memory using Mixed Precision and Gradient Checkpointing (machinelearningmastery.com). Techniques for training memory-limited models with PyTorch: mixed precision, AMP autocast, GradScaler, and gradient checkpointing

Fine-Tuning Qwen3-VL (debuggercafe.com). Fine-tuning Qwen3-VL 2B on a sketch-to-HTML dataset using LoRA for improved HTML generation

Solve Hi-Q with AlphaZero and Curriculum Learning (robw.fyi). Hi-Q solver with AlphaZero and curriculum learning using Python/PyTorch, MCTS, and Claude Code by Robert W

🎯 Recommenders & Personalization

A Modern Recommender Model Architecture (cprimozic.net). Detailed dive into a modern recommender using a Denoising Autoencoder, JAX (AMD GPU), multi-head decoder, custom rating features, and advanced loss balancing tricks

Break the Lock-In: Carry Your Preferences Anywhere (data-processing.club). Explains Pretender: end-user preference transfer to counter cold start using MMD or 1-Wasserstein, with K-item rating strategy

Zero PII Customer Intelligence - Part 1: The Segmentation Model (mostlylucid.net). Zero-PII segmentation using session signals, decay-based interests, vector embeddings (Qdrant), DuckDB analytics, and transparent user-facing controls in a .NET/C# ecommerce proof-of-concept

📚 Academic Research
KernelEvolve: Scaling Agentic Kernel Coding for Heterogeneous AI Accelerators at Meta (arxiv:cs). Meta’s KernelEvolve auto-generates and optimizes DLRM kernels across NVIDIA, AMD, and MTIA using search plus retrieval-augmented prompts. Cuts kernel work weeks to hours, boosts throughput

FUSCO: High-Performance Distributed Data Shuffling via Transformation-Communication Fusion (arxiv:cs). Fusco fuses layout transformation with communication to accelerate Mixture-of-Experts token shuffling. Outperforms NCCL/DeepEP, reducing MoE training and inference latency—useful for anyone scaling expert-parallel models today

Completed Hyperparameter Transfer across Modules, Width, Depth, Batch and Duration (arxiv:stat). Apple and Cambridge propose Complete parameterization enabling hyperparameter transfer across width, depth, batch, and duration, even per-module. Tune once small, train faster large models reliably

Measuring Variable Importance via Accumulated Local Effects (arxiv:stat). ALE-based variable importance avoids misleading extrapolation and deflation under correlated features, while being cheaper than Shapley or permutation methods. Better interpretability for tabular models practitioners

A Profit-Based Measure of Lending Discrimination (arxiv:stat). Harvard–Stanford introduce a profit-based discrimination metric for loan pricing audits. Real fintech data shows miscalibration favors some groups; including protected attributes can improve fairness measurably

                            Don't miss what's next. Subscribe to The ML Engineer:

            Email address (required)

          Add a comment:

                Share this email:

                                Share on LinkedIn

                                Share on Hacker News

                                Share on Mastodon

                                Share on Bluesky