Build every layer of an LLM from scratch.
Build it. Break it. Measure it.
35 hands-on projects, 934 pages, 256,587 words. From a scalar autograd engine to mixture of experts, RLHF, quantization, and the fusion of independently trained specialists. Every project ends by deliberately sabotaging the thing you just built — that is how you find out what it was actually doing.
Eight parts, ordered the way you would actually learn the stack: Foundations, Building a GPT, Training at Scale, Inference, Architecture and Scaling, Alignment, System Scope, Modular Composition. Every project follows the same rhythm — Hook, The Concept, Why It Matters, The Build, BREAK IT, Optional Homework, Questions to Answer, Go Further, What You Now Know. The code companion at github.com/mechramc/Under-the-hood mirrors the book project-by-project.
What you can read elsewhere, and what is only here.
The from-scratch shelf is small. Here is an honest map of what overlaps and what does not.
| Book | Coverage | What is missing or different |
|---|---|---|
| Under The Hood35 projects · 934 pages | Autograd → tokenizer → attention → GPT → training at scale → KV cache, Flash, speculative decoding, paged attention → MoE → RLHF, DPO, PPO → quantization → multimodal → non-transformer → fusing specialists. | The wedge is the BREAK IT methodology. Every project deliberately sabotages a component so you can see what it was actually doing. |
| Build a Large Language Model (From Scratch)Sebastian Raschka · 7 chapters | Excellent, tightly focused walkthrough that gets you to a working GPT and basic fine-tuning. | Does not cover RLHF in depth, KV cache, speculative decoding, paged attention, MoE, quantization tradeoffs, multimodal, or specialist fusion. Closest direct competitor on the shelf. |
| Zero to Hero · makemore · nanoGPTAndrej Karpathy · free videos | The spiritual ancestor of the from-scratch genre. Builds intuition for autograd, transformers, and small GPTs. | Video, not a reference. No persistent searchable text. Stops well before post-training, efficient inference, MoE, or specialist fusion. |
| AI EngineeringChip Huyen | Systems and product framing for shipping AI: data pipelines, evaluation, monitoring, deployment. | Different reader. Complementary, not competitive. Does not implement the model layer. |
| Hands-On Large Language ModelsIusztin, Labonne | Recipe-style applied LLM work using existing libraries. | Library-driven rather than from-scratch. Different reader profile. |
Five entry points into the build.
Each entry point maps to a stretch of the book where the projects cluster. Start where your question lives.
Build an LLM from Scratch
Scalar autograd, BPE, embeddings, attention, the GPT block, the residual stream. The whole foundation in one stretch.
Attention from Scratch
Scaled dot-product attention, causal masking, multi-head, Flash Attention with tiled kernels and online softmax.
KV Cache and Fast Inference
KV cache mechanics, speculative decoding, paged attention, continuous batching, long-context extension.
Mixture of Experts from Scratch
Sparse routing, top-k gating, expert imbalance and routing collapse, the math of conditional compute.
RLHF and Preference Optimization
Reward models, PPO, DPO and its variants, test-time reasoning, tool use, the alignment stack end to end.
Read full chapters from the book.
Eight full-length chapter excerpts, free to read. Each excerpt is the chapter as it appears in the book — including the BREAK IT experiments. If a topic looks like the one you came here for, start there.
Attention From Scratch
Scaled dot-product attention from a blank file. Disable scaling, drop the mask, kill the softmax — see why each piece is there.
Chapter 5Your GPT From a Blank File
Embeddings, blocks, residual stream, language-model head. Build a working decoder and then break each piece to see how it fails.
Chapter 8Flash Attention and Tiled Kernels
Tile-based attention with online softmax. Profile memory savings, then break tiling to see why the standard kernel hurts long sequences.
Chapter 13Fast Inference: The KV Cache
Build the cache, measure the speedup on long generations, invalidate keys to see exactly what it caches and why.
Chapter 18Mixture of Experts
Sparse MoE with top-2 routing. Train it, then break the router to see routing collapse, expert imbalance, and dead experts.
Chapter 23Reward Models and RLHF
Train a reward model, run PPO against it, deliberately break the reward to see what reward hacking actually looks like in code.
Chapter 27Quantization and Deployment
int8 and int4 quantization, accuracy versus latency, what happens when calibration goes wrong.
Chapter 32Fusing Independently Trained Specialists
Take two specialists trained on different data and fuse them into one routed system. The BREAK IT pass shows when fusion silently degrades.
For engineers who learn by writing the thing.
If your instinct is to open a blank file and start typing when you want to understand something, this is the book. If you want a survey of the LLM landscape without writing code, there are better books — Chip Huyen's AI Engineering is a good starting point. If you want to ship an LLM-powered product without understanding the layers underneath, library-driven books will get you there faster.
Under The Hood is the long path. The payoff is that nothing about the modern LLM stack feels mysterious afterward.
Written by a working researcher.
Ramchand Muralidharan founded Murai Labs after a decade of building and shipping AI systems. Author of KALAVAI (predicting when specialist fusion works), Orion (programming the Apple Neural Engine for LLMs), and other arXiv work on adaptive routing and evolutionary adapter architectures. Senior PM at Procore before going full time on research. The book is what he wished existed when he started.
Honest answers.
Is this beginner-friendly?
Yes, if you can write Python and remember a little linear algebra. The book opens with a scalar autograd engine in a single file and builds every layer above it from there. The Preflight chapter sets up the math, the environment, and the workflow before any project starts.
What background do I need?
Comfortable Python, basic NumPy or PyTorch tensors, and enough linear algebra to recognise a dot product. No prior deep-learning experience is assumed. If you have read the Karpathy makemore series you will be ahead, but it is not required.
Is the code on GitHub?
github.com/mechramc/Under-the-hood is the public code companion, organised project-by-project so each chapter maps to a directory you can clone and run.
How is this different from Sebastian Raschka's book?
Raschka's Build a Large Language Model (From Scratch) is excellent and tightly focused on getting to a working GPT in roughly seven chapters. Under The Hood covers 35 projects and goes meaningfully further: RLHF, DPO, KV cache, Flash Attention, speculative decoding, paged attention, MoE, quantization, multimodal, non-transformer architectures, and specialist fusion. Every project also includes a BREAK IT section that deliberately sabotages a piece of the system to show what it was actually doing.
How is this different from Karpathy's Zero-to-Hero?
Zero-to-Hero is a free, inspirational video series and the spiritual ancestor of the from-scratch genre. Under The Hood is the persistent reference those videos do not try to be: searchable, structured, with BREAK IT experiments that go past the territory the videos cover.
PDF, EPUB, or web?
PDF and EPUB, delivered through Leanpub with free updates for life. The chapter excerpts on this site are the same chapters rendered for web reading.
Why Leanpub and not Amazon?
Leanpub supports continuous publishing with free lifetime updates, which fits a technical book whose subject matter keeps moving. Readers get every revision, not just the snapshot that shipped on launch day.
What does the coupon do?
The MURAI200 coupon ties the purchase to this site so the author can see which pages actually drove the sale. Same book, same price floor — the link just tags the source.
Buy Under The Hood on Leanpub.
934 pages. 35 projects. Lifetime updates. Read a sample first if you want — eight full chapters are linked above.