How is this different from Sebastian Raschka's Build a Large Language Model (From Scratch)?

Raschka's book is excellent and tightly focused on getting to a working GPT in roughly seven chapters. Under The Hood covers 35 projects and goes meaningfully further into the modern stack: RLHF and DPO, KV cache and speculative decoding, paged attention, mixture of experts, multimodal, non-transformer architectures, and the fusion of independently trained specialists. Every project also includes a BREAK IT section that deliberately sabotages a piece of the system to show what it was actually doing.

PDF and EPUB, delivered through Leanpub with free updates for life. Excerpts on this site are the same chapters rendered for the web.

The book

Build every layer of an LLM from scratch.

Name: Under The Hood — Build Every Layer of a Large Language Model from Scratch
Price: 19.99 USD
Availability: InStock
Author: Ramchand Muralidharan

Build it. Break it. Measure it.

35 hands-on projects, 934 pages, 256,587 words. From a scalar autograd engine to mixture of experts, RLHF, quantization, and the fusion of independently trained specialists. Every project ends by deliberately sabotaging the thing you just built — that is how you find out what it was actually doing.

Buy on Leanpub — $19.99 See the 35 projects

What you actually get

934

Pages

256,587

Words

Hands-on projects

100%

Complete · free updates

Eight parts, ordered the way you would actually learn the stack: Foundations, Building a GPT, Training at Scale, Inference, Architecture and Scaling, Alignment, System Scope, Modular Composition. Every project follows the same rhythm — Hook, The Concept, Why It Matters, The Build, BREAK IT, Optional Homework, Questions to Answer, Go Further, What You Now Know. The code companion at github.com/mechramc/Under-the-hood mirrors the book project-by-project.

How it compares

What you can read elsewhere, and what is only here.

The from-scratch shelf is small. Here is an honest map of what overlaps and what does not.

Book	Coverage	What is missing or different
Under The Hood35 projects · 934 pages	Autograd → tokenizer → attention → GPT → training at scale → KV cache, Flash, speculative decoding, paged attention → MoE → RLHF, DPO, PPO → quantization → multimodal → non-transformer → fusing specialists.	The wedge is the BREAK IT methodology. Every project deliberately sabotages a component so you can see what it was actually doing.
Build a Large Language Model (From Scratch)Sebastian Raschka · 7 chapters	Excellent, tightly focused walkthrough that gets you to a working GPT and basic fine-tuning.	Does not cover RLHF in depth, KV cache, speculative decoding, paged attention, MoE, quantization tradeoffs, multimodal, or specialist fusion. Closest direct competitor on the shelf.
Zero to Hero · makemore · nanoGPTAndrej Karpathy · free videos	The spiritual ancestor of the from-scratch genre. Builds intuition for autograd, transformers, and small GPTs.	Video, not a reference. No persistent searchable text. Stops well before post-training, efficient inference, MoE, or specialist fusion.
AI EngineeringChip Huyen	Systems and product framing for shipping AI: data pipelines, evaluation, monitoring, deployment.	Different reader. Complementary, not competitive. Does not implement the model layer.
Hands-On Large Language ModelsIusztin, Labonne	Recipe-style applied LLM work using existing libraries.	Library-driven rather than from-scratch. Different reader profile.

The cluster

Five entry points into the build.

Each entry point maps to a stretch of the book where the projects cluster. Start where your question lives.

Projects 1 — 6

Build an LLM from Scratch

Scalar autograd, BPE, embeddings, attention, the GPT block, the residual stream. The whole foundation in one stretch.

Cluster Projects 4, 8

Attention from Scratch

Scaled dot-product attention, causal masking, multi-head, Flash Attention with tiled kernels and online softmax.

Cluster Projects 13 — 17

KV Cache and Fast Inference

KV cache mechanics, speculative decoding, paged attention, continuous batching, long-context extension.

Cluster Projects 18 — 20

Mixture of Experts from Scratch

Sparse routing, top-k gating, expert imbalance and routing collapse, the math of conditional compute.

Cluster Projects 21 — 26

RLHF and Preference Optimization

Reward models, PPO, DPO and its variants, test-time reasoning, tool use, the alignment stack end to end.

Cluster

Chapter excerpts

Read full chapters from the book.

Eight full-length chapter excerpts, free to read. Each excerpt is the chapter as it appears in the book — including the BREAK IT experiments. If a topic looks like the one you came here for, start there.

Chapter 4

Attention From Scratch

Scaled dot-product attention from a blank file. Disable scaling, drop the mask, kill the softmax — see why each piece is there.

Chapter 5

Your GPT From a Blank File

Embeddings, blocks, residual stream, language-model head. Build a working decoder and then break each piece to see how it fails.

Chapter 8

Flash Attention and Tiled Kernels

Tile-based attention with online softmax. Profile memory savings, then break tiling to see why the standard kernel hurts long sequences.

Chapter 13

Fast Inference: The KV Cache

Build the cache, measure the speedup on long generations, invalidate keys to see exactly what it caches and why.

Chapter 18

Mixture of Experts

Sparse MoE with top-2 routing. Train it, then break the router to see routing collapse, expert imbalance, and dead experts.

Chapter 23

Reward Models and RLHF

Train a reward model, run PPO against it, deliberately break the reward to see what reward hacking actually looks like in code.

Chapter 27

Quantization and Deployment

int8 and int4 quantization, accuracy versus latency, what happens when calibration goes wrong.

Chapter 32

Fusing Independently Trained Specialists

Take two specialists trained on different data and fuse them into one routed system. The BREAK IT pass shows when fusion silently degrades.

Who this book is for

For engineers who learn by writing the thing.

If your instinct is to open a blank file and start typing when you want to understand something, this is the book. If you want a survey of the LLM landscape without writing code, there are better books — Chip Huyen's AI Engineering is a good starting point. If you want to ship an LLM-powered product without understanding the layers underneath, library-driven books will get you there faster.

Under The Hood is the long path. The payoff is that nothing about the modern LLM stack feels mysterious afterward.

About the author

Written by a working researcher.

Ramchand Muralidharan founded Murai Labs after a decade of building and shipping AI systems. Author of KALAVAI (predicting when specialist fusion works), Orion (programming the Apple Neural Engine for LLMs), and other arXiv work on adaptive routing and evolutionary adapter architectures. Senior PM at Procore before going full time on research. The book is what he wished existed when he started.

FAQ

Honest answers.

Is this beginner-friendly?

Yes, if you can write Python and remember a little linear algebra. The book opens with a scalar autograd engine in a single file and builds every layer above it from there. The Preflight chapter sets up the math, the environment, and the workflow before any project starts.

What background do I need?

Comfortable Python, basic NumPy or PyTorch tensors, and enough linear algebra to recognise a dot product. No prior deep-learning experience is assumed. If you have read the Karpathy makemore series you will be ahead, but it is not required.

Is the code on GitHub?

github.com/mechramc/Under-the-hood is the public code companion, organised project-by-project so each chapter maps to a directory you can clone and run.

How is this different from Sebastian Raschka's book?

Raschka's Build a Large Language Model (From Scratch) is excellent and tightly focused on getting to a working GPT in roughly seven chapters. Under The Hood covers 35 projects and goes meaningfully further: RLHF, DPO, KV cache, Flash Attention, speculative decoding, paged attention, MoE, quantization, multimodal, non-transformer architectures, and specialist fusion. Every project also includes a BREAK IT section that deliberately sabotages a piece of the system to show what it was actually doing.

How is this different from Karpathy's Zero-to-Hero?

Zero-to-Hero is a free, inspirational video series and the spiritual ancestor of the from-scratch genre. Under The Hood is the persistent reference those videos do not try to be: searchable, structured, with BREAK IT experiments that go past the territory the videos cover.

PDF, EPUB, or web?

PDF and EPUB, delivered through Leanpub with free updates for life. The chapter excerpts on this site are the same chapters rendered for web reading.

Why Leanpub and not Amazon?

Leanpub supports continuous publishing with free lifetime updates, which fits a technical book whose subject matter keeps moving. Readers get every revision, not just the snapshot that shipped on launch day.

What does the coupon do?

The MURAI200 coupon ties the purchase to this site so the author can see which pages actually drove the sale. Same book, same price floor — the link just tags the source.

Start reading

Buy Under The Hood on Leanpub.

934 pages. 35 projects. Lifetime updates. Read a sample first if you want — eight full chapters are linked above.

Buy on Leanpub — $19.99 Code companion on GitHub