Area	Examples
Core Architecture	Transformer, Attention, FFN Layer, MoE, Dense Model
Memory & Compute	KV Cache, Quantization, Inference
Vectors & Retrieval	Embeddings, RAG, Vector DB, Latent Space
Generation & Sampling	Temperature, Top-p, Logits
Training & Alignment	Fine-tuning, LoRA, RLHF, Distillation
Evaluation	Evals, Harness Engineering
Prompting

Area

Examples

Core Architecture

Transformer, Attention, FFN Layer, MoE, Dense Model

Memory & Compute

KV Cache, Quantization, Inference

Vectors & Retrieval

Embeddings, RAG, Vector DB, Latent Space

Generation & Sampling

Temperature, Top-p, Logits

Training & Alignment

Fine-tuning, LoRA, RLHF, Distillation

Evaluation

Evals, Harness Engineering

Prompting

What happens when you send a prompt to an LLM chat?

This repository answers a deceptively deep question:

"What happens - at every layer of the stack - when you type a message into Claude or ChatGPT and press Send?"

Inspired by the classic what-happens-when repository for browser navigation, this traces the full journey of a prompt: from keystroke to rendered response, skipping nothing.

The target reader is an engineer who already understands transformers, attention, and RAG - and wants production intuition, not another introductory walkthrough.

Contributions welcome. If you see a missing layer, open a PR.

Disclaimer: Neither Anthropic nor OpenAI publishes their infrastructure internals. This document describes general patterns that are well-established across the industry - grounded in public research, open-source inference frameworks, and published API documentation. Where specific examples are needed (model architecture, pricing, safety classifiers), they draw from open-source models or a single provider's public…