When I started building with LLMs, I kept running into terms I didn't fully understand. Quantization, KV cache, top-k sampling, temperature. Every time I looked one up, I got either a textbook definition or a link to a paper.
That told me what the term is. It didn't tell me what to do with it. What decision does it affect? What breaks if I ignore it? What tradeoff am I making?
So I started keeping notes. For each term, I wrote down the production angle: why it matters when you're actually shipping something. Over time it grew into 30+ entries organized across 8 pillars, from Core Architecture to Agentic AI, with linked related concepts so you can follow threads naturally.
I cleaned it up, built a browsable UI with search and filtering, and open sourced it.
tomerjann
/
llm-field-notes
LLM terms explained from an engineering perspective with the production implications, not just the definition.
llm-field-notes
LLM terms explained from an engineering angle, with the production implications, not just the definition.
I've been learning how LLMs work at the systems level and kept a running list of every term I had to look up. Writing down what each one actually means when you're building something helped me understand them better than just reading about them.
I thought it might help others too, so I cleaned it up and open sourced it.
What's here
30+ terms across 8 areas, each with a plain-English definition and links to related concepts so you can follow threads rather than look things up in isolation.
| Area | Examples |
|---|---|
| Core Architecture | Transformer, Attention, FFN Layer, MoE, Dense Model |
| Memory & Compute | KV Cache, Quantization, Inference |
| Vectors & Retrieval | Embeddings, RAG, Vector DB, Latent Space |
| Generation & Sampling | Temperature, Top-p, Logits |
| Training & Alignment | Fine-tuning, LoRA, RLHF, Distillation |
| Evaluation | Evals, Harness Engineering |
| Prompting |
There's also a companion project that walks through everything that happens from the moment you hit send to the moment a response streams back:
tomerjann
/
what-happens-when-you-prompt
A deep-dive reference tracing every layer of the stack when you send a prompt to an LLM chat, from keystroke to streamed token. Covers tokenization, KV cache, prefill/decode, sampling, SSE streaming, and more.
What happens when you send a prompt to an LLM chat?
This repository answers a deceptively deep question:
"What happens - at every layer of the stack - when you type a message into Claude or ChatGPT and press Send?"
Inspired by the classic what-happens-when repository for browser navigation, this traces the full journey of a prompt: from keystroke to rendered response, skipping nothing.
The target reader is an engineer who already understands transformers, attention, and RAG - and wants production intuition, not another introductory walkthrough.
Contributions welcome. If you see a missing layer, open a PR.
Disclaimer: Neither Anthropic nor OpenAI publishes their infrastructure internals. This document describes general patterns that are well-established across the industry - grounded in public research, open-source inference frameworks, and published API documentation. Where specific examples are needed (model architecture, pricing, safety classifiers), they draw from open-source models or a single provider's public…
If you've ever felt lost in LLM jargon while building something real, this might save you some time.
Top comments (0)