How LLMs Actually Work — From Tokens to Text (with Python)

#python #ai #machinelearning #beginners

Ever wondered how ChatGPT and other large language models actually work? Under the hood, an LLM does exactly one thing: predict the next token. Everything else — reasoning, code, conversation — emerges from doing that incredibly well after training on trillions of words.

Here's the whole pipeline, end to end.

1. Tokenization

The model never sees raw letters. A tokenizer splits text into subword tokens and maps each to an integer id.

2. Embeddings

Each token id becomes a learned vector, so related meanings sit close together in space.

3. Self-attention

Every token looks at every other token and decides how much to attend to each — this is how the model builds context.

4. The transformer stack

Attention plus a feed-forward network, stacked dozens of times (GPT-3 used 96 layers), each refining the representation.

5. Predict and sample

The final layer scores every token in the vocabulary, softmax turns those scores into probabilities, and the model samples the next token — then appends it and runs again. That loop is how text is generated.