What actually happens when we chat with AI (GPT/Claude)

Hey,

We use ChatGPT / Claude every day.

If someone asked us to explain how it actually works, can we?

A few weeks ago, I couldn't. And honestly, that bothered me. I'm a developer. I don't like using tools I can't explain.

So I went and checked it properly. Then I made a content, breaking it down into 3 layers, simply, nothing serious!

Here's the short version.

Layer 1: Deep Learning

For decades, computers followed rules.

A programmer wrote: if user types X, return Y. Clean, predictable — but completely rigid.

Language doesn't work that way, though. "I'm fine" can mean happy. Or the complete opposite. Context changes everything. And you can't write rules for every possible sentence.

So AI takes a different approach entirely. Instead of following rules, it learns from examples. Millions of them. Show it enough, and sentences and patterns emerge on their own — the same way a child learns to speak. Nobody teaches a child grammar rules first. They just hear thousands of sentences, and understanding develops naturally.

That idea came from studying the human brain. The brain processes information through layers of neurons, each layer picking up something slightly deeper than the last. Deep learning works the same way. Layers of artificial neurons, each one learning a little more than the one before it.

That's deep learning. Not rules. Not programming. Just learning from data.

Layer 2: The Transformer

Even with deep learning, early models still had a problem.

They read sentences word by word, left to right. Like reading a book one letter at a time. By the end of a long sentence, they had almost forgotten how it started.

But that's not how humans read. When you see the sentence, "The trophy didn't fit in the suitcase because it was too big", you instantly know "it" means the trophy, not the suitcase. Your brain reads the whole thing at once and connects the dots.

Early AI couldn't do that.

So in 2017, a team of researchers at Google asked a simple question: what if the model could read everything at once? What if every word could look at every other word and ask — how much do you matter to me right now?

That question became a paper called Attention Is All You Need. It has been cited over 170,000 times. One of the ten most cited academic papers of the 21st century. And most people using ChatGPT today have never heard of it.

That paper gave birth to the Transformer.

GPT. Generative. Pre-trained. Transformer. The name was there the whole time — nobody pointed it out to me for years.

Layer 3: ChatGPT

Understanding language is one thing. Being helpful is another.

After training GPT on massive amounts of text, OpenAI brought in human trainers to rate responses. Helpful. Harmful. Makes no sense. The model learned from those ratings.

That process: RLHF, Reinforcement Learning from Human Feedback, is what turned a language model into a useful assistant.

One more thing worth knowing: ChatGPT doesn't look up facts. It predicts what word comes next, based on everything it learned. When it's right, it seems intelligent. When it's wrong, it sounds completely confident anyway. That's why it hallucinates.

Not because it's broken. Because it's always predicting, never looking up.

So, the moment you hit send

Your message → broken into tokens → each token becomes a number → transformer runs attention across all tokens → model predicts the next word → then the next → until the response is complete.

No magic. No consciousness. No internet lookup.

Just patterns. Attention. Prediction.

I made a full video walking through all of this with visuals. If you want to actually see the pipeline, and be able to explain it to someone else after, it's worth 9 minutes.

👉 https://youtu.be/aLVUzbF1sws?si=T2cbN5SLWotQPE42

And if you're not subscribed to this newsletter yet — this is what every issue looks like. One concept, simply explained, from a developer actually learning it.

👉 Subscribe at: developer-data.beehiiv.com

See you in the next one.

— Rajon