After using ChatGPT for months I've realised what ChatGPT is doing to my words or sentences (input/prompt) before it even starts thinking...
For this there are 4 varieties of steps involved in it:
Step 1 : Tokenization (Our text gets chopped into tokens)
ChatGPT or any LLM models dont understand sentencesor words they only understand tokens (later they get converted into numbers we will discuss this in furthur steps )
So every sentence or text we type as a prompt into the model it gets converted into tokens. This is sometimes also called chunking that are roughly syllables or short words .
Lets have a look to the example block of code :
Weird fact: " ChatGPT" (with a leading space) is a different token than "ChatGPT". The model treats them differently. Whitespace is part of the vocabulary.
Step 2 : Vectorization (Each tokens becomes a vector)
As we know a fact where LLM models can only undestand numbers so it converts the tokens into vectors i.e., giving some numbers to each and every chunked token . Basically ,a vector is a list of approx 12,000 numbers — that encodes its meaning. Similar words land near each other in this space. "king" and "queen" are close. "king" and "Apple" are far apart.
This is the moment where language becomes math !!
*Step 3: The Transformer layer *
Our token vectors flow through approx 96 layers of attention. At each layer, every token "looks at" every other token and decides how much to weight it. This is how the model figures out that "it" in "the animal didn't cross the street because it was tired" refers to the animal, not the street.
No rules. No grammar book. Just learnt weights doing matrix multiplication billions of times.
Step 4: Sampling (One token gets sampled)
Basically the model does not output a sentence right away. It uses probability distribution for its entire vocabulary by sampling one token at a time and repeats this process until it reaches the stop condition.
For example, let's say
The capital of France is ------?
Paris – 95% probability
Italy - 80% probability
Berlin – 81% probability
Texas - 5 % probability
So Paris has a 95% probability to complete the above sentence; Paris could be the next likely word to appear.
That's it. No reasoning. No understanding. Just: What token is most likely to come next, given everything before it?
The unsettling part: that simple process, scaled up, produces responses that feel like thinking.

Top comments (0)