Many of us use AI tools like ChatGPT or GitHub Copilot in our daily lives, but what actually powers them? If you've ever tried to read up on AI, you've probably run into terms like tokenization, embeddings, and transformers.
Sounds complicated? It doesn’t have to be.
Welcome to the world of AI, where tech buzzwords pop up faster than autocomplete suggestions. In this article, I’ll break down several AI jargon into simple concepts.
LLMs
LLM stands for Large Language Model, a type of AI trained to understand and generate human language. It takes in human text as input, processes it, and then produces a response that makes sense to us.
ChatGPT is one of the most popular LLMs, created by OpenAI and made accessible to the public.
GPT
GPT stands for Generative Pre-trained Transformer. Let's break that down:
- Generative: It generates content like text, code, or even images.
- Pre-trained: It has already learned from huge amounts of data before being used.
- Transformer: A model architecture that understands the relationships between words.
GPT predicts what comes next in a sentence by analyzing patterns in the data it was trained on.
How Does an LLM Work?
Let's walk through the three main phases of how a language model processes and generates language.
Phase 1: Understanding the Input
This is where the model begins to process your prompt and understand what you're saying.
Tokenization
The first step is breaking down your sentence into smaller units called tokens, which may be words or sub-words.
"Today is a sunny day"
→[Today, is, a, sunny, day]
These tokens are then converted into numbers so the model can work with them mathematically. Each model has its own tokenization method and a fixed vocabulary size (the total number of unique tokens it understands).
Try out this code, or you could visit here; to see different tokens by different models.
import tiktoken
enc = tiktoken.encoding_for_model("gpt-4o")
text = "Hello, I am Shane" #Input Sequence
tokens = enc.encode(text)
print("Tokens: ", tokens ) #[13225, 11, 357, 939, 99388]
encodedTokens = [13225, 11, 357, 939, 99388]
decodedText = enc.decode(encodedTokens)
print("Decoded Text: ", decodedText) #Hello, I am Shane
Vector Embeddings
After tokenization, each token is mapped to a vector, a string of numbers that represents its meaning.
Suppose, I mention the words
"Cat"
,"Dog"
,"Pedigree"
,"Dog Food"
,"Milk"
,"Cat Food"
. Noticed you pictured these words in your mind while reading them.
Embeddings give more context to a word, like showing a meaning rather than just a spelling.
Try this code out, to see how vector embeddings would look like;
from dotenv import load_dotenv
from openai import OpenAI
load_dotenv()
client = OpenAI()
text = "Today is a sunny day."
response = client.embeddings.create(
model="text-embedding-3-small",
input=text
)
print("Vector Embeddings: ",response)
print("Length", len(response.data[0].embedding)) #Dimensions (3D = 3 Dimensions)
Positional Encoding
Since transformers look at the whole sentence at once (instead of word-by-word), they need to understand the position of each word. Positional encoding give this structure, they tell the model the order in which words appear.
"He went to the bank to get some money."
"They had a picnic by the river bank."
Without positional encoding, the model might confuse the two meanings of the word "bank."
But with it, the model understands that "bank" means a financial institution in one case and a river's edge in the other, based on surrounding words and their positions.
Semantic Meaning
Embeddings don't just turn words into numbers, they also capture the relationships and meanings behind those words.
This is what allows the model to understand that some words are closely related, while others are not.
Examples:
- The vectors for
king
andqueen
are close to each other, just likeman
andwoman
, because they share similar meanings. - But the vectors for
pizza
andshoes
are far apart, those words aren't related in meaning, so their embeddings reflect that distance.
Transformers: The Core of Modern AI
Transformers are at the heart of all modern LLMs.
Originally introduced by Google in 2017 to improve Google Translate, transformers have since become the foundation of nearly every major AI models.
Their research was published in the paper “Attention Is All You Need”
Here's why they matter:
"He swung the bat at the ball.
"A bat flew out of the cave."
Even though both sentences contain the word “bat,” you understood its meaning differently in each. That's context, and transformers are built to capture it.
Unlike older models that processed words one at a time, transformers look at the entire sentence all at once. This allows the model to understand how words relate to each other, regardless of their position.
That's how transformers help LLMs understand not just the words you use, but what you actually mean.
Phase 2: Processing the Input
This is where the model makes sense of the input and begins reasoning about it.
Encoder
The encoder helps the model understand what you've written.
It takes the input tokens and their embeddings (which include meaning and position) and transforms them into deeper representations that capture the sentence's full meaning and context. These internal representations are used by the model to figure out what you're trying to say.
"She unlocked the door with a key."
The encoder understands how each word is connected, that "she" did something, "unlocked" is the action, and "key" was the tool. It doesn't just read words, it also understands the structure and relationships behind them.
Self-Attention
Self-attention is a key feature of transformers. It allows the model to look at all the words in a sentence at once and figure out how much attention each word should pay to the others.
"She poured coffee into the cup."
The model learns that the word "poured" is closely related to both "coffee" and "cup", because they all contribute to the action being described.
This helps the model understand which words matter most in context, rather than just reading one word at a time.
Multi-Head Attention
Multi-head attention is like a team of specialists working in parallel. Each "head" looks at the sentence from a different perspective:
- One might focus on grammar
- Another might look at context
- Another might track relationships between specific words
Each head captures something unique, and when combined, they give the model a richer and more complete understanding of the input.
Softmax
Once the model processes the input, it generates a score for every possible next word.
Softmax turns these raw scores into probabilities, like a voting system. The higher the score, the more likely that word is to be chosen.
The word with the highest probability (most "votes") is selected as the next word in the response.
Temperature
This controls how creative or predictable the model's responses are:
- Low temperature → safe, reliable responses
- High temperature → more random, creative answers
It’s like adjusting a thermostat for imagination.
Phase 3: Generating a Response
Now that the model understands your input, it's time to respond.
Decoder
The decoder takes what the model has understood and starts generating a response, one word at a time, like a builder laying bricks to form a sentence.
Input: "The sky is..."
Output: "blue."
Each word is chosen based on the context of the previous ones to keep the response natural and meaningful.
Inference
Inference is the moment the model replies.
No new learning happens here, the model isn't updating or training anymore. It's simply using what it already knows to generate a response based on your input.
Think of it like a chef following a recipe they've already mastered, just applying what they've learned.
Knowledge Cutoff
AI models aren't connected to the internet in real time. They can't look up new facts or news. They only know what they were trained on, up to a specific point in time, called the knowledge cutoff.
It's like a photo album frozen in time. Anything that happened after that date is missing, unless the model is retrained or updated.
Final Thoughts
AI isn't magic. It’s just a very smart system trained on patterns from massive amounts of data.
Once you understand a few key ideas, like tokens, vectors, attention, and decoding, you'll start to see how LLMs like ChatGPT actually work.
Enjoyed this post?
Found it helpful? Feel free to leave a comment, share it with your team, or follow along for more.
Top comments (0)