Shane Dsouza

Posted on Jul 11 • Edited on Jul 29

AI Buzzwords Decoded: What LLMs like ChatGPT Really Do

#ai #llm #genai #agenticai

Many of us use AI tools like ChatGPT or GitHub Copilot in our daily lives, but what actually powers them? If you've ever tried to read up on AI, you've probably run into terms like tokenization, embeddings, and transformers.

Sounds complicated? It doesn’t have to be.

Welcome to the world of AI, where tech buzzwords pop up faster than autocomplete suggestions. In this article, I’ll break down several AI jargon into simple concepts.

LLMs

LLM stands for Large Language Model, a type of AI trained to understand and generate human language. It takes in human text as input, processes it, and then produces a response that makes sense to us.

ChatGPT is one of the most popular LLMs, created by OpenAI and made accessible to the public.

GPT

GPT stands for Generative Pre-trained Transformer. Let's break that down:

Generative: It generates content like text, code, or even images.
Pre-trained: It has already learned from huge amounts of data before being used.
Transformer: A model architecture that understands the relationships between words.

GPT predicts what comes next in a sentence by analyzing patterns in the data it was trained on.

How Does an LLM Work?

Let's walk through the three main phases of how a language model processes and generates language.

Phase 1: Understanding the Input

This is where the model begins to process your prompt and understand what you're saying.

Tokenization

The first step is breaking down your sentence into smaller units called tokens, which may be words or sub-words.

"Today is a sunny day" → [Today, is, a, sunny, day]

These tokens are then converted into numbers so the model can work with them mathematically. Each model has its own tokenization method and a fixed vocabulary size (the total number of unique tokens it understands).

Try out this code, or you could visit here; to see different tokens by different models.

import tiktoken
enc = tiktoken.encoding_for_model("gpt-4o")

text = "Hello, I am Shane" #Input Sequence

tokens = enc.encode(text)
print("Tokens: ", tokens ) #[13225, 11, 357, 939, 99388]

encodedTokens = [13225, 11, 357, 939, 99388]

decodedText = enc.decode(encodedTokens)
print("Decoded Text: ", decodedText) #Hello, I am Shane

Vector Embeddings

After tokenization, each token is mapped to a vector, a string of numbers that represents its meaning.

Suppose, I mention the words "Cat", "Dog", "Pedigree", "Dog Food", "Milk", "Cat Food". Noticed you pictured these words in your mind while reading them.

Embeddings give more context to a word, like showing a meaning rather than just a spelling.

Try this code out, to see how vector embeddings would look like;

from dotenv import load_dotenv
from openai import OpenAI

load_dotenv()

client = OpenAI()

text = "Today is a sunny day."

response = client.embeddings.create(
    model="text-embedding-3-small",
    input=text
)

print("Vector Embeddings: ",response)
print("Length", len(response.data[0].embedding)) #Dimensions (3D = 3 Dimensions)

Positional Encoding

Since transformers look at the whole sentence at once (instead of word-by-word), they need to understand the position of each word. Positional encoding give this structure, they tell the model the order in which words appear.

"He went to the bank to get some money."

"They had a picnic by the river bank."

Without positional encoding, the model might confuse the two meanings of the word "bank."

But with it, the model understands that "bank" means a financial institution in one case and a river's edge in the other, based on surrounding words and their positions.

Semantic Meaning

Embeddings don't just turn words into numbers, they also capture the relationships and meanings behind those words.

This is what allows the model to understand that some words are closely related, while others are not.

Examples:

The vectors for king and queen are close to each other, just like man and woman, because they share similar meanings.
But the vectors for pizza and shoes are far apart, those words aren't related in meaning, so their embeddings reflect that distance.

Transformers: The Core of Modern AI

Transformers are at the heart of all modern LLMs.

Originally introduced by Google in 2017 to improve Google Translate, transformers have since become the foundation of nearly every major AI models.

Their research was published in the paper “Attention Is All You Need”

Here's why they matter:

"He swung the bat at the ball.
"A bat flew out of the cave."

Even though both sentences contain the word “bat,” you understood its meaning differently in each. That's context, and transformers are built to capture it.

Unlike older models that processed words one at a time, transformers look at the entire sentence all at once. This allows the model to understand how words relate to each other, regardless of their position.

That's how transformers help LLMs understand not just the words you use, but what you actually mean.

Phase 2: Processing the Input

This is where the model makes sense of the input and begins reasoning about it.

Encoder

The encoder helps the model understand what you've written.

It takes the input tokens and their embeddings (which include meaning and position) and transforms them into deeper representations that capture the sentence's full meaning and context. These internal representations are used by the model to figure out what you're trying to say.

"She unlocked the door with a key."

The encoder understands how each word is connected, that "she" did something, "unlocked" is the action, and "key" was the tool. It doesn't just read words, it also understands the structure and relationships behind them.

Self-Attention

Self-attention is a key feature of transformers. It allows the model to look at all the words in a sentence at once and figure out how much attention each word should pay to the others.

"She poured coffee into the cup."

The model learns that the word "poured" is closely related to both "coffee" and "cup", because they all contribute to the action being described.

This helps the model understand which words matter most in context, rather than just reading one word at a time.

Multi-Head Attention

Multi-head attention is like a team of specialists working in parallel. Each "head" looks at the sentence from a different perspective:

One might focus on grammar
Another might look at context
Another might track relationships between specific words

Each head captures something unique, and when combined, they give the model a richer and more complete understanding of the input.

Softmax

Once the model processes the input, it generates a score for every possible next word.

Softmax turns these raw scores into probabilities, like a voting system. The higher the score, the more likely that word is to be chosen.

The word with the highest probability (most "votes") is selected as the next word in the response.

Temperature

This controls how creative or predictable the model's responses are:

Low temperature → safe, reliable responses
High temperature → more random, creative answers

It’s like adjusting a thermostat for imagination.

Phase 3: Generating a Response

Now that the model understands your input, it's time to respond.

Decoder

The decoder takes what the model has understood and starts generating a response, one word at a time, like a builder laying bricks to form a sentence.

Input: "The sky is..."

Output: "blue."

Each word is chosen based on the context of the previous ones to keep the response natural and meaningful.

Inference

Inference is the moment the model replies.

No new learning happens here, the model isn't updating or training anymore. It's simply using what it already knows to generate a response based on your input.

Think of it like a chef following a recipe they've already mastered, just applying what they've learned.

Knowledge Cutoff

AI models aren't connected to the internet in real time. They can't look up new facts or news. They only know what they were trained on, up to a specific point in time, called the knowledge cutoff.

It's like a photo album frozen in time. Anything that happened after that date is missing, unless the model is retrained or updated.

Final Thoughts

AI isn't magic. It’s just a very smart system trained on patterns from massive amounts of data.

Once you understand a few key ideas, like tokens, vectors, attention, and decoding, you'll start to see how LLMs like ChatGPT actually work.

Enjoyed this post?
Found it helpful? Feel free to leave a comment, share it with your team, or follow along for more.

🌐 shanedsouza.com

DEV Community