Generative AI: Under the hood

Yashwanth — Sun, 17 Aug 2025 11:31:24 +0000

TL;DR

Generative AI doesn’t just search—it creates (text, images, music, etc.).
It’s powered by Transformers (introduced by Google in 2017).
Language is broken into tokens (small pieces of text) that the model predicts step by step.
Each model has its own vocabulary & token rules, so token splits can differ.

👉 Think of Generative AI as a creative engine, tokens as its alphabet, and transformers as the brain that puts it all together.

Introduction:

Imagine typing just a few words, and an AI writes an entire story for you, paints a picture, or even composes music. That’s the magic of Generative AI—it doesn’t just find information, it creates something new.

Take Google Search as an example: when you enter a query, it’s like asking a librarian for a book. The librarian fetches the best book already on the shelf.

Generative AI, on the other hand, is like an author. You give it an idea, and it writes you a brand-new book on the spot.

GPT(Generative Pre-Trained Transformer):

One of the most famous Generative AI models is GPT.

In simple words, GPT is a Transformer that has been trained on a huge amount of data, and now it can generate new text based on that training.

What is a Transformer?

Think of a Transformer as a very smart system that can look at words (or images, or sounds), understand how they relate to each other, and then predict what comes next.

Originally introduced by Google in 2017 in the paper “Attention is All You Need”, Transformers were first used in Google Translate to make translations smoother and more accurate.

Today, GPT uses the same idea — but instead of just translating, it generates brand-new text.

In practice, it works by predicting the next token. A token can be as small as a character or as large as a word or sentence, depending on the model. These tokens differ from LLM to LLM.

A computer will only understand numbers. When an input token or a sequence(a collection of tokens) is provided, it is split and converted into numbers so this process is called Tokenization.

Vocabulary in LLMs

OpenAI (and other LLMs) don’t read text the way humans do.
Instead, they convert text into numbers—because computers understand numbers, not letters.

To do this, the model uses a vocabulary (a special “dictionary”) where:

A character, word, or even part of a word is assigned a unique number (called a token ID).
For example:
"cat" → 1234
"dog" → 5678
"ing" → 91011

When you type something like “The cat is running”, the model breaks it into tokens:

"The" → 101
" cat" → 1234
" is" → 202
" run" → 3300
"ing" → 91011

So your text becomes a sequence of numbers:
[101, 1234, 202, 3300, 91011]

This numeric form is what the Transformer processes to predict the next token.

👉 Key Point for Beginners:
You can say, “Vocabulary = the mapping of text to numbers that the AI understands.”

Try generating the token for your message here: TikTokenizer

Note: Each LLM has its own vocabulary and vocabulary size, so the way a sentence is split into tokens and assigned numbers may differ depending on the model.

Comparison between two different Models:

Input: "Generative Pre-Trained Transformer".

gpt-4o:

Token count: 6
Tokens: 5926, 1799, 4659, 61607, 3883, 113133

davinci:

Token count: 8
Tokens: 8645, 876, 3771, 12, 2898, 1328, 3602, 16354

In Progress....🚧🚧🚧