DEV Community

ASHDEEP SINGH
ASHDEEP SINGH

Posted on

Intro to Gen AI

What is Generative AI (GenAI)?

Generative AI is a type of artificial intelligence that can create new content — like text, images, music, or even code — instead of just analyzing existing data.
Unlike traditional AI that mostly classifies or predicts, GenAI learns patterns from massive datasets and then produces something new based on that knowledge.

GenAI models are usually trained on huge amounts of data, and to understand that data, they break it into smaller, meaningful pieces — and that’s where tokenization comes in.

But to understand it even more simply , we can assume of GenAI as an alogithmic program that is capable of creating something based on the dataset it has been trained on.

What is Tokenization?

Tokenization is the process of breaking down text into smaller units, called tokens, so that machines can understand and process it.
A token can be:

A word (Hello)
A subword (ing in running)
A single character (a, b, c)
Or even punctuation marks and spaces

Example:
"Hello world!"

→ ["Hello", "world", "!"]

Computers don’t directly understand text. They need numbers.

Why it matters in GenAI:
Tokenization ensures the AI model processes input consistently, no matter how big or small the text. This is the first step before the model learns relationships between words.
Note : GenAI generates tokens one by one , and in each iteration generated tokens from previous iteration are used , and all are combined at last to find the final answer.

What are Vector Embeddings?

Once text is tokenized, each token is converted into a vector — a list of numbers that represent its meaning in a mathematical space.
This is called an embedding.

Words with similar meanings have vectors that are closer together in this space.

This lets AI “understand” context, similarity, and relationships between words.

Example:
If you plot word vectors in 3D space:

"king" and "queen" will be close to each other

"cat" and "dog" will be closer compared to "cat" and "car"

Why embeddings matter:
They allow GenAI to:

Find relevant documents for a search query

Understand synonyms and related concepts

Make conversational answers more context-aware

Understand it using an example.
Ever thought of going for a walk and suddenly looking at sky , thinking it might rain. Assume you are desparate for a walk even if it rains, so you carry an umbrella and head out. Now you have your umbrella and are walking down the street , ready to open it any moment it rains.

So here you are actually predicting every next moment of what's going to happen (this is essence of generative AI , predicting the next token).

But here's a detail you might have missed , why did you carry the umbrella in 1st place , because your brain saw a few days with same weather and knew it can rain in such a weather. It's the same data fed in your brain that makes you "PREDICT" the outcome , and bring the umbrella. This process of prediction continues entire duration of your journey.

Now think about how your brain recalled those past days.

You didn’t literally remember every single weather detail from years ago.

Instead, your brain keeps a compressed mental representation of each memory — not the exact video, but the essence of the scene (e.g., “cloudy sky + humid air + cool wind” = likely rain).

This compressed representation is like a vector embedding:

Each weather day you’ve experienced is converted into a list of numbers that capture key features (color of sky, humidity level, wind speed).

When you see today’s weather, your brain turns it into another list of numbers (an embedding).

You compare it with your “memory embeddings” to find the most similar ones.

If similar ones often had rain, you predict rain today.
But now you might ask , where is Vector embeddings role in it ? Well actually , look at it this way , you sense it'll rain heavy -> you dont go , it'll rain lightly -> you go with an umbrella. Here you see , you brain already has a mapping of what will happen + what you'll do. If this is to be repersented mathematically we can do it using vector embeddings.

The working of GenAI can be considered analogous to this example.

TLDR; Putting It All Together

Tokenization breaks text into smaller parts and converts them into numbers.
Embeddings turn these tokens into vectors that capture meaning and relationships.
Generative AI uses these vectors in deep learning models to create new content.

Top comments (0)