40 Days Training on RAG – Day 1
Session 1: Hello World of RAG + Introduction & Need of RAG
When I first started learning about RAG (Retrieval-Augmented Generation), I thought it was just another complex AI buzzword. But once I broke it down, I realized it is actually a very practical and powerful idea.
At its core, RAG is simply about helping a language model answer better by allowing it to look up information first before responding.
To understand RAG properly, we first need to understand how LLMs (Large Language Models) work.
What is a Model?
A model is nothing but an equation.
For example:
y=mx+cy = mx + cy=mx+c
This is a simple straight-line equation.
If values of x and y are given, the system tries to adjust the values of m and c so that the line best fits the graph.
This process is called learning.
In AI, the same idea becomes much larger:
y=m1x1+m2x2+m3x3+⋯+billions of terms+cy = m_1x_1 + m_2x_2 + m_3x_3 + \dots + billions\ of\ terms + cy=m1x1+m2x2+m3x3+⋯+billions of terms+c
The more complex the equation, the more patterns the model can learn.
That is why bigger models often perform better.
What are Parameters and Weights?
The values like:
m
c
m₁
m₂
m₃
are called parameters or weights.
These are values learned during training.
They decide how important each input is.
For example:
If a model is learning about animals:
“cat” may get one weight
“dog” may get another
“lion” may get another
The stronger the relevance, the stronger the weight.
This is how models understand importance.
That is why companies like Gemini, ChatGPT, and Claude proudly mention that their models contain billions of parameters.
More parameters → better ability to learn complex relationships.
What Does an LLM Actually Do?
This was the biggest surprise for me.
An LLM mainly does only one thing:
Predict the next word
That’s it.
If you ask:
Tell me about Artificial Intelligence
the model does not “understand” like humans do.
Instead, it predicts:
“What should be the next word?”
Then that predicted word becomes the next input.
Again it predicts the next word.
This repeats again and again until a full paragraph is generated.
This is called generation.
That is why responses appear like magic—but underneath, it is simply next-word prediction happening very fast.
What is Hallucination?
One important limitation of LLMs is called hallucination.
Suppose the model is trained only on:
cats
dogs
and suddenly someone asks about:
lions
The question is valid.
But the model was never exposed to enough lion-related data.
Instead of saying:
“I don’t know”
the model often tries to answer confidently.
Even if the answer is wrong.
This is called hallucination.
Simple definition:
Confidently giving wrong information = Hallucination
This is one of the biggest reasons why RAG becomes necessary.
What is Temperature?
Temperature controls the creativity of the model.
It usually ranges from:
0→10 \rightarrow 10→1
Low Temperature (0.1)
More factual
More stable
Less creative
Medium Temperature (0.5)
Balanced output
High Temperature (0.9)
More creative
More imaginative
Higher chance of hallucination
Temperature does not directly control truth.
It controls randomness.
SLM vs LLM
Not every problem needs a huge model.
Sometimes we only need a smaller specialized model.
SLM – Small Language Model
SLM stands for Small Language Model
It is trained for:
speech-to-text
customer support bots
voice assistants
domain-specific tasks
It may have millions of parameters instead of billions.
It is smaller, faster, and cheaper.
LLM – Large Language Model
LLM stands for Large Language Model
It has:
billions of parameters
knowledge from many domains
It is a generalized model.
Examples:
GPT
Gemini
Claude
Why Do We Need RAG?
This is where everything becomes interesting.
Even powerful LLMs have major limitations:
- Hallucination They make up answers.
- Outdated Knowledge Training data has a cutoff date. They do not know new events automatically.
- No Private Knowledge They cannot directly access:
Company policies
HR documents
Internal reports
Confluence
Jira boards
Business data
This is where RAG solves the problem.
What is RAG?
RAG stands for:
Retrieval-Augmented Generation
It combines two steps:
Retrieval
First, the system searches and retrieves relevant information from external sources.
Examples:
PDFs
Documents
Databases
Internal company files
Knowledge bases
Generation
Then the LLM uses that retrieved information to generate the final answer.
Simple Understanding
Instead of:
Answering only from memory
RAG works like:
Look up first → Then answer
This is the real power of RAG.
Where is Private Data Stored?
Private data is usually stored inside a:
Vector Database
Examples:
Confluence text
Jira content
HR policy documents
Internal business documents
These are not directly fed into the LLM.
Instead, they are converted and stored intelligently.
How Documents are Stored
Documents are broken into smaller parts called:
Chunks
Usually:
sentence groups
paragraph chunks
Not individual words.
Because meaning comes from context.
Not isolated words.
My IELTS Example
I personally relate this to IELTS preparation.
Even if I memorize many English words,
during speaking they may not fit properly into context.
But if I memorize complete sentences,
I can easily adjust the context while speaking.
RAG works in the same way.
It retrieves meaningful sentence chunks—not random words.
This makes the answer much more natural and relevant.
What is a Vector?
A vector has:
magnitude
direction
Each chunk is converted into a numerical vector.
For example:
A paragraph about:
Apple
becomes:
P1=[....700 dimensions....]P1 = [....700\ dimensions....]P1=[....700 dimensions....]
A paragraph about:
Doctor
becomes:
P2=[....700 dimensions....]P2 = [....700\ dimensions....]P2=[....700 dimensions....]
Now the system measures distance between vectors.
Closer vectors = more related
Farther vectors = less related
This helps the system find relevant information quickly.
Real-Life Example
Take these words:
Lemon
Apple
Orange
Pear
Doctor
Fruit-related words stay close together.
Doctor stays farther away.
This is how relevance is understood.
How Relevant Chunks are Found
Algorithms used:
ANN
Approximate Nearest Neighbors
KNN
K-Nearest Neighbors
These help quickly find the most relevant chunks.
The same idea is used in:
Spotify recommendations
Amazon suggestions
Netflix recommendations
YouTube feed
Social media recommendations
Final RAG Flow
User asks question
↓
System retrieves relevant chunks
↓
Retrieved context goes to LLM
↓
LLM generates grounded answer
↓
Better output with less hallucination
Final One-Line Summary
LLM predicts
Vector DB retrieves
RAG provides context
Better answers are generated
This was my Day 1 understanding of RAG.
And honestly, the best definition I found is this:
RAG is simply giving the model the right information before asking it to answer.
That single sentence changed everything for me.
Top comments (0)