RAG and Vector Databases:

Sivakami Thangaraj — Thu, 30 Apr 2026 04:00:00 +0000

Large Language Models (LLMs) like ChatGPT are powerful, but they have one major limitation—they do not know your private documents, latest company data, or newly uploaded PDFs.

This is where RAG (Retrieval-Augmented Generation) helps.

RAG allows LLMs to search external documents, retrieve the most relevant information, and generate accurate answers instead of guessing.

Step 1: Chunking the Data

When we upload a PDF, the LLM does not read the full file at once.

It first splits the document into smaller parts called:
Chunks

Example:

Today is Wednesday
Tomorrow is Thursday
I am travelling Today
This process is called Chunking.

Step 2: Converting Text into Vectors

Machines understand numbers, not words.

So text is converted into vectors.

The basic method is:

One-Hot Encoding

Example:

Today is Wednesday
Line 1 = [1,1,1,0,0,0,0,0,0,0,0]
Line 2 = [0,1,0,1,1,0,0,0,0,0,0]
Line 3 = [1,0,0,0,0,1,1,1,0,0,0]
Line 4 = [0,1,1,0,0,0,0,0,1,1,1]

But this method cannot understand meaning.

So modern systems use:
Embeddings

Embedding models like nomic-embed-text (Ollama) convert sentences into meaningful vectors.

Example:

Today is Wednesday
→ [0.23, -0.45, 0.88, ...]

Step 3: Storing in Vector Database

These vectors are stored in a:
**
Vector Database**

Popular tools include:

ChromaDB
FAISS
Pinecone
Qdrant
MongoDB Vector Search

Traditional databases like MySQL do exact search.

Vector databases do:

Semantic Search

which means searching by meaning, not exact words.

Example:
Search: AI Course
Result: Machine Learning, Deep Learning

Step 4: Similarity Search

When a user asks:

What day is tomorrow?

The query is also converted into a vector.

The system finds the nearest vectors using:

Cosine Similarity ⭐
Euclidean Distance
Manhattan Distance

This helps retrieve:

Tomorrow is Thursday

and sends it to the LLM as context.

Final RAG Flow

PDF
↓
Chunking
↓
Embedding
↓
Vector Database
↓
User Query
↓
Similarity Search
↓
Relevant Chunks
↓
LLM Final Answer

In One Line
*Embedding creates meaning
Vector DB stores meaning
Semantic Search finds meaning
RAG gives smart answers ✨
*

RAG (Day 1)

Sivakami Thangaraj — Wed, 29 Apr 2026 13:24:03 +0000

40 Days Training on RAG – Day 1

Session 1: Hello World of RAG + Introduction & Need of RAG

When I first started learning about RAG (Retrieval-Augmented Generation), I thought it was just another complex AI buzzword. But once I broke it down, I realized it is actually a very practical and powerful idea.
At its core, RAG is simply about helping a language model answer better by allowing it to look up information first before responding.
To understand RAG properly, we first need to understand how LLMs (Large Language Models) work.

What is a Model?
A model is nothing but an equation.
For example:
y=mx+cy = mx + cy=mx+c
This is a simple straight-line equation.
If values of x and y are given, the system tries to adjust the values of m and c so that the line best fits the graph.
This process is called learning.
In AI, the same idea becomes much larger:
y=m1x1+m2x2+m3x3+⋯+billions of terms+cy = m_1x_1 + m_2x_2 + m_3x_3 + \dots + billions\ of\ terms + cy=m1x1+m2x2+m3x3+⋯+billions of terms+c
The more complex the equation, the more patterns the model can learn.
That is why bigger models often perform better.

What are Parameters and Weights?
The values like:
m
c
m₁
m₂
m₃
are called parameters or weights.
These are values learned during training.
They decide how important each input is.
For example:
If a model is learning about animals:

“cat” may get one weight

“dog” may get another

“lion” may get another

The stronger the relevance, the stronger the weight.
This is how models understand importance.
That is why companies like Gemini, ChatGPT, and Claude proudly mention that their models contain billions of parameters.
More parameters → better ability to learn complex relationships.

What Does an LLM Actually Do?
This was the biggest surprise for me.
An LLM mainly does only one thing:
Predict the next word
That’s it.
If you ask:

Tell me about Artificial Intelligence

the model does not “understand” like humans do.
Instead, it predicts:
“What should be the next word?”
Then that predicted word becomes the next input.
Again it predicts the next word.
This repeats again and again until a full paragraph is generated.
This is called generation.
That is why responses appear like magic—but underneath, it is simply next-word prediction happening very fast.

What is Hallucination?
One important limitation of LLMs is called hallucination.
Suppose the model is trained only on:

cats

dogs

and suddenly someone asks about:

lions

The question is valid.
But the model was never exposed to enough lion-related data.
Instead of saying:
“I don’t know”
the model often tries to answer confidently.
Even if the answer is wrong.
This is called hallucination.
Simple definition:
Confidently giving wrong information = Hallucination
This is one of the biggest reasons why RAG becomes necessary.

What is Temperature?
Temperature controls the creativity of the model.
It usually ranges from:
0→10 \rightarrow 10→1
Low Temperature (0.1)

More factual

More stable

Less creative

Medium Temperature (0.5)

Balanced output

High Temperature (0.9)

More creative

More imaginative

Higher chance of hallucination

Temperature does not directly control truth.
It controls randomness.

SLM vs LLM
Not every problem needs a huge model.
Sometimes we only need a smaller specialized model.

SLM – Small Language Model
SLM stands for Small Language Model
It is trained for:

speech-to-text

customer support bots

voice assistants

domain-specific tasks

It may have millions of parameters instead of billions.
It is smaller, faster, and cheaper.

LLM – Large Language Model
LLM stands for Large Language Model
It has:

billions of parameters

knowledge from many domains

It is a generalized model.
Examples:

GPT

Gemini

Claude

Why Do We Need RAG?
This is where everything becomes interesting.
Even powerful LLMs have major limitations:

Hallucination They make up answers.
Outdated Knowledge Training data has a cutoff date. They do not know new events automatically.
No Private Knowledge They cannot directly access:

Company policies

HR documents

Internal reports

Confluence

Jira boards

Business data

This is where RAG solves the problem.

What is RAG?
RAG stands for:
Retrieval-Augmented Generation
It combines two steps:
Retrieval
First, the system searches and retrieves relevant information from external sources.
Examples:

PDFs

Documents

Databases

Internal company files

Knowledge bases

Generation
Then the LLM uses that retrieved information to generate the final answer.

Simple Understanding
Instead of:
Answering only from memory
RAG works like:
Look up first → Then answer
This is the real power of RAG.

Where is Private Data Stored?
Private data is usually stored inside a:
Vector Database
Examples:

Confluence text

Jira content

HR policy documents

Internal business documents

These are not directly fed into the LLM.
Instead, they are converted and stored intelligently.

How Documents are Stored
Documents are broken into smaller parts called:
Chunks
Usually:

sentence groups

paragraph chunks

Not individual words.
Because meaning comes from context.
Not isolated words.

My IELTS Example
I personally relate this to IELTS preparation.
Even if I memorize many English words,
during speaking they may not fit properly into context.
But if I memorize complete sentences,
I can easily adjust the context while speaking.
RAG works in the same way.
It retrieves meaningful sentence chunks—not random words.
This makes the answer much more natural and relevant.

What is a Vector?
A vector has:

magnitude

direction

Each chunk is converted into a numerical vector.
For example:
A paragraph about:
Apple
becomes:
P1=[....700 dimensions....]P1 = [....700\ dimensions....]P1=[....700 dimensions....]
A paragraph about:
Doctor
becomes:
P2=[....700 dimensions....]P2 = [....700\ dimensions....]P2=[....700 dimensions....]
Now the system measures distance between vectors.
Closer vectors = more related
Farther vectors = less related
This helps the system find relevant information quickly.

Real-Life Example
Take these words:

Lemon

Apple

Orange

Pear

Doctor

Fruit-related words stay close together.
Doctor stays farther away.
This is how relevance is understood.

How Relevant Chunks are Found
Algorithms used:
ANN
Approximate Nearest Neighbors
KNN
K-Nearest Neighbors
These help quickly find the most relevant chunks.
The same idea is used in:

Spotify recommendations

Amazon suggestions

Netflix recommendations

YouTube feed

Social media recommendations

Final RAG Flow
User asks question
↓
System retrieves relevant chunks
↓
Retrieved context goes to LLM
↓
LLM generates grounded answer
↓
Better output with less hallucination

Final One-Line Summary
LLM predicts
Vector DB retrieves
RAG provides context
Better answers are generated

This was my Day 1 understanding of RAG.
And honestly, the best definition I found is this:
RAG is simply giving the model the right information before asking it to answer.
That single sentence changed everything for me.

DEV Community: Sivakami Thangaraj

RAG and Vector Databases:

Step 1: Chunking the Data

Step 2: Converting Text into Vectors

Step 3: Storing in Vector Database

Step 4: Similarity Search

Final RAG Flow

RAG (Day 1)