<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>DEV Community: sivakami</title>
    <description>The latest articles on DEV Community by sivakami (@sivakami_thangaraj).</description>
    <link>https://dev.to/sivakami_thangaraj</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F3903108%2F95db7b11-f911-4a85-a9b8-31e515bc3109.png</url>
      <title>DEV Community: sivakami</title>
      <link>https://dev.to/sivakami_thangaraj</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://dev.to/feed/sivakami_thangaraj"/>
    <language>en</language>
    <item>
      <title>RAG (Day 1)</title>
      <dc:creator>sivakami</dc:creator>
      <pubDate>Wed, 29 Apr 2026 13:24:03 +0000</pubDate>
      <link>https://dev.to/sivakami_thangaraj/rag-day-1-54e4</link>
      <guid>https://dev.to/sivakami_thangaraj/rag-day-1-54e4</guid>
      <description>&lt;p&gt;40 Days Training on RAG – Day 1&lt;/p&gt;

&lt;p&gt;Session 1: Hello World of RAG + Introduction &amp;amp; Need of RAG&lt;/p&gt;

&lt;p&gt;When I first started learning about RAG (Retrieval-Augmented Generation), I thought it was just another complex AI buzzword. But once I broke it down, I realized it is actually a very practical and powerful idea.&lt;br&gt;
At its core, RAG is simply about helping a language model answer better by allowing it to look up information first before responding.&lt;br&gt;
To understand RAG properly, we first need to understand how LLMs (Large Language Models) work.&lt;/p&gt;

&lt;p&gt;What is a Model?&lt;br&gt;
A model is nothing but an equation.&lt;br&gt;
For example:&lt;br&gt;
y=mx+cy = mx + cy=mx+c&lt;br&gt;
This is a simple straight-line equation.&lt;br&gt;
If values of x and y are given, the system tries to adjust the values of m and c so that the line best fits the graph.&lt;br&gt;
This process is called learning.&lt;br&gt;
In AI, the same idea becomes much larger:&lt;br&gt;
y=m1x1+m2x2+m3x3+⋯+billions of terms+cy = m_1x_1 + m_2x_2 + m_3x_3 + \dots + billions\ of\ terms + cy=m1​x1​+m2​x2​+m3​x3​+⋯+billions of terms+c&lt;br&gt;
The more complex the equation, the more patterns the model can learn.&lt;br&gt;
That is why bigger models often perform better.&lt;/p&gt;

&lt;p&gt;What are Parameters and Weights?&lt;br&gt;
The values like:&lt;br&gt;
m&lt;br&gt;
c&lt;br&gt;
m₁&lt;br&gt;
m₂&lt;br&gt;
m₃&lt;br&gt;
are called parameters or weights.&lt;br&gt;
These are values learned during training.&lt;br&gt;
They decide how important each input is.&lt;br&gt;
For example:&lt;br&gt;
If a model is learning about animals:&lt;/p&gt;

&lt;p&gt;“cat” may get one weight&lt;/p&gt;

&lt;p&gt;“dog” may get another&lt;/p&gt;

&lt;p&gt;“lion” may get another&lt;/p&gt;

&lt;p&gt;The stronger the relevance, the stronger the weight.&lt;br&gt;
This is how models understand importance.&lt;br&gt;
That is why companies like Gemini, ChatGPT, and Claude proudly mention that their models contain billions of parameters.&lt;br&gt;
More parameters → better ability to learn complex relationships.&lt;/p&gt;

&lt;p&gt;What Does an LLM Actually Do?&lt;br&gt;
This was the biggest surprise for me.&lt;br&gt;
An LLM mainly does only one thing:&lt;br&gt;
Predict the next word&lt;br&gt;
That’s it.&lt;br&gt;
If you ask:&lt;/p&gt;

&lt;p&gt;Tell me about Artificial Intelligence&lt;/p&gt;

&lt;p&gt;the model does not “understand” like humans do.&lt;br&gt;
Instead, it predicts:&lt;br&gt;
“What should be the next word?”&lt;br&gt;
Then that predicted word becomes the next input.&lt;br&gt;
Again it predicts the next word.&lt;br&gt;
This repeats again and again until a full paragraph is generated.&lt;br&gt;
This is called generation.&lt;br&gt;
That is why responses appear like magic—but underneath, it is simply next-word prediction happening very fast.&lt;/p&gt;

&lt;p&gt;What is Hallucination?&lt;br&gt;
One important limitation of LLMs is called hallucination.&lt;br&gt;
Suppose the model is trained only on:&lt;/p&gt;

&lt;p&gt;cats&lt;/p&gt;

&lt;p&gt;dogs&lt;/p&gt;

&lt;p&gt;and suddenly someone asks about:&lt;/p&gt;

&lt;p&gt;lions&lt;/p&gt;

&lt;p&gt;The question is valid.&lt;br&gt;
But the model was never exposed to enough lion-related data.&lt;br&gt;
Instead of saying:&lt;br&gt;
“I don’t know”&lt;br&gt;
the model often tries to answer confidently.&lt;br&gt;
Even if the answer is wrong.&lt;br&gt;
This is called hallucination.&lt;br&gt;
Simple definition:&lt;br&gt;
Confidently giving wrong information = Hallucination&lt;br&gt;
This is one of the biggest reasons why RAG becomes necessary.&lt;/p&gt;

&lt;p&gt;What is Temperature?&lt;br&gt;
Temperature controls the creativity of the model.&lt;br&gt;
It usually ranges from:&lt;br&gt;
0→10 \rightarrow 10→1&lt;br&gt;
Low Temperature (0.1)&lt;/p&gt;

&lt;p&gt;More factual&lt;/p&gt;

&lt;p&gt;More stable&lt;/p&gt;

&lt;p&gt;Less creative&lt;/p&gt;

&lt;p&gt;Medium Temperature (0.5)&lt;/p&gt;

&lt;p&gt;Balanced output&lt;/p&gt;

&lt;p&gt;High Temperature (0.9)&lt;/p&gt;

&lt;p&gt;More creative&lt;/p&gt;

&lt;p&gt;More imaginative&lt;/p&gt;

&lt;p&gt;Higher chance of hallucination&lt;/p&gt;

&lt;p&gt;Temperature does not directly control truth.&lt;br&gt;
It controls randomness.&lt;/p&gt;

&lt;p&gt;SLM vs LLM&lt;br&gt;
Not every problem needs a huge model.&lt;br&gt;
Sometimes we only need a smaller specialized model.&lt;/p&gt;

&lt;p&gt;SLM – Small Language Model&lt;br&gt;
SLM stands for Small Language Model&lt;br&gt;
It is trained for:&lt;/p&gt;

&lt;p&gt;speech-to-text&lt;/p&gt;

&lt;p&gt;customer support bots&lt;/p&gt;

&lt;p&gt;voice assistants&lt;/p&gt;

&lt;p&gt;domain-specific tasks&lt;/p&gt;

&lt;p&gt;It may have millions of parameters instead of billions.&lt;br&gt;
It is smaller, faster, and cheaper.&lt;/p&gt;

&lt;p&gt;LLM – Large Language Model&lt;br&gt;
LLM stands for Large Language Model&lt;br&gt;
It has:&lt;/p&gt;

&lt;p&gt;billions of parameters&lt;/p&gt;

&lt;p&gt;knowledge from many domains&lt;/p&gt;

&lt;p&gt;It is a generalized model.&lt;br&gt;
Examples:&lt;/p&gt;

&lt;p&gt;GPT&lt;/p&gt;

&lt;p&gt;Gemini&lt;/p&gt;

&lt;p&gt;Claude&lt;/p&gt;

&lt;p&gt;Why Do We Need RAG?&lt;br&gt;
This is where everything becomes interesting.&lt;br&gt;
Even powerful LLMs have major limitations:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Hallucination
They make up answers.&lt;/li&gt;
&lt;li&gt;Outdated Knowledge
Training data has a cutoff date.
They do not know new events automatically.&lt;/li&gt;
&lt;li&gt;No Private Knowledge
They cannot directly access:&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;Company policies&lt;/p&gt;

&lt;p&gt;HR documents&lt;/p&gt;

&lt;p&gt;Internal reports&lt;/p&gt;

&lt;p&gt;Confluence&lt;/p&gt;

&lt;p&gt;Jira boards&lt;/p&gt;

&lt;p&gt;Business data&lt;/p&gt;

&lt;p&gt;This is where RAG solves the problem.&lt;/p&gt;

&lt;p&gt;What is RAG?&lt;br&gt;
RAG stands for:&lt;br&gt;
Retrieval-Augmented Generation&lt;br&gt;
It combines two steps:&lt;br&gt;
Retrieval&lt;br&gt;
First, the system searches and retrieves relevant information from external sources.&lt;br&gt;
Examples:&lt;/p&gt;

&lt;p&gt;PDFs&lt;/p&gt;

&lt;p&gt;Documents&lt;/p&gt;

&lt;p&gt;Databases&lt;/p&gt;

&lt;p&gt;Internal company files&lt;/p&gt;

&lt;p&gt;Knowledge bases&lt;/p&gt;

&lt;p&gt;Generation&lt;br&gt;
Then the LLM uses that retrieved information to generate the final answer.&lt;/p&gt;

&lt;p&gt;Simple Understanding&lt;br&gt;
Instead of:&lt;br&gt;
Answering only from memory&lt;br&gt;
RAG works like:&lt;br&gt;
Look up first → Then answer&lt;br&gt;
This is the real power of RAG.&lt;/p&gt;

&lt;p&gt;Where is Private Data Stored?&lt;br&gt;
Private data is usually stored inside a:&lt;br&gt;
Vector Database&lt;br&gt;
Examples:&lt;/p&gt;

&lt;p&gt;Confluence text&lt;/p&gt;

&lt;p&gt;Jira content&lt;/p&gt;

&lt;p&gt;HR policy documents&lt;/p&gt;

&lt;p&gt;Internal business documents&lt;/p&gt;

&lt;p&gt;These are not directly fed into the LLM.&lt;br&gt;
Instead, they are converted and stored intelligently.&lt;/p&gt;

&lt;p&gt;How Documents are Stored&lt;br&gt;
Documents are broken into smaller parts called:&lt;br&gt;
Chunks&lt;br&gt;
Usually:&lt;/p&gt;

&lt;p&gt;sentence groups&lt;/p&gt;

&lt;p&gt;paragraph chunks&lt;/p&gt;

&lt;p&gt;Not individual words.&lt;br&gt;
Because meaning comes from context.&lt;br&gt;
Not isolated words.&lt;/p&gt;

&lt;p&gt;My IELTS Example&lt;br&gt;
I personally relate this to IELTS preparation.&lt;br&gt;
Even if I memorize many English words,&lt;br&gt;
during speaking they may not fit properly into context.&lt;br&gt;
But if I memorize complete sentences,&lt;br&gt;
I can easily adjust the context while speaking.&lt;br&gt;
RAG works in the same way.&lt;br&gt;
It retrieves meaningful sentence chunks—not random words.&lt;br&gt;
This makes the answer much more natural and relevant.&lt;/p&gt;

&lt;p&gt;What is a Vector?&lt;br&gt;
A vector has:&lt;/p&gt;

&lt;p&gt;magnitude&lt;/p&gt;

&lt;p&gt;direction&lt;/p&gt;

&lt;p&gt;Each chunk is converted into a numerical vector.&lt;br&gt;
For example:&lt;br&gt;
A paragraph about:&lt;br&gt;
Apple&lt;br&gt;
becomes:&lt;br&gt;
P1=[....700 dimensions....]P1 = [....700\ dimensions....]P1=[....700 dimensions....]&lt;br&gt;
A paragraph about:&lt;br&gt;
Doctor&lt;br&gt;
becomes:&lt;br&gt;
P2=[....700 dimensions....]P2 = [....700\ dimensions....]P2=[....700 dimensions....]&lt;br&gt;
Now the system measures distance between vectors.&lt;br&gt;
Closer vectors = more related&lt;br&gt;
Farther vectors = less related&lt;br&gt;
This helps the system find relevant information quickly.&lt;/p&gt;

&lt;p&gt;Real-Life Example&lt;br&gt;
Take these words:&lt;/p&gt;

&lt;p&gt;Lemon&lt;/p&gt;

&lt;p&gt;Apple&lt;/p&gt;

&lt;p&gt;Orange&lt;/p&gt;

&lt;p&gt;Pear&lt;/p&gt;

&lt;p&gt;Doctor&lt;/p&gt;

&lt;p&gt;Fruit-related words stay close together.&lt;br&gt;
Doctor stays farther away.&lt;br&gt;
This is how relevance is understood.&lt;/p&gt;

&lt;p&gt;How Relevant Chunks are Found&lt;br&gt;
Algorithms used:&lt;br&gt;
ANN&lt;br&gt;
Approximate Nearest Neighbors&lt;br&gt;
KNN&lt;br&gt;
K-Nearest Neighbors&lt;br&gt;
These help quickly find the most relevant chunks.&lt;br&gt;
The same idea is used in:&lt;/p&gt;

&lt;p&gt;Spotify recommendations&lt;/p&gt;

&lt;p&gt;Amazon suggestions&lt;/p&gt;

&lt;p&gt;Netflix recommendations&lt;/p&gt;

&lt;p&gt;YouTube feed&lt;/p&gt;

&lt;p&gt;Social media recommendations&lt;/p&gt;

&lt;p&gt;Final RAG Flow&lt;br&gt;
User asks question&lt;br&gt;
↓&lt;br&gt;
System retrieves relevant chunks&lt;br&gt;
↓&lt;br&gt;
Retrieved context goes to LLM&lt;br&gt;
↓&lt;br&gt;
LLM generates grounded answer&lt;br&gt;
↓&lt;br&gt;
Better output with less hallucination&lt;/p&gt;

&lt;p&gt;Final One-Line Summary&lt;br&gt;
LLM predicts&lt;br&gt;
Vector DB retrieves&lt;br&gt;
RAG provides context&lt;br&gt;
Better answers are generated&lt;/p&gt;

&lt;p&gt;This was my Day 1 understanding of RAG.&lt;br&gt;
And honestly, the best definition I found is this:&lt;br&gt;
RAG is simply giving the model the right information before asking it to answer.&lt;br&gt;
That single sentence changed everything for me.&lt;/p&gt;

</description>
      <category>beginners</category>
      <category>llm</category>
      <category>rag</category>
      <category>tutorial</category>
    </item>
  </channel>
</rss>
