<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>DEV Community: Indumathi R</title>
    <description>The latest articles on DEV Community by Indumathi R (@indumathi__r).</description>
    <link>https://dev.to/indumathi__r</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F3911149%2Fc5d7ba2c-0411-4699-8861-c8b3af1e2ded.png</url>
      <title>DEV Community: Indumathi R</title>
      <link>https://dev.to/indumathi__r</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://dev.to/feed/indumathi__r"/>
    <language>en</language>
    <item>
      <title>Day 6 - Embedding - RAG</title>
      <dc:creator>Indumathi R</dc:creator>
      <pubDate>Tue, 19 May 2026 17:43:37 +0000</pubDate>
      <link>https://dev.to/indumathi__r/day-6-embedding-rag-4enc</link>
      <guid>https://dev.to/indumathi__r/day-6-embedding-rag-4enc</guid>
      <description>&lt;p&gt;In the previous post, we saw what chunking is and the various methdologies of chunking. In this post, we are going to see the next stage of the RAG pipeline - Embedding. &lt;/p&gt;

&lt;p&gt;&lt;strong&gt;What is Embedding ?&lt;/strong&gt;&lt;br&gt;
For each chunk, a vector will be generated. Vector is nothing but a list of numbers. Vector denotes a point in three dimensional space. This process is called embedding. &lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Why we need to generate a list of numbers in the first place ?&lt;/strong&gt;&lt;br&gt;
The whole idea of RAG is to enable &lt;strong&gt;semantic search&lt;/strong&gt;.&lt;br&gt;
Lets consider the following word pairs&lt;br&gt;
 1.Feline &amp;amp; cat&lt;br&gt;
 2.King &amp;amp; Queen &lt;br&gt;
Although words in each pair are different, meaning wise, words of the respective pairs are related to each other. &lt;br&gt;
Now let's consider another term, &lt;strong&gt;similarity&lt;/strong&gt;. It means how close two items are in nature. Combining semantic and similarity we get &lt;strong&gt;semantic similarity&lt;/strong&gt;. It refers to how close two items are related to each other in terms of intent, meaning and context. So in RAG,words which are semantic in nature(meaning is similar) occurs closer in multi dimensional space as vectors.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F4e7w6tfzon5hy3sp67kx.webp" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F4e7w6tfzon5hy3sp67kx.webp" alt=" " width="735" height="751"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Vectors are generated for each chunk and stored in vectorDB. User query will also be converted to vector. To return a relevant answer for the query, vector points which are of at close proximity to the query vector will be chosen. among them top n close points will be returned.By means of vectorisation, we can find and return the relevant information. This answers our earlier question, why vectors. &lt;/p&gt;

&lt;p&gt;*&lt;em&gt;How close proximity vector points are determined for the user query vector ? &lt;br&gt;
*&lt;/em&gt; There are several metrics to determine this:&lt;br&gt;
    1. Cosine similarity&lt;br&gt;
    2. Euclidean distance&lt;br&gt;
Most commonly used is cosine similarity. Now you may get another question, why cosine ? not Sin or Tan ?&lt;/p&gt;

&lt;p&gt;We basically need to find the points that are closer to each other i.e distance between them should be less. If the angle between is small, obviously distance between them will also be less. Cosine helps to achieve identify this notion. &lt;/p&gt;

&lt;p&gt;If the angle is almost 0 deg then the cos(0) is 1. This means vectors are nearer to each other and are highly related to each other If the angle is 90, then cos(90) is 0, vectors are not situated nearer. If the angle is 180 deg, cos(180) is -1. They are situated at opposite ends, not related to each other at all. Should not be taken into consideration. &lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fd5lf65a67wonotne5s7x.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fd5lf65a67wonotne5s7x.png" alt=" " width="800" height="273"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;When seeing sine, it does not provide clear distinction. For 0 degree, it returns 0 and for 90 deg also returns zero. We cannot distinguish whether the points are near or far as it returns same 0 value. Tan provides unpredictable values like infinity. Because of this, cosine is preferred. &lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fa2o48yy27bvelpe8qtl6.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fa2o48yy27bvelpe8qtl6.png" alt=" " width="800" height="400"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;So in essence, vector is list of numbers  that denotes a point in a n- dimensional space. Dimension can be of 256,..., 3000 +. i.e single point is list of 256 values or more. &lt;/p&gt;

&lt;p&gt;For the query vector, we can either find the distance between each vector and query - this is called KNN algorithm. Suppose if the data is really huge and if we can't afford to find the distance between each of the query, we can choose approximate number of points. This is called ANN. This is all about the need for vectorisation.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Now, lets see how we can choose a embedding model&lt;/strong&gt;&lt;br&gt;
 Some common categories to choose a embedding models are:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;1. By query type&lt;/strong&gt;&lt;br&gt;
   &lt;em&gt;a. Symmetric model&lt;/em&gt;&lt;br&gt;
      search query is identical to the provided documents. &lt;br&gt;
Example: If i ask to return other news article similar to the one that i provide, then we can use this model. Return the news article similar to one where PM asks not to buy gold. &lt;br&gt;
      Ex: Nomic-embed-text, qwen-3&lt;/p&gt;

&lt;p&gt;&lt;em&gt;b. Asymmetric model&lt;/em&gt;&lt;br&gt;
       Shorter query for longer documents. &lt;br&gt;
Ex: HR documents are stored. If we ask a query like, how many leaves are allowed ? we can go with this model type&lt;br&gt;
       Ex: Gemini&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;2. By Retrieval type&lt;/strong&gt;&lt;br&gt;
  &lt;em&gt;a. Dense embedding&lt;/em&gt;&lt;br&gt;
      To have more semantic understanding, we can go with this model. &lt;br&gt;
   Ex: cohere embed models, chatgpt oss 120b&lt;/p&gt;

&lt;p&gt;&lt;em&gt;b. Sparse embedding&lt;/em&gt;&lt;br&gt;
        Does a exact keyword search. Won't have semantic understanding at all. &lt;br&gt;
   Ex: BM- 25. This is based on term frequency (TF) and inverse document frequency (IDF)&lt;br&gt;
     Term frequency: Frequency of a word in a text. This can fail, if someone spams same word over and over. &lt;br&gt;
    Inverse term frequency : It considers How important a word is in the given text. It ignores the frequency of word. &lt;br&gt;
    Ex: is, and will be repeated but not much important. &lt;/p&gt;

&lt;p&gt;We can also use transformers to generate embeddings. &lt;br&gt;
&lt;code&gt;Transformers are made up of encoders and decoders. From transformers LLMs are built.&lt;/code&gt;&lt;/p&gt;

&lt;p&gt;Sometimes, if the document data is large, many vectors may be situated to the query point. Due to this, accuracy of the result generated might be reduced. Many vector points will be returned. While designing documents, need to keep track of this. &lt;/p&gt;

</description>
      <category>ai</category>
      <category>nlp</category>
      <category>rag</category>
      <category>tutorial</category>
    </item>
    <item>
      <title>Day 5 - Chunking continued - RAG</title>
      <dc:creator>Indumathi R</dc:creator>
      <pubDate>Fri, 15 May 2026 17:13:11 +0000</pubDate>
      <link>https://dev.to/indumathi__r/day-5-chunking-continued-rag-78n</link>
      <guid>https://dev.to/indumathi__r/day-5-chunking-continued-rag-78n</guid>
      <description>&lt;p&gt;&lt;strong&gt;Sliding window chunking&lt;/strong&gt;&lt;br&gt;
    To understand this method, we need to know about two parameters, &lt;strong&gt;window size&lt;/strong&gt; and &lt;strong&gt;step size&lt;/strong&gt;. Let's now see how with the help of these two parameters, sliding window chunking works. &lt;/p&gt;

&lt;p&gt;Consider the following :&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;&lt;p&gt;Sample text: &lt;br&gt;
&lt;code&gt;Redis is an open-source, in-memory data store that is primarily used as a cache, database, and message broker. Unlike traditional databases that store data on disk, Redis keeps data in memory (RAM), which makes data access extremely fast. It is commonly used in applications where high performance and low latency are critical, such as caching frequently accessed data, managing user sessions, real-time analytics, task queues, and messaging systems.&lt;/code&gt;&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Window size =15&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Step size =5&lt;/p&gt;&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;Window position is at the first character. It takes the first 15 characters and stores them in chunk1. &lt;br&gt;
&lt;code&gt;Redis is an op&lt;/code&gt;. &lt;br&gt;
Now the window moves, how farther it is gonna move will be based on step size. Since we are considering it as 5, window moves 5 characters. from that new moved point, it takes next 15 characters and store them in chunk 2 &lt;br&gt;
&lt;code&gt;s is an open-so&lt;/code&gt;&lt;/p&gt;

&lt;p&gt;Roughly, sliding window chunking looks like this. &lt;br&gt;
[Redis [is an [open-source], in-memory d[ata store] that is primarily used as a cache, database], and message broker. Unlike traditional]. &lt;/p&gt;

&lt;p&gt;Sliding window is more of a overlapping chunking. Unlike normal overlapping chunking, where we take 1/4th of previous sentence, we are doing a more extensive overlapping in this kind of sliding window chunking. &lt;/p&gt;

&lt;p&gt;In overlapping chunking, there is a limitation, if the text contains two unrelated ideas, by means of overlapping chunking, we are bringing them close together. We are forcefully making relationship. This can provide absurd results. Sliding window also carries this limitation. Token consumption will be more. As more number of chunks will be generated, equivalent number of token should also be generated. (tokens will be produced by embedding model)&lt;/p&gt;

&lt;p&gt;Another disadvantage with this approach is that,   point(generated from query), redundant results will be returned. (as there are several repetitions among several chunks). &lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Where sliding window chunking can be used ?&lt;/strong&gt;&lt;br&gt;
When the data in a text are not that related to each other and we need to explicitly establish a relationship between them, sliding window chunking can be used. In essence, to link less related items together. &lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Token based chunking&lt;/strong&gt;&lt;br&gt;
Input text is converted to tokens&lt;br&gt;
&lt;code&gt;Single word or character can be considered as token&lt;/code&gt; Each token will be assigned a number (like oneshot encoding). These numbers will be sent to embedding model for generating vector points. &lt;/p&gt;

&lt;p&gt;&lt;strong&gt;When can token based chunking be used?&lt;/strong&gt;&lt;br&gt;
When there is ratelimiting in the embedding model, we can choose this method, to give a set of tokens(say 100/200 etc). This is not much used. &lt;/p&gt;

&lt;p&gt;&lt;strong&gt;TOON (Token object oriented notation)&lt;/strong&gt;&lt;br&gt;
  to send json in a more compact manner to a LLM, notation was employed. But this is not much effective. &lt;/p&gt;

&lt;p&gt;Some of the commonly used chunking methdologies are shared in this and previous post. There isn't one size fits all chunking method. It varies based our usecase and dataset. &lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Converting Documents to chunks&lt;/strong&gt;&lt;br&gt;
Tools for converting documents to a text format so that it can be converted into proper chunks. pdfs cannot be processed as such.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;1.Pypdfloader from langchain&lt;/strong&gt;&lt;br&gt;
&lt;strong&gt;2.Pypdf&lt;/strong&gt;&lt;br&gt;
&lt;strong&gt;3.Mupdf&lt;/strong&gt; etc...&lt;br&gt;
&lt;strong&gt;4.Tessaract (for document containing scanned files)&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Here also there is no one best tool/package for processing pdfs. It varies based on document data. For special elements like tables in a documents there are few tools, that handles them. First we detect tables(means of regular expression like space before and after. Entire table will be converted into one chunk). We can also use tools like &lt;strong&gt;camelot&lt;/strong&gt; to processing tabular data. Sometimes there can be also images in a document. But in vector DB, it is quite difficult to link images and textual data together. This is all about chunking methdologies. &lt;/p&gt;

</description>
      <category>ai</category>
      <category>nlp</category>
      <category>rag</category>
      <category>tutorial</category>
    </item>
    <item>
      <title>Day 4 - Chunking continued - RAG</title>
      <dc:creator>Indumathi R</dc:creator>
      <pubDate>Tue, 12 May 2026 02:39:09 +0000</pubDate>
      <link>https://dev.to/indumathi__r/day-4-chunking-continued-rag-29kf</link>
      <guid>https://dev.to/indumathi__r/day-4-chunking-continued-rag-29kf</guid>
      <description>&lt;p&gt;&lt;strong&gt;Semantic Chunking&lt;/strong&gt;&lt;br&gt;
Lets Consider two paragraphs A and B, focussing on strings in python. para A focus on typecasting and para B focus on accessing characters. These two paragraphs are not that related to each other but if i do overlapping, these two points will be closer to each other. We do not want to forcefully bring the two paragraphs together. To solve this problem, semantic chunking can be used. &lt;/p&gt;

&lt;p&gt;It will continue to add sentence to a chunk until the relevancy is present. i.e It will take first sentence, since there is nothing to compare it will add it to a chunk. Next it will the take the second sentence and compare it with the previous sentence, if the relevancy factor is &amp;gt; 0.75 , second sentence will be added to chunk. Next sentence will be taken and compared with the previous sentence. If the relevancy factor is &amp;lt; 0.75, it won't be added to chunk otherwise it will be added. Semantic chunking can be achieved by means of nltk package. &lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Embedding Chunking&lt;/strong&gt;&lt;br&gt;
To find relationship between previous and current sentence, LLM will be used. i.e LLM calculates and produces a number that determines how much are the two sentences related with each other. &lt;/p&gt;

&lt;p&gt;There is no one best method to choose the chunking methodology. It varies based upon the dataset. We can do trial and error to determine the methdology suitable for us. &lt;/p&gt;

</description>
      <category>ai</category>
      <category>nlp</category>
      <category>python</category>
      <category>rag</category>
    </item>
    <item>
      <title>Day 3 - Chunking - RAG</title>
      <dc:creator>Indumathi R</dc:creator>
      <pubDate>Sun, 10 May 2026 09:24:48 +0000</pubDate>
      <link>https://dev.to/indumathi__r/day-3-chunking-rag-2a19</link>
      <guid>https://dev.to/indumathi__r/day-3-chunking-rag-2a19</guid>
      <description>&lt;p&gt;&lt;strong&gt;What is chunking ?&lt;/strong&gt;&lt;br&gt;
It is one of the step in RAG pipeline. Dividing a large document into several small parts. Each small part is called chunk. Chunking means dividing.Let's consider this following passage:&lt;/p&gt;

&lt;p&gt;&lt;code&gt;Redis  is a high-speed, in-memory data structure store that functions as a database, cache, message broker, and streaming engine. It is widely used for real-time applications because it keeps data in RAM rather than on disk, enabling sub-millisecond response times. Unlike traditional databases (like MySQL or PostgreSQL) that read from a hard drive, Redis operates in the computer's main memory, which is significantly faster.&lt;/code&gt;&lt;/p&gt;

&lt;p&gt;We are going to give the whole passage to the embedding model. It will generate a point (let's consider it as P1)and it is stored in vector DB. There is a small problem with this approach. If i ask a query like , "How redis functions ? " intended answer for this question will be "database, cache, message broker, and streaming engine". However, since the entire passage is stored as single point, it wont retrieve the specific part, it will return the entire passage. To get only the specific part and leave out irrelevant parts as an answer to the query, chunking is very important. &lt;/p&gt;

&lt;p&gt;Chunking can be performed in two ways:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Discrete chunking&lt;/li&gt;
&lt;li&gt;Semantic chunking&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;How Small a chunk should be or what should be the size of a chunk ?&lt;/strong&gt; &lt;br&gt;
If i ask a question "How are you ? " to LLM, if it answers as "sun rises in the east", it is irrelevant but the stmt provided is not wrong. It is just irrelevant to the question provided. LLM wont just say, i dont know, it tries to make up some answer. By means of chunking, we are going to tweak the way in which LLM provides answer. &lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Discrete chunking&lt;/strong&gt;&lt;br&gt;
    Fixed logic to generate chunk; Let's see some types in discrete chunking :&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Fixed Chunking&lt;/strong&gt;&lt;br&gt;
If i say size as 25 characters, each chunk will contain only 25 characters. In a paragraph, first 25 characters will be in chunk1 , next 25 characters will be in chunk2 etc... In the redis passage, if i start to split into 25 characters, first chunk would be &lt;code&gt;Redis  is a high-speed i&lt;/code&gt; second chunk would be &lt;code&gt;n memory data structure&lt;/code&gt; etc. When we see these chunks, we can see that, meaning of the words is lost due to splitting. What can we infer from this chunk &lt;code&gt;Redis  is a high-speed i&lt;/code&gt; meaning is lost right ? &lt;/p&gt;

&lt;p&gt;&lt;strong&gt;How can we better do chunking in this ?&lt;/strong&gt;&lt;br&gt;&lt;br&gt;
Besides taking 25 characters, we can take till sentence get completed i.e 25 characters and till fullstop. In this case, chunk 1 would be &lt;code&gt;Redis  is a high-speed in-memory data structure store that functions as a database, cache, message broker, and streaming engine&lt;/code&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Overlapping chunks&lt;/strong&gt;&lt;br&gt;
Taking from the heading, words between the chunks would be overlapped. i.e Consider the first sentence as &lt;code&gt;Redis  is a high-speed in-memory data structure store that functions as a database, cache, message broker, and streaming engine&lt;/code&gt; and second sentence as  &lt;code&gt;It is widely used for real-time applications because it keeps data in RAM rather than on disk, enabling sub-millisecond response times&lt;/code&gt;. If overlapping chunking is applied,few words from the last sentence would be added to starting of next sentence. i.e&lt;br&gt;
Chunk 1 would be  &lt;code&gt;Redis  is a high-speed in-memory data structure store that functions as a database, cache, message broker, and streaming engine&lt;/code&gt; and Chunk 2 would be &lt;code&gt;database, cache, message broker, and streaming engine. It is widely used for real-time applications because it keeps data in RAM rather than on disk, enabling sub-millisecond response times&lt;/code&gt; . &lt;/p&gt;

&lt;p&gt;Sometimes there are chances for the points to be plotted farther from each other although the texts are closely related to each other. overlapping chunking will reduce this event to some extent. &lt;/p&gt;

</description>
      <category>ai</category>
      <category>beginners</category>
      <category>rag</category>
      <category>tutorial</category>
    </item>
    <item>
      <title>Day 2 - RAG - What is Vector DB ?</title>
      <dc:creator>Indumathi R</dc:creator>
      <pubDate>Fri, 08 May 2026 02:13:27 +0000</pubDate>
      <link>https://dev.to/indumathi__r/day-2-rag-what-is-vector-db--527m</link>
      <guid>https://dev.to/indumathi__r/day-2-rag-what-is-vector-db--527m</guid>
      <description>&lt;p&gt;To recall, Integrating our private documents with LLM  is called RAG. &lt;/p&gt;

&lt;p&gt;Lets assume that, we have some pdfs containing our data. That data in the pdf will be broken down into chunks based on some criteria. That chunk will be fed as input to the model. More specifically embedding model. This model will generate a point. How the point is generated ?&lt;/p&gt;

&lt;p&gt;Lets take a simple example:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Today is Wednesday&lt;/li&gt;
&lt;li&gt;Tomorrow is Thursday&lt;/li&gt;
&lt;li&gt;I am travelling today&lt;/li&gt;
&lt;li&gt;Wednesday is a nice series&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;Lets construct a sentence now containing only unique words from the above set of sentences: &lt;br&gt;
&lt;code&gt;Today, is, Wednesday, Tomorrow, Thursday, I, am, travelling, a, nice, series&lt;/code&gt;&lt;/p&gt;

&lt;p&gt;We are now going to construct each of the 4 sentences into a number format. We will compare unique constructed sentence with each of the input sentence. If the input sentence contains a word from unique construct sentence, number 1 will be assigned to uniquely constructed sentence otherwise 0. &lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;1,  1, 1, 0, 0, 0, 0, 0, 0, 0, 0&lt;br&gt;
0, 1, 0, 0, 0, 1, 1, 0, 0, 0, 0, 0, 0&lt;br&gt;
1, 0, 0, 0, 0, 1, 1, 1, 0, 0, 0&lt;br&gt;
0,1, 1, 0, 0, 0, 0, 0, 1, 1, 1&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;This method of conversion is called a &lt;strong&gt;one shot encoding&lt;/strong&gt; &lt;/p&gt;

&lt;p&gt;Now coming to RAG, based on the context of the model, it will generate a point. Generated point will be multidimensional (x,y,z,a ...). Generated points will enable semantic search. What is semantic search ? It will help us to know, how two points are closely related to each other. Meaning based search is called semantic search. For each chunk, a point will be generated. Then model based on its context plots it.  Related points appear together. &lt;/p&gt;

&lt;p&gt;Vector DB provides a place to store related points together and when quering on the data, it provides the related data. &lt;/p&gt;

&lt;p&gt;*&lt;em&gt;How do we say that two points are closer to each other ?&lt;br&gt;
*&lt;/em&gt;&lt;br&gt;
When distance is less we say that the two points are closer to each other. Just because there are two points, we can't always say that they are nearer to each other. We need to bring in another point.(for comparison). To find distance between points, there are several algorithms: Euclidean, Cosine Similarity, Manhattan distance. &lt;/p&gt;

&lt;p&gt;Lets take Cosine similarity and see how it works:&lt;br&gt;
There are three points(p1,p2,p3) plotted in a graph. From origin, a straight line will be drawn to each of the points. The lines forming an angle with point3 will be considered and its angle will be noted. Cosine of the angle will be taken. smallest cosine angle will be the shortest point. &lt;/p&gt;

&lt;p&gt;There are 100 points. if i want to find the nearest points for a point named x, i need to calculate distance between x to all other remaining points. Then only i can arrive the nearest points. But this approach is time consuming. &lt;/p&gt;

&lt;p&gt;So a pipeline for RAG is, data will be given to a embedding model(nomic-embeed text), it will a generate a point (mathematical representation of the data). This point will be stored in a vector DB. Some examples of vector DB are chromaDB(general purpose), pinecone, FAISS(high similarity), Quadrant(images) etc. &lt;/p&gt;

&lt;p&gt;If i ask any query, it will be sent to embedding model and generate a point and store it in the vector DB and returns the points(say like 5) that are nearer to the query point.  This is all about Vector DB&lt;/p&gt;

</description>
      <category>ai</category>
      <category>database</category>
      <category>llm</category>
      <category>rag</category>
    </item>
    <item>
      <title>Day 1 - RAG</title>
      <dc:creator>Indumathi R</dc:creator>
      <pubDate>Mon, 04 May 2026 04:11:00 +0000</pubDate>
      <link>https://dev.to/indumathi__r/day-1-rag-416l</link>
      <guid>https://dev.to/indumathi__r/day-1-rag-416l</guid>
      <description>&lt;p&gt;RAG stands for Retrieval Augmented Generation. Why do we even need RAG?? To answer this lets take a look at What LLMs and SLMs are. &lt;/p&gt;

&lt;p&gt;LLM(Large Language Model). Data on several categories(generalized) will be given as input. From that, a model would be created. What is a model ? To understand this, lets take mathematical equation of a straight line &lt;/p&gt;

&lt;p&gt;&lt;code&gt;y = mx +c&lt;/code&gt;&lt;/p&gt;

&lt;p&gt;Lets take x values to be 1, 2, 3, ... and y values to be 2, 4, 6, 8, 10. We can use whatever values for m and c to get our desired y value(like 2, 4 etc). Instead of a simple linear equation, we can also consider double, cubic or equations(order of the variables like x^2, x^3 etc...). When we say a model is os of 4b parametrs, 120b parameters and all , it refers to a big equation. Using the input data, a mathematical equation is being created. Larger the equation, more better the result will be. i.e if model is exposed and trained on several amount of data, results generated will also be more relevant and good. &lt;/p&gt;

&lt;p&gt;LLMs predict the next word. If we give hello, it may give hello world. We can control how the output should be generated by LLM. like factual or imaginative type. This is determined by a factor used in LLM called &lt;strong&gt;Temperature&lt;/strong&gt;. Higher the temperature, more factual it will be. Lower the temperature, output will be more imaginative. &lt;/p&gt;

&lt;p&gt;&lt;code&gt;Temperature is meant for a single query&lt;/code&gt;&lt;/p&gt;

&lt;p&gt;SLM(Small Language model)&lt;br&gt;
  Instead of training the data on vast amount of data across all categories, training a model on the data of specific domain to solve a set of tasks from that domain (like speech to text generation) is referred to as small language model. &lt;/p&gt;

&lt;p&gt;&lt;code&gt;Think of it like this, LLMs are generic and SLMs are specific&lt;/code&gt;&lt;/p&gt;

&lt;p&gt;If we ask a question to LLM based on the data it was trained, we will be getting a good result. But, if we ask a question which is out of the scope of trained data, it will try to answer it i.e makes up answer on its own. This is called &lt;strong&gt;hallucination&lt;/strong&gt;. (wont say like i dont know it, unless we explicitly prompt it). &lt;/p&gt;

&lt;p&gt;Analogy: Lets take GPT-OSS model (released at around 2025). If we ask the model now about the Iran-Isreal war, it wont know about it. As the war did not happen at 2025.&lt;/p&gt;

&lt;p&gt;In the sameway think about this, In our company, we have some set of data stored in doc, wikis etc. Models out there (gemini, claude) wont know about it. Somehow, if were able to link the LLMs with our private data, we can use that LLM for our internal usage in our company/personal use. This is called RAG. i.e Linking LLM with our data and asking LLM some questions about our data is what RAG is.&lt;/p&gt;

&lt;p&gt;One of the approach to achieve LLM to answer our queries on private data is to train the LLMs with the private data. This is one way but not the only way. &lt;/p&gt;

&lt;p&gt;Another way is, uploading documents into a vector DB. Before getting into deep in this. Lets first What is vector ? one that has direction and magnitude. For our case, we wont be dealing with direction only dealing with magnitude. &lt;/p&gt;

&lt;p&gt;We will be breaking the document into several chunks and convert it into points and plot it in a graph. Lets just plot apple, orange, pear, doctor as points in a graph. Which two are points are releveant here? apple and doctor(apple a day keeps a doctor away), how more relevant ? How to find this. Two points are said to be closer, if the distance between the two points are less. (This is with respect to 2d). It can go upto 700D. &lt;/p&gt;

&lt;p&gt;Why did we put doctor closer to apple ? Normally a sentence will be broken into chunks. These chunks wil be given to LLM and it gives points. Based on the context it was trained, it generates points. The closer points will be related to each other. &lt;/p&gt;

&lt;p&gt;In essence, our private document will be broken down into several chunk. For each chunk, a point will be generated and plotted in vector DB. &lt;/p&gt;

&lt;p&gt;Analogy: ANN(Approximate nearest neighbour) is one of the algorithms used in spotify like platform to find relevancy between items and suggest relevant items&lt;/p&gt;

</description>
      <category>ai</category>
      <category>rag</category>
    </item>
  </channel>
</rss>
