<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>DEV Community: Ramya Perumal</title>
    <description>The latest articles on DEV Community by Ramya Perumal (@ramya_perumal_e93721ef2fa).</description>
    <link>https://dev.to/ramya_perumal_e93721ef2fa</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F3900955%2F3e2feb4c-f889-4df6-b8ef-5b1a1ac619ca.png</url>
      <title>DEV Community: Ramya Perumal</title>
      <link>https://dev.to/ramya_perumal_e93721ef2fa</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://dev.to/feed/ramya_perumal_e93721ef2fa"/>
    <language>en</language>
    <item>
      <title>RAG- Understanding of Embedding</title>
      <dc:creator>Ramya Perumal</dc:creator>
      <pubDate>Sun, 17 May 2026 22:43:09 +0000</pubDate>
      <link>https://dev.to/ramya_perumal_e93721ef2fa/rag-understanding-of-embedding-nlk</link>
      <guid>https://dev.to/ramya_perumal_e93721ef2fa/rag-understanding-of-embedding-nlk</guid>
      <description>&lt;h2&gt;
  
  
  What is Embedding?
&lt;/h2&gt;

&lt;p&gt;After text is split into chunks, the next process is called embedding. In this step, each chunk is converted into vectors (points in vector space). In vector-based RAG systems, chunks are converted into vectors so that semantic search can be performed efficiently.&lt;/p&gt;

&lt;h2&gt;
  
  
  Why Do We Need to Convert Chunks into Vectors?
&lt;/h2&gt;

&lt;p&gt;The main goal of a RAG application is to achieve semantic search.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Semantic&lt;/strong&gt;&lt;br&gt;
&lt;strong&gt;Example&lt;/strong&gt;&lt;br&gt;
The word feline is related to the cat family, even though the words are different. Understanding that “feline” and “cat” are related is called semantic understanding.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Similarity&lt;/strong&gt;&lt;br&gt;
When a user asks a query, semantically related chunks are returned even though the exact words in the chunks may be different.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Semantic Similarity&lt;/strong&gt;&lt;br&gt;
Semantic similarity combines:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Intent&lt;/li&gt;
&lt;li&gt;Context&lt;/li&gt;
&lt;li&gt;Meaning&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The purpose is to establish relationships between the user query and the documents stored in the RAG system. This allows the system to retrieve relevant information from the database and provide it to the LLM for further processing.&lt;/p&gt;

&lt;p&gt;Words that are semantically related are usually stored closer together in multi-dimensional vector space.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Cosine Similarity&lt;/strong&gt;&lt;br&gt;
To determine how close vectors are to each other, cosine similarity is commonly used.&lt;/p&gt;

&lt;p&gt;When a user query arrives:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;The query is converted into a vector&lt;/li&gt;
&lt;li&gt;Cosine similarity is calculated between the query vector and stored vectors&lt;/li&gt;
&lt;li&gt;The closest vectors are retrieved&lt;/li&gt;
&lt;/ol&gt;

&lt;h2&gt;
  
  
  Retrieval Methodologies
&lt;/h2&gt;

&lt;p&gt;Two major retrieval methodologies are used:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;1. KNN (K-Nearest Neighbors)&lt;/strong&gt;&lt;br&gt;
KNN compares the query vector with all stored vectors one by one to find the nearest neighbors.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Advantage&lt;/strong&gt;&lt;br&gt;
More accurate retrieval&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Disadvantage&lt;/strong&gt;&lt;br&gt;
Slow for very large datasets&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;2. ANN (Approximate Nearest Neighbors)&lt;/strong&gt;&lt;br&gt;
ANN approximately finds the nearest vectors instead of comparing every single point.&lt;/p&gt;

&lt;p&gt;This method is mainly used when:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;The document volume is huge&lt;/li&gt;
&lt;li&gt;Faster retrieval is required&lt;/li&gt;
&lt;li&gt;Time constraints exist&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;ANN improves retrieval speed while sacrificing a small amount of accuracy.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Why Cosine Similarity Instead of Sine or Tangent?&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Cosine similarity works effectively because:&lt;br&gt;
If two vectors are very close and highly related, the cosine similarity value approaches 1. If the angle between vectors increases, the cosine similarity value decreases, meaning the vectors are less related&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Why Not Sine or Tangent?&lt;/strong&gt;&lt;br&gt;
For small angles:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Sine values remain close to 0&lt;/li&gt;
&lt;li&gt;Tangent values can fluctuate significantly&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;These measurements are not stable for semantic comparison. Cosine similarity provides a more reliable way to measure semantic closeness between vectors.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Embedding Dimensions&lt;/strong&gt;&lt;br&gt;
Embedding models can generate vectors with dimensions ranging from 256 to 3000 or more.&lt;/p&gt;

&lt;p&gt;The dimension size depends on the embedding model and the amount of contextual information it captures.&lt;/p&gt;

&lt;p&gt;Generally:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Higher dimensions capture richer semantic information&lt;/li&gt;
&lt;li&gt;Lower dimensions are faster and cheaper but may lose context&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Types of Embedding Models&lt;/strong&gt;&lt;br&gt;
Choosing an embedding model completely depends on the application scenario.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;1. Based on Query Type&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Symmetric Models&lt;/strong&gt;&lt;br&gt;
Symmetric embedding models are used when the query and the documents are similar in structure and length.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Examples&lt;/strong&gt;&lt;br&gt;
nomic-embed-text&lt;br&gt;
Qwen embeddings&lt;/p&gt;

&lt;p&gt;These are commonly used in semantic search systems.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Asymmetric Models&lt;/strong&gt;&lt;br&gt;
Asymmetric embedding models are used when:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Queries are short&lt;/li&gt;
&lt;li&gt;Documents are long&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Example&lt;/strong&gt;&lt;br&gt;
Google Gemini embedding models&lt;/p&gt;

&lt;p&gt;These models are optimized for retrieving long documents from short queries.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;2. Based on Retrieval Type&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Dense Embeddings&lt;/strong&gt;&lt;br&gt;
Dense embeddings mainly focus on semantic meaning.&lt;/p&gt;

&lt;p&gt;These embeddings generate dense vectors where most values contain meaningful information.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Examples&lt;/strong&gt;&lt;br&gt;
Cohere embedding models&lt;br&gt;
ChatGPT OSS 120B embeddings&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Advantage&lt;/strong&gt;&lt;br&gt;
Better semantic understanding&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Sparse Embeddings&lt;/strong&gt;&lt;br&gt;
Sparse embeddings mainly focus on exact keyword matching.&lt;/p&gt;

&lt;p&gt;They commonly use the BM25 (Best Match 25) algorithm, which is based on:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;TF (Term Frequency)&lt;/li&gt;
&lt;li&gt;IDF (Inverse Document Frequency)&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  TF-IDF Concepts
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;TF (Term Frequency)&lt;/strong&gt;&lt;br&gt;
Measures how many times a word appears in a document.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;IDF (Inverse Document Frequency)&lt;/strong&gt;&lt;br&gt;
Measures how important a word is across the entire document collection.Words that appear too frequently across all documents are considered less important.&lt;/p&gt;

&lt;h2&gt;
  
  
  Transformer Architecture
&lt;/h2&gt;

&lt;p&gt;The transformer architecture was a major breakthrough for LLMs.&lt;br&gt;
Transformers mainly contain:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Encoder&lt;/li&gt;
&lt;li&gt;Decoder&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Encoder&lt;/strong&gt;&lt;br&gt;
The encoder converts text into embeddings (vectors).&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Decoder&lt;/strong&gt;&lt;br&gt;
The decoder converts embeddings back into human-readable text after processing.&lt;/p&gt;

&lt;p&gt;This architecture enables modern LLMs to understand and generate natural language effectively.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Choosing a Vector Database&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Chroma&lt;/strong&gt;&lt;br&gt;
Open source&lt;br&gt;
Easy to set up&lt;br&gt;
Suitable for basic and small-scale applications&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;FAISS&lt;/strong&gt;&lt;br&gt;
Better for large document collections&lt;br&gt;
Optimized for high-performance semantic search&lt;br&gt;
Commonly used in production-scale retrieval systems&lt;/p&gt;

</description>
      <category>ai</category>
      <category>rag</category>
      <category>python</category>
      <category>beginners</category>
    </item>
    <item>
      <title>RAG - Sliding Window, Token Based Chunking and PDF Chunking Packages</title>
      <dc:creator>Ramya Perumal</dc:creator>
      <pubDate>Thu, 14 May 2026 23:25:36 +0000</pubDate>
      <link>https://dev.to/ramya_perumal_e93721ef2fa/rag-sliding-window-token-based-chunking-and-pdf-chunking-packages-18nd</link>
      <guid>https://dev.to/ramya_perumal_e93721ef2fa/rag-sliding-window-token-based-chunking-and-pdf-chunking-packages-18nd</guid>
      <description>&lt;h2&gt;
  
  
  &lt;strong&gt;Sliding Window Chunking&lt;/strong&gt;
&lt;/h2&gt;

&lt;p&gt;Sliding Window Chunking is a more intensive chunking mechanism.&lt;/p&gt;

&lt;p&gt;In this method, a window size is defined based on a character or token limit. Instead of creating completely separate chunks, the window moves forward gradually while keeping part of the previous content.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Ff3gx1hccb28qu5t5twi4.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Ff3gx1hccb28qu5t5twi4.png" alt=" " width="626" height="251"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;The character or token limit is called the &lt;strong&gt;window size&lt;/strong&gt;
&lt;/li&gt;
&lt;li&gt;The amount the window moves forward each time is called the &lt;strong&gt;step size&lt;/strong&gt;
&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;This is a stricter form of overlapping chunking.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;How it Works&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Suppose:&lt;/p&gt;

&lt;p&gt;Window size = 500 characters&lt;br&gt;
Step size = 100 characters&lt;/p&gt;

&lt;p&gt;The first chunk may contain characters 1–500.&lt;br&gt;
The second chunk starts after moving 100 characters and may contain characters 101–600.&lt;/p&gt;

&lt;p&gt;Because of this overlap, related information is repeatedly included across chunks.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Benefits&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;The major benefit of this method is that semantically related points are stored closer together in the vector database, almost like clusters. This improves retrieval in scenarios where context changes frequently.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Disadvantages&lt;/strong&gt;&lt;br&gt;
&lt;strong&gt;Problem 1: Higher Token Consumption&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Since overlapping data is repeatedly embedded, the embedding model consumes more tokens. This increases computational cost unless local embedding models are used.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Problem 2: Duplicate Retrieval&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Because related chunks are stored very close together, the LLM may retrieve multiple duplicate or nearly identical chunks instead of retrieving different contexts.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fzgegxk8g0uzqkaoyoyl0.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fzgegxk8g0uzqkaoyoyl0.png" alt=" " width="626" height="397"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;As a result:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Context diversity decreases&lt;/li&gt;
&lt;li&gt;Token usage increases&lt;/li&gt;
&lt;li&gt;Retrieval efficiency may reduce&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Where Sliding Window Chunking is Useful&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Sliding window chunking is useful in scenarios where context switching happens frequently.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Example: Source Code&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;In coding-related datasets:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Different parts of the code may not be directly related&lt;/li&gt;
&lt;li&gt;One service or module may trigger another service indirectly&lt;/li&gt;
&lt;li&gt;Important context may exist across multiple sections&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;For example, in microservices architecture:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;One service event may trigger another service&lt;/li&gt;
&lt;li&gt;Related logic may exist in different files or services&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Sliding window chunking helps preserve such relationships, even though it comes with higher token consumption.&lt;/p&gt;

&lt;h2&gt;
  
  
  &lt;strong&gt;Token Based Chunking&lt;/strong&gt;
&lt;/h2&gt;

&lt;p&gt;Token-based chunking mainly focuses on cost optimization and model limitations.&lt;/p&gt;

&lt;p&gt;LLMs process text as tokens rather than words.&lt;/p&gt;

&lt;p&gt;Depending on the tokenizer and model:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;One word may become a single token&lt;/li&gt;
&lt;li&gt;One word may become multiple tokens&lt;/li&gt;
&lt;li&gt;Sometimes even a single character can become a token&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Since models have token input limits, token-based chunking is used to ensure the content stays within the allowed token size.&lt;/p&gt;

&lt;p&gt;In this method:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Text is split based on token count&lt;/li&gt;
&lt;li&gt;Chunks are converted into embedding vectors&lt;/li&gt;
&lt;li&gt;Vectors are stored in the vector database&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;This method is mainly used when working with strict token constraints.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;TOON (Token-Oriented Object Notation)&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;TOON stands for Token-Oriented Object Notation.&lt;/p&gt;

&lt;p&gt;It is an alternative representation format designed to reduce token usage compared to JSON.&lt;/p&gt;

&lt;p&gt;JSON is human-readable, but repeated keys increase token consumption.&lt;/p&gt;

&lt;p&gt;Repeated structures and keys increase token usage.&lt;/p&gt;

&lt;p&gt;TOON reduces repeated keys and represents the same information in a more token-efficient format.&lt;/p&gt;

&lt;p&gt;The purpose is to reduce embedding and inference cost while preserving context.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;LLMLingua&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;LLMLingua is a framework used for prompt compression.&lt;/p&gt;

&lt;p&gt;It converts user queries or prompts into simplified versions while preserving the original meaning and context.&lt;/p&gt;

&lt;p&gt;The main goal is:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Reduce token consumption&lt;/li&gt;
&lt;li&gt;Lower inference cost&lt;/li&gt;
&lt;li&gt;Improve efficiency&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;However, aggressive compression may sometimes reduce retrieval quality compared to the original JSON or text structure.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Summary of Chunking Methods&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;The commonly used chunking methods are:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Fixed Chunking&lt;/li&gt;
&lt;li&gt;Overlapping Chunking&lt;/li&gt;
&lt;li&gt;Semantic Chunking&lt;/li&gt;
&lt;li&gt;Embedding-Based Chunking&lt;/li&gt;
&lt;li&gt;Sliding Window Chunking&lt;/li&gt;
&lt;li&gt;Token-Based Chunking&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;These methods represent different approaches and trade-offs.&lt;/p&gt;

&lt;p&gt;In real-world applications, multiple chunking methods are often combined depending on:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Dataset type&lt;/li&gt;
&lt;li&gt;Retrieval quality&lt;/li&gt;
&lt;li&gt;Cost constraints&lt;/li&gt;
&lt;li&gt;Token limitations&lt;/li&gt;
&lt;li&gt;Application requirements&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;There is no single chunking strategy that works best for every dataset.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;PDF Reading in RAG Systems&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;To process documents such as company internal communication files, PDFs must first be converted into readable text.&lt;/p&gt;

&lt;p&gt;Several libraries are commonly used for this purpose under LangChain Framework:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;PyPDFLoader&lt;/li&gt;
&lt;li&gt;PyPDF&lt;/li&gt;
&lt;li&gt;PyMuPDF&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Different document types require different processing approaches. A single package may not work effectively for all document formats.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Challenges in PDF Processing&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Documents may contain:&lt;/p&gt;

&lt;p&gt;Scanned images&lt;br&gt;
Multi-column layouts&lt;br&gt;
Tables&lt;br&gt;
Handwritten text&lt;br&gt;
Two-sided scanned pages&lt;/p&gt;

&lt;p&gt;Because of this, preprocessing becomes an important step.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Tools Used in PDF Processing&lt;/strong&gt;&lt;br&gt;
&lt;strong&gt;Camelot&lt;/strong&gt;&lt;br&gt;
Camelot is commonly used to extract table content from PDFs.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Tesseract&lt;/strong&gt; &lt;br&gt;
Tesseract or computer vision models are used to convert scanned images into readable text documents.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Final RAG Flow for Documents&lt;/strong&gt;&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Raw documents are collected&lt;/li&gt;
&lt;li&gt;Images, tables, and scanned content are converted into text&lt;/li&gt;
&lt;li&gt;Data is cleaned &lt;/li&gt;
&lt;li&gt;Documents are split into chunks using chunking methods&lt;/li&gt;
&lt;li&gt;Chunks are converted into embedding vectors&lt;/li&gt;
&lt;li&gt;Vectors are stored in the vector database for retrieval&lt;/li&gt;
&lt;/ol&gt;

</description>
      <category>ai</category>
      <category>beginners</category>
      <category>rag</category>
    </item>
    <item>
      <title>RAG - Chunking</title>
      <dc:creator>Ramya Perumal</dc:creator>
      <pubDate>Mon, 11 May 2026 03:16:49 +0000</pubDate>
      <link>https://dev.to/ramya_perumal_e93721ef2fa/rag-chunking-1db</link>
      <guid>https://dev.to/ramya_perumal_e93721ef2fa/rag-chunking-1db</guid>
      <description>&lt;h2&gt;
  
  
  &lt;strong&gt;What is chunking&lt;/strong&gt;
&lt;/h2&gt;

&lt;p&gt;Chunking is the process of breaking data into smaller pieces called chunks. Chunking happens before the data is fed into an embedding model, which converts each chunk into a vector (point) and stores the converted vectors in a vector database.&lt;/p&gt;

&lt;h2&gt;
  
  
  &lt;strong&gt;Why chunking Matters in RAG&lt;/strong&gt;
&lt;/h2&gt;

&lt;p&gt;Data can contain different types of context while still relating to the same topic.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fhacimk51vrozqg07ypuo.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fhacimk51vrozqg07ypuo.png" alt=" " width="800" height="396"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;From the above example, we may have a paragraph related to the Redis database that contains multiple contexts. An embedding model like nomic-embed-text converts the entire paragraph into a single vector point and stores it in the database. &lt;/p&gt;

&lt;p&gt;This is where chunking plays a major role. Proper chunking helps retrieve only the most relevant information and avoids unrelated content.&lt;/p&gt;

&lt;p&gt;For example, if a chunk contains information about both Python and Java, a query about Python may also retrieve Java-related information because both topics exist in the same chunk. Effective chunking helps prevent unrelated data from being retrieved.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fsm1ysbsjj0i6rr762o5c.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fsm1ysbsjj0i6rr762o5c.png" alt=" " width="731" height="316"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Even an entire document can be stored as a single chunk. However, the purpose of chunking is to split the data into smaller meaningful sections so that only relevant data is retrieved for the user query while avoiding irrelevant information.&lt;/p&gt;

&lt;h2&gt;
  
  
  &lt;strong&gt;Chunking Method(Discrete way - formula methodology)&lt;/strong&gt;
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;Fixed Chunking&lt;/strong&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Fixed chunking is the most common chunking method. In this approach, a fixed character or token limit is assigned to every chunk.&lt;/p&gt;

&lt;p&gt;There is no single best chunking strategy for all datasets. Choosing the right chunk size usually requires a trial-and-error approach.&lt;/p&gt;

&lt;p&gt;Disadvantage&lt;br&gt;
A chunk may break in the middle of a sentence, resulting in incomplete context. This can reduce retrieval quality and may lead to irrelevant results.&lt;/p&gt;

&lt;p&gt;Solution&lt;br&gt;
One way to overcome this issue is to allow the chunk to continue until the sentence ends by checking for punctuation such as "." or spaces.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;Overlapping chunking&lt;/strong&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;In some cases, related information may be stored far apart in vector space due to the embedding model’s understanding. As a result, the LLM may miss relevant information during retrieval.&lt;/p&gt;

&lt;p&gt;To overcome this issue, overlapping chunking is used.&lt;/p&gt;

&lt;p&gt;In overlapping chunking, each chunk includes a portion of the previous chunk’s ending content. This helps the embedding model place related chunks closer together in the vector database.&lt;/p&gt;

&lt;p&gt;The purpose of overlapping is to improve retrieval by making semantically related chunks easier to find.&lt;/p&gt;

&lt;p&gt;Disadvantage&lt;/p&gt;

&lt;p&gt;There is a possibility that irrelevant information may also be retrieved because of the overlap.&lt;/p&gt;

&lt;p&gt;Example&lt;/p&gt;

&lt;p&gt;Suppose:&lt;/p&gt;

&lt;p&gt;Paragraph 1 is related to Topic A&lt;br&gt;
Paragraph 2 is related to Topic B&lt;/p&gt;

&lt;p&gt;If overlapping is applied, a query about Topic B may also retrieve some information from Topic A because part of Paragraph 1 overlaps with Paragraph 2.&lt;/p&gt;

&lt;p&gt;In such scenarios, storing these chunks closer together may not be necessary. This is where semantic chunking becomes useful.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;Semantic Chunking&lt;/strong&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Another scenario is when two paragraphs discuss the same topic but are not strongly related to each other. Normally, these paragraphs may still be stored nearby in the vector database. In such cases, overlapping chunking may not be necessary.&lt;/p&gt;

&lt;p&gt;Semantic chunking solves this problem by grouping content based on meaning rather than fixed size.&lt;/p&gt;

&lt;p&gt;In this method, each sentence is compared with the previous chunk using a similarity threshold value.&lt;/p&gt;

&lt;p&gt;If the similarity score is below the threshold value, the sentence becomes a separate chunk.&lt;br&gt;
If the similarity score is above the threshold value, it is added to the current chunk.&lt;/p&gt;

&lt;p&gt;Libraries such as NLTK can be used to implement semantic chunking. The threshold value is configurable based on the use case.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;Embedded Chunking&lt;/strong&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;In embedding-based chunking, embedding models are used instead of libraries like NLTK.&lt;/p&gt;

&lt;p&gt;This method works by calculating cosine similarity between sentences and grouping semantically similar sentences into chunks.&lt;/p&gt;

&lt;p&gt;Advantage&lt;br&gt;
Better semantic understanding&lt;br&gt;
More accurate chunk boundaries&lt;/p&gt;

&lt;p&gt;Disadvantage&lt;br&gt;
Higher computational cost&lt;br&gt;
Additional embedding model usage cost &lt;/p&gt;

&lt;h2&gt;
  
  
  &lt;strong&gt;Choosing the Right Chunking Method&lt;/strong&gt;
&lt;/h2&gt;

&lt;p&gt;Choosing a chunking method always involves trade-offs. There is no single chunking strategy that works for all datasets.&lt;/p&gt;

&lt;p&gt;The best chunking method depends on:&lt;/p&gt;

&lt;p&gt;Dataset type&lt;br&gt;
Cost&lt;br&gt;
Time&lt;br&gt;
Retrieval accuracy requirements&lt;br&gt;
Embedding model behavior&lt;/p&gt;

&lt;p&gt;Different applications may require different chunking strategies to achieve the best RAG performance.&lt;/p&gt;

</description>
      <category>ai</category>
      <category>llm</category>
      <category>machinelearning</category>
      <category>rag</category>
    </item>
    <item>
      <title>RAG - Vector DB</title>
      <dc:creator>Ramya Perumal</dc:creator>
      <pubDate>Thu, 30 Apr 2026 17:56:45 +0000</pubDate>
      <link>https://dev.to/ramya_perumal_e93721ef2fa/rag-vector-db-ah9</link>
      <guid>https://dev.to/ramya_perumal_e93721ef2fa/rag-vector-db-ah9</guid>
      <description>&lt;h2&gt;
  
  
  What is a Vector Database?
&lt;/h2&gt;

&lt;p&gt;A vector database is a database used to store vectors (points in space) where data with similar meanings are positioned close together. These vectors are generated using embedding models or LLM embedding models. One of the embedding models is &lt;strong&gt;nomic-embed-text&lt;/strong&gt;. We can download this model using Ollama.&lt;/p&gt;

&lt;h2&gt;
  
  
  Why Vector DB in RAG?
&lt;/h2&gt;

&lt;p&gt;One-hot encoding is a technique used to convert categorical data (like words) into binary vectors.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;How it works:&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Each unique word in a vocabulary is mapped to a vector that is mostly zeros except for a single 1 at a specific index.&lt;/p&gt;

&lt;p&gt;Example:&lt;/p&gt;

&lt;p&gt;Today is Wednesday&lt;br&gt;
Tomorrow is Thursday&lt;br&gt;
I am travelling Today&lt;br&gt;
Wednesday is a nice series&lt;/p&gt;

&lt;p&gt;Vocabulary values:&lt;/p&gt;

&lt;p&gt;[Today, is, Wednesday, Tomorrow, Thursday, I, am, Travelling, a, nice, series]&lt;/p&gt;

&lt;p&gt;Vector representation:&lt;/p&gt;

&lt;p&gt;Line 1 = [1,1,1,0,0,0,0,0,0,0,0]&lt;br&gt;
Line 2 = [0,1,0,1,1,0,0,0,0,0,0]&lt;br&gt;
Line 3 = [1,0,0,0,0,1,1,1,0,0,0]&lt;br&gt;
Line 4 = [0,1,1,0,0,0,0,0,1,1,1]&lt;/p&gt;

&lt;p&gt;Disadvantages:&lt;br&gt;
No semantic meaning&lt;br&gt;
High dimensionality&lt;br&gt;
Not scalable&lt;/p&gt;

&lt;p&gt;Because of these limitations, modern RAG systems use vector databases where chunks are converted into vectors in a high-dimensional space, where similar meanings are positioned close together.&lt;/p&gt;

&lt;h2&gt;
  
  
  How Data is Stored In a vector DB:
&lt;/h2&gt;

&lt;p&gt;Documents will be split into chunks. Each chunk will be converted into a vector using an embedding model. The resulting vector will be stored in the vector DB. Chunks with similar semantic meaning are stored closer together in vector space.&lt;/p&gt;

&lt;h2&gt;
  
  
  Similarity Search
&lt;/h2&gt;

&lt;p&gt;When a user query arrives, the LLM will search for the vectors that are closest to the user query by distance.&lt;/p&gt;

&lt;p&gt;To calculate the distance, we can use:&lt;br&gt;
Euclidean Distance (based on the Pythagorean theorem)&lt;br&gt;
Manhattan method&lt;br&gt;
Cosine similarity (finds the smaller angle to the user vector)&lt;/p&gt;

&lt;p&gt;Calculating similarity against every vector becomes computationally expensive. For that, we use ANN and KNN algorithms.&lt;/p&gt;

&lt;h2&gt;
  
  
  Popular Vector DBs
&lt;/h2&gt;

&lt;p&gt;Some of the popular vector databases are:&lt;br&gt;
Chroma&lt;br&gt;
FAISS&lt;br&gt;
Pinecone&lt;br&gt;
Qdrant – commonly used for embeddings, semantic search, and image similarity search.&lt;br&gt;
MongoDB – It also has vector database support&lt;/p&gt;

&lt;h2&gt;
  
  
  End-to-End Flow
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;Data Ingestion&lt;/strong&gt;&lt;br&gt;
Data or documents will be split into chunks.&lt;br&gt;
Each chunk will be converted into vectors using embedding models&lt;br&gt;
Stored in the vector DB.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Data Retrieval&lt;/strong&gt;&lt;br&gt;
User query will be converted into a vector using an embedding model. Semantically related vectors will be obtained using search algorithms in the vector DB. Along with the user query, the retrieved chunks are provided to the LLM as context to get output in human-readable format.&lt;/p&gt;

</description>
      <category>beginners</category>
      <category>ai</category>
    </item>
    <item>
      <title>Introduction to RAG</title>
      <dc:creator>Ramya Perumal</dc:creator>
      <pubDate>Mon, 27 Apr 2026 19:45:04 +0000</pubDate>
      <link>https://dev.to/ramya_perumal_e93721ef2fa/introduction-to-rag-4d0a</link>
      <guid>https://dev.to/ramya_perumal_e93721ef2fa/introduction-to-rag-4d0a</guid>
      <description>&lt;p&gt;Title: 40-days training on RAG(Day 1)&lt;/p&gt;

&lt;p&gt;RAG is Retrieval-Augmented Generation.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;What is a Model?&lt;/strong&gt;&lt;br&gt;
A model is nothing but an equation.&lt;br&gt;
Example:&lt;br&gt;
y=mx+c&lt;br&gt;
During training, values of x and y will be provided. The model has to find the appropriate values of m and c and try to make a line that best fits the graph. The values of m and c may vary depending on the use case.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;What is a Parameter?&lt;/strong&gt;&lt;br&gt;
A parameter is nothing but a variable that is learned during training.&lt;br&gt;
In the above equation:&lt;br&gt;
m is a parameter&lt;br&gt;
c is a parameter&lt;/p&gt;

&lt;p&gt;If the number of parameters is more, the model can learn more complex patterns.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;What is Temperature&lt;/strong&gt;&lt;br&gt;
Temperature controls the model's creativity. It usually ranges from 0 to 1.&lt;br&gt;
Lower temperature gives more factual answers.&lt;br&gt;
Higher temperature gives more imaginative answers.&lt;/p&gt;

&lt;p&gt;Temperature is passed along with the prompt input.&lt;br&gt;
Usually, it is kept around 0.5 for balanced output.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;SLM&lt;/strong&gt;&lt;br&gt;
SLM stands for Small Language Model.&lt;br&gt;
It usually has fewer billion parameters and is trained for a particular domain or specific tasks.&lt;br&gt;
Training cost can still be high, similar to LLMs, depending on the use case.&lt;br&gt;
Example: smallest ai - provides voice-based smaller AI models.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;LLM&lt;/strong&gt;&lt;br&gt;
LLM stands for Large Language Model.&lt;br&gt;
It usually has billions of parameters and contains knowledge from many domains. It is called a generalized model.&lt;br&gt;
Example: gpt-oss-120b.&lt;/p&gt;

&lt;p&gt;&lt;em&gt;How LLM Works&lt;/em&gt;&lt;br&gt;
The primary functionality of an LLM is to predict the next word correctly.&lt;br&gt;
It generates text by predicting one word after another based on previous words.&lt;br&gt;
Sometimes LLMs generate incorrect information confidently. This is called hallucination.&lt;br&gt;
Example:&lt;br&gt;
If the model knows about cats and dogs but has limited knowledge about lions, it may generate irrelevant or incorrect content.&lt;br&gt;
Hallucination can be reduced by writing proper prompts and providing correct context.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;What is RAG?&lt;/strong&gt;&lt;br&gt;
RAG stands for Retrieval-Augmented Generation.&lt;/p&gt;

&lt;p&gt;It is a method used to provide private or external knowledge such as:&lt;br&gt;
Company policies&lt;br&gt;
HR policy documents&lt;br&gt;
Internal business documents&lt;/p&gt;

&lt;p&gt;This information is given to the LLM so it can generate human-readable answers based on that content.&lt;/p&gt;

&lt;p&gt;&lt;em&gt;Where is Private Data Stored?&lt;/em&gt;&lt;br&gt;
Private data is usually stored in a database called a Vector Database.&lt;/p&gt;

&lt;p&gt;&lt;em&gt;How Documents are Stored&lt;/em&gt;&lt;br&gt;
Documents are split into smaller parts called chunks.&lt;br&gt;
These chunks are converted into numerical vectors and stored in the vector database.&lt;/p&gt;

&lt;p&gt;To search relevant chunks quickly, algorithms like:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;ANN&lt;/strong&gt; (Approximate Nearest Neighbors)&lt;br&gt;
&lt;strong&gt;KNN&lt;/strong&gt; (K-Nearest Neighbors) &lt;br&gt;
are commonly used.&lt;/p&gt;

&lt;p&gt;These kind of algorithm used to find next suggestion in spotify app , amazon etc..&lt;/p&gt;

&lt;p&gt;Thank you Syed Jafer for conducting this wonderful course.&lt;/p&gt;

</description>
      <category>ai</category>
      <category>beginners</category>
      <category>machinelearning</category>
      <category>rag</category>
    </item>
  </channel>
</rss>
