<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>DEV Community: Mahek Sayyad</title>
    <description>The latest articles on DEV Community by Mahek Sayyad (@maheksayyad).</description>
    <link>https://dev.to/maheksayyad</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F3881064%2Fbc82e717-3ab4-47b4-9d68-ba28df14a1e5.jpeg</url>
      <title>DEV Community: Mahek Sayyad</title>
      <link>https://dev.to/maheksayyad</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://dev.to/feed/maheksayyad"/>
    <language>en</language>
    <item>
      <title>Building Production RAG Pipelines: 3 Lessons from My First Year</title>
      <dc:creator>Mahek Sayyad</dc:creator>
      <pubDate>Wed, 15 Apr 2026 18:42:09 +0000</pubDate>
      <link>https://dev.to/maheksayyad/building-production-rag-pipelines-3-lessons-from-my-first-year-4mdk</link>
      <guid>https://dev.to/maheksayyad/building-production-rag-pipelines-3-lessons-from-my-first-year-4mdk</guid>
      <description>&lt;p&gt;&lt;em&gt;How I went from RAG beginner to production-ready, handling 50K+ queries per month&lt;/em&gt;&lt;/p&gt;




&lt;h2&gt;
  
  
  TL;DR
&lt;/h2&gt;

&lt;p&gt;I spent a year building RAG (Retrieval-Augmented Generation) systems at TCS, going from experiments to production systems handling 50,000–80,000 HR queries monthly. Here are 3 hard-earned lessons about chunking, embedding models, and retrieval strategies that I wish I knew on day one.&lt;/p&gt;




&lt;h2&gt;
  
  
  The Problem: HR Was Drowning in Queries
&lt;/h2&gt;

&lt;p&gt;Picture this: HR teams fielding &lt;strong&gt;50,000 to 80,000 queries every month&lt;/strong&gt;. Most were repetitive: &lt;em&gt;"How do I apply for leave?"&lt;/em&gt; &lt;em&gt;"What's the laptop policy?"&lt;/em&gt; &lt;em&gt;"When is the next pay cycle?"&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;The knowledge existed in documents, but finding it was painfully slow.&lt;/p&gt;

&lt;p&gt;Our team decided to build a RAG-powered system that could:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;Understand&lt;/strong&gt; natural language questions&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Retrieve&lt;/strong&gt; relevant policy documents&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Generate&lt;/strong&gt; accurate, contextual answers&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;Here's what I learned.&lt;/p&gt;




&lt;h2&gt;
  
  
  Lesson 1: Chunking Strategy Makes or Breaks Your RAG
&lt;/h2&gt;

&lt;h3&gt;
  
  
  The Mistake
&lt;/h3&gt;

&lt;p&gt;My first attempt? Split documents every 500 characters with 50-character overlap. Simple, right?&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Result:&lt;/strong&gt; Terrible retrieval. We were cutting sentences mid-way and losing context.&lt;/p&gt;

&lt;h3&gt;
  
  
  The Fix
&lt;/h3&gt;

&lt;p&gt;I learned to respect &lt;strong&gt;semantic boundaries&lt;/strong&gt;:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Preserve sentences:&lt;/strong&gt; Don't split mid-sentence unless absolutely necessary&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Paragraph awareness:&lt;/strong&gt; Keep related sentences together&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Context windows:&lt;/strong&gt; Ensure each chunk has enough context to be understood independently
&lt;/li&gt;
&lt;/ul&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="c1"&gt;# What I implemented
&lt;/span&gt;&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;smart_chunk&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;text&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;chunk_size&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;512&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;overlap&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;50&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="sh"&gt;"""&lt;/span&gt;&lt;span class="s"&gt;Chunk text while preserving sentence boundaries&lt;/span&gt;&lt;span class="sh"&gt;"""&lt;/span&gt;
    &lt;span class="n"&gt;chunks&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[]&lt;/span&gt;
    &lt;span class="n"&gt;start&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt;

    &lt;span class="k"&gt;while&lt;/span&gt; &lt;span class="n"&gt;start&lt;/span&gt; &lt;span class="o"&gt;&amp;lt;&lt;/span&gt; &lt;span class="nf"&gt;len&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;text&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
        &lt;span class="n"&gt;end&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;min&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;start&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="n"&gt;chunk_size&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nf"&gt;len&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;text&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;

        &lt;span class="c1"&gt;# Look for sentence boundary in overlap region
&lt;/span&gt;        &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;end&lt;/span&gt; &lt;span class="o"&gt;&amp;lt;&lt;/span&gt; &lt;span class="nf"&gt;len&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;text&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
            &lt;span class="n"&gt;search_start&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;max&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;end&lt;/span&gt; &lt;span class="o"&gt;-&lt;/span&gt; &lt;span class="mi"&gt;100&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;start&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
            &lt;span class="n"&gt;sentence_end&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;text&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;rfind&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;. &lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;search_start&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;end&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
            &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;sentence_end&lt;/span&gt; &lt;span class="o"&gt;!=&lt;/span&gt; &lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
                &lt;span class="n"&gt;end&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;sentence_end&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt;

        &lt;span class="n"&gt;chunks&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;append&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;text&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;start&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="n"&gt;end&lt;/span&gt;&lt;span class="p"&gt;].&lt;/span&gt;&lt;span class="nf"&gt;strip&lt;/span&gt;&lt;span class="p"&gt;())&lt;/span&gt;
        &lt;span class="n"&gt;start&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;end&lt;/span&gt; &lt;span class="o"&gt;-&lt;/span&gt; &lt;span class="n"&gt;overlap&lt;/span&gt;

    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;chunks&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  The Impact
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;40% reduction&lt;/strong&gt; in manual HR queries after implementing semantic chunking.&lt;/p&gt;




&lt;h2&gt;
  
  
  Lesson 2: Not All Embedding Models Are Equal
&lt;/h2&gt;

&lt;h3&gt;
  
  
  The Mistake
&lt;/h3&gt;

&lt;p&gt;I started with the first embedding model I found. It seemed to work in testing.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Result:&lt;/strong&gt; In production, similar queries were retrieving completely wrong documents. "Leave policy" was returning "Laptop policy" because both contained words like "application" and "request."&lt;/p&gt;

&lt;h3&gt;
  
  
  The Fix
&lt;/h3&gt;

&lt;p&gt;I learned to &lt;strong&gt;evaluate embeddings for my specific domain&lt;/strong&gt;:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Model&lt;/th&gt;
&lt;th&gt;Size&lt;/th&gt;
&lt;th&gt;Speed&lt;/th&gt;
&lt;th&gt;Domain Fit&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;all-MiniLM-L6-v2&lt;/td&gt;
&lt;td&gt;80MB&lt;/td&gt;
&lt;td&gt;Fast&lt;/td&gt;
&lt;td&gt;General purpose&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;all-mpnet-base-v2&lt;/td&gt;
&lt;td&gt;400MB&lt;/td&gt;
&lt;td&gt;Medium&lt;/td&gt;
&lt;td&gt;Better semantic understanding&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;domain-specific&lt;/td&gt;
&lt;td&gt;Varies&lt;/td&gt;
&lt;td&gt;Varies&lt;/td&gt;
&lt;td&gt;Requires fine-tuning&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;&lt;strong&gt;What worked for us:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Start with &lt;code&gt;all-MiniLM-L6-v2&lt;/code&gt; for prototyping&lt;/li&gt;
&lt;li&gt;Upgrade to &lt;code&gt;all-mpnet-base-v2&lt;/code&gt; for production&lt;/li&gt;
&lt;li&gt;Consider domain fine-tuning if you have 10K+ documents&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  Key Insight
&lt;/h3&gt;

&lt;p&gt;Test your embeddings with &lt;strong&gt;real queries&lt;/strong&gt; from actual users. Synthetic test data lies.&lt;/p&gt;




&lt;h2&gt;
  
  
  Lesson 3: Hybrid Search Beats Pure Semantic Search
&lt;/h2&gt;

&lt;h3&gt;
  
  
  The Mistake
&lt;/h3&gt;

&lt;p&gt;I relied solely on vector similarity search. It worked great for conceptual queries like &lt;em&gt;"What's our remote work policy?"&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Result:&lt;/strong&gt; Failed miserably on specific queries like &lt;em&gt;"Leave policy version 3.2"&lt;/em&gt; or acronyms like &lt;em&gt;"PTO"&lt;/em&gt;.&lt;/p&gt;

&lt;h3&gt;
  
  
  The Fix
&lt;/h3&gt;

&lt;p&gt;I implemented &lt;strong&gt;hybrid search&lt;/strong&gt;: combine semantic search with keyword matching.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="c1"&gt;# Simplified hybrid search approach
&lt;/span&gt;&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;hybrid_search&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;query&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;top_k&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;5&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="c1"&gt;# Semantic search
&lt;/span&gt;    &lt;span class="n"&gt;semantic_results&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;vector_store&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;similarity_search&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;query&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;k&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;top_k&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

    &lt;span class="c1"&gt;# Keyword search (BM25 or simple TF-IDF)
&lt;/span&gt;    &lt;span class="n"&gt;keyword_results&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;keyword_search&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;query&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;k&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;top_k&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

    &lt;span class="c1"&gt;# Combine and re-rank
&lt;/span&gt;    &lt;span class="n"&gt;combined&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;merge_results&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;semantic_results&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;keyword_results&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;combined&lt;/span&gt;&lt;span class="p"&gt;[:&lt;/span&gt;&lt;span class="n"&gt;top_k&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  The Results
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Conceptual queries&lt;/strong&gt; → Semantic search wins&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Specific queries&lt;/strong&gt; → Keyword search wins
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Hybrid&lt;/strong&gt; → Best of both worlds&lt;/li&gt;
&lt;/ul&gt;




&lt;h2&gt;
  
  
  Bonus: The Forgotten Piece - Evaluation
&lt;/h2&gt;

&lt;p&gt;I almost forgot to mention: &lt;strong&gt;how do you know if your RAG is actually good?&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;I learned to track:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Retrieval accuracy:&lt;/strong&gt; Did we get the right documents?&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Answer relevance:&lt;/strong&gt; Does the generated answer actually help?&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;User feedback:&lt;/strong&gt; Simple 👍/👎 buttons teach you more than metrics&lt;/li&gt;
&lt;/ul&gt;




&lt;h2&gt;
  
  
  Where I Am Now
&lt;/h2&gt;

&lt;p&gt;That first RAG system? It's now handling queries across &lt;strong&gt;3 different business workflows&lt;/strong&gt; at TCS. The HR team? They went from drowning to actually having time for strategic work.&lt;/p&gt;

&lt;p&gt;I'm now working on LangGraph-powered agentic workflows and still learning every day.&lt;/p&gt;




&lt;h2&gt;
  
  
  Key Takeaways
&lt;/h2&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;Chunk smart, not just small&lt;/strong&gt; - Respect semantic boundaries&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Test embeddings with real data&lt;/strong&gt; - Your domain matters&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Hybrid search &amp;gt; pure semantic&lt;/strong&gt; - Combine approaches&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Measure everything&lt;/strong&gt; - You can't improve what you don't track&lt;/li&gt;
&lt;/ol&gt;




&lt;h2&gt;
  
  
  Your Turn
&lt;/h2&gt;

&lt;p&gt;Building RAG systems? I'd love to hear about your challenges.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;What's your biggest pain point with RAG right now?&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Chunking strategies?&lt;/li&gt;
&lt;li&gt;Embedding model selection?&lt;/li&gt;
&lt;li&gt;Retrieval accuracy?&lt;/li&gt;
&lt;li&gt;Something else?&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Drop a comment—let's learn together!&lt;/p&gt;




&lt;h2&gt;
  
  
  About Me
&lt;/h2&gt;

&lt;p&gt;I'm Mahek, an AI Engineer at TCS with a passion for making complex AI accessible. When I'm not building RAG pipelines, I'm conducting workshops (upsilled 60+ engineers so far!) or writing about lessons from production systems.&lt;/p&gt;

&lt;p&gt;🔗 &lt;a href="https://linkedin.com/in/maheksayyad" rel="noopener noreferrer"&gt;LinkedIn&lt;/a&gt; | 💻 &lt;a href="https://github.com/MahekSayyad" rel="noopener noreferrer"&gt;GitHub&lt;/a&gt;&lt;/p&gt;




&lt;p&gt;&lt;em&gt;Thanks for reading! If you found this helpful, consider following for more production AI insights.&lt;/em&gt;&lt;/p&gt;

</description>
      <category>ai</category>
      <category>programming</category>
      <category>tutorial</category>
      <category>rag</category>
    </item>
  </channel>
</rss>
