<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>DEV Community: Pinaki Batabyal</title>
    <description>The latest articles on DEV Community by Pinaki Batabyal (@logout007).</description>
    <link>https://dev.to/logout007</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F3935062%2F018b6eb5-8e42-435e-8886-4a4937249961.jpeg</url>
      <title>DEV Community: Pinaki Batabyal</title>
      <link>https://dev.to/logout007</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://dev.to/feed/logout007"/>
    <language>en</language>
    <item>
      <title>I Built a RAG Pipeline From Scratch and It Completely Changed How I Think About AI</title>
      <dc:creator>Pinaki Batabyal</dc:creator>
      <pubDate>Sat, 16 May 2026 15:34:24 +0000</pubDate>
      <link>https://dev.to/logout007/i-built-a-rag-pipeline-from-scratch-and-it-completely-changed-how-i-think-about-ai-3hb8</link>
      <guid>https://dev.to/logout007/i-built-a-rag-pipeline-from-scratch-and-it-completely-changed-how-i-think-about-ai-3hb8</guid>
      <description>&lt;h1&gt;
  
  
  I Built a RAG Pipeline From Scratch and It Completely Changed How I Think About AI
&lt;/h1&gt;

&lt;p&gt;I've been writing code for 3+ years. I thought I understood how AI worked.&lt;/p&gt;

&lt;p&gt;I didn't.&lt;/p&gt;

&lt;p&gt;Not until I sat down one weekend, opened a blank Node.js project, and decided to build something I'd been curious about for months — a system that could read a stack of PDFs and actually &lt;em&gt;answer questions&lt;/em&gt; about them. In plain English. With sources.&lt;/p&gt;

&lt;p&gt;What followed was honestly one of the most satisfying weeks of building I've ever had.&lt;/p&gt;




&lt;h2&gt;
  
  
  How It Started
&lt;/h2&gt;

&lt;p&gt;I'd been using ChatGPT like everyone else — pasting text, asking questions, getting answers. But I kept hitting the same wall: &lt;em&gt;it didn't know my documents&lt;/em&gt;. It couldn't read a specific PDF I had. It couldn't search across multiple files. It couldn't say "this answer is on page 12."&lt;/p&gt;

&lt;p&gt;I knew RAG (Retrieval-Augmented Generation) was the solution. I'd read about it. I understood it conceptually.&lt;/p&gt;

&lt;p&gt;Actually building it is a completely different thing.&lt;/p&gt;




&lt;h2&gt;
  
  
  The Moment It Clicked
&lt;/h2&gt;

&lt;p&gt;The first time I uploaded a PDF, typed a question, and got back a precise answer — with the exact page number — I genuinely sat back and stared at the screen for a few seconds.&lt;/p&gt;

&lt;p&gt;Not because it was magic. But because I &lt;em&gt;understood&lt;/em&gt; every single step that produced that answer. I wrote every line. I knew why it worked.&lt;/p&gt;

&lt;p&gt;That feeling is hard to describe. It's different from using a library or calling an API. This was mine, end to end.&lt;/p&gt;

&lt;p&gt;Here's the architecture I landed on:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;User types a question
    ↓
Embed the question (OpenAI text-embedding-3-small)
    ↓
Find the most similar chunks in the database (pgvector)
    ↓
Feed those chunks into GPT-4o-mini
    ↓
Get a precise, grounded answer
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Four steps. Deceptively simple on paper. Deeply interesting to build.&lt;/p&gt;




&lt;h2&gt;
  
  
  What I Learned Building Each Layer
&lt;/h2&gt;

&lt;h3&gt;
  
  
  Chunking is harder than it sounds
&lt;/h3&gt;

&lt;p&gt;My first attempt: split text every 500 characters. Done.&lt;/p&gt;

&lt;p&gt;The results were awful. Sentences got cut in half. Context got destroyed. The model would retrieve a chunk that started mid-sentence and couldn't make sense of it.&lt;/p&gt;

&lt;p&gt;The fix was breaking on sentence boundaries with overlap:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight javascript"&gt;&lt;code&gt;&lt;span class="kd"&gt;function&lt;/span&gt; &lt;span class="nf"&gt;chunkText&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;text&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;chunkSize&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;500&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;overlap&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;50&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;chunks&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[];&lt;/span&gt;
  &lt;span class="kd"&gt;let&lt;/span&gt; &lt;span class="nx"&gt;start&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

  &lt;span class="k"&gt;while &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;start&lt;/span&gt; &lt;span class="o"&gt;&amp;lt;&lt;/span&gt; &lt;span class="nx"&gt;text&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;length&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="kd"&gt;let&lt;/span&gt; &lt;span class="nx"&gt;end&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;start&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="nx"&gt;chunkSize&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

    &lt;span class="c1"&gt;// Don't cut mid-sentence — find the nearest period&lt;/span&gt;
    &lt;span class="k"&gt;if &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;end&lt;/span&gt; &lt;span class="o"&gt;&amp;lt;&lt;/span&gt; &lt;span class="nx"&gt;text&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;length&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
      &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;breakPoint&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;text&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;lastIndexOf&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;.&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;end&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
      &lt;span class="k"&gt;if &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;breakPoint&lt;/span&gt; &lt;span class="o"&gt;&amp;gt;&lt;/span&gt; &lt;span class="nx"&gt;start&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="nx"&gt;chunkSize&lt;/span&gt; &lt;span class="o"&gt;/&lt;/span&gt; &lt;span class="mi"&gt;2&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="nx"&gt;end&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;breakPoint&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
      &lt;span class="p"&gt;}&lt;/span&gt;
    &lt;span class="p"&gt;}&lt;/span&gt;

    &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;chunk&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;text&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;slice&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;start&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;end&lt;/span&gt;&lt;span class="p"&gt;).&lt;/span&gt;&lt;span class="nf"&gt;trim&lt;/span&gt;&lt;span class="p"&gt;();&lt;/span&gt;
    &lt;span class="k"&gt;if &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;chunk&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;length&lt;/span&gt; &lt;span class="o"&gt;&amp;gt;&lt;/span&gt; &lt;span class="mi"&gt;30&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="nx"&gt;chunks&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;push&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;chunk&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;

    &lt;span class="nx"&gt;start&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;end&lt;/span&gt; &lt;span class="o"&gt;-&lt;/span&gt; &lt;span class="nx"&gt;overlap&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt; &lt;span class="c1"&gt;// overlap = no lost context at boundaries&lt;/span&gt;
  &lt;span class="p"&gt;}&lt;/span&gt;

  &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="nx"&gt;chunks&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The 50-character overlap sounds tiny. It makes a huge difference.&lt;/p&gt;




&lt;h3&gt;
  
  
  Embeddings feel like magic until you understand them
&lt;/h3&gt;

&lt;p&gt;An embedding is just a list of 1536 numbers that represents the &lt;em&gt;meaning&lt;/em&gt; of a piece of text. Two sentences that mean similar things will produce similar number lists — even if they use completely different words.&lt;/p&gt;

&lt;p&gt;So "What are the safety requirements?" and "List the security rules" will have similar embeddings, even though they share no words. That's semantic search. That's what makes this better than ctrl+F.&lt;/p&gt;

&lt;p&gt;I chose &lt;code&gt;text-embedding-3-small&lt;/code&gt; over the older &lt;code&gt;ada-002&lt;/code&gt;. 80% cheaper, equal or better quality. Easy choice.&lt;/p&gt;

&lt;p&gt;Batch embedding 400 chunks and watching them all get stored in the database in about 8 seconds was one of those quiet "oh wow" moments.&lt;/p&gt;




&lt;h3&gt;
  
  
  pgvector is genuinely impressive
&lt;/h3&gt;

&lt;p&gt;I expected to need a dedicated vector database — Pinecone, Weaviate, Qdrant. I'd heard of all of them.&lt;/p&gt;

&lt;p&gt;Then I discovered &lt;code&gt;pgvector&lt;/code&gt; — a Postgres extension that adds a vector column type and similarity search operators. I already know SQL. I already use Supabase. It was a 5-line setup:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="k"&gt;create&lt;/span&gt; &lt;span class="n"&gt;extension&lt;/span&gt; &lt;span class="n"&gt;if&lt;/span&gt; &lt;span class="k"&gt;not&lt;/span&gt; &lt;span class="k"&gt;exists&lt;/span&gt; &lt;span class="n"&gt;vector&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

&lt;span class="k"&gt;create&lt;/span&gt; &lt;span class="k"&gt;table&lt;/span&gt; &lt;span class="n"&gt;document_chunks&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;
  &lt;span class="n"&gt;id&lt;/span&gt;         &lt;span class="n"&gt;uuid&lt;/span&gt; &lt;span class="k"&gt;primary&lt;/span&gt; &lt;span class="k"&gt;key&lt;/span&gt; &lt;span class="k"&gt;default&lt;/span&gt; &lt;span class="n"&gt;gen_random_uuid&lt;/span&gt;&lt;span class="p"&gt;(),&lt;/span&gt;
  &lt;span class="n"&gt;content&lt;/span&gt;    &lt;span class="nb"&gt;text&lt;/span&gt; &lt;span class="k"&gt;not&lt;/span&gt; &lt;span class="k"&gt;null&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="n"&gt;page_number&lt;/span&gt; &lt;span class="nb"&gt;integer&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="n"&gt;embedding&lt;/span&gt;  &lt;span class="n"&gt;vector&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;1536&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="p"&gt;);&lt;/span&gt;

&lt;span class="k"&gt;create&lt;/span&gt; &lt;span class="k"&gt;index&lt;/span&gt; &lt;span class="k"&gt;on&lt;/span&gt; &lt;span class="n"&gt;document_chunks&lt;/span&gt;
  &lt;span class="k"&gt;using&lt;/span&gt; &lt;span class="n"&gt;ivfflat&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;embedding&lt;/span&gt; &lt;span class="n"&gt;vector_cosine_ops&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
  &lt;span class="k"&gt;with&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;lists&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;100&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;And querying it is just SQL:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="k"&gt;select&lt;/span&gt; &lt;span class="n"&gt;content&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;page_number&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
       &lt;span class="mi"&gt;1&lt;/span&gt; &lt;span class="o"&gt;-&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;embedding&lt;/span&gt; &lt;span class="o"&gt;&amp;lt;=&amp;gt;&lt;/span&gt; &lt;span class="err"&gt;$&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="n"&gt;similarity&lt;/span&gt;
&lt;span class="k"&gt;from&lt;/span&gt; &lt;span class="n"&gt;document_chunks&lt;/span&gt;
&lt;span class="k"&gt;order&lt;/span&gt; &lt;span class="k"&gt;by&lt;/span&gt; &lt;span class="n"&gt;embedding&lt;/span&gt; &lt;span class="o"&gt;&amp;lt;=&amp;gt;&lt;/span&gt; &lt;span class="err"&gt;$&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;
&lt;span class="k"&gt;limit&lt;/span&gt; &lt;span class="mi"&gt;5&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;code&gt;&amp;lt;=&amp;gt;&lt;/code&gt; is cosine distance. &lt;code&gt;1 - distance = similarity&lt;/code&gt;. I love how clean this is.&lt;/p&gt;




&lt;h3&gt;
  
  
  The system prompt matters more than the model
&lt;/h3&gt;

&lt;p&gt;I spent more time on the system prompt than on any other single piece of code. The difference between a well-prompted model and a poorly-prompted one is dramatic.&lt;/p&gt;

&lt;p&gt;My first prompt: "Answer questions using the provided context."&lt;/p&gt;

&lt;p&gt;Results: confidently wrong answers, hallucinated details, vague summaries.&lt;/p&gt;

&lt;p&gt;My final prompt, after many iterations:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;You are a precise document assistant. Answer questions using
ONLY the provided context chunks.

- If the answer is in the context, answer clearly.
- If it isn't, say exactly: "I could not find a clear answer
  in the uploaded documents."
- Never make up information not present in the context.
- Be concise. Prefer bullet points for multi-part answers.
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;That single word "ONLY" and the explicit fallback phrase cut hallucinations significantly. The model still reasons and synthesises — it just stays grounded.&lt;/p&gt;

&lt;p&gt;Temperature = 0.1, by the way. This isn't a creative writing task.&lt;/p&gt;




&lt;h3&gt;
  
  
  MCP was the rabbit hole I didn't expect
&lt;/h3&gt;

&lt;p&gt;Halfway through the project I read about Model Context Protocol — a way to give LLMs structured tools they can call as function calls. Search a database, query an API, fetch live data.&lt;/p&gt;

&lt;p&gt;I added two tools to my pipeline:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight javascript"&gt;&lt;code&gt;&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;tools&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;
  &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="na"&gt;type&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;function&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="na"&gt;function&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
      &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;search_documents&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
      &lt;span class="na"&gt;description&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;Search uploaded PDFs for relevant information&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
      &lt;span class="na"&gt;parameters&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="na"&gt;type&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;object&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="na"&gt;properties&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
          &lt;span class="na"&gt;query&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="na"&gt;type&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;string&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt; &lt;span class="p"&gt;}&lt;/span&gt;
        &lt;span class="p"&gt;},&lt;/span&gt;
        &lt;span class="na"&gt;required&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;query&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
      &lt;span class="p"&gt;}&lt;/span&gt;
    &lt;span class="p"&gt;}&lt;/span&gt;
  &lt;span class="p"&gt;}&lt;/span&gt;
&lt;span class="p"&gt;];&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Now the model decides &lt;em&gt;when&lt;/em&gt; to search. You can ask a multi-part question and it'll call the search tool, synthesise the results, and respond — all in one turn. No hardcoded routing logic. The model figures it out.&lt;/p&gt;

&lt;p&gt;This is when I started to understand why everyone in AI engineering talks about agents.&lt;/p&gt;




&lt;h2&gt;
  
  
  The Numbers (Real, Measured)
&lt;/h2&gt;

&lt;p&gt;I ran 50 test queries across a set of PDFs I had lying around.&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Metric&lt;/th&gt;
&lt;th&gt;Result&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;P95 response time&lt;/td&gt;
&lt;td&gt;2.8 seconds&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Average response time&lt;/td&gt;
&lt;td&gt;1.9 seconds&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Embedding cost for ~200 PDFs&lt;/td&gt;
&lt;td&gt;~$0.80&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Queries that returned correct page citation&lt;/td&gt;
&lt;td&gt;84%&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Queries where I'd say the answer was "good"&lt;/td&gt;
&lt;td&gt;~82%&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;The 18% miss rate is real and honest. It's mostly on questions that require synthesising information across many pages — a known weakness of basic RAG. Hybrid search (combining vector + keyword BM25) would improve this. That's my next experiment.&lt;/p&gt;




&lt;h2&gt;
  
  
  What I'd Tell Myself Before Starting
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;Start with fewer PDFs than you think.&lt;/strong&gt; I tried to test with 50 documents at once. Debug with 3. You'll thank yourself.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;The similarity threshold matters.&lt;/strong&gt; I filter out chunks below 30% cosine similarity before passing them to the LLM. Without this filter, irrelevant chunks confuse the model and produce vague, wishy-washy answers.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;pdf-parse is good but imperfect.&lt;/strong&gt; Scanned PDFs (images of text) return nothing — you need OCR for those. Text PDFs work great. Know your document types before you commit to a parsing strategy.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Per-page extraction from the start.&lt;/strong&gt; I approximated page numbers. It works but isn't accurate. Use pdf.js if exact page attribution matters for your use case.&lt;/p&gt;




&lt;h2&gt;
  
  
  Why This Project Was Different
&lt;/h2&gt;

&lt;p&gt;I build things constantly. APIs, dashboards, mobile apps. Most of it is satisfying in a normal way.&lt;/p&gt;

&lt;p&gt;This one was different because every piece builds on the previous one in a way that feels like a proper system — not just features bolted together. The chunker feeds the embedder. The embedder feeds the vector store. The vector store feeds the retriever. The retriever feeds the synthesiser. Change one and it ripples through everything.&lt;/p&gt;

&lt;p&gt;And the output is &lt;em&gt;intelligent&lt;/em&gt;. It reads documents and &lt;em&gt;understands&lt;/em&gt; them. I wrote the code that makes that happen, and I still find it a little bit remarkable every time I run it.&lt;/p&gt;

&lt;p&gt;If you've been curious about RAG but haven't started — start. The gap between "I understand the concept" and "I built it" is where all the real learning happens.&lt;/p&gt;




&lt;h2&gt;
  
  
  The Stack
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Backend:&lt;/strong&gt; Node.js + Express&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;PDF parsing:&lt;/strong&gt; pdf-parse&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Embeddings:&lt;/strong&gt; OpenAI text-embedding-3-small&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Vector store:&lt;/strong&gt; pgvector (Supabase free tier)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;LLM:&lt;/strong&gt; GPT-4o-mini&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Tool calling:&lt;/strong&gt; Model Context Protocol (MCP)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Frontend:&lt;/strong&gt; React + Vite + TypeScript + Tailwind CSS&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Full code is on my GitHub: &lt;strong&gt;&lt;a href="https://github.com/logout007" rel="noopener noreferrer"&gt;github.com/logout007&lt;/a&gt;&lt;/strong&gt;&lt;/p&gt;




&lt;p&gt;&lt;em&gt;I'm Pinaki Batabyal — Full Stack Developer and Technical Lead. I write about things I build,&lt;br&gt;
break, and figure out. Connect with me on &lt;a href="https://linkedin.com/in/pinaki-batabyal" rel="noopener noreferrer"&gt;LinkedIn&lt;/a&gt;&lt;br&gt;
if you're into this kind of thing.&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;&lt;em&gt;Currently exploring senior fullstack and AI engineering roles — remote or Kolkata/Bangalore.&lt;/em&gt;&lt;/p&gt;

</description>
      <category>ai</category>
      <category>javascript</category>
      <category>rag</category>
      <category>webdev</category>
    </item>
  </channel>
</rss>
