<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>DEV Community: Sharath Kurup</title>
    <description>The latest articles on DEV Community by Sharath Kurup (@sharathkurup).</description>
    <link>https://dev.to/sharathkurup</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F3855663%2F841065e1-5964-4cef-8344-4800620c456f.jpg</url>
      <title>DEV Community: Sharath Kurup</title>
      <link>https://dev.to/sharathkurup</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://dev.to/feed/sharathkurup"/>
    <language>en</language>
    <item>
      <title>Understanding RAG by Building a ChatPDF App with NumPy (Part 1)</title>
      <dc:creator>Sharath Kurup</dc:creator>
      <pubDate>Wed, 08 Apr 2026 02:21:07 +0000</pubDate>
      <link>https://dev.to/sharathkurup/understanding-rag-by-building-a-chatpdf-app-with-numpy-part-1-14p7</link>
      <guid>https://dev.to/sharathkurup/understanding-rag-by-building-a-chatpdf-app-with-numpy-part-1-14p7</guid>
      <description>&lt;h2&gt;
  
  
  🧠 Building a Chat with PDF App (From Scratch using NumPy) – Part 1
&lt;/h2&gt;

&lt;blockquote&gt;
&lt;p&gt;Turning a simple PDF into a conversational AI system using local LLMs 🚀&lt;/p&gt;
&lt;/blockquote&gt;




&lt;h2&gt;
  
  
  📌 Introduction
&lt;/h2&gt;

&lt;p&gt;Have you ever wanted to &lt;em&gt;chat with your PDF documents&lt;/em&gt; like you chat with ChatGPT?&lt;/p&gt;

&lt;p&gt;In this series, I’ll walk you through building a &lt;strong&gt;ChatPDF application from scratch&lt;/strong&gt;, starting from the absolute basics and gradually improving it into a production-ready system.&lt;/p&gt;

&lt;p&gt;👉 In this first part, we’ll build a &lt;strong&gt;naive RAG (Retrieval-Augmented Generation) system using only NumPy&lt;/strong&gt; — no FAISS, no vector databases, just pure fundamentals.&lt;/p&gt;




&lt;h2&gt;
  
  
  🎯 What We’ll Build
&lt;/h2&gt;

&lt;p&gt;By the end of this article, you’ll have:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;📄 A system that reads a PDF&lt;/li&gt;
&lt;li&gt;✂️ Splits it into meaningful chunks&lt;/li&gt;
&lt;li&gt;🔢 Converts text into embeddings using a local model&lt;/li&gt;
&lt;li&gt;🔍 Searches relevant content using vector similarity&lt;/li&gt;
&lt;li&gt;💬 Generates answers using an LLM&lt;/li&gt;
&lt;/ul&gt;




&lt;h2&gt;
  
  
  ⚙️ Tech Stack
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;code&gt;pdfplumber&lt;/code&gt; → Extract text from PDFs&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;numpy&lt;/code&gt; → Perform vector similarity search&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;ollama&lt;/code&gt; → Run local embedding + LLM models&lt;/li&gt;
&lt;/ul&gt;




&lt;h2&gt;
  
  
  🧩 How It Works (High Level)
&lt;/h2&gt;

&lt;p&gt;Our pipeline looks like this:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;PDF → Text → Chunks → Embeddings → Similarity Search → LLM → Answer
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;






&lt;h2&gt;
  
  
  📥 Step 1: Reading the PDF
&lt;/h2&gt;

&lt;p&gt;We start by extracting text page by page:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;readpdf&lt;/span&gt;&lt;span class="p"&gt;():&lt;/span&gt;
    &lt;span class="n"&gt;all_texts&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[]&lt;/span&gt;
    &lt;span class="k"&gt;with&lt;/span&gt; &lt;span class="n"&gt;pdfplumber&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;open&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;PDF_PATH&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="n"&gt;pdf&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;i&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;page&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="nf"&gt;enumerate&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;pdf&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;pages&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
            &lt;span class="n"&gt;text&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;page&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;extract_text&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt; &lt;span class="ow"&gt;or&lt;/span&gt; &lt;span class="sh"&gt;""&lt;/span&gt;
            &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="ow"&gt;not&lt;/span&gt; &lt;span class="n"&gt;text&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;strip&lt;/span&gt;&lt;span class="p"&gt;():&lt;/span&gt;
                &lt;span class="k"&gt;continue&lt;/span&gt;
            &lt;span class="n"&gt;all_texts&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;append&lt;/span&gt;&lt;span class="p"&gt;((&lt;/span&gt;&lt;span class="n"&gt;i&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;text&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;all_texts&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  🧠 What’s happening?
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;Reads each page&lt;/li&gt;
&lt;li&gt;Skips empty pages&lt;/li&gt;
&lt;li&gt;Stores &lt;code&gt;(page_number, text)&lt;/code&gt;
&lt;/li&gt;
&lt;/ul&gt;




&lt;h2&gt;
  
  
  ✂️ Step 2: Chunking the Text
&lt;/h2&gt;

&lt;p&gt;Large text doesn’t work well for embeddings or LLMs, so we split it:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;generate_chunks&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;text&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;page_num&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="n"&gt;chunks&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[]&lt;/span&gt;
    &lt;span class="n"&gt;i&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt;
    &lt;span class="k"&gt;while&lt;/span&gt; &lt;span class="n"&gt;i&lt;/span&gt; &lt;span class="o"&gt;&amp;lt;&lt;/span&gt; &lt;span class="nf"&gt;len&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;text&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
        &lt;span class="n"&gt;end&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;min&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;i&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="n"&gt;CHUNK_SIZE&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nf"&gt;len&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;text&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;
        &lt;span class="n"&gt;chunk&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;text&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;i&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="n"&gt;end&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;

        &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;end&lt;/span&gt; &lt;span class="o"&gt;&amp;lt;&lt;/span&gt; &lt;span class="nf"&gt;len&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;text&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
            &lt;span class="n"&gt;last_space&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;chunk&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;rfind&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt; &lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
            &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;last_space&lt;/span&gt; &lt;span class="o"&gt;!=&lt;/span&gt; &lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
                &lt;span class="n"&gt;end&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;i&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="n"&gt;last_space&lt;/span&gt;
                &lt;span class="n"&gt;chunk&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;text&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;i&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="n"&gt;end&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;

        &lt;span class="n"&gt;chunks&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;append&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;text&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;chunk&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;strip&lt;/span&gt;&lt;span class="p"&gt;(),&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;page&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;page_num&lt;/span&gt;&lt;span class="p"&gt;})&lt;/span&gt;

        &lt;span class="n"&gt;i&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;end&lt;/span&gt; &lt;span class="o"&gt;-&lt;/span&gt; &lt;span class="n"&gt;OVERLAP_SIZE&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  🧠 Why overlap?
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;Prevents context loss between chunks&lt;/li&gt;
&lt;li&gt;Helps LLM understand continuity&lt;/li&gt;
&lt;/ul&gt;




&lt;h2&gt;
  
  
  🔢 Step 3: Generating Embeddings
&lt;/h2&gt;

&lt;p&gt;We convert text into vectors using Ollama:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;generate_embeddings_batch&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;texts&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="n"&gt;all_embeddings&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[]&lt;/span&gt;
    &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;i&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="nf"&gt;range&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nf"&gt;len&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;texts&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt; &lt;span class="n"&gt;BATCH_SIZE&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
        &lt;span class="n"&gt;batch_texts&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;texts&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;i&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="n"&gt;i&lt;/span&gt;&lt;span class="o"&gt;+&lt;/span&gt;&lt;span class="n"&gt;BATCH_SIZE&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
        &lt;span class="n"&gt;response&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;ollama&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;embed&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;model&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;EMBED_MODEL&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nb"&gt;input&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;batch_texts&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="n"&gt;all_embeddings&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;extend&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;response&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;embeddings&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;])&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;all_embeddings&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  🧠 Why batching?
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;Faster processing&lt;/li&gt;
&lt;li&gt;Efficient use of resources&lt;/li&gt;
&lt;/ul&gt;




&lt;h2&gt;
  
  
  📏 Step 4: Similarity Search (Core Logic)
&lt;/h2&gt;

&lt;p&gt;Here’s where NumPy shines:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="n"&gt;similarities&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;np&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;dot&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;vector_db&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;query_vector&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;top_indices&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;np&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;argsort&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;similarities&lt;/span&gt;&lt;span class="p"&gt;)[&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="n"&gt;TOP_K&lt;/span&gt;&lt;span class="p"&gt;:][::&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  🧠 What’s happening?
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;We compute &lt;strong&gt;dot product similarity&lt;/strong&gt;
&lt;/li&gt;
&lt;li&gt;Higher score = more relevant chunk&lt;/li&gt;
&lt;li&gt;Select top K results&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;👉 This is essentially a &lt;strong&gt;manual vector database using NumPy&lt;/strong&gt;&lt;/p&gt;




&lt;h2&gt;
  
  
  💬 Step 5: Generate Answer using LLM
&lt;/h2&gt;

&lt;p&gt;We pass retrieved chunks as context:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;generate_answer&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;query&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;chunks&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="n"&gt;context_chunks&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="se"&gt;\n\n&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;join&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;chunks&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="n"&gt;prompt&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"""&lt;/span&gt;&lt;span class="s"&gt;
Context:
&lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;context_chunks&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s"&gt;

Question:
&lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;query&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s"&gt;

Answer:
&lt;/span&gt;&lt;span class="sh"&gt;"""&lt;/span&gt;
    &lt;span class="n"&gt;response&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;ollama&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;generate&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;model&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;THINKING_MODEL&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;prompt&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;prompt&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;response&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;response&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  🧠 Key Idea
&lt;/h3&gt;

&lt;p&gt;We’re doing &lt;strong&gt;RAG (Retrieval-Augmented Generation)&lt;/strong&gt;:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Retrieval → relevant chunks&lt;/li&gt;
&lt;li&gt;Generation → LLM response&lt;/li&gt;
&lt;/ul&gt;




&lt;h2&gt;
  
  
  🔁 Step 6: Interactive Chat Loop
&lt;/h2&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;chat_pdf&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;vector_db&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;text_metadata&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="k"&gt;while&lt;/span&gt; &lt;span class="bp"&gt;True&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="n"&gt;user_query&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;input&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;You - &lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="n"&gt;results&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;search&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;user_query&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;vector_db&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;text_metadata&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

        &lt;span class="n"&gt;context_llm&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;res&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;text&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;res&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;results&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
        &lt;span class="n"&gt;response&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;generate_answer&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;user_query&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;context_llm&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

        &lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;response&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Now you can literally:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;You - What is the main topic?
AI  - ...
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;






&lt;h2&gt;
  
  
  🔍 Bonus: Embedding Normalization Check
&lt;/h2&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="n"&gt;norms&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;np&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;linalg&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;norm&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;embeddings_array&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;axis&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  🧠 Why this matters?
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;If vectors are normalized → dot product ≈ cosine similarity&lt;/li&gt;
&lt;li&gt;Improves consistency in search results&lt;/li&gt;
&lt;/ul&gt;




&lt;h2&gt;
  
  
  🚨 Limitations of This Approach
&lt;/h2&gt;

&lt;p&gt;This implementation is intentionally simple — and that comes with trade-offs:&lt;/p&gt;

&lt;h3&gt;
  
  
  ⚠️ 1. Slow Search for Large PDFs
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;NumPy scans &lt;strong&gt;every vector&lt;/strong&gt;
&lt;/li&gt;
&lt;li&gt;No indexing → O(n) search&lt;/li&gt;
&lt;/ul&gt;




&lt;h3&gt;
  
  
  ⚠️ 2. Not Scalable
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;Works fine for small docs&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;Breaks down with:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Large PDFs&lt;/li&gt;
&lt;li&gt;Multiple documents&lt;/li&gt;
&lt;/ul&gt;


&lt;/li&gt;

&lt;/ul&gt;




&lt;h3&gt;
  
  
  ⚠️ 3. No Persistent Storage
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;Embeddings are generated every run&lt;/li&gt;
&lt;li&gt;No caching or database&lt;/li&gt;
&lt;/ul&gt;




&lt;h3&gt;
  
  
  ⚠️ 4. Limited Retrieval Quality
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;Pure similarity search&lt;/li&gt;
&lt;li&gt;No reranking, filtering, or hybrid search&lt;/li&gt;
&lt;/ul&gt;




&lt;h3&gt;
  
  
  ⚠️ 5. Context Limitation
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;Only &lt;code&gt;TOP_K&lt;/code&gt; chunks used&lt;/li&gt;
&lt;li&gt;May miss important information&lt;/li&gt;
&lt;/ul&gt;




&lt;h2&gt;
  
  
  🧠 What You Learned
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;How RAG works under the hood&lt;/li&gt;
&lt;li&gt;How embeddings enable semantic search&lt;/li&gt;
&lt;li&gt;How to build a vector search using NumPy&lt;/li&gt;
&lt;li&gt;How LLMs use context to answer questions&lt;/li&gt;
&lt;/ul&gt;




&lt;h2&gt;
  
  
  🔜 What’s Next?
&lt;/h2&gt;

&lt;p&gt;In &lt;strong&gt;Part 2&lt;/strong&gt;, we’ll upgrade this system by replacing NumPy search with:&lt;/p&gt;

&lt;p&gt;➡️ &lt;strong&gt;FAISS (Facebook AI Similarity Search)&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;This will give us:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;⚡ Faster retrieval&lt;/li&gt;
&lt;li&gt;📈 Better scalability&lt;/li&gt;
&lt;li&gt;🧠 Efficient indexing&lt;/li&gt;
&lt;/ul&gt;




&lt;h2&gt;
  
  
  📂 Project Repo
&lt;/h2&gt;

&lt;p&gt;👉 GitHub: &lt;a href="https://github.com/SharathKurup/chatPDF/tree/numpy_vector" rel="noopener noreferrer"&gt;https://github.com/SharathKurup/chatPDF/tree/numpy_vector&lt;/a&gt;&lt;/p&gt;




&lt;h2&gt;
  
  
  💬 Final Thoughts
&lt;/h2&gt;

&lt;p&gt;This is the &lt;strong&gt;most important step&lt;/strong&gt; in understanding RAG systems:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;Before using fancy tools like FAISS or vector DBs,&lt;br&gt;
you should understand what’s happening underneath.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;Once you get this, everything else becomes easy.&lt;/p&gt;




&lt;p&gt;If you're building something similar or experimenting with local LLMs, I’d love to hear your thoughts 👇&lt;/p&gt;




&lt;p&gt;&lt;strong&gt;Stay tuned for Part 2 🚀&lt;/strong&gt;&lt;/p&gt;

</description>
      <category>ai</category>
      <category>python</category>
      <category>rag</category>
      <category>tutorial</category>
    </item>
  </channel>
</rss>
