<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>DEV Community: Sanjana medidi</title>
    <description>The latest articles on DEV Community by Sanjana medidi (@sanjana_medidi_7cd09b177b).</description>
    <link>https://dev.to/sanjana_medidi_7cd09b177b</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.us-east-2.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F4002023%2F713829b4-1b79-4855-8b18-167bf849e723.jpg</url>
      <title>DEV Community: Sanjana medidi</title>
      <link>https://dev.to/sanjana_medidi_7cd09b177b</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://dev.to/feed/sanjana_medidi_7cd09b177b"/>
    <language>en</language>
    <item>
      <title>Building a RAG-Based PDF Question Answering System: Engineering Decisions, Failures, and Lessons</title>
      <dc:creator>Sanjana medidi</dc:creator>
      <pubDate>Thu, 25 Jun 2026 09:23:41 +0000</pubDate>
      <link>https://dev.to/sanjana_medidi_7cd09b177b/building-a-rag-based-pdf-question-answering-system-engineering-decisions-failures-and-lessons-ae1</link>
      <guid>https://dev.to/sanjana_medidi_7cd09b177b/building-a-rag-based-pdf-question-answering-system-engineering-decisions-failures-and-lessons-ae1</guid>
      <description>&lt;ul&gt;
&lt;li&gt;A technical deep-dive into StudyMate AI — a Retrieval-Augmented Generation system built with LangChain, FAISS, HuggingFace, and Groq. *&lt;/li&gt;
&lt;/ul&gt;




&lt;p&gt;As an AI/ML student preparing applications for research internships at companies like Google, I wanted to build something that went beyond the typical classifier or fine-tuning demo. I wanted a project that demonstrated systems thinking — not just model calling. The result was &lt;strong&gt;StudyMate AI&lt;/strong&gt;: a RAG pipeline that lets you upload any PDF and ask questions about it, grounded strictly in the document's content.&lt;/p&gt;

&lt;p&gt;This post documents the real engineering decisions I made, the problems I ran into, and what I learned — including the parts that didn't work the first time.&lt;/p&gt;




&lt;h2&gt;
  
  
  What is RAG and Why Use It?
&lt;/h2&gt;

&lt;p&gt;Retrieval-Augmented Generation (RAG) is a pattern where instead of asking an LLM to answer from memory, you first retrieve relevant context from a knowledge source and inject it into the prompt. This means:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;The model only answers from what's in your document&lt;/li&gt;
&lt;li&gt;You get source attribution&lt;/li&gt;
&lt;li&gt;Hallucinations are dramatically reduced&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The alternative — fine-tuning an LLM on your documents — is expensive, slow, and overkill for a single-document use case. RAG was the right architectural choice here.&lt;/p&gt;




&lt;h2&gt;
  
  
  System Architecture
&lt;/h2&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;PDF
 ↓
PyPDFLoader
 ↓
RecursiveCharacterTextSplitter  (chunk_size=800, overlap=100)
 ↓
HuggingFace Embeddings (all-MiniLM-L6-v2)  — runs locally
 ↓
FAISS Vector Store  (in-memory)
 ↓
Custom Two-Stage Retriever
(first_page_chunks + summary chunks + content chunks + broad search)
 ↓
Groq LLM  (llama-3.1-8b-instant)
 ↓
Answer
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;






&lt;h2&gt;
  
  
  Engineering Decision 1: HuggingFace Embeddings over OpenAI
&lt;/h2&gt;

&lt;p&gt;The first decision was how to embed the document chunks. OpenAI's embedding API is the popular choice, but it's pay-per-token — during development and testing, that cost accumulates quickly.&lt;/p&gt;

&lt;p&gt;HuggingFace's &lt;code&gt;all-MiniLM-L6-v2&lt;/code&gt; runs &lt;strong&gt;locally on your machine&lt;/strong&gt;, costs nothing, and requires no API key. For a single-user, single-PDF system, the performance tradeoff is negligible. This is the kind of decision that matters at scale — choosing the right tool for the actual constraints, not the most popular one.&lt;/p&gt;




&lt;h2&gt;
  
  
  Engineering Decision 2: FAISS over Hosted Vector Databases
&lt;/h2&gt;

&lt;p&gt;Pinecone and Weaviate are the production choices for vector storage. They offer persistence across sessions, horizontal scaling, and multi-user support. None of that was needed here.&lt;/p&gt;

&lt;p&gt;FAISS runs &lt;strong&gt;in-memory with zero setup cost&lt;/strong&gt;. For one user processing one PDF at a time, it's the correct tradeoff. The rule I applied: use the simplest thing that satisfies your actual constraints. Reach for hosted infrastructure when you need persistence, concurrency, or datasets too large for memory — not before.&lt;/p&gt;




&lt;h2&gt;
  
  
  Engineering Decision 3: Migrating from &lt;code&gt;RetrievalQA&lt;/code&gt; to LCEL
&lt;/h2&gt;

&lt;p&gt;During development I noticed LangChain had deprecated &lt;code&gt;RetrievalQA&lt;/code&gt;. Rather than ignore the warning and ship deprecated code, I migrated to the current &lt;strong&gt;LCEL (LangChain Expression Language)&lt;/strong&gt; chain composition pattern.&lt;/p&gt;

&lt;p&gt;The old approach was a black box. The new approach is explicit:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="n"&gt;retrieval_chain&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;RunnablePassthrough&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;assign&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
        &lt;span class="n"&gt;context&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="nc"&gt;RunnableLambda&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;retrieve_with_summary&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;|&lt;/span&gt; &lt;span class="n"&gt;format_docs&lt;/span&gt;
    &lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="o"&gt;|&lt;/span&gt; &lt;span class="n"&gt;prompt&lt;/span&gt;
    &lt;span class="o"&gt;|&lt;/span&gt; &lt;span class="n"&gt;llm&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Every step is visible — retrieval, formatting, prompting, generation. This matters for debugging and for understanding what the system is actually doing.&lt;/p&gt;




&lt;h2&gt;
  
  
  The Problem Vanilla RAG Can't Solve
&lt;/h2&gt;

&lt;p&gt;This is where it got interesting.&lt;/p&gt;

&lt;p&gt;After building the basic pipeline, I tested it with: &lt;em&gt;"What is the main purpose of this document?"&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;The response: &lt;strong&gt;"I cannot find the answer in the provided documents."&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;But the document's purpose was clearly stated in the abstract. What went wrong?&lt;/p&gt;

&lt;p&gt;Similarity search surfaces &lt;strong&gt;locally similar chunks&lt;/strong&gt; — chunks whose text is semantically close to the query. A query about "purpose" doesn't semantically match individual chunks about methodology or findings, even though the answer exists in the document.&lt;/p&gt;

&lt;p&gt;Vanilla RAG is optimized for specific factual questions. Document-level questions — purpose, thesis, overview — require a &lt;strong&gt;global view&lt;/strong&gt; of the document that chunk-level retrieval can't provide.&lt;/p&gt;




&lt;h2&gt;
  
  
  The Fix: Pre-Generated Summaries + First-Page Pinning
&lt;/h2&gt;

&lt;p&gt;I solved this with two additions:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;1. Pre-generated summary chunks&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;At build time, before any user query, I generate 5 targeted summaries from the first 2,500 characters of the document and store them as special chunks in the vector store:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="n"&gt;summaries_to_create&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;research_question&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;What is the exact research question?&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;methodology&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;       &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Describe the methodology in one sentence.&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;findings&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;          &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;What are the main findings in one sentence?&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;conclusions&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;       &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;What are the conclusions in one sentence?&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;limitations&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;       &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;What are the limitations in one sentence?&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;These give the retriever a global view of the document that similarity search alone can't provide.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;2. First-page pinning&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Pages 0 and 1 of any academic document almost always contain the abstract and introduction — where purpose and topic live. I pin these as always-included context regardless of the query:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;retrieve_with_summary&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;inputs&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="n"&gt;query&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;inputs&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;input&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="nf"&gt;isinstance&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;inputs&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nb"&gt;dict&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;else&lt;/span&gt; &lt;span class="n"&gt;inputs&lt;/span&gt;

    &lt;span class="n"&gt;summary_results&lt;/span&gt;    &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;vector_store&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;similarity_search&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;query&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;k&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;2&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nb"&gt;filter&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;chunk_type&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;summary&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;})&lt;/span&gt;
    &lt;span class="n"&gt;content_results&lt;/span&gt;    &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;vector_store&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;similarity_search&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;query&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;k&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;3&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nb"&gt;filter&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;chunk_type&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;content&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;})&lt;/span&gt;
    &lt;span class="n"&gt;broad_results&lt;/span&gt;      &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;vector_store&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;similarity_search&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;query&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;k&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;2&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="n"&gt;first_page_chunks&lt;/span&gt;  &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;c&lt;/span&gt; &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;c&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;chunks&lt;/span&gt; &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;c&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;metadata&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;get&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;page&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;99&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;)]&lt;/span&gt;

    &lt;span class="n"&gt;seen&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;all_docs&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;set&lt;/span&gt;&lt;span class="p"&gt;(),&lt;/span&gt; &lt;span class="p"&gt;[]&lt;/span&gt;
    &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;doc&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;first_page_chunks&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="n"&gt;summary_results&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="n"&gt;content_results&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="n"&gt;broad_results&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;doc&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;page_content&lt;/span&gt; &lt;span class="ow"&gt;not&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;seen&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
            &lt;span class="n"&gt;seen&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;add&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;doc&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;page_content&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
            &lt;span class="n"&gt;all_docs&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;append&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;doc&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;all_docs&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;After this fix, document-level questions worked correctly.&lt;/p&gt;




&lt;h2&gt;
  
  
  Hitting Groq's Rate Limit — and Designing Around It
&lt;/h2&gt;

&lt;p&gt;Groq's free tier allows 6,000 tokens per minute (TPM). My initial implementation used &lt;code&gt;ThreadPoolExecutor&lt;/code&gt; with multiple workers to generate summaries in parallel. The result: all 5 API calls fired within milliseconds of each other, consuming ~5,000 tokens in one second and triggering a 429 error immediately.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight console"&gt;&lt;code&gt;&lt;span class="go"&gt;Rate limit reached: Limit 6000, Used 5881, Requested 3378.
Please try again in 32.59s.
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This is a real distributed systems constraint — and solving it required thinking about the problem like a systems engineer, not just a model user.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Solution:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;code&gt;max_workers=1&lt;/code&gt; — sequential generation eliminates the burst&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;time.sleep(35)&lt;/code&gt; between calls — 35s gives a safe buffer above the 32.59s reset window&lt;/li&gt;
&lt;li&gt;Input capped at &lt;code&gt;[:2500]&lt;/code&gt; characters — keeps each prompt to ~150 tokens, so 5 summaries stay well within the TPM limit&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The tradeoff is ~3 minutes of startup time on the free tier. On Groq's Dev tier (30,000 TPM), the sleep can be removed entirely and workers restored — startup drops to under 10 seconds.&lt;/p&gt;




&lt;h2&gt;
  
  
  Hallucination Guardrail
&lt;/h2&gt;

&lt;p&gt;The system prompt strictly instructs the model to refuse answering if the context doesn't support it:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Strict Rules:
1. Rely ONLY on the clear facts directly mentioned in the context.
2. Do NOT assume, extrapolate, or bring in outside knowledge.
3. If the context does not contain the answer, reply exactly:
   "I cannot find the answer in the provided documents."
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Test result with an out-of-scope question:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;&lt;strong&gt;Q: What is the capital of France?&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;I cannot find the answer in the provided documents.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://dev.topage%203"&gt;Source 1&lt;/a&gt; — The EUROCALL Review, Volume 25, No. 2, September 2017...&lt;br&gt;
&lt;a href="https://dev.topage%202"&gt;Source 2&lt;/a&gt; — ...research question, description of participants...&lt;br&gt;
&lt;a href="https://dev.topage%208"&gt;Source 3&lt;/a&gt; — ...referred their students to electronic or online resources...&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;The system correctly refuses rather than hallucinating, and returns the chunks it did find — making the reasoning transparent.&lt;/p&gt;




&lt;h2&gt;
  
  
  What I Learned
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;Vanilla RAG is not enough for document-level questions.&lt;/strong&gt; Chunk similarity search is optimized for factual, specific queries. Any question requiring a global view of the document — purpose, thesis, summary — needs a separate strategy: pre-summarization, large-k retrieval, or a dedicated summary index.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Rate limits are a systems design problem.&lt;/strong&gt; The solution isn't just adding a sleep — it's understanding the constraint (TPM budget), calculating the safe parameters (tokens per call × calls per minute), and designing the pipeline around them.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Read the deprecation warnings.&lt;/strong&gt; &lt;code&gt;RetrievalQA&lt;/code&gt; and &lt;code&gt;langchain-community&lt;/code&gt; both flagged deprecation during development. Ignoring them is technical debt. Evaluating them — deciding when to migrate and when to defer — is engineering judgment.&lt;/p&gt;




&lt;h2&gt;
  
  
  Stack
&lt;/h2&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Component&lt;/th&gt;
&lt;th&gt;Choice&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Frontend&lt;/td&gt;
&lt;td&gt;Streamlit&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;LLM&lt;/td&gt;
&lt;td&gt;Groq (llama-3.1-8b-instant)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Embeddings&lt;/td&gt;
&lt;td&gt;HuggingFace all-MiniLM-L6-v2&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Vector Store&lt;/td&gt;
&lt;td&gt;FAISS&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Chain&lt;/td&gt;
&lt;td&gt;LangChain LCEL&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;




&lt;h2&gt;
  
  
  Links
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;GitHub:&lt;/strong&gt; &lt;a href="https://github.com/Sanjana-Medidi/studymate-ai" rel="noopener noreferrer"&gt;github.com/Sanjana-Medidi/studymate-ai&lt;/a&gt;
&lt;/li&gt;
&lt;/ul&gt;




&lt;p&gt;&lt;em&gt;Built as part of my AI/ML portfolio while preparing research internship applications. Feedback welcome.&lt;/em&gt;&lt;/p&gt;

</description>
      <category>machinelearning</category>
      <category>langchain</category>
      <category>nlp</category>
      <category>rag</category>
    </item>
  </channel>
</rss>
