<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>DEV Community: Sourav Dwivedi</title>
    <description>The latest articles on DEV Community by Sourav Dwivedi (@srvdwivedi).</description>
    <link>https://dev.to/srvdwivedi</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F539041%2F888f31e9-c9a6-4a58-8b44-438f5ef1f79f.jpeg</url>
      <title>DEV Community: Sourav Dwivedi</title>
      <link>https://dev.to/srvdwivedi</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://dev.to/feed/srvdwivedi"/>
    <language>en</language>
    <item>
      <title>Building a Simple RAG Document Assistant with LangChain and GPT</title>
      <dc:creator>Sourav Dwivedi</dc:creator>
      <pubDate>Sun, 15 Mar 2026 10:52:19 +0000</pubDate>
      <link>https://dev.to/srvdwivedi/building-a-simple-rag-document-assistant-with-langchain-and-gpt-2a7l</link>
      <guid>https://dev.to/srvdwivedi/building-a-simple-rag-document-assistant-with-langchain-and-gpt-2a7l</guid>
      <description>&lt;p&gt;Large Language Models are great at generating text and answering general&lt;br&gt;
questions. However, they struggle when we ask questions about &lt;strong&gt;specific&lt;br&gt;
documents they have never seen before&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;For example:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;  What are the key insights in this PDF report?&lt;/li&gt;
&lt;li&gt;  Can you summarize section 3 of this document?&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;LLMs alone cannot reliably answer these questions because they &lt;strong&gt;do not&lt;br&gt;
have access to your private or custom data&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;This is where &lt;strong&gt;Retrieval Augmented Generation (RAG)&lt;/strong&gt; comes in.&lt;/p&gt;

&lt;p&gt;In this article, I will walk through how I built a &lt;strong&gt;RAG-based document&lt;br&gt;
assistant&lt;/strong&gt; using:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;  Python&lt;/li&gt;
&lt;li&gt;  LangChain&lt;/li&gt;
&lt;li&gt;  OpenAI GPT&lt;/li&gt;
&lt;li&gt;  Chroma Vector Database&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The result is a system that allows users to &lt;strong&gt;chat with their&lt;br&gt;
documents&lt;/strong&gt;.&lt;/p&gt;


&lt;h2&gt;
  
  
  What We Are Building
&lt;/h2&gt;

&lt;p&gt;We are creating a document assistant that:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;  Loads a PDF document&lt;/li&gt;
&lt;li&gt;  Breaks it into smaller chunks&lt;/li&gt;
&lt;li&gt;  Converts the chunks into embeddings&lt;/li&gt;
&lt;li&gt;  Stores them in a vector database&lt;/li&gt;
&lt;li&gt;  Retrieves relevant chunks when a user asks a question&lt;/li&gt;
&lt;li&gt;  Uses an LLM to generate an answer based on the retrieved content&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Instead of manually searching through documents, you can simply ask:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;What does this document say about AI in healthcare?
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;


&lt;p&gt;And receive an answer instantly.&lt;/p&gt;


&lt;h2&gt;
  
  
  Understanding RAG
&lt;/h2&gt;

&lt;p&gt;RAG stands for &lt;strong&gt;Retrieval Augmented Generation&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;Instead of sending the entire document to an LLM, the system:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt; Retrieves relevant information&lt;/li&gt;
&lt;li&gt; Sends that context to the model&lt;/li&gt;
&lt;li&gt; Generates an answer based on the retrieved content&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;This approach improves:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;  accuracy&lt;/li&gt;
&lt;li&gt;  relevance&lt;/li&gt;
&lt;li&gt;  scalability&lt;/li&gt;
&lt;/ul&gt;


&lt;h2&gt;
  
  
  System Architecture
&lt;/h2&gt;
&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;PDF Document
      ↓
Document Loader
      ↓
Text Splitter
      ↓
Embeddings
      ↓
Vector Database (Chroma)
      ↓
Retriever
      ↓
LLM (GPT)
      ↓
Generated Answer
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;
&lt;p&gt;Each component contributes to the RAG pipeline.&lt;/p&gt;


&lt;h2&gt;
  
  
  Step 1: Loading the Document
&lt;/h2&gt;


&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;langchain_community.document_loaders&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;PyPDFLoader&lt;/span&gt;

&lt;span class="n"&gt;loader&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;PyPDFLoader&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;data/sample.pdf&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;documents&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;loader&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;load&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;


&lt;p&gt;This converts the PDF into text that our system can process.&lt;/p&gt;


&lt;h2&gt;
  
  
  Step 2: Splitting the Document
&lt;/h2&gt;


&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;langchain.text_splitter&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;RecursiveCharacterTextSplitter&lt;/span&gt;

&lt;span class="n"&gt;text_splitter&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;RecursiveCharacterTextSplitter&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;chunk_size&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;500&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;chunk_overlap&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;100&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="n"&gt;docs&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;text_splitter&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;split_documents&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;documents&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;


&lt;p&gt;Chunking improves retrieval accuracy and efficiency.&lt;/p&gt;


&lt;h2&gt;
  
  
  Step 3: Creating Embeddings
&lt;/h2&gt;


&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;langchain_openai&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;OpenAIEmbeddings&lt;/span&gt;

&lt;span class="n"&gt;embeddings&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;OpenAIEmbeddings&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;model&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;text-embedding-3-small&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;


&lt;p&gt;Each document chunk becomes a numeric vector representation.&lt;/p&gt;


&lt;h2&gt;
  
  
  Step 4: Storing in a Vector Database
&lt;/h2&gt;


&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;langchain_community.vectorstores&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;Chroma&lt;/span&gt;

&lt;span class="n"&gt;vectorstore&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;Chroma&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;from_documents&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;docs&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;embeddings&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;persist_directory&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;vectordb&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;


&lt;p&gt;The database allows efficient similarity search.&lt;/p&gt;


&lt;h2&gt;
  
  
  Step 5: Retrieving Relevant Context
&lt;/h2&gt;


&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="n"&gt;retriever&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;vectorstore&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;as_retriever&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;search_kwargs&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;k&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="mi"&gt;4&lt;/span&gt;&lt;span class="p"&gt;})&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;


&lt;p&gt;These retrieved chunks provide the context for the LLM.&lt;/p&gt;


&lt;h2&gt;
  
  
  Step 6: Generating Answers with GPT
&lt;/h2&gt;


&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;langchain_openai&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;ChatOpenAI&lt;/span&gt;

&lt;span class="n"&gt;llm&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;ChatOpenAI&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;model&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;gpt-4o-mini&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;temperature&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;


&lt;p&gt;This model is fast and cost‑effective, making it suitable for RAG&lt;br&gt;
systems.&lt;/p&gt;


&lt;h2&gt;
  
  
  Adding Conversational Memory
&lt;/h2&gt;


&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;langchain.memory&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;ConversationBufferMemory&lt;/span&gt;

&lt;span class="n"&gt;memory&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;ConversationBufferMemory&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;memory_key&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;chat_history&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;return_messages&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="bp"&gt;True&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;


&lt;p&gt;This allows the assistant to remember previous questions and responses.&lt;/p&gt;

&lt;p&gt;Example conversation:&lt;/p&gt;
&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;User: What is this document about?

Assistant: It discusses AI applications in healthcare.

User: What challenges are mentioned?
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;

&lt;h2&gt;
  
  
  Example Interaction
&lt;/h2&gt;
&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;User: What does the document say about predictive analytics?

Assistant:
The document explains that predictive analytics uses machine learning
to forecast patient outcomes and identify individuals at risk of
hospital readmission.
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;

&lt;h2&gt;
  
  
  Technologies Used
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;  Python\&lt;/li&gt;
&lt;li&gt;  LangChain\&lt;/li&gt;
&lt;li&gt;  OpenAI GPT\&lt;/li&gt;
&lt;li&gt;  Chroma Vector Database\&lt;/li&gt;
&lt;li&gt;  Sentence Transformers\&lt;/li&gt;
&lt;li&gt;  PyPDF&lt;/li&gt;
&lt;/ul&gt;


&lt;h2&gt;
  
  
  Key Takeaways
&lt;/h2&gt;

&lt;p&gt;Building this project helped demonstrate several important concepts:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;  Retrieval Augmented Generation improves LLM accuracy\&lt;/li&gt;
&lt;li&gt;  Embeddings enable semantic search\&lt;/li&gt;
&lt;li&gt;  Vector databases store document knowledge efficiently\&lt;/li&gt;
&lt;li&gt;  Conversational memory enhances user interaction&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Combining these technologies allows developers to build powerful AI&lt;br&gt;
applications that can interact with real‑world data.&lt;/p&gt;


&lt;h2&gt;
  
  
  Future Improvements
&lt;/h2&gt;

&lt;p&gt;Potential improvements include:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;  Adding a web interface using Streamlit\&lt;/li&gt;
&lt;li&gt;  Supporting multiple documents\&lt;/li&gt;
&lt;li&gt;  Including source citations in responses\&lt;/li&gt;
&lt;li&gt;  Implementing hybrid search\&lt;/li&gt;
&lt;li&gt;  Deploying the assistant as an API&lt;/li&gt;
&lt;/ul&gt;


&lt;h2&gt;
  
  
  Conclusion
&lt;/h2&gt;

&lt;p&gt;Retrieval Augmented Generation is an important architecture for building&lt;br&gt;
AI systems that work with external knowledge.&lt;/p&gt;

&lt;p&gt;By combining document retrieval with language models, we can create&lt;br&gt;
systems that transform static documents into &lt;strong&gt;interactive knowledge&lt;br&gt;
assistants&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;This project demonstrates how a relatively simple pipeline can unlock&lt;br&gt;
powerful capabilities for &lt;strong&gt;document understanding and conversational&lt;br&gt;
AI&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;

&lt;/p&gt;
&lt;div class="ltag-github-readme-tag"&gt;
  &lt;div class="readme-overview"&gt;
    &lt;h2&gt;
      &lt;img src="https://assets.dev.to/assets/github-logo-5a155e1f9a670af7944dd5e12375bc76ed542ea80224905ecaf878b9157cdefc.svg" alt="GitHub logo"&gt;
      &lt;a href="https://github.com/srvdwivedi" rel="noopener noreferrer"&gt;
        srvdwivedi
      &lt;/a&gt; / &lt;a href="https://github.com/srvdwivedi/rag_poc" rel="noopener noreferrer"&gt;
        rag_poc
      &lt;/a&gt;
    &lt;/h2&gt;
    &lt;h3&gt;
      This project is a Retrieval Augmented Generation (RAG) based document assistant that allows users to interact with a PDF document using natural language.  Instead of manually searching through documents, users can ask questions and receive context-aware answers generated by an LLM based on the document content.
    &lt;/h3&gt;
  &lt;/div&gt;
  &lt;div class="ltag-github-body"&gt;
    
&lt;div id="readme" class="md"&gt;
&lt;p&gt;📄 RAG Document Assistant (LangChain + OpenAI)&lt;/p&gt;
&lt;p&gt;This project is a Retrieval Augmented Generation (RAG) based document assistant that allows users to interact with a PDF document using natural language.&lt;/p&gt;
&lt;p&gt;Instead of manually searching through documents, users can ask questions and receive context-aware answers generated by an LLM based on the document content.&lt;/p&gt;
&lt;p&gt;The system combines LangChain, OpenAI GPT, embeddings, and a vector database to retrieve relevant information and generate accurate responses.&lt;/p&gt;
&lt;p&gt;🚀 Features&lt;/p&gt;
&lt;p&gt;📑 Load and process PDF documents&lt;/p&gt;
&lt;p&gt;✂️ Split documents into manageable chunks&lt;/p&gt;
&lt;p&gt;🧠 Convert text into embeddings for semantic search&lt;/p&gt;
&lt;p&gt;📦 Store embeddings in a Chroma vector database&lt;/p&gt;
&lt;p&gt;🔎 Retrieve relevant document chunks for queries&lt;/p&gt;
&lt;p&gt;💬 Conversational question answering with memory&lt;/p&gt;
&lt;p&gt;⚡ Fast responses using GPT-4o-mini&lt;/p&gt;
&lt;p&gt;🧠 How RAG Works&lt;/p&gt;
&lt;p&gt;The system follows this pipeline:&lt;/p&gt;
&lt;p&gt;PDF Document
↓
Document Loader
↓
Text Splitter
↓
Embeddings
↓
Vector Database (Chroma)
↓
Retriever
↓
LLM (GPT)
↓
Answer…&lt;/p&gt;
&lt;/div&gt;
  &lt;/div&gt;
  &lt;div class="gh-btn-container"&gt;&lt;a class="gh-btn" href="https://github.com/srvdwivedi/rag_poc" rel="noopener noreferrer"&gt;View on GitHub&lt;/a&gt;&lt;/div&gt;
&lt;/div&gt;





</description>
      <category>llm</category>
      <category>python</category>
      <category>rag</category>
      <category>tutorial</category>
    </item>
  </channel>
</rss>
