DEV Community

Cover image for Building a Simple RAG Document Assistant with LangChain and GPT
Sourav Dwivedi
Sourav Dwivedi

Posted on

Building a Simple RAG Document Assistant with LangChain and GPT

Large Language Models are great at generating text and answering general
questions. However, they struggle when we ask questions about specific
documents they have never seen before
.

For example:

  • What are the key insights in this PDF report?
  • Can you summarize section 3 of this document?

LLMs alone cannot reliably answer these questions because they do not
have access to your private or custom data
.

This is where Retrieval Augmented Generation (RAG) comes in.

In this article, I will walk through how I built a RAG-based document
assistant
using:

  • Python
  • LangChain
  • OpenAI GPT
  • Chroma Vector Database

The result is a system that allows users to chat with their
documents
.


What We Are Building

We are creating a document assistant that:

  • Loads a PDF document
  • Breaks it into smaller chunks
  • Converts the chunks into embeddings
  • Stores them in a vector database
  • Retrieves relevant chunks when a user asks a question
  • Uses an LLM to generate an answer based on the retrieved content

Instead of manually searching through documents, you can simply ask:

What does this document say about AI in healthcare?
Enter fullscreen mode Exit fullscreen mode

And receive an answer instantly.


Understanding RAG

RAG stands for Retrieval Augmented Generation.

Instead of sending the entire document to an LLM, the system:

  1. Retrieves relevant information
  2. Sends that context to the model
  3. Generates an answer based on the retrieved content

This approach improves:

  • accuracy
  • relevance
  • scalability

System Architecture

PDF Document
      ↓
Document Loader
      ↓
Text Splitter
      ↓
Embeddings
      ↓
Vector Database (Chroma)
      ↓
Retriever
      ↓
LLM (GPT)
      ↓
Generated Answer
Enter fullscreen mode Exit fullscreen mode

Each component contributes to the RAG pipeline.


Step 1: Loading the Document

from langchain_community.document_loaders import PyPDFLoader

loader = PyPDFLoader("data/sample.pdf")
documents = loader.load()
Enter fullscreen mode Exit fullscreen mode

This converts the PDF into text that our system can process.


Step 2: Splitting the Document

from langchain.text_splitter import RecursiveCharacterTextSplitter

text_splitter = RecursiveCharacterTextSplitter(
    chunk_size=500,
    chunk_overlap=100
)

docs = text_splitter.split_documents(documents)
Enter fullscreen mode Exit fullscreen mode

Chunking improves retrieval accuracy and efficiency.


Step 3: Creating Embeddings

from langchain_openai import OpenAIEmbeddings

embeddings = OpenAIEmbeddings(
    model="text-embedding-3-small"
)
Enter fullscreen mode Exit fullscreen mode

Each document chunk becomes a numeric vector representation.


Step 4: Storing in a Vector Database

from langchain_community.vectorstores import Chroma

vectorstore = Chroma.from_documents(
    docs,
    embeddings,
    persist_directory="vectordb"
)
Enter fullscreen mode Exit fullscreen mode

The database allows efficient similarity search.


Step 5: Retrieving Relevant Context

retriever = vectorstore.as_retriever(search_kwargs={"k": 4})
Enter fullscreen mode Exit fullscreen mode

These retrieved chunks provide the context for the LLM.


Step 6: Generating Answers with GPT

from langchain_openai import ChatOpenAI

llm = ChatOpenAI(
    model="gpt-4o-mini",
    temperature=0
)
Enter fullscreen mode Exit fullscreen mode

This model is fast and cost‑effective, making it suitable for RAG
systems.


Adding Conversational Memory

from langchain.memory import ConversationBufferMemory

memory = ConversationBufferMemory(
    memory_key="chat_history",
    return_messages=True
)
Enter fullscreen mode Exit fullscreen mode

This allows the assistant to remember previous questions and responses.

Example conversation:

User: What is this document about?

Assistant: It discusses AI applications in healthcare.

User: What challenges are mentioned?
Enter fullscreen mode Exit fullscreen mode

Example Interaction

User: What does the document say about predictive analytics?

Assistant:
The document explains that predictive analytics uses machine learning
to forecast patient outcomes and identify individuals at risk of
hospital readmission.
Enter fullscreen mode Exit fullscreen mode

Technologies Used

  • Python\
  • LangChain\
  • OpenAI GPT\
  • Chroma Vector Database\
  • Sentence Transformers\
  • PyPDF

Key Takeaways

Building this project helped demonstrate several important concepts:

  • Retrieval Augmented Generation improves LLM accuracy\
  • Embeddings enable semantic search\
  • Vector databases store document knowledge efficiently\
  • Conversational memory enhances user interaction

Combining these technologies allows developers to build powerful AI
applications that can interact with real‑world data.


Future Improvements

Potential improvements include:

  • Adding a web interface using Streamlit\
  • Supporting multiple documents\
  • Including source citations in responses\
  • Implementing hybrid search\
  • Deploying the assistant as an API

Conclusion

Retrieval Augmented Generation is an important architecture for building
AI systems that work with external knowledge.

By combining document retrieval with language models, we can create
systems that transform static documents into interactive knowledge
assistants
.

This project demonstrates how a relatively simple pipeline can unlock
powerful capabilities for document understanding and conversational
AI
.

GitHub logo srvdwivedi / rag_poc

This project is a Retrieval Augmented Generation (RAG) based document assistant that allows users to interact with a PDF document using natural language. Instead of manually searching through documents, users can ask questions and receive context-aware answers generated by an LLM based on the document content.

📄 RAG Document Assistant (LangChain + OpenAI)

This project is a Retrieval Augmented Generation (RAG) based document assistant that allows users to interact with a PDF document using natural language.

Instead of manually searching through documents, users can ask questions and receive context-aware answers generated by an LLM based on the document content.

The system combines LangChain, OpenAI GPT, embeddings, and a vector database to retrieve relevant information and generate accurate responses.

🚀 Features

📑 Load and process PDF documents

✂️ Split documents into manageable chunks

🧠 Convert text into embeddings for semantic search

📦 Store embeddings in a Chroma vector database

🔎 Retrieve relevant document chunks for queries

💬 Conversational question answering with memory

⚡ Fast responses using GPT-4o-mini

🧠 How RAG Works

The system follows this pipeline:

PDF Document ↓ Document Loader ↓ Text Splitter ↓ Embeddings ↓ Vector Database (Chroma) ↓ Retriever ↓ LLM (GPT) ↓ Answer…

Top comments (0)