Skip to content

DEV Community

Sourav Dwivedi

Posted on Mar 15

Building a Simple RAG Document Assistant with LangChain and GPT

#llm #python #rag #tutorial

Large Language Models are great at generating text and answering general
questions. However, they struggle when we ask questions about specific
documents they have never seen before.

For example:

What are the key insights in this PDF report?
Can you summarize section 3 of this document?

LLMs alone cannot reliably answer these questions because they do not
have access to your private or custom data.

This is where Retrieval Augmented Generation (RAG) comes in.

In this article, I will walk through how I built a RAG-based document
assistant using:

Python
LangChain
OpenAI GPT
Chroma Vector Database

The result is a system that allows users to chat with their
documents.

What We Are Building

We are creating a document assistant that:

Loads a PDF document
Breaks it into smaller chunks
Converts the chunks into embeddings
Stores them in a vector database
Retrieves relevant chunks when a user asks a question
Uses an LLM to generate an answer based on the retrieved content

Instead of manually searching through documents, you can simply ask:

What does this document say about AI in healthcare?

And receive an answer instantly.

Understanding RAG

RAG stands for Retrieval Augmented Generation.

Instead of sending the entire document to an LLM, the system:

Retrieves relevant information
Sends that context to the model
Generates an answer based on the retrieved content

This approach improves:

accuracy
relevance
scalability

System Architecture

PDF Document
      ↓
Document Loader
      ↓
Text Splitter
      ↓
Embeddings
      ↓
Vector Database (Chroma)
      ↓
Retriever
      ↓
LLM (GPT)
      ↓
Generated Answer

Each component contributes to the RAG pipeline.

Step 1: Loading the Document

from langchain_community.document_loaders import PyPDFLoader

loader = PyPDFLoader("data/sample.pdf")
documents = loader.load()

This converts the PDF into text that our system can process.

Step 2: Splitting the Document

from langchain.text_splitter import RecursiveCharacterTextSplitter

text_splitter = RecursiveCharacterTextSplitter(
    chunk_size=500,
    chunk_overlap=100
)

docs = text_splitter.split_documents(documents)

Chunking improves retrieval accuracy and efficiency.

Step 3: Creating Embeddings

from langchain_openai import OpenAIEmbeddings

embeddings = OpenAIEmbeddings(
    model="text-embedding-3-small"
)

Each document chunk becomes a numeric vector representation.

Step 4: Storing in a Vector Database

from langchain_community.vectorstores import Chroma

vectorstore = Chroma.from_documents(
    docs,
    embeddings,
    persist_directory="vectordb"
)

The database allows efficient similarity search.

Step 5: Retrieving Relevant Context

retriever = vectorstore.as_retriever(search_kwargs={"k": 4})

These retrieved chunks provide the context for the LLM.

Step 6: Generating Answers with GPT

from langchain_openai import ChatOpenAI

llm = ChatOpenAI(
    model="gpt-4o-mini",
    temperature=0
)

This model is fast and cost‑effective, making it suitable for RAG
systems.

Adding Conversational Memory

from langchain.memory import ConversationBufferMemory

memory = ConversationBufferMemory(
    memory_key="chat_history",
    return_messages=True
)

This allows the assistant to remember previous questions and responses.

Example conversation:

User: What is this document about?

Assistant: It discusses AI applications in healthcare.

User: What challenges are mentioned?

Example Interaction

User: What does the document say about predictive analytics?

Assistant:
The document explains that predictive analytics uses machine learning
to forecast patient outcomes and identify individuals at risk of
hospital readmission.

Technologies Used

Python\
LangChain\
OpenAI GPT\
Chroma Vector Database\
Sentence Transformers\
PyPDF

Key Takeaways

Building this project helped demonstrate several important concepts:

Retrieval Augmented Generation improves LLM accuracy\
Embeddings enable semantic search\
Vector databases store document knowledge efficiently\
Conversational memory enhances user interaction

Combining these technologies allows developers to build powerful AI
applications that can interact with real‑world data.

Future Improvements

Potential improvements include:

Adding a web interface using Streamlit\
Supporting multiple documents\
Including source citations in responses\
Implementing hybrid search\
Deploying the assistant as an API

Conclusion

Retrieval Augmented Generation is an important architecture for building
AI systems that work with external knowledge.

By combining document retrieval with language models, we can create
systems that transform static documents into interactive knowledge
assistants.

This project demonstrates how a relatively simple pipeline can unlock
powerful capabilities for document understanding and conversational
AI.

srvdwivedi / rag_poc

This project is a Retrieval Augmented Generation (RAG) based document assistant that allows users to interact with a PDF document using natural language. Instead of manually searching through documents, users can ask questions and receive context-aware answers generated by an LLM based on the document content.

📄 RAG Document Assistant (LangChain + OpenAI)

This project is a Retrieval Augmented Generation (RAG) based document assistant that allows users to interact with a PDF document using natural language.

Instead of manually searching through documents, users can ask questions and receive context-aware answers generated by an LLM based on the document content.

The system combines LangChain, OpenAI GPT, embeddings, and a vector database to retrieve relevant information and generate accurate responses.

🚀 Features

📑 Load and process PDF documents

✂️ Split documents into manageable chunks

🧠 Convert text into embeddings for semantic search

📦 Store embeddings in a Chroma vector database

🔎 Retrieve relevant document chunks for queries

💬 Conversational question answering with memory

⚡ Fast responses using GPT-4o-mini

🧠 How RAG Works

The system follows this pipeline:

PDF Document ↓ Document Loader ↓ Text Splitter ↓ Embeddings ↓ Vector Database (Chroma) ↓ Retriever ↓ LLM (GPT) ↓ Answer…

Top comments (0)

Subscribe