Large Language Models are great at generating text and answering general
questions. However, they struggle when we ask questions about specific
documents they have never seen before.
For example:
- What are the key insights in this PDF report?
- Can you summarize section 3 of this document?
LLMs alone cannot reliably answer these questions because they do not
have access to your private or custom data.
This is where Retrieval Augmented Generation (RAG) comes in.
In this article, I will walk through how I built a RAG-based document
assistant using:
- Python
- LangChain
- OpenAI GPT
- Chroma Vector Database
The result is a system that allows users to chat with their
documents.
What We Are Building
We are creating a document assistant that:
- Loads a PDF document
- Breaks it into smaller chunks
- Converts the chunks into embeddings
- Stores them in a vector database
- Retrieves relevant chunks when a user asks a question
- Uses an LLM to generate an answer based on the retrieved content
Instead of manually searching through documents, you can simply ask:
What does this document say about AI in healthcare?
And receive an answer instantly.
Understanding RAG
RAG stands for Retrieval Augmented Generation.
Instead of sending the entire document to an LLM, the system:
- Retrieves relevant information
- Sends that context to the model
- Generates an answer based on the retrieved content
This approach improves:
- accuracy
- relevance
- scalability
System Architecture
PDF Document
↓
Document Loader
↓
Text Splitter
↓
Embeddings
↓
Vector Database (Chroma)
↓
Retriever
↓
LLM (GPT)
↓
Generated Answer
Each component contributes to the RAG pipeline.
Step 1: Loading the Document
from langchain_community.document_loaders import PyPDFLoader
loader = PyPDFLoader("data/sample.pdf")
documents = loader.load()
This converts the PDF into text that our system can process.
Step 2: Splitting the Document
from langchain.text_splitter import RecursiveCharacterTextSplitter
text_splitter = RecursiveCharacterTextSplitter(
chunk_size=500,
chunk_overlap=100
)
docs = text_splitter.split_documents(documents)
Chunking improves retrieval accuracy and efficiency.
Step 3: Creating Embeddings
from langchain_openai import OpenAIEmbeddings
embeddings = OpenAIEmbeddings(
model="text-embedding-3-small"
)
Each document chunk becomes a numeric vector representation.
Step 4: Storing in a Vector Database
from langchain_community.vectorstores import Chroma
vectorstore = Chroma.from_documents(
docs,
embeddings,
persist_directory="vectordb"
)
The database allows efficient similarity search.
Step 5: Retrieving Relevant Context
retriever = vectorstore.as_retriever(search_kwargs={"k": 4})
These retrieved chunks provide the context for the LLM.
Step 6: Generating Answers with GPT
from langchain_openai import ChatOpenAI
llm = ChatOpenAI(
model="gpt-4o-mini",
temperature=0
)
This model is fast and cost‑effective, making it suitable for RAG
systems.
Adding Conversational Memory
from langchain.memory import ConversationBufferMemory
memory = ConversationBufferMemory(
memory_key="chat_history",
return_messages=True
)
This allows the assistant to remember previous questions and responses.
Example conversation:
User: What is this document about?
Assistant: It discusses AI applications in healthcare.
User: What challenges are mentioned?
Example Interaction
User: What does the document say about predictive analytics?
Assistant:
The document explains that predictive analytics uses machine learning
to forecast patient outcomes and identify individuals at risk of
hospital readmission.
Technologies Used
- Python\
- LangChain\
- OpenAI GPT\
- Chroma Vector Database\
- Sentence Transformers\
- PyPDF
Key Takeaways
Building this project helped demonstrate several important concepts:
- Retrieval Augmented Generation improves LLM accuracy\
- Embeddings enable semantic search\
- Vector databases store document knowledge efficiently\
- Conversational memory enhances user interaction
Combining these technologies allows developers to build powerful AI
applications that can interact with real‑world data.
Future Improvements
Potential improvements include:
- Adding a web interface using Streamlit\
- Supporting multiple documents\
- Including source citations in responses\
- Implementing hybrid search\
- Deploying the assistant as an API
Conclusion
Retrieval Augmented Generation is an important architecture for building
AI systems that work with external knowledge.
By combining document retrieval with language models, we can create
systems that transform static documents into interactive knowledge
assistants.
This project demonstrates how a relatively simple pipeline can unlock
powerful capabilities for document understanding and conversational
AI.
srvdwivedi
/
rag_poc
This project is a Retrieval Augmented Generation (RAG) based document assistant that allows users to interact with a PDF document using natural language. Instead of manually searching through documents, users can ask questions and receive context-aware answers generated by an LLM based on the document content.
📄 RAG Document Assistant (LangChain + OpenAI)
This project is a Retrieval Augmented Generation (RAG) based document assistant that allows users to interact with a PDF document using natural language.
Instead of manually searching through documents, users can ask questions and receive context-aware answers generated by an LLM based on the document content.
The system combines LangChain, OpenAI GPT, embeddings, and a vector database to retrieve relevant information and generate accurate responses.
🚀 Features
📑 Load and process PDF documents
✂️ Split documents into manageable chunks
🧠 Convert text into embeddings for semantic search
📦 Store embeddings in a Chroma vector database
🔎 Retrieve relevant document chunks for queries
💬 Conversational question answering with memory
⚡ Fast responses using GPT-4o-mini
🧠 How RAG Works
The system follows this pipeline:
PDF Document ↓ Document Loader ↓ Text Splitter ↓ Embeddings ↓ Vector Database (Chroma) ↓ Retriever ↓ LLM (GPT) ↓ Answer…
Top comments (0)