A step-by-step journey into building a flexible, multi-provider RAG application that lets you talk to your documents.
We’ve all been there: staring at a 100-page PDF, knowing the answer to our question is buried somewhere inside. Skimming through dense academic papers, legal documents, or technical manuals is a tedious process. What if you could just… ask the document a question and get a straight answer instead of doing Ctrl+F multiple times?
That’s exactly what I set out to build: an interactive chatbot that ingests any PDF (not 1 but many hehe) and allows you to have a natural conversation with it.
In this post, I’ll walk you through how I built this PDFs-ChatBot using the power of Streamlit for the UI and LangChain for the AI orchestration. More importantly, I’ll show you how I designed it to be incredibly flexible, allowing users to switch between different AI providers like Google Gemini, Cohere, and NVIDIA on the fly.
You can find the full source code on GitHub.
Also, try it out Here!
The Core Idea: Retrieval-Augmented Generation (RAG)
The magic behind this application is a technique called Retrieval-Augmented Generation (RAG). Instead of just relying on a Large Language Model’s (LLM) general knowledge, we give it access to specific information from our documents.
Here’s the workflow in a nutshell:
Indexing (The “Library”):
First, we process the PDFs and create a searchable knowledge library.
- Load & Split: The text from the PDFs is extracted and broken down into smaller, manageable chunks.
-
Embed: Each chunk is converted into a numerical representation called an “embedding” using a model (like Google’s
embedding-001
or NVIDIA’snv-embedqa-e5-v5
, or any other model. The Project supports dynamic addition of LLMs). These embeddings capture the semantic meaning of the text. - Store: These embeddings are stored in a specialized database called a vector store (I’m using FAISS), which is incredibly fast at finding similar vectors.
Retrieval & Generation (The “Conversation”):
When you ask a question:
- Your question is also converted into an embedding (Make sure to use the same Embedding model used for processing the PDFs for obtaining optimal results).
- The vector store finds the text chunks from the PDF that are most semantically similar to your question.
- Finally, your question and these relevant chunks are passed to an LLM (like Gemini or Cohere) with a prompt like: “Based on the following context, answer this question: [Your Question] [Relevant Chunks]”.
- The LLM generates a coherent, context-aware answer. This approach ensures the answers are grounded in the document’s content, dramatically reducing hallucinations and providing accurate, relevant information.
Architecting for Flexibility: The Factory Pattern
One of my main goals was to avoid being locked into a single AI provider. I wanted the freedom to experiment with different models. To achieve this, I used a simple but powerful software design pattern: the Factory Pattern.
Instead of cluttering my main app.py with model initialisation logic, I created separate files: llm_providers.py
and embedding_providers.py
.
# llm_providers.py
import os
from langchain_google_genai import ChatGoogleGenerativeAI
from langchain.llms import Cohere, HuggingFaceHub
# The “Factory” dictionary
llm_providers = {
# Helper function for Initializing and returning the Google Gemini LLM.
‘gemini’: _get_gemini_llm,
# Helper function for Initializing and returning the Cohere LLM.
‘cohere’: _get_cohere_llm
}
# The main factory function
def get_llm(model_choice: str):
“””
Factory function to select and initialize the chosen LLM.
“””
return llm_providers[model_choice]()
This design keeps my code clean and modular. Adding a new LLM provider is as simple as adding a new helper function and one line to the dictionary.
Building the UI with Streamlit
Streamlit is a game-changer for building interactive data and AI applications in Python. It allowed me to create a polished user interface with minimal code.
The sidebar is the control center of the application.
A key part of making a Streamlit app feel seamless is managing its state. I used st.session_state
extensively to store the conversation history, the user’s model choices, and the main conversation chain object, ensuring they persist between user interactions.
For the chat bubbles, I used a little custom CSS and HTML templating to create a familiar and visually appealing chat experience.
Final Thoughts and Next Steps
Building this PDF-ChatBot was an incredibly rewarding experience. It clarified the RAG pipeline and highlighted the importance of clean, modular code when working with rapidly evolving AI technologies. Of course, I had the trouble of going through multiple Documentations, I’m not an expert either!
The final application is a powerful tool that not only works well but is also easy to extend. Here are a few ideas for future improvements (Maybe any of you could create a PR, I would be very much happy to accept it :D):
- Displaying Sources: Highlighting which parts of the PDF were used to generate an answer.
- More Vector Stores: Adding support for other vector databases like Chroma or Pinecone.
- Deployment: Packaging the application in Docker and deploying it to the cloud. If you’re looking to get your hands dirty with building practical LLM applications, I highly encourage you to try building a project like this. It’s a fantastic way to learn the fundamentals of the modern AI stack along with scouring the internet, looking through docs and stack overflow!
Thanks for reading! Feel free to check out the project on GitHub and build your own.
Top comments (2)
Some comments may only be visible to logged-in visitors. Sign in to view all comments.