DEV Community

Zarrar Shaikh
Zarrar Shaikh

Posted on

Building a RAG-powered PDF Chatbot with LangChain, Streamlit and FAISS

In this guide, we’re going to build a working AI chatbot that can read our PDFs and answer questions from them. We’ll use a method called Retrieval-Augmented Generation (RAG) to help our chatbot connect the dots between static AI models and our own documents.

The thing with large language models (LLMs) is - they’re great with language, but they don’t know anything about our specific data. Their knowledge is frozen at the point they were trained. They can’t read our lease agreements, product manuals, or meeting notes. That’s where RAG comes in.

RAG gives us a way to feed our data into the model. So instead of asking, “When does this lease expire?” and hoping the AI knows what we’re talking about, we give it the lease PDF - and it finds the answer based on what’s actually written in there.

Some useful ways we can apply this:

  • Legal documents - Ask the AI to summarize a case or find key terms.
  • Research papers - Get summaries or compare studies without reading it all ourselves.
  • Customer support - Use our product docs and chat history to create a smart support assistant.
  • HR/Policy docs - Help our team quickly find rules, policies, and procedures.

We’ll walk through this step by step, using:

  • Streamlit for a simple, clean frontend
  • LangChain to handle the LLM logic
  • FAISS for fast vector search over our document content

Our goal is to build something useful, easy to understand, and flexible enough to plug in any LLM we want later.

Let’s get started.


🏗️ What We're Building

Working Demo

g-bot-basicar

📐 System Design

rag-bot-basic

We’re building RAG PDFBot - a chatbot that lets us upload PDFs, choose a model provider (like Groq or Gemini), and ask natural-language questions based on the content of those PDFs.

All the core pieces - PDF parsing, embedding, vector search, and LLM interaction - will be stitched together using LangChain, FAISS, and Streamlit.

And we’re keeping the architecture modular, so we can plug in new models or features anytime.


📚 Core Concepts (Explained with Real Examples)

Before we dive into the code, let’s quickly go over the key ideas behind what we’re building - explained in a way that makes sense without needing a PhD.

Large Language Models (LLMs)

LLMs are programs trained to guess the next word in a sentence - but they’ve seen billions of examples, so their guesses are usually spot on.

For example:

What is the capital of France?
Paris.

We’ll use an LLM later in our chatbot to generate answers based on the content we retrieve from our PDFs.

Embeddings

Embeddings turn our text into numbers - or more accurately, into vectors. These numbers capture meaning. For example:

“Cats drink milk.” → [0.12, -0.34, 0.89, …]
“Kittens consume dairy.” → [0.10, -0.31, 0.87, …]

The two sentences mean nearly the same thing, and their embeddings are close too. That’s how we compare meanings using math, not just words.

Vectors and Vector Databases

A vector is just a list of numbers. A vector database stores lots of these vectors and helps us search by meaning.

Let’s say we ask:

“Who signed this document?”

Instead of keyword matching, our app finds chunks like:

“Authorized representative: John Doe”

We’re using FAISS - a super fast, lightweight vector store that runs locally and keeps things snappy.

Retrieval-Augmented Generation (RAG)

This is the secret sauce. Instead of dumping the entire PDF into the LLM, we do something smarter:

  • Retrieval - We find the chunks that best match our question
  • Augmented - We add those chunks to the prompt
  • Generation - The model uses them to craft a relevant answer

So when we ask a question, it doesn’t guess blindly - it answers based on the actual content from our documents.


🧬 Stack Overview

Here’s what we’re using to build our chatbot - and why each piece matters:

  • LangChain
    The glue holding everything together. LangChain helps us connect to LLMs, manage prompts, load chains, and run contextual queries. It saves us from writing a ton of boilerplate.

  • Streamlit
    A Python-based web framework that turns our scripts into a working UI - no need to mess with HTML or JavaScript. Just write Python functions, and Streamlit builds the interface.

  • Groq
    A service that runs open-source models like LLaMA 3 at lightning speed using custom chips. It’s perfect for low-latency responses. And with their generous free tier, it’s great for getting started.

  • Google Gemini
    Google’s LLM platform and a strong alternative to ChatGPT. We can access advanced models like Gemini Flash for reasoning and dialogue. The free tier gives us more than enough for prototyping.

  • FAISS
    A fast, local vector store from Facebook AI. It lets us search our document chunks by meaning, not just keywords. Lightweight, efficient, and easy to use with embeddings.

💡 We’re skipping OpenAI for now because of cost - both Groq and Gemini have fast and generous free tiers.


🧰 Setting Up the Project

Let’s start by creating a fresh project from scratch - and version-controlling it right from the beginning.

1. Create a New GitHub Repository

Head over to your GitHub profile and create a new repository.
You can name it something like rag-pdf-chatbot.

  • Keep it public or private - up to you.
  • Add a .gitignore and choose the Python template.

Once created, copy the repo’s URL (use HTTPS).

2. Clone the Repo to Your Local Machine

Open your terminal and run:

git clone https://github.com/your-username/rag-pdf-chatbot.git
cd rag-pdf-chatbot
Enter fullscreen mode Exit fullscreen mode

Replace your-username with your actual GitHub username.

3. Set Up a Virtual Environment

Let’s keep our project dependencies clean and isolated from the rest of our system:

python3 -m venv .venv
source .venv/bin/activate  # For MacOS / Linux
Enter fullscreen mode Exit fullscreen mode

🧪 Why use a virtual environment?
It keeps our project’s packages separate from everything else on our machine. This avoids version conflicts and makes our setup easier to manage and share.


🧩 Code Overview

Directory Structure

rag-pdf-chatbot/
│
├── app.py                 # Streamlit frontend
├── requirements.txt       # Dependencies
├── data/                  # FAISS vector store
├── README.md              # Description and instructions
├── .env                   # API keys (not committed to Git)
Enter fullscreen mode Exit fullscreen mode

requirements.txt

pandas
PyPDF2
streamlit
faiss-cpu
langchain
langchain-community
langchain-groq
langchain-google-genai
sentence-transformers
Enter fullscreen mode Exit fullscreen mode

Install with:

pip install -r requirements.txt
Enter fullscreen mode Exit fullscreen mode

app.py

from datetime import datetime

import pandas as pd
import streamlit as st
from PyPDF2 import PdfReader

from langchain.prompts import PromptTemplate
from langchain.text_splitter import RecursiveCharacterTextSplitter
from langchain.chains.question_answering import load_qa_chain

from langchain.embeddings import HuggingFaceEmbeddings
from langchain_community.vectorstores import FAISS

from langchain_google_genai import (
  GoogleGenerativeAIEmbeddings,
  ChatGoogleGenerativeAI
)
from langchain_groq import ChatGroq
Enter fullscreen mode Exit fullscreen mode

These cover everything from PDF parsing and embedding to using Groq and Gemini with LangChain.

💡 If you plan to use a different LLM provider later (like OpenAI or Mistral), you can add their package when needed.


📂 Full Code Reference

You can find the complete code for this project here:

👉 GitHub Repo - Zlash65/rag-bot-basic

Feel free to give it a ⭐ if you like it!


🔍 Full Walkthrough of main()

This walkthrough follows the top-down order of the main() function in your app and explains utility functions in-depth as they're introduced.


1. 🚀 App Initialization and State Setup

def main():
  st.set_page_config(page_title="RAG PDFBot", layout="centered")
  st.title("👽 RAG PDFBot")
  st.caption("Chat with multiple PDFs :books:")
Enter fullscreen mode Exit fullscreen mode

The app starts by setting a page title and layout, followed by a heading and caption. This makes the UI friendly and helps users immediately understand the purpose of the app.

  for key, default in {
    "chat_history": [],
    "pdfs_submitted": False,
    "vector_store": None,
    "pdf_files": [],
    "last_provider": None,
    "unsubmitted_files": False,
  }.items():
    if key not in st.session_state:
      st.session_state[key] = default
Enter fullscreen mode Exit fullscreen mode

This loop initializes st.session_state with default values. Streamlit reruns the script on every interaction, so we use the session state to persist key pieces of information across reruns, such as uploaded files, model selections, chat history, and whether reprocessing is needed.


2. 🎛️ Sidebar: Model Configuration UI

  with st.sidebar:
    with st.expander("⚙️ Configuration", expanded=True):
Enter fullscreen mode Exit fullscreen mode

The sidebar groups all model-related configurations inside an expandable section to keep the interface organized and uncluttered.

🧾 Below the imports, define model options.

MODEL_OPTIONS = {
  "Groq": {
    "playground": "https://console.groq.com/",
    "models": ["llama-3.1-8b-instant", "llama3-70b-8192"]
  },
  "Gemini": {
    "playground": "https://ai.google.dev",
    "models": ["gemini-2.0-flash", "gemini-2.5-flash"]
  }
}
Enter fullscreen mode Exit fullscreen mode

🧾 Inside the sidebar expander, add this.

      model_provider = st.selectbox(
        "🔌 Model Provider",
        ["Select a model provider", "Groq", "Gemini"],
        index=0,
        key="model_provider"
      )

      if model_provider == "Select a model provider":
        return
Enter fullscreen mode Exit fullscreen mode

The user is required to pick either Groq or Gemini as their LLM provider. If nothing is selected, the function returns early. This prevents the rest of the interface from loading and avoids initializing model-specific logic prematurely.

🧾 Still inside the same sidebar expander, add this.

      api_key = st.text_input(
        "🔑 Enter your API Key",
        help=f"Get API key from [here]({MODEL_OPTIONS[model_provider]['playground']})"
      )
      if not api_key:
        return
Enter fullscreen mode Exit fullscreen mode

This input prompts the user for their API key. It dynamically links to the relevant model provider's API dashboard. Again, if no key is entered, we return early, which ensures no model or embedding operations run without credentials.

🧾 Still inside the same sidebar expander, add this.

      models = MODEL_OPTIONS[model_provider]["models"]
      model = st.selectbox("🧠 Select a model", models, key="model")
Enter fullscreen mode Exit fullscreen mode

The available model options are loaded based on the selected provider. For Groq, it includes LLaMA variants, and for Gemini, it includes Gemini 2.0 and 2.5.


3. 📥 PDF Upload and Submission

🧾 Still inside the same sidebar expander, add this.

      uploaded_files = st.file_uploader(
        "📚 Upload PDFs",
        type=["pdf"],
        accept_multiple_files=True,
        key="pdf_uploader"
      )

      if uploaded_files and uploaded_files != st.session_state.pdf_files:
        st.session_state.unsubmitted_files = True
Enter fullscreen mode Exit fullscreen mode

Users can upload one or more PDFs. If the uploaded files differ from the current state, we flag them as “unsubmitted.” This allows us to prompt users later to submit their files explicitly, avoiding silent reprocessing.

🧾 Still inside the same sidebar expander, add this.

      if st.button("➡️ Submit"):
        if uploaded_files:
          with st.spinner("Processing PDFs..."):
            process_and_store_pdfs(uploaded_files, model_provider, api_key)
            st.session_state.pdf_files = uploaded_files
            st.session_state.unsubmitted_files = False
            st.toast("PDFs processed successfully!", icon="")
        else:
          st.warning("No files uploaded.")
Enter fullscreen mode Exit fullscreen mode

When the user clicks Submit, we process the uploaded files and generate their vector representation. This is done only after confirmation to prevent accidental reprocessing or loading large documents unintentionally.

3.1 ⚙️ Utility: process_and_store_pdfs()

🧾 Add this function above your main() function.

def process_and_store_pdfs(pdfs, provider, api_key):
  raw_text = get_pdf_text(pdfs)
  chunks = get_text_chunks(raw_text)
  store = get_vectorstore(chunks, provider, api_key)
  st.session_state.vector_store = store
  st.session_state.pdfs_submitted = True
Enter fullscreen mode Exit fullscreen mode

This function is the core of the ingestion pipeline. It extracts all text from the uploaded PDFs, chunks that text into manageable overlapping segments, embeds them, stores them in a FAISS vectorstore, and then keeps the store in session memory.

3.2 📄 get_pdf_text()

🧾 Add this function above your process_and_store_pdfs() function.

def get_pdf_text(pdf_files):
  text = ""
  for file in pdf_files:
    reader = PdfReader(file)
    for page in reader.pages:
      text += page.extract_text() or ""
  return text
Enter fullscreen mode Exit fullscreen mode

This function loops through each uploaded PDF and extracts raw text from every page using PyPDF2. If a page doesn't contain extractable text (like a scanned image), extract_text() may return None, so we use or "" to ensure the process doesn’t fail. The function returns a long string containing all text concatenated together.

Example:
If you upload a PDF with 2 pages:

  • Page 1: "Terms and Conditions"
  • Page 2: "Refunds will not be issued after 30 days."

Then this function will return:
"Terms and ConditionsRefunds will not be issued after 30 days."

3.3 ✂️ get_text_chunks()

🧾 Add this function below your get_pdf_text() function.

def get_text_chunks(text):
  splitter = RecursiveCharacterTextSplitter(chunk_size=5000, chunk_overlap=500)
  return splitter.split_text(text)
Enter fullscreen mode Exit fullscreen mode

This function breaks the extracted text into overlapping chunks using LangChain’s RecursiveCharacterTextSplitter.

The chunk_size is set to 5000 characters, and each chunk overlaps the previous one by 500 characters. This ensures that if important context spans across two chunks, the LLM doesn’t lose meaning due to hard boundaries.

Example:

Let’s say you have the following 150-character text:

"In case of early termination, the lessee shall forfeit the security deposit. A 60-day written notice is mandatory for all cancellations."
Enter fullscreen mode Exit fullscreen mode

With a chunk size of 80 and an overlap of 20, the chunks would be:

  • Chunk 1: "In case of early termination, the lessee shall forfeit the security deposit."
  • Chunk 2: "shall forfeit the security deposit. A 60-day written notice is mandatory"

This overlap ensures LLMs don’t miss context when answering questions later.

3.4 🧠 get_vectorstore() and get_embeddings()

🧾 Add these function below your get_text_chunks() function.

def get_vectorstore(chunks, provider, api_key):
  embedding = get_embeddings(provider, api_key)
  store = FAISS.from_texts(chunks, embedding)
  store.save_local(f"./data/{provider.lower()}_vector_store.faiss")
  return store
Enter fullscreen mode Exit fullscreen mode

This function creates the actual FAISS vectorstore. It first retrieves the appropriate embedding function using get_embeddings(), applies it to each chunk, and stores the resulting vectors in a FAISS index saved locally.

def get_embeddings(provider, api_key=None):
  if provider.lower() == "groq":
    return HuggingFaceEmbeddings(model_name="sentence-transformers/all-MiniLM-L6-v2")
  elif provider.lower() == "gemini":
    return GoogleGenerativeAIEmbeddings(
      model="models/embedding-001",
      google_api_key=api_key
    )
  else:
    raise ValueError("Unsupported provider")
Enter fullscreen mode Exit fullscreen mode

Since LangChain does not yet offer an official embedding model for Groq, we use a general-purpose HuggingFace embedding model called all-MiniLM-L6-v2, which works well for a wide range of semantic search tasks. For Gemini, LangChain provides official support for Google’s embedding API (models/embedding-001), which integrates seamlessly and is optimized for use with Gemini models.


4. 🔁 Auto-Reprocess on Provider Change

🧾 Place this right below the Submit button inside the same sidebar block.

      if model_provider != st.session_state.last_provider:
        st.session_state.last_provider = model_provider
        if st.session_state.pdf_files:
          with st.spinner("Reprocessing PDFs..."):
            process_and_store_pdfs(st.session_state.pdf_files, model_provider, api_key)
            st.toast("PDFs reprocessed successfully!", icon="🔁")
Enter fullscreen mode Exit fullscreen mode

If the user changes the provider (e.g., from Groq to Gemini), the system automatically reprocesses existing PDFs using the new embedding model. This prevents mismatched embeddings and ensures consistency.


5. 🛠 Sidebar Tools

🧾 Still inside the sidebar but outside the config expander, add this.

    with st.expander("🛠️ Tools", expanded=False):
      col1, col2, col3 = st.columns(3)

      if col1.button("🔄 Reset"):
        st.session_state.clear()
        st.session_state.model_provider = "Select a model provider"
        st.rerun()

      if col2.button("🧹 Clear Chat"):
        st.session_state.chat_history = []
        st.session_state.pdf_files = None
        st.session_state.vector_store = None
        st.session_state.pdfs_submitted = False
        st.toast("Chat and PDF cleared.", icon="🧼")

      if col3.button("↩️ Undo") and st.session_state.chat_history:
        st.session_state.chat_history.pop()
        st.rerun()
Enter fullscreen mode Exit fullscreen mode

The following tools allow users to reset, clear, or undo chat state, making the UI much more usable and fault-tolerant:

  • Reset: Clears everything and resets the dropdown
  • Clear Chat: Removes chat and uploaded data
  • Undo: Removes the last chat interaction only

6. 📎 Show Uploaded File List

🧾 Add this function below your process_and_store_pdfs() function.

def render_uploaded_files():
  pdf_files = st.session_state.get("pdf_files", [])
  if pdf_files:
    with st.expander("**📎 Uploaded Files:**"):
      for f in pdf_files:
        st.markdown(f"- {f.name}")
Enter fullscreen mode Exit fullscreen mode

The render_uploaded_files() function shows the names of all submitted PDF files in a collapsible section. It gives users quick visual confirmation of which files are currently active in the chatbot.

We only call this function after the PDFs have been submitted and processed, using:

🧾 Place this outside the sidebar, right below the entire sidebar block in main().

if st.session_state.pdfs_submitted and st.session_state.pdf_files:
  render_uploaded_files()
Enter fullscreen mode Exit fullscreen mode

This avoids showing the file list prematurely or when the files are not yet embedded, keeping the UI clean and relevant.


7. 📖 Show Chat History

🧾 Add this inside main() just after uploaded files list.

  for q, a, *_ in st.session_state.chat_history:
    with st.chat_message("user"):
      st.markdown(q)
    with st.chat_message("ai"):
      st.markdown(a)
Enter fullscreen mode Exit fullscreen mode

This loop recreates all past questions and responses in chat-style bubbles.


8. ⚠️ Warn About Unsubmitted Files

🧾 Add this inside main() after chat history.

  if st.session_state.unsubmitted_files:
    st.warning("📄 New PDFs uploaded. Please submit before chatting.")
    return
Enter fullscreen mode Exit fullscreen mode

This prevents the user from asking questions before submitting the newly uploaded PDFs, ensuring only processed documents are used for answering.


9. 💬 Chat Input and Answer Generation

🧾 Add this next inside main() after the unsubmitted check.

  if st.session_state.pdfs_submitted:
    question = st.chat_input("💬 Ask a Question from the PDF Files")
    if question:
      with st.chat_message("user"):
        st.markdown(question)
      with st.chat_message("ai"):
        with st.spinner("Thinking..."):
          try:
            docs = st.session_state.vector_store.similarity_search(question)
            chain = get_qa_chain(model_provider, model, api_key)
            output = chain(
              {"input_documents": docs, "question": question},
              return_only_outputs=True
            )["output_text"]
            st.markdown(output)
            pdf_names = [f.name for f in st.session_state.pdf_files]
            st.session_state.chat_history.append(
              (question, output, model_provider, model, pdf_names, datetime.now())
            )
          except Exception as e:
            st.error(f"Error: {str(e)}")
  else:
    st.info("📄 Please upload and submit PDFs to start chatting.")
Enter fullscreen mode Exit fullscreen mode

This section renders the chat input only after PDFs have been successfully submitted. When a user asks a question, it performs a similarity search over the FAISS vector store to find relevant document chunks. It then sends those chunks and the user’s question to the selected LLM (Groq or Gemini) using a prompt chain, and returns a detailed answer. The result, along with metadata, is saved into session state for chat history and optional download.

🧠 9.1 get_qa_chain()

🧾 Add this function below your get_vectorstore() function.

def get_qa_chain(provider, model, api_key):
  prompt = PromptTemplate(
    template="""
    Answer the question as detailed as possible.
    If the question cannot be answered using the provided context, please say "I don't know."

    Context:
    {context}

    Question:
    {question}?

    Answer:
    """,
    input_variables=["context", "question"]
  )
  llm = ChatGroq(model=model, api_key=api_key) if provider.lower() == "groq" else ChatGoogleGenerativeAI(model=model, api_key=api_key)
  return load_qa_chain(llm, chain_type="stuff", prompt=prompt)
Enter fullscreen mode Exit fullscreen mode

This utility creates a custom QA chain using LangChain’s load_qa_chain method. The chain is configured to respond strictly based on context, and not to hallucinate when the answer isn’t found. It supports both Groq and Gemini as LLM backends, selecting the appropriate one based on the provider.


10. 💾 Download Chat History

🧾 Add this function below your render_uploaded_files() function.

def render_download_chat_history():
  df = pd.DataFrame(
    st.session_state.chat_history,
    columns=["Question", "Answer", "Model", "Model Name", "PDF File", "Timestamp"]
  )
  with st.expander("**📎 Download Chat History:**"):
    st.sidebar.download_button(
      "📥 Download Chat History",
      data=df.to_csv(index=False),
      file_name="chat_history.csv",
      mime="text/csv"
    )
Enter fullscreen mode Exit fullscreen mode

This function creates a downloadable CSV file containing the full chat history. It uses Pandas to format the data and adds a download button in the sidebar for users to save their conversations along with model and file metadata.

🧾 At the bottom of main(), add this.

  if st.session_state.chat_history:
    render_download_chat_history()
Enter fullscreen mode Exit fullscreen mode

This checks if there’s any chat history and, if so, calls the utility function to render the download option.

💡 What This Does

  • If any chat has occurred, a new expander shows up in the sidebar.
  • Users can download a .csv file with all conversation metadata:

    • Questions, Answers
    • Model Provider and Model Name
    • PDF files used
    • Timestamp of each entry

✅ Final Behavior Example

A downloaded CSV might look like:

Question Answer Model Model Name PDF File Timestamp
What’s the lease end date? June 30, 2025 Groq llama-3.1-8b-instant lease.pdf 2025-07-04 14:02:00
Who signed the agreement? John Doe Groq llama-3.1-8b-instant lease.pdf 2025-07-04 14:03:15

11. 🎬 Launch the Script

🧾 Finally, call main() at the bottom of your file.

if __name__ == "__main__":
  main()
Enter fullscreen mode Exit fullscreen mode

This standard Python entry point ensures the app runs when app.py is executed directly.


12. 🏁 How to Run the Code

Once everything is set up, running the app is simple. Just use:

streamlit run app.py
Enter fullscreen mode Exit fullscreen mode

This command will start a local web server and open the chatbot in your browser - usually at http://localhost:8501.

rag-bot-basic

From there, you’ll be able to:

  • Upload one or more PDFs
  • Choose a model provider (Groq or Gemini)
  • Enter your API key
  • Start asking questions based on your documents

🧠 If the browser doesn’t open automatically, just copy the localhost URL from the terminal and paste it into your browser.


💭 Final Thoughts

That’s it - our chatbot is ready to rock.

It reads PDFs, finds context using RAG, and gives relevant answers using your choice of LLM - all from a clean, modular setup that’s easy to extend.

Here are some ideas you can explore next:

  • 🧠 Add memory so it remembers past messages
  • 📁 Support multiple PDFs at once
  • 🌐 Deploy it on the web (Streamlit Cloud or Hugging Face Spaces)
  • 🔌 Try swapping in different LLMs like Claude or Mistral
  • 🧪 Add advanced features like source highlighting or confidence scores

This isn’t just a chatbot - it’s a real-world template we can build upon to create context-aware AI tools. No retraining, no black-box magic. Just good engineering and the right tools.

So let’s fork it, build on it, break it, fix it - and make it ours.

Happy building. 🛠️

Top comments (0)