INDRANIL MAITI

Posted on May 26

Step-by-Step: Build a RAG Chatbot That Understands Your PDFs

#rag #ai #tutorial #openai

Today, we will build a simple chatbot that answers based on your pdf. We will use Langchain and Openai. The goal is to learn the basics of a simple RAG application. Before beginning lets understand the steps. For building a basic RAG application you need to understand the 5 steps.

Loading Documents : First you need to load your data be it PDF, or CSV, excel, or website.
Split : You need to then split your data. This is needed because every LLM has its own context window. So if your data is big you can not give all the data at once to LLM to get preferred output.
Embeddings: To store the data in the database we need to make embeddings.
Store : As the embeddings is done final work is to store it in the database.

Retrieval : Now, as the user asks some question to LLM, relevant chunks of data will be retrieved from the database and will be given to the LLM context. LLM will give the output based on chunks it has received.

Without wasting any time let's get started.

Installation

We need to install the following packages. I will tell about the need of these packages in the subsequent sections.

%pip install -qU pypdf
%pip install -qU langchain-openai
%pip install -qU langchain-qdrant
%pip install langchain_text_splitters

We need a OPENAI_API_KEY as well. So if you do not have please create it. After creating it we have to load the key from the .env file.

import os
from dotenv import load_dotenv
load_dotenv()

OPENAI_API_KEY = os.getenv("OPENAI_API_KEY")

Load your PDF

To make a chatbot based on your pdf we have to load the pdf first and for this we will use PyPDFLoader from langchain_community

from langchain_community.document_loaders import PyPDFLoader

file_path = "YOUR_FILE_PATH"

loader = PyPDFLoader(file_path)
pages = []
async for page in loader.alazy_load():
    pages.append(page)

Split the text

As we have loaded the PDF successfully, we now have to split the text for breaking it into chunks. Later the relevant chunks will be taken based on the user's query. We have used RecursiveCharacterTextSplitter but there are other options as well. Read here. You can choose different splitters and watch the performance.

from langchain.text_splitter import RecursiveCharacterTextSplitter
text_splitter  = RecursiveCharacterTextSplitter(
    chunk_size=1000,
    chunk_overlap=200,
)

split_docs = text_splitter.split_documents(documents = pages)

Chunk size defines how much information it should contain. You can change and play to know what's the correct number to get correct responses. Chunk overlap means a chunk will start not just after the end of the previous chunk but it will contain some information from previous chunk. This helps to maintain the context of the PDF.

Embeddings

Next step is to make embeddings from the chunks. We will use OpenAI embeddings here from langchain. You can choose any other embeddings as well. Check here for other embeddings models.

from langchain_openai import OpenAIEmbeddings
embeddings = OpenAIEmbeddings(
    model = "text-embedding-3-large",
    api_key = os.environ["OPENAI_API_KEY"],
)

We have created our embedder and will use this while uploading the chunks into database.

Store chunks in the Database

Next step is to store the chunks in the database. For this we will use QdrantDB from langchain. This is open source and easy to use. You need to have docker installed in your local computer and you can run it very easily.
After docker installation simply run a docker-compose.yml file to run the qdrantdb. Here is the .yml file.

services:
  qdrant:
    image: qdrant/qdrant
    ports:
      - 6333:6333

Once it runs go to http://localhost:6333/dashboard to check if qdrant has run successfully or not. If you see the following then its working. for your case there should not be any db store.

Keep the docker running as it is and let's move on to the next step. So far we have created the chunks.
First we have to create a client to connect with the database.

from langchain_qdrant import QdrantVectorStore
from qdrant_client import QdrantClient
from qdrant_client.models import Distance, VectorParams
# Create the client
client = QdrantClient(url="http://localhost:6333")

# Create the collection with the correct embedding dimension
client.create_collection(
    collection_name="visual_instruction_final",
    vectors_config=VectorParams(
        size=3072,  # Your embedding dimension
        distance=Distance.COSINE
    )
)

Next we will use this client to upload the chunks into database.

# Now create the vector store using the existing collection
vector_store = QdrantVectorStore(
    client=client,  # Use the client we created in Step 2
    collection_name="visual_instruction_final",
    embedding=embeddings
)

# Add your documents
vector_store.add_documents(split_docs)

Once this is successful you can check the dashboard again by refreshing it and you will see the datas stored there.

Information Retrieval

Now the most fun part, is to retrieve the data based on user's query. The most important part now you can imagine, based on the user's query the most important chunks will be retrieved from the database and provided to the LLM models. LLM models will then prepare the response.
Here is how you can write the retriever.

from langchain_qdrant import QdrantVectorStore

# Create retriever without passing the client object
retriever = QdrantVectorStore.from_existing_collection(
    collection_name="visual_instruction_final",
    embedding=embeddings,
    url="http://localhost:6333"  # Use URL instead of client
)

# Perform similarity search
search_result = retriever.similarity_search("What is written in introduction?", k=3)

We have done retriver based on similarity search but there are other options as well that you can change and check the performances.
So we are all ready. Now on to the last and final step i.e. prepare the chatbot.
We will use OpenAI from Langchain.

from langchain_openai import ChatOpenAI
from langchain.schema import HumanMessage, SystemMessage, AIMessage

# Create the chat model
chat_model = ChatOpenAI(
    model="gpt-3.5-turbo",  # or "gpt-4" if you have access
    temperature=0.7,        # Controls randomness (0.0 = deterministic, 1.0 = very random)
    max_tokens=1000,        # Maximum tokens in response
    streaming=False         # Set to True for streaming responses
)

def rag_chat(question, k=3):
    # Step 1: Retrieve relevant documents
    relevant_docs = retriever.similarity_search(question, k=k)

    # Step 2: Combine retrieved content
    context = "\n\n".join([doc.page_content for doc in relevant_docs])

    # Step 3: Create messages directly (simpler approach)
    system_message = f"""You are a helpful AI assistant. Use the following context to answer the user's question. 
If you can't find the answer in the context, say so clearly.

Context:
{context}"""

    messages = [
        SystemMessage(content=system_message),
        HumanMessage(content=question)
    ]

    # Step 4: Get response from chat model
    response = chat_model.invoke(messages)

    return {
        "answer": response.content,
        "sources": relevant_docs,
        "context_length": len(context)
    }

# Test the corrected RAG chatbot
question = "What is written in introduction?"
result = rag_chat(question)

print("Question:", question)
print("\nAnswer:", result["answer"])
print(f"\nContext length: {result['context_length']} characters")
print(f"Used {len(result['sources'])} source documents")

Last few lines have been added here to check the retrieved chunks.

Once you run this, you will get the response from the LLM. Now you can play with different options and try to get a better response. The main goal here was to introduce the concept step by step and how to use them. Hope this clears. Happy building..
Share what you are building.
Connect me : Twitter

DEV Community