Specialized Chatbot using RAG (Retrieval-Augmented Generation) — Part II
In the previous episode, we already discussed the concept of Retrieval-Augmented Generation (RAG) and prepared our project structure, requirements, and source data.
Now we are moving to the most important step of the RAG pipeline, which is ingesting our knowledge source into the vector database.
This process includes several stages:
- Reading the PDF
- Splitting the document into chunks
- Creating embeddings
- Storing them inside ChromaDB
Once this process is completed, our chatbot will have a searchable knowledge base.
Later, when a user asks a question, our system will retrieve the most relevant parts of the document and use them as context for the model.
Understanding the Ingestion Pipeline
Before jumping into the code, let's understand the overall workflow.
Our program will perform the following steps:
- Load the PDF document
- Extract text from the PDF
- Split the text into smaller chunks
- Convert each chunk into embeddings
- Store the embeddings and text into ChromaDB
The result will be a vector database that represents the entire BCA Annual Report in semantic form.
Importing Required Libraries
First, we import all the libraries used in our program.
import os
import chromadb
from pypdf import PdfReader
from openai import OpenAI
from dotenv import load_dotenv
`
Explanation of Each Library
os
Used to interact with our system environment.
chromadb
Our vector database that will store embeddings.
pypdf
Used to read and extract text from PDF documents.
OpenAI client (Nebula API)
Used to generate embeddings.
dotenv
Used to securely load our API key from the .env file.
Loading Environment Variables
Next, we load the environment variables.
python
load_dotenv()
This allows our program to read the NEBULA_API_KEY stored inside the .env file.
Example .env file:
NEBULA_API_KEY=your_api_key_here
This method is important because API keys should never be hardcoded directly inside the program.
Initializing Nebula API Client
Now we initialize the API client that will communicate with Nebula.
python
client = OpenAI(
api_key=os.getenv("NEBULA_API_KEY"),
base_url="https://llm.ai-nebula.com/v1"
)
Here we use an OpenAI-compatible client, but the request is actually sent to the Nebula API endpoint.
This allows us to use Nebula infrastructure while keeping the OpenAI SDK interface.
Initializing ChromaDB
Now we prepare our vector database.
`python
db_path = "./chroma_db"
chroma_client = chromadb.PersistentClient(path=db_path)
collection = chroma_client.get_or_create_collection(
name="bca_annual_report_2025"
)
`
Explanation
PersistentClient
Creates a persistent database stored locally.
db_path
The folder where our vector database will be stored.
Collection
Similar to a table in a traditional database.
In this case we create a collection called:
bca_annual_report_2025
This collection will contain:
- document chunks
- embeddings
- document IDs
Step 1 — Reading the PDF
Now we create a function to read the entire PDF document.
`python
def read_pdf(path):
reader = PdfReader(path)
text = ""
for page in reader.pages:
text += page.extract_text() + "\n"
return text
`
What This Function Does
- Opens the PDF file
- Iterates through every page
- Extracts the text
- Combines all text into a single string
Since the BCA Annual Report contains around 600 pages, this step may take a few seconds depending on your machine.
Step 2 — Splitting the Document into Chunks
Next we split the document into smaller pieces.
`python
def chunk_text(text, chunk_size=500, overlap=50):
words = text.split()
chunks = []
for i in range(0, len(words), chunk_size - overlap):
chunk = " ".join(words[i:i+chunk_size])
chunks.append(chunk)
return chunks
`
Why Do We Need Chunking?
Because:
- Language models have context limits
- Sending the entire document into the prompt would be extremely inefficient
Instead, we break the document into smaller segments.
In this code:
chunk_size = 500 words
overlap = 50 words
The overlap helps preserve context continuity between chunks, which improves retrieval quality.
Example Structure
Chunk 1 : word 1 → word 500
Chunk 2 : word 450 → word 950
Chunk 3 : word 900 → word 1400
This technique helps prevent information loss between chunks.
Step 3 — Creating Embeddings
Now we convert each chunk into embeddings.
`python
def embed(texts):
response = client.embeddings.create(
model="text-embedding-3-large",
input=texts
)
return [e.embedding for e in response.data]
`
Embeddings are numerical vector representations of text.
Instead of storing raw text only, we convert each chunk into vectors so the database can perform semantic similarity search.
Example Concept
"bank revenue growth" → [0.021, -0.771, 0.144, ...]
Texts with similar meaning will have vectors close to each other in vector space.
This is what allows RAG systems to find relevant knowledge quickly.
Step 4 — Running the Ingestion Process
Now we run the entire ingestion pipeline.
`python
pdf_path = "source/20260212-BCA-AR-2025-ID.pdf"
if os.path.exists(pdf_path):
print("⏳ Reading PDF...")
pdf_text = read_pdf(pdf_path)
print("⏳ Creating chunks...")
chunks = chunk_text(pdf_text)
print(f"⏳ Creating embeddings for {len(chunks)} chunks...")
embeddings = embed(chunks)
collection.add(
documents=chunks,
embeddings=embeddings,
ids=[str(i) for i in range(len(chunks))]
)
print(f"✅ Success! Database saved to: {db_path}")
else:
print(f"❌ File not found: {pdf_path}")
`
Step-by-Step Explanation
Step 1 — Check if the PDF exists
python
os.path.exists(pdf_path)
This prevents errors if the file is missing.
Step 2 — Extract the text
python
pdf_text = read_pdf(pdf_path)
The entire PDF is converted into raw text.
Step 3 — Create chunks
python
chunks = chunk_text(pdf_text)
The document is split into smaller pieces.
For a 600-page report, this may generate hundreds or even thousands of chunks.
Step 4 — Generate embeddings
python
embeddings = embed(chunks)
Each chunk is converted into a vector representation using the embedding model.
Step 5 — Store everything in ChromaDB
python
collection.add(
documents=chunks,
embeddings=embeddings,
ids=[str(i) for i in range(len(chunks))]
)
We store three components:
- documents → the text chunks
- embeddings → vector representations
- ids → unique identifiers
Now our vector database is ready.
After the Ingestion

Once the ingestion process is finished, our database will contain semantic vectors for every chunk of the BCA Annual Report.

This means our system can now perform:
Semantic Search instead of traditional Keyword Search.
What Happens Next?
Now that our knowledge base has been stored inside ChromaDB, the next step is building the retrieval pipeline.
In the next part we will implement:
- Convert user question into embedding
- Search the vector database
- Retrieve the most relevant chunks
- Send them as context to Nebula API
- Generate a grounded response
This is where the actual RAG magic happens.
Nebula Lab
For those who want to build chatbots or other AI applications, you can check Nebula Lab here:
They offer more than just API access for multiple models, including various tools and features for AI development.
See you in the next part.

Top comments (2)
Some comments may only be visible to logged-in visitors. Sign in to view all comments.