I'm starting a "Crawl, walk, run" series of posts on various topics and decided to start with Retrieval-Augmented Generation (RAG). Learn the basics and progress to a production-ready system!
Crawl
In this phase of your journey, we're going to learn about the core concepts of a Retrieval-Augmented Generation (RAG) system and then apply them in a simple example.
We're going to build a Human Resources (HR) agent that can help answer and navigate HR-related questions. Using the Government of British Columbia's HR Policy PDFs as our knowledge base, we will process, chunk, and embed the documents into a local vector database. This allows the agent to provided grounded answers and ensures that every response is rooted directly in the ingested BC government policies.
Why RAG?
RAG is a very common design pattern that turns a standard LLM into an informed AI agent. Standard models can be a "black box", but RAG gives your agent an "open-book test". It bypasses knowledge cutoffs by linking directly to your documents, providing factual grounding and citations. No fine-tuning is required, data can be updated quickly. RAG provides a real-time bridge between your LLM and your data.
This post will focus primarily on indexing and retrieval of the your data. Let's get started!
How do you eat an elephant? One bite at a time
It's not feasible to have to feed all the information into the AI every time you want to ask it a question. Instead, it is broken down into smaller, more manageable pieces called chunks, which the AI can process and retrieve efficiently.
We will use a "recursive character chunking" strategy which is a fast and smart and will try to split at natural boundaries like paragraphs, but can still cut off mid-sentence if the chunk is too big. An overlap is used to ensure that context isn't lost at the edges of a the cut if a split does occur.
Snippet of code used for splitting & chunking using LangChain:
from langchain_community.document_loaders import PyPDFLoader, DirectoryLoader
from langchain_text_splitters import RecursiveCharacterTextSplitter
loader = DirectoryLoader(
DATA_DIR,
glob="./**/*.pdf",
loader_cls=PyPDFLoader
)
docs = loader.load()
text_splitter = RecursiveCharacterTextSplitter(
chunk_size=1000,
chunk_overlap=100
)
chunks = text_splitter.split_documents(docs)
Recursive character chunking is the successor to "fixed-sized chunking", which is just a fixed sliding window. Here, you always need the overlap because you never know how much of what sentence you're cutting off.
What's the vector representation of 'Life'?
The embedding process transforms text chunks into vectors, which are mathematical arrays of floating-point numbers that capture semantic meaning. However, higher dimensionality won't necessarily generate better results. For simple or straightforward documents, expanding the vector size often introduces latency and computational overhead without providing better search accuracy.
Two sides of the same coin: Indexing & retrieval
Indexing and retrieval are two parts of the same conversation, and you must use the same embedding model for both. This is important because every embedding model puts emphasis on words in a sentence differently. One might prioritize the subject, while another might prioritize the action, which would yield different results.
Selection the appropriate embedding type is also very important. The documents that you embed and index are usually a long, structured documents where the focus is on the information it provides. This is in contrast to the user queries which are usually short, messy text, so the retrieval process focuses on the information it is looking for.
Finding a match
Once your user query is embedded, the RAG system performs a similarity search against the vector database to identify the most relevant answers. In most vector databases, this is calculated using cosine similarity. This metric focuses exclusively on the angle between vectors rather than their magnitude; it measures how closely the semantic "intent" (angle) of the query aligns with the document, regardless of the text's length or word frequency. This is important because it means the AI can recognize that a short question and a long technical manual can share the same intent even if their scale (magnitude) is completely different.
Putting it all together
Link to my GitHub repository → here
To handle HR questions, I'm building an agent using Google's Agent Development Kit (ADK) that connects directly to this RAG system:
from .tools import query_hr
hr_rag_tool = FunctionTool(func=query_hr)
hr_agent = LlmAgent(
name="hr_agent",
model="gemini-3.1-pro-preview",
description="Specialist in company HR policies and procedures.",
instruction=(
"You are a professional HR assistant. Your goal is to answer questions "
"using ONLY the information retrieved from the 'query_hr' tool. "
"When calling the 'query_hr' tool, ensure all string arguments are properly formatted as standard JSON strings with double quotes.\n\n"
"RULES:\n"
"1. If the tool returns relevant information, summarize it clearly.\n"
"2. You MUST cite your sources using the format: (Source: [Source Name], Page: [Page Number]).\n"
"3. If the tool results do not contain the answer, state: 'I'm sorry, I couldn't find that in our HR documents.'\n"
"4. Do not use outside knowledge or make up facts about company policy."
),
tools=[query_hr],
)
By giving the agent clear instructions and the right tools to search our vector database, it should be able to pull precise answers for users in seconds:
It answers questions reliably, but if I'm being honest, I can't help but feel we're only scratching the surface of the "full" answers that we're looking for.
Next steps
We have a working prototype, but there's still plenty of room to grow. To transform this from a simple RAG system into a high performance engine, our next steps will focus on precision. We'll refine how we process and chunk documents and introduce a reranking layer to our search results to significantly boost the quality of the agent's responses.
Additional learning
If you haven't used Agent Development Kit yet, but would like to learn more, checkout this Codelab: "ADK Crash Course - From Beginner to Expert" (it comes with a link to claim some free GCP credits to get you through the course).




Top comments (2)
Great intro to the crawl-walk-run approach for RAG. The emphasis on using the same embedding model for both indexing and retrieval is something I see people trip over constantly -- mixing models silently degrades retrieval quality and it's really hard to debug.
One thing from my experience building RAG over financial documents (SEC 13F filings, earnings reports): chunk_size=1000 works well for policy docs, but for structured financial tables you often need a different chunking strategy. Tables get mangled by recursive character splitting. We ended up writing a custom chunker that detects tabular data and keeps rows together as atomic units before embedding.
Also curious about the "walk" phase -- are you planning to cover hybrid search (dense vector + sparse BM25) or reranking with cross-encoders? Those two were the biggest quality jumps in our production pipeline.
Looking forward to the rest of the series!
Yes! Pretty much spot on. Though I will be using another open source tool which I love for document processing. "Walk" comes out Tuesday next week followed by "Run" the Tuesday after!