Everyone's talking about RAG. Before I built one myself, it sounded really fancy. I knew it had something to do with providing documents to AI, and honestly, I thought it was badly named. I still think they should have picked a better name.
But now that I've built one from scratch, I get it. And it's actually simpler than I expected.
This article is for anyone who is new to RAG and wants to understand what it actually does — not the theory, but by building a working agent.
So What is RAG Actually?
RAG stands for Retrieval Augmented Generation. Fancy name, simple idea.
Instead of just asking an LLM a question and hoping it knows the answer, you set it up with context beforehand. You provide your documents, store them in a special database, and then when you ask questions, the agent searches YOUR documents first and answers based on what it finds.
In plain words: RAG lets you chat with your own documents. That's it.
The Architecture
If you've built any AI agent before, you already know the pattern:
- Config → settings in one place
- Tools → what the agent can do
- Brain → the decision loop
My RAG agent has the same structure:
- config.py → API keys, chunk size, model settings
- loader.py → reads PDFs and splits them into chunks
- embeddings.py → stores chunks in a vector database
- retriever.py → the search tool
- agent.py → the ReACT loop that decides when to search and when to answer
The only new thing compared to my previous agents was the vector database. Everything else was the same pattern.
The Part That Blew My Mind: Vector Databases
This is where it gets cool.
A normal database stores and retrieves data as-is. If you search for "dog", you get rows that contain the word "dog." Nothing else.
A vector database is different. It stores data by meaning. Related information gets stored as similar numbers. So if you search for "dog", you might also get results about pets, puppies, and furry animals — because they mean similar things.
Think of it like this:
- Normal DB: SELECT * FROM animals WHERE name = 'dog' → only dogs
- Vector DB: search("dog") → dogs, puppies, pets, furry animals
This is what makes RAG powerful. You don't need to use the exact words from the document. You search by meaning.
How does it do this? The vector database uses an embedding model that was trained on billions of sentences. It learned that words appearing in similar contexts have similar meanings. So "dog" and "puppy" get similar numbers because they show up in the same kinds of sentences.
I want to go deeper into how embeddings work at some point — it's fascinating.
Why Chunking Matters
You can't just throw a whole PDF into the database. A 60-page paper covers many topics. If you store it as one big blob, the vector becomes an average of everything — good at nothing specific.
So you split the document into small chunks (I used 500 words each). Each chunk gets its own vector. When someone asks a question, only the most relevant chunks get retrieved — maybe 5 out of 58. Less noise, better answers, lower cost.
One important detail: chunks need to overlap. I used 50 words of overlap. Without it, a sentence like "the treatment showed 95% efficacy" might get split across two chunks, and neither chunk alone makes sense.
What I'd Build Next
I want to add web search alongside RAG. Imagine asking a question and the agent first checks your documents, then compares that with what the web says. You get both what your document says and what the world says.
Even simpler — if the answer isn't in your documents, the agent could ask "Want me to check the web?" That's just adding more tools to the same architecture. Same brain, more hands.
Try It Yourself
The full code is on my GitHub: github.com/familyguyfg/rag-document-chat
To run it:
- Clone the repo
- Drop your PDFs in the documents folder
- Run python3 agent.py
- Start asking questions
If you're new to building agents, RAG is a great project to learn.
The concept sounds complex but once you understand that it's just
"search your documents before asking the LLM" — everything clicks.
Top comments (0)