DEV Community

Frank Oge
Frank Oge

Posted on

Your AI is lying to you: How to fix hallucinations with a simple RAG system


We have all been there. You ask an LLM (Large Language Model) a specific question about your company's policy or a document you wrote last week, and it confidently gives you a completely wrong answer.

​It’s not broken. It just doesn't know you.

​ChatGPT, Claude, and Llama are trained on the public internet. They don't have access to your private Google Drive, your Notion docs, or your customer support logs.

​This is where RAG (Retrieval-Augmented Generation) comes in.

​It sounds fancy, but it’s actually a very simple concept. It’s the difference between taking a test from memory vs. taking an open-book test. RAG is simply giving the AI the textbook before asking it the question.

​Here is how you build one from scratch, without getting lost in complex jargon.

​The Architecture: Three Simple Steps

​Forget the complex diagrams for a second. A RAG system does three things:

​Index: It reads your data and organizes it.

​Retrieve: It finds the relevant page when you ask a question.

​Generate: It sends that page + your question to the LLM to write the answer.

​Step 1: The "Embedding" (Turning words into numbers)

​Computers don't understand English; they understand math. To search through your documents, we need to convert your text into a list of numbers called a Vector.

​Imagine a map.

​"Dog" and "Puppy" are close together on the map.

​"Dog" and "Sandwich" are far apart.

​We use an Embedding Model (like OpenAI’s text-embedding-3-small or open-source alternatives) to turn your PDFs into these coordinates. We store these numbers in a Vector Database (like ChromaDB, Pinecone, or even Postgres with pgvector).

​Step 2: The Retrieval (The Librarian)

​Now, when a user asks: "What is our refund policy?"

​We don't send that straight to the LLM. First, we convert that question into numbers (vectors) too.

​Then, we search our database: Which document is mathematically closest to this question?

​The database replies: "Hey, 'Refund_Policy_2025.pdf' is a 95% match."

​Step 3: The Generation (The Magic Trick)

​This is the part that feels like magic, but it’s just prompt engineering.

​We take the document we found in Step 2, and we paste it into a prompt that looks like this:

​"You are a helpful assistant. Answer the user's question using ONLY the context provided below.

​Context: [Insert text from Refund_Policy_2025.pdf]

​User Question: What is our refund policy?"

​Now, the AI isn't hallucinating. It’s summarizing the text you just gave it.

​Why build from scratch?

​You might ask, "Frank, why not just use a tool that does this for me?"

​Because when it breaks (and it will), you need to know where it broke.

​Did it fail to find the document? (Bad Embeddings)

​Did it find the document but fail to answer? (Bad LLM)

​Is the data messy? (Bad Ingestion)

​Building the basic pipeline yourself, even just a simple Python script, gives you the intuition to debug the complex systems later.

​Final Thoughts

​RAG isn't just a trend; it's the standard for how businesses will interact with AI. We are moving away from "Chat with a bot" to "Chat with your data."

​If you are a developer in 2026, understanding this flow is as important as understanding how a database works.

Frank Oge

Top comments (0)