qudrat ullah

Posted on Apr 16 • Originally published at qudratullah.net

Build Your First AI Search in 30 Minutes: A Complete RAG Tutorial

#rag #ai #python #machinelearning

What is RAG and Why Should You Care?

RAG stands for Retrieval-Augmented Generation. Think of it like having a super-smart assistant who can quickly search through your documents and then give you intelligent answers based on what they found.

Imagine you have thousands of company documents, and instead of spending hours searching through them manually, you could just ask questions like "What's our vacation policy?" and get instant, accurate answers. That's exactly what RAG does.

The "Retrieval" part finds relevant information, and the "Generation" part creates human-like responses using that information. It's like combining Google search with ChatGPT, but for your own data.

How RAG Works: The Simple Explanation

RAG works in three simple steps:

Store: Break your documents into small chunks and convert them into numbers (called embeddings) that computers understand
Search: When you ask a question, find the most relevant chunks from your stored data
Generate: Feed those relevant chunks to an AI model (like GPT) to create a natural answer

Think of it like organizing a massive library. Instead of browsing every book, you have a librarian who knows exactly where to find information and can summarize it for you.

Building Your First RAG System

Let's build a simple RAG system that can answer questions about your documents. We'll use Python and some helpful libraries.

Step 1: Install Required Libraries

pip install langchain openai chromadb sentence-transformers

Step 2: Prepare Your Documents

# documents.py
documents = [
    "Our company offers 20 days of paid vacation per year. Employees can carry over up to 5 unused days to the next year.",
    "The office hours are 9 AM to 6 PM, Monday through Friday. Remote work is allowed up to 3 days per week.",
    "Health insurance covers 100% of employee premiums and 80% of family member premiums.",
    "The annual performance review happens in December. Salary increases are effective from January."
]

Step 3: Create the RAG System

# rag_system.py
from langchain.text_splitter import CharacterTextSplitter
from langchain.embeddings import HuggingFaceEmbeddings
from langchain.vectorstores import Chroma
from langchain.llms import OpenAI
from langchain.chains import RetrievalQA
import os

# Set your OpenAI API key
os.environ["OPENAI_API_KEY"] = "your-api-key-here"

class SimpleRAG:
    def __init__(self, documents):
        # Split documents into smaller chunks
        text_splitter = CharacterTextSplitter(chunk_size=500, chunk_overlap=50)
        texts = text_splitter.create_documents(documents)

        # Create embeddings (convert text to numbers)
        embeddings = HuggingFaceEmbeddings()

        # Store in vector database
        self.vectorstore = Chroma.from_documents(texts, embeddings)

        # Set up the question-answering chain
        llm = OpenAI(temperature=0)
        self.qa_chain = RetrievalQA.from_chain_type(
            llm=llm,
            chain_type="stuff",
            retriever=self.vectorstore.as_retriever()
        )

    def ask_question(self, question):
        return self.qa_chain.run(question)

# Usage
from documents import documents

rag = SimpleRAG(documents)
answer = rag.ask_question("How many vacation days do I get?")
print(answer)

Step 4: Test Your RAG System

# test_rag.py
questions = [
    "How many vacation days do I get?",
    "Can I work from home?",
    "When do performance reviews happen?",
    "What does health insurance cover?"
]

for question in questions:
    print(f"Q: {question}")
    print(f"A: {rag.ask_question(question)}")
    print("-" * 50)

Making It Better: Pro Tips

1. Chunk Your Data Smartly

Don't just split text randomly. Split by paragraphs, sentences, or logical sections:

# Better text splitting
from langchain.text_splitter import RecursiveCharacterTextSplitter

text_splitter = RecursiveCharacterTextSplitter(
    chunk_size=1000,
    chunk_overlap=200,
    separators=["\n\n", "\n", " ", ""]
)

2. Use Better Embeddings

For better search results, use more powerful embedding models:

from langchain.embeddings import OpenAIEmbeddings

# More accurate but costs money
embeddings = OpenAIEmbeddings()

# Free alternative that works well
from sentence_transformers import SentenceTransformer
embeddings = HuggingFaceEmbeddings(
    model_name="sentence-transformers/all-MiniLM-L6-v2"
)

3. Add Memory for Conversations

from langchain.memory import ConversationBufferMemory

memory = ConversationBufferMemory(memory_key="chat_history")
qa_chain = RetrievalQA.from_chain_type(
    llm=llm,
    retriever=vectorstore.as_retriever(),
    memory=memory
)

Common Mistakes to Avoid

Mistake 1: Using chunks that are too big or too small

Too big: AI gets confused with too much information
Too small: Important context gets lost
Sweet spot: 500-1500 characters per chunk

Mistake 2: Not handling edge cases

def safe_ask_question(self, question):
    if not question.strip():
        return "Please ask a valid question."

    try:
        return self.qa_chain.run(question)
    except Exception as e:
        return f"Sorry, I couldn't process your question: {str(e)}"

Mistake 3: Ignoring document quality

Clean your documents first
Remove unnecessary formatting
Fix typos and inconsistencies

Scaling Up: Next Steps

Once you have a basic RAG system working, you can:

Add more document types: PDFs, Word docs, web pages
Improve the UI: Build a web interface with Streamlit or Flask
Use better databases: PostgreSQL with pgvector for production
Add authentication: Secure your system for multiple users
Monitor performance: Track which questions work well and which don't

Key Takeaways

RAG combines search and AI generation to answer questions about your documents
You need three main components: document storage, similarity search, and text generation
Start simple with basic libraries, then improve gradually
Good document chunking is crucial for accurate results
Test with real questions your users would ask
Always handle errors gracefully in production systems

RAG isn't magic, but it's incredibly powerful when done right. Start with this simple example, experiment with your own documents, and gradually add more features. Before you know it, you'll have built an AI assistant that actually knows about your specific domain.

The best part? This is just the beginning. RAG technology is evolving rapidly, and mastering the basics now will set you up for even more exciting developments ahead.

About the Author

Hi, I'm Qudrat Ullah, an Engineering Lead with 10+ years building scalable systems across fintech, media, and enterprise. I write about Node.js, cloud infrastructure, AI, and engineering leadership.

Find me online: LinkedIn · qudratullah.net

If you found this useful, share it with a fellow engineer or drop your thoughts in the comments.

Originally published at qudratullah.net.

DEV Community