DEV Community

Cover image for Build Your First AI Search in 30 Minutes: A Complete RAG Tutorial
qudrat ullah
qudrat ullah

Posted on • Originally published at qudratullah.net

Build Your First AI Search in 30 Minutes: A Complete RAG Tutorial

What is RAG and Why Should You Care?

RAG stands for Retrieval-Augmented Generation. Think of it like having a super-smart assistant who can quickly search through your documents and then give you intelligent answers based on what they found.

Imagine you have thousands of company documents, and instead of spending hours searching through them manually, you could just ask questions like "What's our vacation policy?" and get instant, accurate answers. That's exactly what RAG does.

The "Retrieval" part finds relevant information, and the "Generation" part creates human-like responses using that information. It's like combining Google search with ChatGPT, but for your own data.

How RAG Works: The Simple Explanation

RAG workflow diagram showing three steps

RAG works in three simple steps:

  1. Store: Break your documents into small chunks and convert them into numbers (called embeddings) that computers understand
  2. Search: When you ask a question, find the most relevant chunks from your stored data
  3. Generate: Feed those relevant chunks to an AI model (like GPT) to create a natural answer

What are embeddings

Think of it like organizing a massive library. Instead of browsing every book, you have a librarian who knows exactly where to find information and can summarize it for you.

Building Your First RAG System

Let's build a simple RAG system that can answer questions about your documents. We'll use Python and some helpful libraries.

Step 1: Install Required Libraries

pip install langchain openai chromadb sentence-transformers
Enter fullscreen mode Exit fullscreen mode

Step 2: Prepare Your Documents

# documents.py
documents = [
    "Our company offers 20 days of paid vacation per year. Employees can carry over up to 5 unused days to the next year.",
    "The office hours are 9 AM to 6 PM, Monday through Friday. Remote work is allowed up to 3 days per week.",
    "Health insurance covers 100% of employee premiums and 80% of family member premiums.",
    "The annual performance review happens in December. Salary increases are effective from January."
]
Enter fullscreen mode Exit fullscreen mode

Step 3: Create the RAG System

RAG System Steps

# rag_system.py
from langchain.text_splitter import CharacterTextSplitter
from langchain.embeddings import HuggingFaceEmbeddings
from langchain.vectorstores import Chroma
from langchain.llms import OpenAI
from langchain.chains import RetrievalQA
import os

# Set your OpenAI API key
os.environ["OPENAI_API_KEY"] = "your-api-key-here"

class SimpleRAG:
    def __init__(self, documents):
        # Split documents into smaller chunks
        text_splitter = CharacterTextSplitter(chunk_size=500, chunk_overlap=50)
        texts = text_splitter.create_documents(documents)

        # Create embeddings (convert text to numbers)
        embeddings = HuggingFaceEmbeddings()

        # Store in vector database
        self.vectorstore = Chroma.from_documents(texts, embeddings)

        # Set up the question-answering chain
        llm = OpenAI(temperature=0)
        self.qa_chain = RetrievalQA.from_chain_type(
            llm=llm,
            chain_type="stuff",
            retriever=self.vectorstore.as_retriever()
        )

    def ask_question(self, question):
        return self.qa_chain.run(question)

# Usage
from documents import documents

rag = SimpleRAG(documents)
answer = rag.ask_question("How many vacation days do I get?")
print(answer)
Enter fullscreen mode Exit fullscreen mode

Step 4: Test Your RAG System

# test_rag.py
questions = [
    "How many vacation days do I get?",
    "Can I work from home?",
    "When do performance reviews happen?",
    "What does health insurance cover?"
]

for question in questions:
    print(f"Q: {question}")
    print(f"A: {rag.ask_question(question)}")
    print("-" * 50)
Enter fullscreen mode Exit fullscreen mode

Making It Better: Pro Tips

1. Chunk Your Data Smartly

Don't just split text randomly. Split by paragraphs, sentences, or logical sections:

# Better text splitting
from langchain.text_splitter import RecursiveCharacterTextSplitter

text_splitter = RecursiveCharacterTextSplitter(
    chunk_size=1000,
    chunk_overlap=200,
    separators=["\n\n", "\n", " ", ""]
)
Enter fullscreen mode Exit fullscreen mode

2. Use Better Embeddings

For better search results, use more powerful embedding models:

from langchain.embeddings import OpenAIEmbeddings

# More accurate but costs money
embeddings = OpenAIEmbeddings()

# Free alternative that works well
from sentence_transformers import SentenceTransformer
embeddings = HuggingFaceEmbeddings(
    model_name="sentence-transformers/all-MiniLM-L6-v2"
)
Enter fullscreen mode Exit fullscreen mode

3. Add Memory for Conversations

from langchain.memory import ConversationBufferMemory

memory = ConversationBufferMemory(memory_key="chat_history")
qa_chain = RetrievalQA.from_chain_type(
    llm=llm,
    retriever=vectorstore.as_retriever(),
    memory=memory
)
Enter fullscreen mode Exit fullscreen mode

Common Mistakes to Avoid

Mistake 1: Using chunks that are too big or too small

  • Too big: AI gets confused with too much information
  • Too small: Important context gets lost
  • Sweet spot: 500-1500 characters per chunk

Mistake 2: Not handling edge cases

def safe_ask_question(self, question):
    if not question.strip():
        return "Please ask a valid question."

    try:
        return self.qa_chain.run(question)
    except Exception as e:
        return f"Sorry, I couldn't process your question: {str(e)}"
Enter fullscreen mode Exit fullscreen mode

Mistake 3: Ignoring document quality

  • Clean your documents first
  • Remove unnecessary formatting
  • Fix typos and inconsistencies

Scaling Up: Next Steps

Once you have a basic RAG system working, you can:

  1. Add more document types: PDFs, Word docs, web pages
  2. Improve the UI: Build a web interface with Streamlit or Flask
  3. Use better databases: PostgreSQL with pgvector for production
  4. Add authentication: Secure your system for multiple users
  5. Monitor performance: Track which questions work well and which don't

Key Takeaways

  • RAG combines search and AI generation to answer questions about your documents
  • You need three main components: document storage, similarity search, and text generation
  • Start simple with basic libraries, then improve gradually
  • Good document chunking is crucial for accurate results
  • Test with real questions your users would ask
  • Always handle errors gracefully in production systems

RAG isn't magic, but it's incredibly powerful when done right. Start with this simple example, experiment with your own documents, and gradually add more features. Before you know it, you'll have built an AI assistant that actually knows about your specific domain.

The best part? This is just the beginning. RAG technology is evolving rapidly, and mastering the basics now will set you up for even more exciting developments ahead.


About the Author

Hi, I'm Qudrat Ullah, an Engineering Lead with 10+ years building scalable systems across fintech, media, and enterprise. I write about Node.js, cloud infrastructure, AI, and engineering leadership.

Find me online: LinkedIn · qudratullah.net

If you found this useful, share it with a fellow engineer or drop your thoughts in the comments.

Originally published at qudratullah.net.

Top comments (0)