Build a RAG System in 50 Lines of Python

#ai #machinelearning #python #tutorial

Retrieval-Augmented Generation (RAG) sounds complex, but the core idea is simple: give your LLM access to your own documents. Here's how to build one in 50 lines.

What is RAG?

Instead of relying solely on the LLM's training data, RAG retrieves relevant documents first, then feeds them as context to the LLM. This means your AI can answer questions about YOUR data.

The Setup

pip install openai chromadb sentence-transformers

The Code

import chromadb
from sentence_transformers import SentenceTransformer
import openai

# 1. Initialize embedding model and vector DB
embedder = SentenceTransformer("all-MiniLM-L6-v2")
client = chromadb.Client()
collection = client.create_collection("docs")

# 2. Add your documents
docs = [
    "Python 3.12 introduced type parameter syntax.",
    "FastAPI is built on Starlette and Pydantic.",
    "Docker containers share the host OS kernel.",
    "PostgreSQL supports JSONB for document storage.",
    "Redis can be used as a message broker with Pub/Sub.",
]

embeddings = embedder.encode(docs).tolist()
collection.add(
    documents=docs,
    embeddings=embeddings,
    ids=[f"doc_{i}" for i in range(len(docs))]
)

# 3. Query function
def ask(question, n_results=2):
    q_embedding = embedder.encode([question]).tolist()
    results = collection.query(query_embeddings=q_embedding, n_results=n_results)
    context = "\n".join(results["documents"][0])

    response = openai.chat.completions.create(
        model="gpt-4",
        messages=[
            {"role": "system", "content": f"Answer based on this context:\n{context}"},
            {"role": "user", "content": question}
        ]
    )
    return response.choices[0].message.content

# 4. Use it
print(ask("What is FastAPI built on?"))

How It Works

Documents are converted to vectors (embeddings)
When you ask a question, it's also converted to a vector
ChromaDB finds the most similar documents
Those documents are passed as context to the LLM
The LLM answers based on YOUR data, not just its training

Scaling Up

For production, swap in:

Pinecone/Weaviate instead of ChromaDB for persistence
Chunking for large documents (split into 500-token chunks)
Reranking to improve retrieval quality

But this 50-line version is enough to understand the concept and prototype quickly.

🚀 Level up your AI workflow! Check out my AI Developer Mega Prompt Pack — 80 battle-tested prompts for developers. $9.99

DEV Community