DEV Community

郑沛沛
郑沛沛

Posted on

Build a RAG System in 50 Lines of Python

Retrieval-Augmented Generation (RAG) sounds complex, but the core idea is simple: give your LLM access to your own documents. Here's how to build one in 50 lines.

What is RAG?

Instead of relying solely on the LLM's training data, RAG retrieves relevant documents first, then feeds them as context to the LLM. This means your AI can answer questions about YOUR data.

The Setup

pip install openai chromadb sentence-transformers
Enter fullscreen mode Exit fullscreen mode

The Code

import chromadb
from sentence_transformers import SentenceTransformer
import openai

# 1. Initialize embedding model and vector DB
embedder = SentenceTransformer("all-MiniLM-L6-v2")
client = chromadb.Client()
collection = client.create_collection("docs")

# 2. Add your documents
docs = [
    "Python 3.12 introduced type parameter syntax.",
    "FastAPI is built on Starlette and Pydantic.",
    "Docker containers share the host OS kernel.",
    "PostgreSQL supports JSONB for document storage.",
    "Redis can be used as a message broker with Pub/Sub.",
]

embeddings = embedder.encode(docs).tolist()
collection.add(
    documents=docs,
    embeddings=embeddings,
    ids=[f"doc_{i}" for i in range(len(docs))]
)

# 3. Query function
def ask(question, n_results=2):
    q_embedding = embedder.encode([question]).tolist()
    results = collection.query(query_embeddings=q_embedding, n_results=n_results)
    context = "\n".join(results["documents"][0])

    response = openai.chat.completions.create(
        model="gpt-4",
        messages=[
            {"role": "system", "content": f"Answer based on this context:\n{context}"},
            {"role": "user", "content": question}
        ]
    )
    return response.choices[0].message.content

# 4. Use it
print(ask("What is FastAPI built on?"))
Enter fullscreen mode Exit fullscreen mode

How It Works

  1. Documents are converted to vectors (embeddings)
  2. When you ask a question, it's also converted to a vector
  3. ChromaDB finds the most similar documents
  4. Those documents are passed as context to the LLM
  5. The LLM answers based on YOUR data, not just its training

Scaling Up

For production, swap in:

  • Pinecone/Weaviate instead of ChromaDB for persistence
  • Chunking for large documents (split into 500-token chunks)
  • Reranking to improve retrieval quality

But this 50-line version is enough to understand the concept and prototype quickly.

🚀 Level up your AI workflow! Check out my AI Developer Mega Prompt Pack — 80 battle-tested prompts for developers. $9.99

Top comments (0)