Zafer Dace

Posted on Apr 10

I Built a 50-Line RAG System That Saves Me 10x Tokens in Claude Code

#rag #ai #productivity #python

Every Claude Code user hits the same wall: you ask a question about your codebase, Claude reads 5 files, burns 30K tokens, and your context window is half gone before you've written a single line of code.

I fixed this with a local RAG system. 50 lines of Python, zero API costs, 6-10x token savings on every semantic search. Here's exactly how I built it and the real numbers from a 22,000-file Unity project.

The Problem: Claude Code Eats Context for Breakfast

I work on a large Unity mobile game with 22,000+ C# files. When I ask Claude Code something like "how does the energy system handle timer refills?", here's what happens:

Claude runs grep for "energy" and "timer" — finds 47 matches across 12 files
Reads EnergyManager.cs (187 lines) — that's relevant
Reads EnergyCountDownTimer.cs (32 lines) — also relevant
Reads NotificationManager.cs (1,278 lines) — only 12 lines are about energy
Maybe reads another file or two just to be sure

Total: ~6,000 tokens consumed. And Claude only needed about 30 lines of code to answer the question.

Now multiply this by every question in a session. By the time you're actually implementing something, you've burned half your context on research.

The Solution: Method-Level RAG in 50 Lines

RAG (Retrieval-Augmented Generation) lets you search code by meaning, not keywords. Instead of reading entire files, you get back just the specific methods that answer your question.

Your source files → chunk by method → embed with all-MiniLM-L6-v2 → store in ChromaDB
                                                                           ↓
Your question → embed → similarity search → top 5 methods (with file:line metadata)

The entire system is two Python scripts, no server needed, runs 100% locally.

Step 1: index.py — Chunk and Embed Your Code

import os, re, sys
import chromadb
from sentence_transformers import SentenceTransformer

model = SentenceTransformer("all-MiniLM-L6-v2")
client = chromadb.PersistentClient(path="./chroma_db")
collection = client.get_or_create_collection("codebase", metadata={"hnsw:space": "cosine"})

# ⚠️ Change this to your project's source directory
SOURCE_DIR = os.path.expanduser("~/your-project/Assets")

# ⚠️ Change this to match your file extension (.cs, .ts, .py, etc.)
FILE_EXT = ".cs"

def extract_chunks(filepath):
    """Split a C# file into method-level chunks using brace counting."""
    with open(filepath, "r", errors="ignore") as f:
        lines = f.readlines()

    chunks = []
    current_class = ""
    i = 0

    while i < len(lines):
        line = lines[i]

        # Track current class
        class_match = re.match(r'\s*(?:public|private|internal|protected)?\s*(?:abstract|static|sealed|partial)?\s*class\s+(\w+)', line)
        if class_match:
            current_class = class_match.group(1)

        # Detect method signatures
        method_match = re.match(
            r'\s*(?:public|private|protected|internal|static|virtual|override|abstract|async|sealed|\[.*?\]|\s)*'
            r'[\w<>\[\],\s\?]+\s+(\w+)\s*\(.*?\)',
            line
        )

        if method_match and '{' in ''.join(lines[i:i+3]):
            method_name = method_match.group(1)
            start_line = i + 1
            brace_count = 0
            j = i

            # Count braces to find method end
            while j < len(lines):
                brace_count += lines[j].count('{') - lines[j].count('}')
                if brace_count <= 0 and '{' in ''.join(lines[i:j+1]):
                    break
                j += 1

            chunk_text = ''.join(lines[i:j+1])
            end_line = j + 1

            rel_path = os.path.relpath(filepath, SOURCE_DIR)
            chunk_id = f"{rel_path}:{start_line}-{end_line}:{current_class}.{method_name}"

            chunks.append({
                "id": chunk_id,
                "text": chunk_text.strip(),
                "metadata": {
                    "file": rel_path,
                    "class": current_class,
                    "method": method_name,
                    "start_line": start_line,
                    "end_line": end_line
                }
            })
            i = j + 1
        else:
            i += 1

    # If no methods found, index the whole file as one chunk
    if not chunks and lines:
        rel_path = os.path.relpath(filepath, SOURCE_DIR)
        chunks.append({
            "id": f"{rel_path}:1-{len(lines)}:{current_class}.file",
            "text": ''.join(lines).strip(),
            "metadata": {
                "file": rel_path,
                "class": current_class,
                "method": "file",
                "start_line": 1,
                "end_line": len(lines)
            }
        })

    return chunks


def index_file(filepath):
    """Index a single file (used for incremental updates)."""
    rel_path = os.path.relpath(filepath, SOURCE_DIR)
    try:
        existing = collection.get(where={"file": rel_path})
        if existing["ids"]:
            collection.delete(ids=existing["ids"])
    except Exception:
        pass

    chunks = extract_chunks(filepath)
    if not chunks:
        return 0

    collection.add(
        ids=[c["id"] for c in chunks],
        documents=[c["text"] for c in chunks],
        metadatas=[c["metadata"] for c in chunks]
    )
    return len(chunks)


def index_all():
    """Full re-index of the entire source directory."""
    all_chunks = []
    for root, _, files in os.walk(SOURCE_DIR):
        for fname in files:
            if fname.endswith(FILE_EXT):
                filepath = os.path.join(root, fname)
                all_chunks.extend(extract_chunks(filepath))

    BATCH = 500
    for i in range(0, len(all_chunks), BATCH):
        batch = all_chunks[i:i+BATCH]
        collection.upsert(
            ids=[c["id"] for c in batch],
            documents=[c["text"] for c in batch],
            metadatas=[c["metadata"] for c in batch]
        )

    print(f"Indexed {len(all_chunks)} chunks from {SOURCE_DIR}")


if __name__ == "__main__":
    if "--single" in sys.argv:
        filepath = sys.argv[sys.argv.index("--single") + 1]
        count = index_file(filepath)
        print(f"Re-indexed {filepath}: {count} chunks")
    else:
        index_all()

Customization points:

SOURCE_DIR — set this to the root of your source code (e.g., ~/my-project/src for TypeScript, ~/my-project/Assets for Unity)

FILE_EXT — change to .ts, .py, .go, etc. for non-C# projects

The method detection regex is C#/Java-style. For Python or Go, you'd swap the regex for def or func patterns.

Step 2: query.py — Search by Meaning

import sys
import chromadb

client = chromadb.PersistentClient(path="./chroma_db")
collection = client.get_collection("codebase")

query = " ".join(sys.argv[1:]) if len(sys.argv) > 1 else "how does gameplay work"
n_results = int(sys.argv[-1]) if sys.argv[-1].isdigit() else 5

results = collection.query(query_texts=[query], n_results=n_results)

for i, (doc, meta, dist) in enumerate(zip(
    results["documents"][0],
    results["metadatas"][0],
    results["distances"][0]
)):
    print("=" * 60)
    print(f"#{i+1} | {meta['file']}:{meta['start_line']}-{meta['end_line']} | {meta['class']}.{meta['method']} | dist: {dist:.4f}")
    print("=" * 60)
    lines = doc.split('\n')
    print('\n'.join(lines[:20]))
    if len(lines) > 20:
        print(f"... ({len(lines) - 20} more lines)")
    print()

Step 3: Index Your Codebase

# Setup (one time)
mkdir codebase-rag && cd codebase-rag
python3 -m venv venv
source venv/bin/activate
pip install chromadb sentence-transformers

# Copy index.py and query.py into this directory
# Edit SOURCE_DIR in index.py to point to your codebase

# Full index (takes 2-3 minutes for ~20K files)
python3 index.py

# Single file re-index (< 1 second)
python3 index.py --single /path/to/YourScript.cs

My project produces 22,373 method-level chunks. The ChromaDB database is about 150MB on disk.

Real Numbers: RAG vs. Grep+Read

I ran three real queries against my production codebase and measured both approaches. These aren't cherry-picked — they're the kind of questions I ask Claude Code daily.

Query 1: "How does the energy system work with timers and refills?"

Approach	What Claude reads	Tokens consumed
Grep+Read	EnergyManager.cs (187 ln) + EnergyTimer.cs (32 ln) + NotificationManager.cs (1,278 ln)	~6,000
RAG	3 method chunks directly relevant (SetRemainingTimeOnLoad, ResetRemainingTime, CalculateRemainingTime)	~800
Savings		7.5x

RAG returned the exact 3 methods that answer the question. Grep+Read had to load the entire 1,278-line NotificationManager just because it mentions "energy" in 12 lines.

Query 2: "How does remote config apply settings to scriptable objects?"

Approach	What Claude reads	Tokens consumed
Grep+Read	ConfigController.cs (192 ln) + RemoteSettings.cs (115 ln) + grep results	~3,500
RAG	Top result: ConfigController.ApplyRemoteValues method (104 lines — the exact answer)	~1,200
Savings		3x

Query 3: "How does the purchase flow handle rewards after buying a product?"

Approach	What Claude reads	Tokens consumed
Grep+Read	IAPManager.cs (395 ln) + RewardController.cs (381 ln) + StoreItemView + DailyRewards	~8,000
RAG	3 relevant chunks: RewardManager.GiveRewards, DailyRewardController.ClaimReward, StoreItemView.OnPurchase	~860
Savings		9x

Average savings across all queries: 6.5x

The savings are highest when the answer lives in a small method inside a large file. RAG pulls out the needle; Grep+Read gives you the whole haystack.

Integrating with Claude Code

CLAUDE.md Rule

Add this to your project's CLAUDE.md so Claude knows to use RAG first:

### RAG-First Codebase Search

For semantic questions about the codebase ("how does X work", "where is Y implemented"):

1. **Try RAG first**: `source /path/to/codebase-rag/venv/bin/activate && cd /path/to/codebase-rag && python3 query.py "your question"`
2. **If RAG returns good results** (distance < 1.0): use those file paths and line ranges
3. **If RAG misses** (distance > 1.2): fall back to Grep/Glob

RAG saves 7-10x tokens vs reading entire files. Use Grep for exact symbol searches.

Replace /path/to/codebase-rag with the absolute path where you created the RAG project in Step 3.

Auto-Reindex Hook

Claude Code hooks let you automatically re-index files as they get edited. Add this to your project settings at ~/.claude/projects/<your-project-hash>/settings.json:

{
  "hooks": {
    "PostToolUse": [
      {
        "matcher": "Edit|Write",
        "hooks": [
          {
            "type": "command",
            "command": "jq -r '.tool_input.file_path // .tool_response.filePath' | { read -r f; if [[ \"$f\" == *.cs ]]; then source /path/to/codebase-rag/venv/bin/activate && cd /path/to/codebase-rag && python3 index.py --single \"$f\" 2>/dev/null || true; fi; }",
            "timeout": 30
          }
        ]
      }
    ]
  }
}

Two things to customize:

Replace /path/to/codebase-rag (appears twice) with your RAG project path

Change *.cs to match your file extension (*.ts, *.py, etc.)

Finding your project settings path: Run claude in your project directory, then use /hooks to see where settings are loaded from. Or create the file at ~/.claude/projects/-<sanitized-cwd>/settings.json where <sanitized-cwd> is your project path with / replaced by -.

Now every time Claude edits a source file, the RAG index updates in under a second. Your search results are always fresh.

When RAG Doesn't Help

RAG isn't a silver bullet. Here's when to skip it:

Use Case	Best Tool
"How does the energy system work?"	RAG — semantic understanding
"Find all files that import EnergyManager"	Grep — exact string match
"What's on line 142 of IAPManager.cs?"	Read — direct file access
"Trace the full SDK init chain across 15 files"	Agent subagent — deep cross-file analysis

The sweet spot is semantic questions about behavior, where the answer is a specific method buried in a large file.

Distance Score Guide

The distance score tells you how relevant each result is:

< 0.8 — Excellent match, almost certainly the right code
0.8 - 1.0 — Good match, likely relevant
1.0 - 1.2 — Moderate match, worth checking
> 1.2 — Probably noise, fall back to Grep

Why This Works So Well

The key insight is method-level chunking. Most RAG tutorials chunk by fixed character count (500 chars, 1000 chars). That breaks code in the middle of functions and loses context.

By chunking at method boundaries with brace counting, every chunk is a complete, self-contained unit of logic. The metadata (class name, method name, line numbers) lets Claude jump straight to the right location without reading the whole file.

The embedding model (all-MiniLM-L6-v2) is small (80MB) and fast — it runs locally on CPU in under 2 seconds for a query. No API calls, no costs, no latency.

Quick Start Checklist

# 1. Create project
mkdir codebase-rag && cd codebase-rag
python3 -m venv venv && source venv/bin/activate
pip install chromadb sentence-transformers

# 2. Copy index.py and query.py from this post

# 3. ⚠️ Edit SOURCE_DIR in index.py → your source root
# 4. ⚠️ Edit FILE_EXT in index.py → your file extension

# 5. Index everything
python3 index.py

# 6. Test a query
python3 query.py "how does authentication work"

# 7. ⚠️ Add RAG-first rule to your CLAUDE.md (update the path)
# 8. ⚠️ Add auto-reindex hook to project settings (update path + extension)

Total setup time: about 10 minutes. After that, every semantic search saves you thousands of tokens.

Bonus: Let Claude Code Set It Up For You

If you'd rather not do the manual setup, just paste this prompt into Claude Code and let it build the whole system for you:

Set up a local RAG system for this codebase so you can search code by meaning instead of keywords. Here's what I need:

1. Create a directory at ~/codebase-rag with a Python venv
2. Install chromadb and sentence-transformers
3. Create index.py that:
   - Walks my source directory and finds all [.cs/.ts/.py] files (pick the right extension for this project)
   - Splits each file into method-level chunks using brace counting (or def/func detection for Python/Go)
   - Embeds chunks with all-MiniLM-L6-v2 and stores them in a local ChromaDB at ./chroma_db
   - Supports --single <filepath> for incremental re-indexing of a single file
   - Stores metadata: file path, class name, method name, start/end line numbers
4. Create query.py that:
   - Takes a natural language query as CLI args
   - Returns top 5 matching code chunks with file:line, class.method, and distance score
5. Run the full index on this project's source directory
6. Add a RAG-first search rule to my CLAUDE.md:
   - For semantic questions, try RAG first via query.py
   - If distance < 1.0, use those results; if > 1.2, fall back to Grep
7. Add a PostToolUse hook to my project settings that auto re-indexes any source file after Edit/Write
8. Test it with a sample query about this codebase

Use the absolute path of this project for SOURCE_DIR. The hook should filter by the correct file extension.

Claude Code will create both scripts, index your codebase, wire up the CLAUDE.md rule and the auto-reindex hook — all in one shot.

What's Next

I'm exploring a few improvements:

Hybrid search: combine vector similarity with BM25 keyword matching for better precision
Multi-language support: extending the chunker for TypeScript (function/arrow), Python (def), Go (func)
Smarter chunking: using tree-sitter for AST-based parsing instead of regex

But honestly, the simple regex + ChromaDB approach handles 90% of cases. Don't over-engineer it — the value is in the integration with your workflow, not the sophistication of the retrieval.

I write about AI-assisted development, multi-model orchestration, and developer productivity. If you found this useful, check out my other posts on local LLM setup and multi-model AI orchestration.

DEV Community