Every Claude Code user hits the same wall: you ask a question about your codebase, Claude reads 5 files, burns 30K tokens, and your context window is half gone before you've written a single line of code.
I fixed this with a local RAG system. 50 lines of Python, zero API costs, 6-10x token savings on every semantic search. Here's exactly how I built it and the real numbers from a 22,000-file Unity project.
The Problem: Claude Code Eats Context for Breakfast
I work on a large Unity mobile game with 22,000+ C# files. When I ask Claude Code something like "how does the energy system handle timer refills?", here's what happens:
- Claude runs
grepfor "energy" and "timer" — finds 47 matches across 12 files - Reads
EnergyManager.cs(187 lines) — that's relevant - Reads
EnergyCountDownTimer.cs(32 lines) — also relevant - Reads
NotificationManager.cs(1,278 lines) — only 12 lines are about energy - Maybe reads another file or two just to be sure
Total: ~6,000 tokens consumed. And Claude only needed about 30 lines of code to answer the question.
Now multiply this by every question in a session. By the time you're actually implementing something, you've burned half your context on research.
The Solution: Method-Level RAG in 50 Lines
RAG (Retrieval-Augmented Generation) lets you search code by meaning, not keywords. Instead of reading entire files, you get back just the specific methods that answer your question.
Your source files → chunk by method → embed with all-MiniLM-L6-v2 → store in ChromaDB
↓
Your question → embed → similarity search → top 5 methods (with file:line metadata)
The entire system is two Python scripts, no server needed, runs 100% locally.
Step 1: index.py — Chunk and Embed Your Code
import os, re, sys
import chromadb
from sentence_transformers import SentenceTransformer
model = SentenceTransformer("all-MiniLM-L6-v2")
client = chromadb.PersistentClient(path="./chroma_db")
collection = client.get_or_create_collection("codebase", metadata={"hnsw:space": "cosine"})
# ⚠️ Change this to your project's source directory
SOURCE_DIR = os.path.expanduser("~/your-project/Assets")
# ⚠️ Change this to match your file extension (.cs, .ts, .py, etc.)
FILE_EXT = ".cs"
def extract_chunks(filepath):
"""Split a C# file into method-level chunks using brace counting."""
with open(filepath, "r", errors="ignore") as f:
lines = f.readlines()
chunks = []
current_class = ""
i = 0
while i < len(lines):
line = lines[i]
# Track current class
class_match = re.match(r'\s*(?:public|private|internal|protected)?\s*(?:abstract|static|sealed|partial)?\s*class\s+(\w+)', line)
if class_match:
current_class = class_match.group(1)
# Detect method signatures
method_match = re.match(
r'\s*(?:public|private|protected|internal|static|virtual|override|abstract|async|sealed|\[.*?\]|\s)*'
r'[\w<>\[\],\s\?]+\s+(\w+)\s*\(.*?\)',
line
)
if method_match and '{' in ''.join(lines[i:i+3]):
method_name = method_match.group(1)
start_line = i + 1
brace_count = 0
j = i
# Count braces to find method end
while j < len(lines):
brace_count += lines[j].count('{') - lines[j].count('}')
if brace_count <= 0 and '{' in ''.join(lines[i:j+1]):
break
j += 1
chunk_text = ''.join(lines[i:j+1])
end_line = j + 1
rel_path = os.path.relpath(filepath, SOURCE_DIR)
chunk_id = f"{rel_path}:{start_line}-{end_line}:{current_class}.{method_name}"
chunks.append({
"id": chunk_id,
"text": chunk_text.strip(),
"metadata": {
"file": rel_path,
"class": current_class,
"method": method_name,
"start_line": start_line,
"end_line": end_line
}
})
i = j + 1
else:
i += 1
# If no methods found, index the whole file as one chunk
if not chunks and lines:
rel_path = os.path.relpath(filepath, SOURCE_DIR)
chunks.append({
"id": f"{rel_path}:1-{len(lines)}:{current_class}.file",
"text": ''.join(lines).strip(),
"metadata": {
"file": rel_path,
"class": current_class,
"method": "file",
"start_line": 1,
"end_line": len(lines)
}
})
return chunks
def index_file(filepath):
"""Index a single file (used for incremental updates)."""
rel_path = os.path.relpath(filepath, SOURCE_DIR)
try:
existing = collection.get(where={"file": rel_path})
if existing["ids"]:
collection.delete(ids=existing["ids"])
except Exception:
pass
chunks = extract_chunks(filepath)
if not chunks:
return 0
collection.add(
ids=[c["id"] for c in chunks],
documents=[c["text"] for c in chunks],
metadatas=[c["metadata"] for c in chunks]
)
return len(chunks)
def index_all():
"""Full re-index of the entire source directory."""
all_chunks = []
for root, _, files in os.walk(SOURCE_DIR):
for fname in files:
if fname.endswith(FILE_EXT):
filepath = os.path.join(root, fname)
all_chunks.extend(extract_chunks(filepath))
BATCH = 500
for i in range(0, len(all_chunks), BATCH):
batch = all_chunks[i:i+BATCH]
collection.upsert(
ids=[c["id"] for c in batch],
documents=[c["text"] for c in batch],
metadatas=[c["metadata"] for c in batch]
)
print(f"Indexed {len(all_chunks)} chunks from {SOURCE_DIR}")
if __name__ == "__main__":
if "--single" in sys.argv:
filepath = sys.argv[sys.argv.index("--single") + 1]
count = index_file(filepath)
print(f"Re-indexed {filepath}: {count} chunks")
else:
index_all()
Customization points:
SOURCE_DIR— set this to the root of your source code (e.g.,~/my-project/srcfor TypeScript,~/my-project/Assetsfor Unity)FILE_EXT— change to.ts,.py,.go, etc. for non-C# projects- The method detection regex is C#/Java-style. For Python or Go, you'd swap the regex for
deforfuncpatterns.
Step 2: query.py — Search by Meaning
import sys
import chromadb
client = chromadb.PersistentClient(path="./chroma_db")
collection = client.get_collection("codebase")
query = " ".join(sys.argv[1:]) if len(sys.argv) > 1 else "how does gameplay work"
n_results = int(sys.argv[-1]) if sys.argv[-1].isdigit() else 5
results = collection.query(query_texts=[query], n_results=n_results)
for i, (doc, meta, dist) in enumerate(zip(
results["documents"][0],
results["metadatas"][0],
results["distances"][0]
)):
print("=" * 60)
print(f"#{i+1} | {meta['file']}:{meta['start_line']}-{meta['end_line']} | {meta['class']}.{meta['method']} | dist: {dist:.4f}")
print("=" * 60)
lines = doc.split('\n')
print('\n'.join(lines[:20]))
if len(lines) > 20:
print(f"... ({len(lines) - 20} more lines)")
print()
Step 3: Index Your Codebase
# Setup (one time)
mkdir codebase-rag && cd codebase-rag
python3 -m venv venv
source venv/bin/activate
pip install chromadb sentence-transformers
# Copy index.py and query.py into this directory
# Edit SOURCE_DIR in index.py to point to your codebase
# Full index (takes 2-3 minutes for ~20K files)
python3 index.py
# Single file re-index (< 1 second)
python3 index.py --single /path/to/YourScript.cs
My project produces 22,373 method-level chunks. The ChromaDB database is about 150MB on disk.
Real Numbers: RAG vs. Grep+Read
I ran three real queries against my production codebase and measured both approaches. These aren't cherry-picked — they're the kind of questions I ask Claude Code daily.
Query 1: "How does the energy system work with timers and refills?"
| Approach | What Claude reads | Tokens consumed |
|---|---|---|
| Grep+Read | EnergyManager.cs (187 ln) + EnergyTimer.cs (32 ln) + NotificationManager.cs (1,278 ln) | ~6,000 |
| RAG | 3 method chunks directly relevant (SetRemainingTimeOnLoad, ResetRemainingTime, CalculateRemainingTime) | ~800 |
| Savings | 7.5x |
RAG returned the exact 3 methods that answer the question. Grep+Read had to load the entire 1,278-line NotificationManager just because it mentions "energy" in 12 lines.
Query 2: "How does remote config apply settings to scriptable objects?"
| Approach | What Claude reads | Tokens consumed |
|---|---|---|
| Grep+Read | ConfigController.cs (192 ln) + RemoteSettings.cs (115 ln) + grep results | ~3,500 |
| RAG | Top result: ConfigController.ApplyRemoteValues method (104 lines — the exact answer) | ~1,200 |
| Savings | 3x |
Query 3: "How does the purchase flow handle rewards after buying a product?"
| Approach | What Claude reads | Tokens consumed |
|---|---|---|
| Grep+Read | IAPManager.cs (395 ln) + RewardController.cs (381 ln) + StoreItemView + DailyRewards | ~8,000 |
| RAG | 3 relevant chunks: RewardManager.GiveRewards, DailyRewardController.ClaimReward, StoreItemView.OnPurchase | ~860 |
| Savings | 9x |
Average savings across all queries: 6.5x
The savings are highest when the answer lives in a small method inside a large file. RAG pulls out the needle; Grep+Read gives you the whole haystack.
Integrating with Claude Code
CLAUDE.md Rule
Add this to your project's CLAUDE.md so Claude knows to use RAG first:
### RAG-First Codebase Search
For semantic questions about the codebase ("how does X work", "where is Y implemented"):
1. **Try RAG first**: `source /path/to/codebase-rag/venv/bin/activate && cd /path/to/codebase-rag && python3 query.py "your question"`
2. **If RAG returns good results** (distance < 1.0): use those file paths and line ranges
3. **If RAG misses** (distance > 1.2): fall back to Grep/Glob
RAG saves 7-10x tokens vs reading entire files. Use Grep for exact symbol searches.
Replace
/path/to/codebase-ragwith the absolute path where you created the RAG project in Step 3.
Auto-Reindex Hook
Claude Code hooks let you automatically re-index files as they get edited. Add this to your project settings at ~/.claude/projects/<your-project-hash>/settings.json:
{
"hooks": {
"PostToolUse": [
{
"matcher": "Edit|Write",
"hooks": [
{
"type": "command",
"command": "jq -r '.tool_input.file_path // .tool_response.filePath' | { read -r f; if [[ \"$f\" == *.cs ]]; then source /path/to/codebase-rag/venv/bin/activate && cd /path/to/codebase-rag && python3 index.py --single \"$f\" 2>/dev/null || true; fi; }",
"timeout": 30
}
]
}
]
}
}
Two things to customize:
- Replace
/path/to/codebase-rag(appears twice) with your RAG project path- Change
*.csto match your file extension (*.ts,*.py, etc.)Finding your project settings path: Run
claudein your project directory, then use/hooksto see where settings are loaded from. Or create the file at~/.claude/projects/-<sanitized-cwd>/settings.jsonwhere<sanitized-cwd>is your project path with/replaced by-.
Now every time Claude edits a source file, the RAG index updates in under a second. Your search results are always fresh.
When RAG Doesn't Help
RAG isn't a silver bullet. Here's when to skip it:
| Use Case | Best Tool |
|---|---|
| "How does the energy system work?" | RAG — semantic understanding |
| "Find all files that import EnergyManager" | Grep — exact string match |
| "What's on line 142 of IAPManager.cs?" | Read — direct file access |
| "Trace the full SDK init chain across 15 files" | Agent subagent — deep cross-file analysis |
The sweet spot is semantic questions about behavior, where the answer is a specific method buried in a large file.
Distance Score Guide
The distance score tells you how relevant each result is:
- < 0.8 — Excellent match, almost certainly the right code
- 0.8 - 1.0 — Good match, likely relevant
- 1.0 - 1.2 — Moderate match, worth checking
- > 1.2 — Probably noise, fall back to Grep
Why This Works So Well
The key insight is method-level chunking. Most RAG tutorials chunk by fixed character count (500 chars, 1000 chars). That breaks code in the middle of functions and loses context.
By chunking at method boundaries with brace counting, every chunk is a complete, self-contained unit of logic. The metadata (class name, method name, line numbers) lets Claude jump straight to the right location without reading the whole file.
The embedding model (all-MiniLM-L6-v2) is small (80MB) and fast — it runs locally on CPU in under 2 seconds for a query. No API calls, no costs, no latency.
Quick Start Checklist
# 1. Create project
mkdir codebase-rag && cd codebase-rag
python3 -m venv venv && source venv/bin/activate
pip install chromadb sentence-transformers
# 2. Copy index.py and query.py from this post
# 3. ⚠️ Edit SOURCE_DIR in index.py → your source root
# 4. ⚠️ Edit FILE_EXT in index.py → your file extension
# 5. Index everything
python3 index.py
# 6. Test a query
python3 query.py "how does authentication work"
# 7. ⚠️ Add RAG-first rule to your CLAUDE.md (update the path)
# 8. ⚠️ Add auto-reindex hook to project settings (update path + extension)
Total setup time: about 10 minutes. After that, every semantic search saves you thousands of tokens.
Bonus: Let Claude Code Set It Up For You
If you'd rather not do the manual setup, just paste this prompt into Claude Code and let it build the whole system for you:
Set up a local RAG system for this codebase so you can search code by meaning instead of keywords. Here's what I need:
1. Create a directory at ~/codebase-rag with a Python venv
2. Install chromadb and sentence-transformers
3. Create index.py that:
- Walks my source directory and finds all [.cs/.ts/.py] files (pick the right extension for this project)
- Splits each file into method-level chunks using brace counting (or def/func detection for Python/Go)
- Embeds chunks with all-MiniLM-L6-v2 and stores them in a local ChromaDB at ./chroma_db
- Supports --single <filepath> for incremental re-indexing of a single file
- Stores metadata: file path, class name, method name, start/end line numbers
4. Create query.py that:
- Takes a natural language query as CLI args
- Returns top 5 matching code chunks with file:line, class.method, and distance score
5. Run the full index on this project's source directory
6. Add a RAG-first search rule to my CLAUDE.md:
- For semantic questions, try RAG first via query.py
- If distance < 1.0, use those results; if > 1.2, fall back to Grep
7. Add a PostToolUse hook to my project settings that auto re-indexes any source file after Edit/Write
8. Test it with a sample query about this codebase
Use the absolute path of this project for SOURCE_DIR. The hook should filter by the correct file extension.
Claude Code will create both scripts, index your codebase, wire up the CLAUDE.md rule and the auto-reindex hook — all in one shot.
What's Next
I'm exploring a few improvements:
- Hybrid search: combine vector similarity with BM25 keyword matching for better precision
-
Multi-language support: extending the chunker for TypeScript (
function/arrow), Python (def), Go (func) - Smarter chunking: using tree-sitter for AST-based parsing instead of regex
But honestly, the simple regex + ChromaDB approach handles 90% of cases. Don't over-engineer it — the value is in the integration with your workflow, not the sophistication of the retrieval.
I write about AI-assisted development, multi-model orchestration, and developer productivity. If you found this useful, check out my other posts on local LLM setup and multi-model AI orchestration.



Top comments (0)