Originally published at chudi.dev
I was debugging the same authentication error for the third time this month.
Same error. Same root cause. Same fix.
Claude Code had solved this exact problem two weeks ago—but it didn't remember. Each session starts fresh. No memory of what worked, what failed, or what patterns emerged.
That's a massive waste of debugging time.
So I built a system to fix it.
The Problem With Stateless AI
Claude Code is powerful, but it has a fundamental limitation: every session starts from zero.
This means:
- Same mistakes repeated across sessions
- No accumulation of project-specific knowledge
- Debugging loops that should take minutes take hours
- Learnings trapped in conversation history, never extracted
The irony? Claude Code can solve complex problems. It just can't remember that it already solved them. This is where building a self-improving RAG system becomes transformative.
Introducing the Self-Improving RAG System
I built a configuration that makes Claude Code learn from every debugging session.
The core idea: automatic capture, structured storage, intelligent retrieval.
When something breaks, the system captures it. When something works, the system remembers it. When a session ends, the system reflects on what happened.
No manual logging. No /learn commands for every insight. The system watches, learns, and improves. This is built on the principles of Retrieval-Augmented Generation (RAG)—using external knowledge to enhance AI responses—combined with Claude Code's development capabilities.
Architecture: Three Memory Layers
The system uses three complementary memory approaches:
┌─────────────────────────────────────────────────────────────────┐
│ Knowledge Layer │
│ │
│ ┌──────────────┐ ┌──────────────┐ ┌──────────────┐ │
│ │ ChromaDB │ │ Graph Memory │ │ CLAUDE.md │ │
│ │ (Vectors) │ │ (Relations) │ │ (File) │ │
│ └──────────────┘ └──────────────┘ └──────────────┘ │
│ │
│ Collections: Relationships: Sections: │
│ • error_patterns • error→file • Known Pitfalls │
│ • successful_patterns • error→fix • Successful Patterns │
│ • project_learnings • fix→file • Error History │
│ • meta_learnings • decision→outcome │
└─────────────────────────────────────────────────────────────────┘
Layer 1: ChromaDB (Semantic Search)
Vector embeddings enable semantic search across all captured knowledge.
Collections:
-
error_patterns: Build failures, type errors, runtime exceptions -
successful_patterns: What worked—deployment patterns, architecture decisions -
project_learnings: Project-specific insights -
meta_learnings: Process improvements from self-reflection
When I search "authentication errors," I get relevant results even if the exact phrase wasn't used.
Layer 2: Graph Memory (Relationships)
ChromaDB stores flat documents. But knowledge has structure.
Graph memory tracks relationships:
Error ──occurred_in──→ File
│
└──fixed_by──→ Fix ──applied_to──→ File
Decision ──led_to──→ Outcome
│
Learning ←──learned_from─┘
This enables queries like:
- "What errors have occurred in
auth.ts?" - "What fixes have been applied to the API module?"
- "What decisions led to successful deployments?"
Relationships reveal patterns that flat search misses.
Layer 3: CLAUDE.md (Project Memory)
Each project maintains a CLAUDE.md file—a living document that Claude reads at session start.
Sections:
- Known Pitfalls: Project-specific gotchas (auto-populated by hooks)
- Successful Patterns: Code patterns that have worked
- Error History: Recent errors with resolutions
This provides immediate context without database queries.
Automatic Learning Capture
The magic is in the hooks—scripts that intercept Claude Code events.
Hook: Capture Failures
When a build or test fails:
# capture_failure.py (PostToolUse hook)
def capture_failure(tool_result):
if tool_result.exit_code != 0:
error = extract_error(tool_result.output)
store_to_chromadb({
"type": "error_pattern",
"error": error,
"file": tool_result.file,
"timestamp": now()
})
update_graph("error", error, "occurred_in", tool_result.file)
No manual intervention. Failures get captured automatically.
Hook: Track File Edits
When files are modified:
# log_edit.py (PostToolUse hook)
def log_edit(tool_result):
if tool_result.tool == "Edit":
update_graph("fix", tool_result.diff, "applied_to", tool_result.file)
This builds the error→fix→file relationship over time.
Hook: Session Summary
When a session ends:
# session_summary.py (Stop hook)
def session_summary():
learnings = extract_learnings(session_history)
update_claude_md(learnings)
store_to_chromadb(learnings)
The system extracts what was learned and persists it.
Self-Reflection: Meta-Learning
Beyond capturing individual learnings, the system reflects on patterns.
At session end, a self-reflection agent analyzes:
- What approaches were effective
- What inefficiencies occurred
- What patterns emerged
These meta-learnings go into a separate collection—insights about the development process itself, not just specific bugs.
Example meta-learning:
"When debugging TypeScript type errors, checking the imported types first resolves 70% of issues faster than tracing through the codebase."
This is knowledge about how to debug, not just what broke.
Memory Decay: Keeping Knowledge Fresh
Old knowledge becomes stale. A fix that worked six months ago might not apply to the current architecture.
Memory decay solves this:
- Half-life: 90 days (configurable)
- Minimum weight: 0.1 (never fully forgotten)
- Access boost: Recently used memories get +20% weight
The result: Claude prioritizes recent, actively-used knowledge while maintaining a long-term memory of rare but important patterns. This is similar to the token optimization strategies I've outlined for reducing AI token usage, where selective information display keeps context efficient.
Custom Commands
The system adds slash commands for manual interaction:
| Command | What It Does |
|---|---|
/learn |
Manually capture a learning from the current session |
/search-knowledge "query" |
Search all memory layers |
/review-plan |
Validate a plan against past learnings |
/reflect |
Trigger self-reflection analysis |
/memory-stats |
Show knowledge base statistics |
/memory-maintain |
Run decay, merge duplicates, archive old memories |
Most learning happens automatically. These commands are for when you want manual control.
Effort-Based Routing
Not every task needs maximum AI capability. The system classifies prompts:
| Level | Example | Token Impact |
|---|---|---|
low |
"What's the syntax for..." | Fastest, cheapest |
medium |
"Implement this feature" | Balanced |
high |
"Architect the auth system" | Maximum capability |
Simple questions get fast answers. Complex problems get deep thinking.
Real Results
After two months of use:
Before (vanilla Claude Code):
- Same auth bug: 45 minutes to debug (third time)
- Build failures: Manual pattern recognition
- Session knowledge: Lost after conversation ends
After (self-improving RAG):
- Same auth bug:
/search-knowledge "auth middleware"→ fix in 2 minutes - Build failures: Automatic capture, searchable history
- Session knowledge: Persisted, searchable, improving
The system has captured 150+ error patterns, 45 successful patterns, and 80 meta-learnings across my projects. For more on Claude Code workflows and best practices, see my comprehensive Claude Code guide.
Getting Started
The system is available as a configuration you can install:
cd ~/Projects/active/claude-rag-config
./setup.sh
# Then in any project:
claude
Setup installs:
- Hooks for automatic capture
- Custom commands for manual control
- ChromaDB for vector storage
- Graph memory database
- CLAUDE.md template
Requirements: Python 3.9+, Node 18+, ChromaDB (
pip install chromadb)
Getting Started: The Minimum Viable RAG Setup
You don't need the full system on day one. Here's the smallest version that actually makes a difference.
Step 1: Install ChromaDB
pip install chromadb
That's your vector store. One command.
Step 2: Create a capture hook
Create a file at ~/.claude/hooks/post_tool_use.py:
import json, sys, chromadb, hashlib
from datetime import datetime
data = json.loads(sys.stdin.read())
if data.get("tool") == "Bash" and data.get("exit_code", 0) != 0:
client = chromadb.PersistentClient(path="~/.claude/memory")
collection = client.get_or_create_collection("error_patterns")
error_text = data.get("output", "")[:500]
doc_id = hashlib.md5(error_text.encode()).hexdigest()
collection.upsert(
documents=[error_text],
ids=[doc_id],
metadatas=[{"timestamp": datetime.now().isoformat()}]
)
print(json.dumps({"continue": True}))
This captures every failed Bash command into ChromaDB. No manual intervention.
Step 3: Add a search command
Add this to your CLAUDE.md:
## /search-errors command
When user types /search-errors "query":
1. Query ChromaDB error_patterns collection
2. Return top 3 similar past errors and their context
3. Suggest fixes based on patterns
Step 4: Add a project-specific CLAUDE.md section
## Known Pitfalls (auto-updated)
<!-- This section gets updated by the session summary hook -->
That's the minimum viable setup. Four steps, maybe 20 minutes. You won't have graph memory or self-reflection yet--but you'll have semantic search over your past errors, which is where most of the day-to-day value comes from.
The auth bug I mentioned at the top of this post? The minimum viable version would have caught it. The error was in the database. The fix was two queries away.
Build the full system when the minimum version proves itself. For me, that took about 3 weeks.
The minimum version will change how you think about debugging. Instead of starting from scratch each session, you'll start with a search. That shift alone is worth the 20-minute setup cost.
What's Next
The system keeps improving. Planned additions:
- Cross-project learning: Patterns that work in one project suggested in others
- Confidence scoring: How reliable is this learning based on how often it's worked?
- Team memory: Shared knowledge base across collaborators
The Bigger Picture
This isn't just about remembering bugs.
It's about accumulating developer judgment in a searchable, queryable format.
Every debugging session teaches something. Without capture, those lessons evaporate. With this system, they compound.
Claude Code doesn't just help you code. It becomes a repository of everything you've learned about your codebase—and it gets smarter every session.
Questions about implementing this in your workflow? Reach out on LinkedIn.
Sources
- Claude Code Best Practices (Anthropic)
- Claude Code Hooks Documentation (Anthropic)
- ChromaDB Documentation (Chroma)
Top comments (0)