Chudi Nnorukam

Posted on Apr 10 • Edited on Jul 18 • Originally published at chudi.dev

Self-Improving RAG: Teaching Claude Code to Learn From Errors

#ai #claudecode #rag #automation

Originally published at chudi.dev

I was debugging the same authentication error for the third time this month.

Same error. Same root cause. Same fix.

Claude Code had solved this exact problem two weeks ago—but it didn't remember. Each session starts fresh. No memory of what worked, what failed, or what patterns emerged.

That's a massive waste of debugging time.

So I built a system to fix it.

The Problem With Stateless AI

Claude Code is powerful, but it has a fundamental limitation: every session starts from zero.

This means:

Same mistakes repeated across sessions
No accumulation of project-specific knowledge
Debugging loops that should take minutes take hours
Learnings trapped in conversation history, never extracted

The irony? Claude Code can solve complex problems. It just can't remember that it already solved them. This is where building a self-improving RAG system becomes transformative.

Introducing the Self-Improving RAG System

I built a configuration that makes Claude Code learn from every debugging session.

The core idea: automatic capture, structured storage, intelligent retrieval.

When something breaks, the system captures it. When something works, the system remembers it. When a session ends, the system reflects on what happened.

No manual logging. No /learn commands for every insight. The system watches, learns, and improves. This is built on the principles of Retrieval-Augmented Generation (RAG)—using external knowledge to enhance AI responses—combined with Claude Code's development capabilities.

Architecture: Three Memory Layers

The system uses three complementary memory approaches:

┌─────────────────────────────────────────────────────────────────┐
│                     Knowledge Layer                              │
│                                                                  │
│  ┌──────────────┐  ┌──────────────┐  ┌──────────────┐          │
│  │ ChromaDB     │  │ Graph Memory │  │ CLAUDE.md    │          │
│  │ (Vectors)    │  │ (Relations)  │  │ (File)       │          │
│  └──────────────┘  └──────────────┘  └──────────────┘          │
│                                                                  │
│  Collections:          Relationships:     Sections:             │
│  • error_patterns      • error→file       • Known Pitfalls      │
│  • successful_patterns • error→fix        • Successful Patterns │
│  • project_learnings   • fix→file         • Error History       │
│  • meta_learnings      • decision→outcome                       │
└─────────────────────────────────────────────────────────────────┘

Layer 1: ChromaDB (Semantic Search)

Vector embeddings enable semantic search across all captured knowledge.

Collections:

error_patterns: Build failures, type errors, runtime exceptions
successful_patterns: What worked—deployment patterns, architecture decisions
project_learnings: Project-specific insights
meta_learnings: Process improvements from self-reflection

When I search "authentication errors," I get relevant results even if the exact phrase wasn't used.

Layer 2: Graph Memory (Relationships)

ChromaDB stores flat documents. But knowledge has structure.

Graph memory tracks relationships:

Error ──occurred_in──→ File
  │
  └──fixed_by──→ Fix ──applied_to──→ File

Decision ──led_to──→ Outcome
                        │
Learning ←──learned_from─┘

This enables queries like:

"What errors have occurred in auth.ts?"
"What fixes have been applied to the API module?"
"What decisions led to successful deployments?"

Relationships reveal patterns that flat search misses.

Layer 3: CLAUDE.md (Project Memory)

Each project maintains a CLAUDE.md file—a living document that Claude reads at session start.

Sections:

Known Pitfalls: Project-specific gotchas (auto-populated by hooks)
Successful Patterns: Code patterns that have worked
Error History: Recent errors with resolutions

This provides immediate context without database queries.

Automatic Learning Capture

The magic is in the hooks—scripts that intercept Claude Code events.

Hook: Capture Failures

When a build or test fails:

# capture_failure.py (PostToolUse hook)
def capture_failure(tool_result):
    if tool_result.exit_code != 0:
        error = extract_error(tool_result.output)
        store_to_chromadb({
            "type": "error_pattern",
            "error": error,
            "file": tool_result.file,
            "timestamp": now()
        })
        update_graph("error", error, "occurred_in", tool_result.file)

No manual intervention. Failures get captured automatically.

Hook: Track File Edits

When files are modified:

# log_edit.py (PostToolUse hook)
def log_edit(tool_result):
    if tool_result.tool == "Edit":
        update_graph("fix", tool_result.diff, "applied_to", tool_result.file)

This builds the error→fix→file relationship over time.

Hook: Session Summary

When a session ends:

# session_summary.py (Stop hook)
def session_summary():
    learnings = extract_learnings(session_history)
    update_claude_md(learnings)
    store_to_chromadb(learnings)

The system extracts what was learned and persists it.

Self-Reflection: Meta-Learning

Beyond capturing individual learnings, the system reflects on patterns.

At session end, a self-reflection agent analyzes:

What approaches were effective
What inefficiencies occurred
What patterns emerged

These meta-learnings go into a separate collection—insights about the development process itself, not just specific bugs.

Example meta-learning:

"When debugging TypeScript type errors, checking the imported types first resolves 70% of issues faster than tracing through the codebase."

This is knowledge about how to debug, not just what broke.

Memory Decay: Keeping Knowledge Fresh

Old knowledge becomes stale. A fix that worked six months ago might not apply to the current architecture.

Memory decay solves this:

Half-life: 90 days (configurable)
Minimum weight: 0.1 (never fully forgotten)
Access boost: Recently used memories get +20% weight

The result: Claude prioritizes recent, actively-used knowledge while maintaining a long-term memory of rare but important patterns. This is similar to the token optimization strategies I've outlined for reducing AI token usage, where selective information display keeps context efficient.

Custom Commands

The system adds slash commands for manual interaction:

Command	What It Does
`/learn`	Manually capture a learning from the current session
`/search-knowledge "query"`	Search all memory layers
`/review-plan`	Validate a plan against past learnings
`/reflect`	Trigger self-reflection analysis
`/memory-stats`	Show knowledge base statistics
`/memory-maintain`	Run decay, merge duplicates, archive old memories

Most learning happens automatically. These commands are for when you want manual control.

Effort-Based Routing

Not every task needs maximum AI capability. The system classifies prompts:

Level	Example	Token Impact
`low`	"What's the syntax for..."	Fastest, cheapest
`medium`	"Implement this feature"	Balanced
`high`	"Architect the auth system"	Maximum capability

Simple questions get fast answers. Complex problems get deep thinking.

Real Results

After two months of use:

Before (vanilla Claude Code):

Same auth bug: 45 minutes to debug (third time)
Build failures: Manual pattern recognition
Session knowledge: Lost after conversation ends

After (self-improving RAG):

Same auth bug: /search-knowledge "auth middleware" → fix in 2 minutes
Build failures: Automatic capture, searchable history
Session knowledge: Persisted, searchable, improving

The system has captured 150+ error patterns, 45 successful patterns, and 80 meta-learnings across my projects. For more on Claude Code workflows and best practices, see my comprehensive Claude Code guide.

Getting Started

The system is available as a configuration you can install:

cd ~/Projects/active/claude-rag-config
./setup.sh

# Then in any project:
claude

Setup installs:

Hooks for automatic capture
Custom commands for manual control
ChromaDB for vector storage
Graph memory database
CLAUDE.md template

Requirements: Python 3.9+, Node 18+, ChromaDB (pip install chromadb)

Getting Started: The Minimum Viable RAG Setup

You don't need the full system on day one. Here's the smallest version that actually makes a difference.

Step 1: Install ChromaDB

pip install chromadb

That's your vector store. One command.

Step 2: Create a capture hook

Create a file at ~/.claude/hooks/post_tool_use.py:

import json, sys, chromadb, hashlib
from datetime import datetime

data = json.loads(sys.stdin.read())

if data.get("tool") == "Bash" and data.get("exit_code", 0) != 0:
    client = chromadb.PersistentClient(path="~/.claude/memory")
    collection = client.get_or_create_collection("error_patterns")

    error_text = data.get("output", "")[:500]
    doc_id = hashlib.md5(error_text.encode()).hexdigest()

    collection.upsert(
        documents=[error_text],
        ids=[doc_id],
        metadatas=[{"timestamp": datetime.now().isoformat()}]
    )

print(json.dumps({"continue": True}))

This captures every failed Bash command into ChromaDB. No manual intervention.

Step 3: Add a search command

Add this to your CLAUDE.md:

## /search-errors command
When user types /search-errors "query":
1. Query ChromaDB error_patterns collection
2. Return top 3 similar past errors and their context
3. Suggest fixes based on patterns

Step 4: Add a project-specific CLAUDE.md section

## Known Pitfalls (auto-updated)
<!-- This section gets updated by the session summary hook -->

That's the minimum viable setup. Four steps, maybe 20 minutes. You won't have graph memory or self-reflection yet--but you'll have semantic search over your past errors, which is where most of the day-to-day value comes from.

The auth bug I mentioned at the top of this post? The minimum viable version would have caught it. The error was in the database. The fix was two queries away.

Build the full system when the minimum version proves itself. For me, that took about 3 weeks.

The minimum version will change how you think about debugging. Instead of starting from scratch each session, you'll start with a search. That shift alone is worth the 20-minute setup cost.

What's Next

The system keeps improving. Planned additions:

Cross-project learning: Patterns that work in one project suggested in others
Confidence scoring: How reliable is this learning based on how often it's worked?
Team memory: Shared knowledge base across collaborators

The Bigger Picture

This isn't just about remembering bugs.

It's about accumulating developer judgment in a searchable, queryable format.

Every debugging session teaches something. Without capture, those lessons evaporate. With this system, they compound.

Claude Code doesn't just help you code. It becomes a repository of everything you've learned about your codebase—and it gets smarter every session.

Questions about implementing this in your workflow? Reach out on LinkedIn.

Sources

Claude Code Best Practices (Anthropic)
Claude Code Hooks Documentation (Anthropic)
ChromaDB Documentation (Chroma)

DEV Community