I've been using AI coding agents — Claude Code, GitHub Copilot, Cursor — daily across 11 production microservices for the past year. They've transformed how I work. But they keep failing at the same three things.
This isn't a rant. This is a problem statement with a solution I built and open-sourced.
Problem 1: Context Rot
AI agents lose track of what matters during longer sessions. Even with 1M token context windows, having a massive database without proper indexing doesn't help — the data exists, but retrieval is inefficient.
Real example: I asked Claude to add rate limiting to a payment endpoint. It read the file, saw no existing rate limiter, and built one from scratch. There was already a rate limiting module in src/middleware/rate_limiter.py with 200 lines of battle-tested code. Claude just didn't see it because it was two directories away.
Problem 2: No Decision Awareness
AI doesn't know WHY your code is the way it is.
Real example: Claude suggested switching our auth tokens from EdDSA to RS256 "for broader compatibility." Our team specifically chose EdDSA six months ago because our compliance team required it for a security audit. That decision existed in a PR description that Claude never saw.
The agent was technically correct. It was contextually wrong.
Problem 3: Quality Drift
AI-generated code prioritizes speed over project conventions.
Real example: Our project uses async handlers everywhere — 100% consistency across 46 handler files. Claude generated a sync handler. It worked. Tests passed. But it broke the convention, and the next developer who saw it started writing sync handlers too. One exception became a pattern.
The Root Cause
Every AI context tool solves what code exists — code graphs, RAG, embeddings. Nobody solves why code exists, what rules apply, and whether the context is still valid.
What I Built
codebase-intel — an open-source platform that provides AI agents with three things no other tool offers:
1. Decision Journal
Structured records of WHY decisions were made, linked to specific code locations:
id: DEC-042
title: "Use token bucket for rate limiting"
context: "Payment endpoint hammered during flash sales"
decision: "Token bucket, per-user buckets, 100 req/min"
alternatives:
- name: sliding_window
rejection_reason: "Memory overhead too high at scale"
constraints:
- description: "Must not add >2ms p99 latency"
source: sla
is_hard: true
code_anchors:
- "src/middleware/rate_limiter.py:15-82"
These are auto-mined from git history. Run codebase-intel mine --save and it extracts decision candidates from commit messages and PR descriptions using keyword analysis. On our IDP backend, it found 16 decision candidates automatically.
2. Quality Contracts
Not linting. Project-specific architectural rules that AI reads BEFORE generating code:
rules:
- id: async-handlers
name: All handlers must be async
severity: error
pattern: "(?<!async\\s)def\\s+(get_|post_|create_)"
fix_suggestion: "Use async def for handlers"
The killer feature: auto-detection. Run codebase-intel detect-patterns --save and it scans your codebase to find patterns you already follow:
┏━━━┳━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━┳━━━━━━━━━┳━━━━━━━━━━┓
┃ # ┃ Pattern ┃ Confidence ┃ Follows ┃ Violates ┃
┡━━━╇━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━╇━━━━━━━━━╇━━━━━━━━━━┩
│ 1 │ Async handlers │ 100% │ 46 │ 0 │
│ 2 │ Layer separation │ 96% │ 132 │ 5 │
│ 3 │ Service class naming │ 90% │ 35 │ 0 │
│ 4 │ Custom exceptions │ 100% │ 47 │ 0 │
│ 5 │ Docstring convention │ 88% │ 793 │ 108 │
└───┴────────────────────────┴────────────┴─────────┴──────────┘
Zero manual contract writing. The tool discovers what your team already does and generates rules from it.
3. Drift Detection
Context rots. Decisions go stale. Code anchors point to deleted files. codebase-intel detects this:
$ codebase-intel drift
Overall: MEDIUM
- Decision DEC-012 anchored to deleted file
- Decision DEC-008 is past its review date
- 2 files changed since last graph index
The Benchmarks
Tested on 4 production codebases:
| Project | Without Tool | With codebase-intel | Reduction | Decisions Surfaced |
|---|---|---|---|---|
| FastAPI monolith (359 files) | 16,063 tokens | 5,955 tokens | 63% | 13 |
| Microservice (358 files) | 14,611 | 5,955 | 59% | 0 |
| Microservice (153 files) | 5,904 | 1,476 | 75% | 0 |
The token reduction is nice. But the real value is the 13 decisions surfaced — those are constraints, trade-offs, and rejected alternatives that would have been invisible to the agent.
How It Works
AI Agent → MCP Server → Context Orchestrator
↓
┌─────────┼─────────┐
↓ ↓ ↓
Code Graph Decisions Contracts
(19 langs) (git-mined) (auto-detected)
↓ ↓ ↓
└─────────┼─────────┘
↓
Drift Detector
The agent asks "what do I need to know to work on this file?" and gets:
- Relevant files (from the code graph)
- Applicable decisions (linked to this code area)
- Quality rules to follow
- Warnings about stale context
All fitted within a token budget, prioritized by relevance.
Getting Started
pip install codebase-intel
cd your-project
codebase-intel init # build graph, mine decisions
codebase-intel detect-patterns --save # auto-generate contracts
codebase-intel benchmark # see before/after numbers
For Claude Code, add to ~/.claude/settings.json:
{
"mcpServers": {
"codebase-intel": {
"command": "codebase-intel",
"args": ["serve", "/path/to/project"]
}
}
}
It's Not a Competition
Tools like code-review-graph (6.3K stars) are excellent for code graphs. Cursor and Copilot are great for autocomplete. We don't replace any of them.
We fill the gap none of them address: why does this code exist, what rules must AI follow, and is the context still valid?
What I'd Love Feedback On
- What patterns should the auto-detector look for in your stack?
- What quality rules would you want enforced in your project?
- Are there decision types we're missing?
GitHub: MutharasuArchunan13/codebase-intel
PyPI: pip install codebase-intel
License: MIT
Top comments (0)