Every AI coding tool needs context about your project. But each wants it in a different format:
| Tool | File |
|---|---|
| Claude Code | CLAUDE.md |
| Cursor | .cursorrules |
| Codex CLI | codex.md |
| Windsurf | .windsurfrules |
| Multi-agent | AGENTS.md |
I was maintaining 4 different context files across 3 projects. When I refactored a module,
I had to update all of them. So I built codebase-md.
One command, all formats
pip install codebase-md
cd your-project/
codebase scan .
codebase generate .
That's it. You now have all 6 context files in your project root — auto-generated from a
single scan of your codebase.
What does it actually detect?
This isn't a template generator. It analyzes your code:
Language & Framework Detection
Recognizes 50+ file extensions. Detects frameworks like FastAPI, Django, Flask, React,
Next.js, Express, Vue.
Architecture Pattern Recognition
Looks at your folder structure, entry points, and package layout to classify:
monolith, monorepo, microservice, library, or CLI tool.
Convention Inference (via tree-sitter AST)
Parses your actual code to determine:
- Naming conventions (snake_case, camelCase, PascalCase)
- Import style (absolute vs relative)
- File organization (modular, layer-based, feature-based)
- Design patterns (MVC, service/repository, etc.)
Dependency Intelligence
Parses package.json, requirements.txt, pyproject.toml, go.mod, Cargo.toml, Gemfile.
Then queries PyPI/npm registries for health scoring, version freshness, and breaking
change detection.
Git Analysis
Contributor analysis, commit frequency, file change hotspots — all included in the
generated context.
Smart Context Routing
My favorite feature: query your project context like a search engine.
codebase context "how does authentication work"
codebase context "database models" --max 5
It chunks your project into 12 topic areas and uses TF-IDF with 6 scoring signals
to return the most relevant context. Feed this directly to any LLM.
The Architecture
Scanner Engine
→ Language Detector (50+ extensions)
→ Structure Analyzer (architecture patterns)
→ Dependency Parser (6 package formats)
→ Convention Inferrer (tree-sitter AST)
→ Git Analyzer (history, contributors)
↓
ProjectModel (Pydantic v2, frozen, validated)
↓
Generators (plugin-style, one per format)
↓
CLAUDE.md, .cursorrules, AGENTS.md, codex.md, .windsurfrules
Each generator transforms the same ProjectModel into its output format.
Adding a new format means writing one class — the architecture is designed for extensibility.
What's tested
354 tests covering 8 project archetypes:
- Python CLI, FastAPI app, Next.js app, Go CLI, Rust CLI
- Mixed-language, monorepo, empty repo edge case
Plus integration tests against real-world repositories.
Try it
pip install codebase-md
Or with tree-sitter AST support (recommended):
pip install "codebase-md[ast]"
Contribute
This is v0.1.0 — the first public release. I'd love contributions:
- Good first issues: Go/Rust tree-sitter support, PHP dependency parser, README demo GIF
- Bigger features: Watch mode, Java/Kotlin support
GitHub: sauravanand542/codebase-md
PyPI: codebase-md
Star ⭐ if this is useful to you — it helps others find it!
Top comments (0)