Saurav Anand

Posted on Mar 6

I Built a Tool That Generates AI Coding Context for Every Tool — From One Scan

#ai #python #opensource #productivity

Every AI coding tool needs context about your project. But each wants it in a different format:

Tool	File
Claude Code	CLAUDE.md
Cursor	.cursorrules
Codex CLI	codex.md
Windsurf	.windsurfrules
Multi-agent	AGENTS.md

I was maintaining 4 different context files across 3 projects. When I refactored a module,
I had to update all of them. So I built codebase-md.

One command, all formats

pip install codebase-md
cd your-project/
codebase scan .
codebase generate .

That's it. You now have all 6 context files in your project root — auto-generated from a
single scan of your codebase.

What does it actually detect?

This isn't a template generator. It analyzes your code:

Language & Framework Detection
Recognizes 50+ file extensions. Detects frameworks like FastAPI, Django, Flask, React,
Next.js, Express, Vue.

Architecture Pattern Recognition
Looks at your folder structure, entry points, and package layout to classify:
monolith, monorepo, microservice, library, or CLI tool.

Convention Inference (via tree-sitter AST)
Parses your actual code to determine:

Naming conventions (snake_case, camelCase, PascalCase)
Import style (absolute vs relative)
File organization (modular, layer-based, feature-based)
Design patterns (MVC, service/repository, etc.)

Dependency Intelligence
Parses package.json, requirements.txt, pyproject.toml, go.mod, Cargo.toml, Gemfile.
Then queries PyPI/npm registries for health scoring, version freshness, and breaking
change detection.

Git Analysis
Contributor analysis, commit frequency, file change hotspots — all included in the
generated context.

Smart Context Routing

My favorite feature: query your project context like a search engine.

codebase context "how does authentication work"
codebase context "database models" --max 5

It chunks your project into 12 topic areas and uses TF-IDF with 6 scoring signals
to return the most relevant context. Feed this directly to any LLM.

The Architecture

Scanner Engine
  → Language Detector (50+ extensions)
  → Structure Analyzer (architecture patterns)
  → Dependency Parser (6 package formats)
  → Convention Inferrer (tree-sitter AST)
  → Git Analyzer (history, contributors)
       ↓
  ProjectModel (Pydantic v2, frozen, validated)
       ↓
  Generators (plugin-style, one per format)
       ↓
  CLAUDE.md, .cursorrules, AGENTS.md, codex.md, .windsurfrules

Each generator transforms the same ProjectModel into its output format.
Adding a new format means writing one class — the architecture is designed for extensibility.

What's tested

354 tests covering 8 project archetypes:

Python CLI, FastAPI app, Next.js app, Go CLI, Rust CLI
Mixed-language, monorepo, empty repo edge case

Plus integration tests against real-world repositories.

Try it

pip install codebase-md

Or with tree-sitter AST support (recommended):

pip install "codebase-md[ast]"

Contribute

This is v0.1.0 — the first public release. I'd love contributions:

Good first issues: Go/Rust tree-sitter support, PHP dependency parser, README demo GIF
Bigger features: Watch mode, Java/Kotlin support

GitHub: sauravanand542/codebase-md
PyPI: codebase-md

Star ⭐ if this is useful to you — it helps others find it!

DEV Community