My codebase was 83% of Cursor's context window and I didn't know it

#ai #webdev #showdev #programming

My codebase was 83% of Cursor's context window and I didn't know it.

I found out because I wrote a script to check. Took about an hour. Here's what it found.

The problem

When Claude Code or Cursor loses track of your conventions mid-task, the usual advice is "your codebase is too big for the context window." But that's vague. How big is too big? Which files are the problem?

I wanted actual numbers, not vibes.

The script

context-scanner.py walks your project directory, estimates tokens per file (1 token per 4 characters, rough but useful), and shows where you land against each model's context limit.

python3 context-scanner.py .

Output on my project:

=== Context Scanner ===
Total tokens (est): 52.9k
Files scanned: 34

--- Context window usage ---
Claude 3.5 Sonnet      █████                 26.4%  ✓ fits
GPT-4o                 ████████              41.3%  ✓ fits
Cursor default         ████████████████      82.6%  ⚠ tight
Gemini 1.5 Pro         █                      5.3%  ✓ fits

--- Top files (token hogs) ---
10.4k   package-lock.json
 6.6k   multi-agent-templates.md
 6.3k   agent-prompt-playbook.md

Fine for Claude and GPT-4o, tight for Cursor. And package-lock.json is chewing through 10k tokens, which is pure noise.

I added it to .claudeignore. Context usage dropped 20% in maybe 30 seconds.

The conventions generator

While I was at it, I wrote a second script. It reads your codebase and spits out a CONVENTIONS.md documenting your actual patterns: import style, error handling, naming, framework, package manager.

python3 conventions-gen.py . > CONVENTIONS.md

Feed that to Claude Code at session start and it stops guessing your conventions from scratch every time.

Get them

Both are free at builtbyzac.com/tools.html. No install, no signup. Just Python files you drop into your project.

Context scanner is ~60 lines, conventions generator is ~90. Easy to adapt.

The thing that surprised me: the total count alone isn't that useful. The per-file breakdown is. In my case it was package-lock.json doing most of the damage. Usually it's a lockfile or a build artifact — something that should've been in .gitignore in the first place.

If this is useful, I also documented what I have learned about managing agent context, memory patterns, and multi-agent coordination: payhip.com/b/6rRkT. $29, about 40 pages.

DEV Community

My codebase was 83% of Cursor's context window and I didn't know it

The problem

The script

The conventions generator

Get them

Top comments (0)