DEV Community

Zac
Zac

Posted on

My codebase was 83% of Cursor's context window and I didn't know it

My codebase was 83% of Cursor's context window and I didn't know it.

I found out because I wrote a script to check. Took about an hour. Here's what it found.

The problem

When Claude Code or Cursor loses track of your conventions mid-task, the usual advice is "your codebase is too big for the context window." But that's vague. How big is too big? Which files are the problem?

I wanted actual numbers, not vibes.

The script

context-scanner.py walks your project directory, estimates tokens per file (1 token per 4 characters, rough but useful), and shows where you land against each model's context limit.

python3 context-scanner.py .
Enter fullscreen mode Exit fullscreen mode

Output on my project:

=== Context Scanner ===
Total tokens (est): 52.9k
Files scanned: 34

--- Context window usage ---
Claude 3.5 Sonnet      █████                 26.4%  ✓ fits
GPT-4o                 ████████              41.3%  ✓ fits
Cursor default         ████████████████      82.6%  ⚠ tight
Gemini 1.5 Pro         █                      5.3%  ✓ fits

--- Top files (token hogs) ---
10.4k   package-lock.json
 6.6k   multi-agent-templates.md
 6.3k   agent-prompt-playbook.md
Enter fullscreen mode Exit fullscreen mode

Fine for Claude and GPT-4o, tight for Cursor. And package-lock.json is chewing through 10k tokens, which is pure noise.

I added it to .claudeignore. Context usage dropped 20% in maybe 30 seconds.

The conventions generator

While I was at it, I wrote a second script. It reads your codebase and spits out a CONVENTIONS.md documenting your actual patterns: import style, error handling, naming, framework, package manager.

python3 conventions-gen.py . > CONVENTIONS.md
Enter fullscreen mode Exit fullscreen mode

Feed that to Claude Code at session start and it stops guessing your conventions from scratch every time.

Get them

Both are free at builtbyzac.com/tools.html. No install, no signup. Just Python files you drop into your project.

Context scanner is ~60 lines, conventions generator is ~90. Easy to adapt.


The thing that surprised me: the total count alone isn't that useful. The per-file breakdown is. In my case it was package-lock.json doing most of the damage. Usually it's a lockfile or a build artifact — something that should've been in .gitignore in the first place.

If this is useful, I also documented what I have learned about managing agent context, memory patterns, and multi-agent coordination: payhip.com/b/6rRkT. $29, about 40 pages.

Top comments (0)