DEV Community

massiron
massiron

Posted on • Originally published at deepstrain.dev

Context Window Too Small? Compress Your Codebase to 1500 Tokens Without Losing Signal

If you've ever hit a token limit trying to feed a codebase to an LLM, you know the pain: truncate the files and lose critical context, or pay for more tokens than makes sense.

Repofuse solves this by compressing your entire codebase into a structured ~1500 token context pack — module tree, dependency graph, and risk-ranked function list in one portable JSON block. That's a 95% token savings vs. dumping raw source files.

Let's walk through it.

Install

pip install repofuse
Enter fullscreen mode Exit fullscreen mode

Zero dependencies — pure Python, stdlib only. Works with any Python project on Linux, macOS, or Windows.

One-shot run

repofuse .
Enter fullscreen mode Exit fullscreen mode

You'll get a JSON output in stdout. You can redirect it to a file, pipe it to a clipboard tool, or feed it directly to an LLM:

repofuse . > context-pack.json
Enter fullscreen mode Exit fullscreen mode

The output

A context pack contains three sections:

  • module_tree — All source files arranged as a tree, with line counts per file. Your LLM sees the project skeleton immediately.
  • dependency_graph — Edges between modules (who imports whom). Enables reasoning about coupling and change impact.
  • risk_ranked_functions — Functions sorted by risk indicators (complexity, cyclomatic depth, number of imports). Lets the LLM focus attention on the most critical code.

Example snippet:

{
  "module_tree": {
    "src/app.py": 120,
    "src/models.py": 85
  },
  "dependency_graph": [
    {"from": "src/app.py", "to": "src/models.py"}
  ],
  "risk_ranked_functions": [
    {"name": "process_payment", "file": "src/payments.py", "risk_score": 0.87, "reason": "High cyclomatic complexity, 5 conditional branches, 3 external imports"}
  ]
}
Enter fullscreen mode Exit fullscreen mode

CI integration

Add it to your CI pipeline so every commit ships an up-to-date context pack:

# .github/workflows/context-pack.yml
name: Update context pack
on: [push]
jobs:
  pack:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4
      - run: pip install repofuse
      - run: repofuse . > context-pack.json
      - uses: stefanzweifel/git-auto-commit-action@v5
        with:
          commit_message: "auto: update context pack"
Enter fullscreen mode Exit fullscreen mode

When would you use this?

  • Your team's monorepo has hundreds of files, and claude/gpt-4 keeps forgetting module structure after two files.
  • You're building an AI agent that needs to understand a codebase before writing code in it. A context pack is far more reliable than a few random source files.
  • You're onboarding to a new repo and want to dump the whole thing into an AI chat in one shot.

Limitations (honest ones)

  • Python only — repofuse parses Python AST. It won't read TypeScript, Go, or Rust (yet).
  • Static analysis only — risk scores are based on structural metrics, not runtime data. A function with a high risk score might be perfectly safe if it's well-tested.
  • Tree + deps + risk, not code — the output replaces raw source files. You still need the actual code for line-level details. The context pack is a map, not the territory.

Try it

pip install repofuse && repofuse .
Enter fullscreen mode Exit fullscreen mode

Repo: github.com/massiron/repofuse

Docs: deepstrain.dev

Free and open-source.

Top comments (0)