Posted on May 30

How I Cut AI Coding Token Costs by 96% with AST Compression

#claude #opensource #python #productivity

If you use Claude Code, Cursor, or Aider - you feel the pain.

You ask to fix a bug and it dumps 15 files into context. 45K tokens wasted.

I built PMC Engine - AST-level compression that scores every symbol and sends only what's needed.

The Numbers

FastAPI (48 files, 33K LOC, DeepSeek V4 Flash):

Simple fix: 45K -> 7.1K (84%)
Refactor: 85K -> 7.8K (91%)
Complex: 148K -> 7.7K (95%)
Avg: 91.8% reduction, 100% quality, <5ms

Quick Start

pip install pmc-engine
pmc index ./my-project
pmc serve --port 8080

MIT: https://github.com/mdayan8/pmc-engine

Top comments (1)

Harjot Singh • May 31

AST compression is the clever, underused lever - instead of dumping whole files into context, you send the structural skeleton (signatures, types, call graph) and only expand the bodies the task actually needs. 96% is plausible because most of what people paste into a model is irrelevant implementation detail; the model usually needs the shape, not every line. Treating code as a tree you can prune beats treating it as a flat blob of tokens.

The thing I'd flag: AST-aware context selection pairs beautifully with model-routing - once you've cut the context 96%, the remaining work is small enough that cheap models handle most of it, compounding the savings. That combination (smart context + per-step routing) is exactly how Moonshift (a multi-agent pipeline shipping a prompt to a real SaaS) keeps a full build ~$3 flat. Context compression and routing are the two biggest levers and they multiply. Genuinely sharp technique - are you doing the AST pruning per-task (only expand touched nodes) or a static summary? The dynamic per-task version is where the 96% really lives.