leo Yan

Posted on May 23

I built TokenPatch to measure AI coding cost per applied patch

#ai #programming #devtools #opensource

AI coding tools are getting very useful, but I kept running into one problem:

Expensive frontier models are often used for everything, including small file-scoped implementation patches.

That feels wasteful.

For many coding tasks, I want the strong model to stay in charge of planning and judgment, but I do not necessarily need it to write every narrow diff.

So I built TokenPatch.

GitHub: https://github.com/Leoyen1/tokenpatch

Website: https://tokenpatch.com

What it does

TokenPatch lets you keep using your current AI coding tool, such as Codex, Claude Code, Cursor, or MCP-capable coding agents.

The strong model still decides what should change.

TokenPatch then routes bounded implementation work to a cheaper executor, checks the patch locally, and reports what the useful change actually cost.

The core metric is:

cost per applied patch

Not just request cost.

Example

A task might look like this:

tp: change the page title. Only modify index.html.

A report can show:

Task: change page title, only modify index.html

All-strong estimate: $0.42

TokenPatch actual: $0.08

Saved: 81%

Patch applied: yes

Tests: passed

Why I built it

Most LLM cost tools focus on API requests.

But when coding with agents, I care more about task-level economics:

Did the patch actually apply?
Did it stay inside allowed files?
Did it pass validation?
How much did the accepted change cost?
Would this have been more expensive if everything used the strong model?

That is the layer I wanted to explore.

Current status

TokenPatch is open source and BYOK-first.

You bring your own executor API key, currently DeepSeek-compatible, and TokenPatch runs locally.

Install from GitHub:

pip install git+https://github.com/Leoyen1/tokenpatch.git

Then run:

tokenpatch bootstrap

Then use it from your coding app:

tp: implement a small change. Only modify <file>.

What I am looking for

This is still early.

I am looking for feedback from developers who use AI coding tools regularly:

Is “cost per applied patch” a useful metric?
Is the setup too hard?
Would you trust a cheaper executor if file boundaries are enforced?
What coding-agent workflows should this support next?

If you try it, I would really appreciate feedback or issues on GitHub.

Top comments (1)

leo Yan • May 23

I am especially interested in feedback from Codex, Claude Code, and Cursor users.