DEV Community

leo Yan
leo Yan

Posted on

I built TokenPatch to measure AI coding cost per applied patch

AI coding tools are getting very useful, but I kept running into one problem:

Expensive frontier models are often used for everything, including small file-scoped implementation patches.

That feels wasteful.

For many coding tasks, I want the strong model to stay in charge of planning and judgment, but I do not necessarily need it to write every narrow diff.

So I built TokenPatch.

GitHub: https://github.com/Leoyen1/tokenpatch

Website: https://tokenpatch.com

What it does

TokenPatch lets you keep using your current AI coding tool, such as Codex, Claude Code, Cursor, or MCP-capable coding agents.

The strong model still decides what should change.

TokenPatch then routes bounded implementation work to a cheaper executor, checks the patch locally, and reports what the useful change actually cost.

The core metric is:

cost per applied patch

Not just request cost.

Example

A task might look like this:

tp: change the page title. Only modify index.html.

A report can show:

Task: change page title, only modify index.html

All-strong estimate: $0.42

TokenPatch actual: $0.08

Saved: 81%

Patch applied: yes

Tests: passed

Why I built it

Most LLM cost tools focus on API requests.

But when coding with agents, I care more about task-level economics:

  • Did the patch actually apply?
  • Did it stay inside allowed files?
  • Did it pass validation?
  • How much did the accepted change cost?
  • Would this have been more expensive if everything used the strong model?

That is the layer I wanted to explore.

Current status

TokenPatch is open source and BYOK-first.

You bring your own executor API key, currently DeepSeek-compatible, and TokenPatch runs locally.

Install from GitHub:

pip install git+https://github.com/Leoyen1/tokenpatch.git

Then run:

tokenpatch bootstrap

Then use it from your coding app:

tp: implement a small change. Only modify <file>.

What I am looking for

This is still early.

I am looking for feedback from developers who use AI coding tools regularly:

  • Is “cost per applied patch” a useful metric?
  • Is the setup too hard?
  • Would you trust a cheaper executor if file boundaries are enforced?
  • What coding-agent workflows should this support next?

If you try it, I would really appreciate feedback or issues on GitHub.

Top comments (1)

Collapse
 
leo_yan_dac4a3dbb07ff1095 profile image
leo Yan

I am especially interested in feedback from Codex, Claude Code, and Cursor users.