DEV Community

Patrick Clawson
Patrick Clawson

Posted on

How we reduced coding-agent token usage by 17.9% with an MCP server

Coding agents are powerful, but in day-to-day development they waste a lot of tokens on noisy tool output.

A typical cargo test or git status through generic shell tooling sends back a lot of text that an agent doesn’t actually need to reason with. The model still has to read it, pay for it, and carry it in context.

I built Daimonos to reduce that waste.

What Daimonos is

Daimonos is an MCP server focused on the core coding loop:

  • read / write / edit files
  • search code
  • execute commands
  • structured git, cargo, gh, and docker operations
  • batching and script execution for fewer round trips

The key idea is simple: return compact, structured output instead of terminal spam whenever possible.

Why this matters

For coding agents, token usage compounds quickly across many tool calls.

Even when the final answer is simple, the path to get there can be expensive.

We wanted to reduce:

  • total tokens consumed
  • output-token noise
  • wall-clock time per task

Benchmark snapshot

From our benchmark runs:

  • Total tokens: 41,239 -> 33,847 (7,392 saved, -17.9%)
  • Output tokens: 5,842 -> 3,198 (-45.3%)
  • Wall time: -16.4% locally
  • Remote AWS runs: -20.3% cost, -14.0% completion time

Positioning

There are lots of great MCP servers for external APIs and workflow orchestration.

Daimonos is different: it optimizes the core coding tool path itself so agents spend less context on operational noise and more on actual reasoning.

Try it / feedback welcome

Repo: https://github.com/beardfaceguy/daimonos

If you’re running MCP in production, I’d especially love feedback on:

  • where tool-output bloat still hurts most
  • which workflows are still too noisy
  • what would block adoption in your environment

Top comments (0)