How we reduced coding-agent token usage by 17.9% with an MCP server

#mcp #opensource #ai #developers

Coding agents are powerful, but in day-to-day development they waste a lot of tokens on noisy tool output.

A typical cargo test or git status through generic shell tooling sends back a lot of text that an agent doesn’t actually need to reason with. The model still has to read it, pay for it, and carry it in context.

I built Daimonos to reduce that waste.

What Daimonos is

Daimonos is an MCP server focused on the core coding loop:

read / write / edit files
search code
execute commands
structured git, cargo, gh, and docker operations
batching and script execution for fewer round trips

The key idea is simple: return compact, structured output instead of terminal spam whenever possible.

Why this matters

For coding agents, token usage compounds quickly across many tool calls.

Even when the final answer is simple, the path to get there can be expensive.

We wanted to reduce:

total tokens consumed
output-token noise
wall-clock time per task

Benchmark snapshot

From our benchmark runs:

Total tokens: 41,239 -> 33,847 (7,392 saved, -17.9%)
Output tokens: 5,842 -> 3,198 (-45.3%)
Wall time: -16.4% locally
Remote AWS runs: -20.3% cost, -14.0% completion time

Positioning

There are lots of great MCP servers for external APIs and workflow orchestration.

Daimonos is different: it optimizes the core coding tool path itself so agents spend less context on operational noise and more on actual reasoning.