Benjamin Eckstein

Posted on Apr 1 • Originally published at codewithagents.de

The 22,000 Token Tax: Why I Killed My MCP Server

#claudecode #mcp #ai #llm

I was at a company workshop, arguing with beginners about token costs.

They wanted to save money. Reasonable instinct. They were spending maybe €25 a week on API calls and wanted to cut it to €20. I pushed back hard: "You're at the learning stage. Spend more, not less. Explore. Break things. Create costs.
Because while you're saving €5, I'm spending €600 a week — and I'll gladly spend €20 more if it means finishing a ticket in one session instead of two."

Then I told them the one scenario where token consumption actually matters: when you need to prolong a session. Not to save money — to preserve context. Because when your session compacts or resets, you lose everything the model was holding in its head. And in the early days of Claude Code, there was no auto-compact. Your session just died with an error when you hit the limit. Auto-compact made this better, but you never know what survives the squeeze. Research confirms what I've felt in practice: context length alone hurts LLM performance, even when the relevant information is right there. The longer your context, the worse the output — a phenomenon sometimes called context rot. So every unnecessary token you load at startup is a tax on the quality of everything that follows.

I came home that evening and opened a new session. Ran /context. Stared at the breakdown.

22,000 tokens in MCP tools alone. Before I typed a single prompt.

The Receipt

I had three MCP servers running: mcp-atlassian for Jira and Confluence, chrome-devtools for browser automation, and context7 for documentation lookups. Together they cost 22K tokens. But the Atlassian server was the one I could kill — it was registering 33 tools for a service where I used six.

I'd gone through the settings and disabled as many as I could — but the server kept loading all of them. Confluence tools I never used. Batch operations. Sprint management. Worklog tracking. None of it mattered.

All 33 tools. About 10,000 tokens. Every single session.

I compared the numbers. One skill — 40 tokens of metadata. One MCP tool — 300 tokens of schema. The Atlassian MCP was loading tools I had explicitly told it not to load.

The Setting That Doesn't

Here's what disabledTools actually does in Claude Code: it prevents the AI from calling a tool. That's it.

It does not prevent the MCP server from starting. It does not prevent the server from registering its tools. It does not prevent those tool schemas from being injected into the context window. The Docker container still spins up. The tool definitions still flow in. The tokens still burn. disabledTools is a runtime filter, not a context optimization. I was disappointed — if the setting exists in the configuration, you'd expect the platform to be smart enough to not load what you've explicitly disabled. But that's not how it works.

The only way to actually save the tokens is to remove the MCP server entirely.

The Replacement: 7 Scripts

I looked at what I actually use. Six Jira operations. Zero Confluence operations. Out of 33 registered tools, I needed six.

So I wrote shell scripts. The same pattern I already use for Jenkins and Slack — credentials in a JSON file under ~/.config/, curl calls with Bearer token auth, jq for parsing responses.

The first script took five minutes. Authentication worked on the first try — just Authorization: Bearer <token> with the same personal access token the MCP had been using. No Docker container. No protocol negotiation. No tool registration. Just curl.

  TOKEN=$(jq -r '.personal_token' ~/.config/jira/credentials.json)
  BASE_URL=$(jq -r '.base_url' ~/.config/jira/credentials.json)

  curl -s -k -H "Authorization: Bearer $TOKEN" \
    "$BASE_URL/rest/api/2/issue/PROJ-123"

The credentials file should be chmod 600 (owner-only read/write).

The -k flag skips SSL certificate verification because our internal Jira uses a self-signed cert — don't copy that for public endpoints. And yes, the token ends up in the process list briefly via shell variable expansion. For a local developer workstation running personal scripts, that's an acceptable trade-off. For a shared server or CI pipeline, you'd want to pipe credentials through stdin instead.

Cairn built all six scripts in under an hour. I fed the Jira REST API documentation into the session for context, described the pattern I wanted, and Cairn wrote the scripts, tested them against our live Jira, and verified each one worked. I gave it a real ticket number to go wild on — fetch, update, transition, comment, the full lifecycle. Then we fine-tuned the scripts to bake in our project defaults: the right component, the right team label, the custom fields our board requires. Get issue. Search with JQL. Update fields. Add comment. Get transitions. Transition status. Each script reads credentials, makes a curl call, formats the output. No abstraction layer. No protocol. No 300-token tool schema.

Then I added a seventh: create issue.

The Thing MCP Could Never Do

Creating Jira tickets through MCP never worked reliably. I'd hit the MCP permission wall before — specialized agents couldn't even access the tools. But even when access worked, the actual creation flow — with custom fields, project-specific components, team assignments — always hit edge cases that the MCP abstraction couldn't handle cleanly.

The curl script created a ticket on the first try.

  curl -s -k -X POST \
    -H "Authorization: Bearer $TOKEN" \
    -H "Content-Type: application/json" \
    -d '{"fields": {
      "project": {"key": "PROJ"},
      "issuetype": {"name": "Task"},
      "summary": "Test ticket",
      "components": [{"name": "Frontend"}],
      "customfield_12345": [{"value": "Team-A"}]
    }}' \
    "$BASE_URL/rest/api/2/issue"

HTTP 201. The ticket existed. With the right component, the right team, the right assignee. First try.

The MCP had been sitting between me and a REST API that was perfectly willing to cooperate. It was abstracting away complexity that didn't exist.

The Abstraction Tax

MCP is a good idea for getting started. You install a server, you get tools, you're productive in minutes. For someone spending €25 a week who's still learning, that's the right trade-off. The setup cost is zero and the token cost doesn't matter because you're not pushing session limits.

When you're 5,428 prompts deep into a persistent agent system, running multi-agent workflows that eat 100K+ tokens per ticket, every unnecessary token at startup compresses the useful work you can do before quality starts degrading. I've learned this lesson before — 23K tokens burned loading a bloated memory file. Now it was 10K tokens burned loading Jira tools I'd explicitly disabled. Same tax, different landlord.

And here's the part that bothered me most: I couldn't partially load the MCP server. It's all or nothing. Want 6 tools? You get 33. Want to disable the other 27? You can — but you still pay for all 33 in your context. The protocol has no mechanism for selective tool registration based on client preferences.

So I replaced it:

33 MCP tools with 7 shell scripts
~10,000 tokens per session with 0 tokens at startup
Docker container on every launch with no container
Issue creation broken with Issue creation works
Tool schemas you can't customize with Scripts you own completely

The seven scripts total about 700 lines of bash. They live in my skill directory, version-controlled, testable. I can read them. I can debug them. I can add project-specific defaults — like auto-applying the default component and team for every ticket in our project. Try doing that in an MCP tool schema.

And I know exactly what they do. That MCP server was a Docker image pulled from a third-party registry, running with my Jira credentials baked into environment variables. I never audited that image. I never read its source. Every docker pull could have shipped a different binary. When your integration is 700 lines of bash that you wrote and can read end to end, supply chain risk isn't a concern — it's just curl.

When to Graduate

MCP stops making sense the moment you're paying for tools you don't use and can't shed. When you need 6 tools but get 33. When 10K tokens burn before your first prompt. When you need capabilities the server doesn't expose. When you need project-specific behavior that the protocol can't express. That's when you graduate.

The graduation path is simple: credentials file, curl, jq. The same tools that powered the internet before every API got wrapped in an abstraction layer. They still work. They're still faster. And you own them completely.

They don't cost you a single token to say hello.

What I Actually Learned

This isn't new. It's what every software engineer has done since the beginning: make it work first, then optimize. The MCP got me running. It was the right choice when I was figuring out how to wire an AI agent to Jira at all. But once it worked, the job was to look at the bill and cut the waste. That's not AI-specific wisdom — that's just engineering.

Integrations have carrying costs. An MCP server isn't free just because it's open-source. A tool registry isn't free just because the tools are disabled. Every abstraction layer between your code and the API it talks to has a price — in tokens, in debuggability, in flexibility, in the things you can't do because the abstraction didn't anticipate your use case.

Sometimes the best integration is the one with no integration layer at all.

Originally published at CodeWithAgents.de

Top comments (4)

Henry Godnick • Apr 4

This hits on something I hadn't fully articulated before — the token tax isn't just about what you explicitly send in prompts, it's all the invisible overhead that loads before you even start. MCP tool schemas, system prompts, memory files.

The context rot effect you mention is real. I've noticed this working pattern: sessions that start lean stay sharp much longer.

I built TokenBar (tokenbar.site) partly because of exactly this — a macOS menu bar app that shows live token count and cost as you work. It doesn't solve the context rot problem, but at least you can see it happening and decide when to reset. Watching the counter climb in real time changes how you think about what you load at session start.

The "€600/week and don't care about €5" framing is interesting though. The issue isn't cost, it's context quality degradation. Those are the same thing wearing different hats.

Benjamin Eckstein • Apr 5

Hey Henry, thanks for your auto generated comment 😅 ???

It is just funny how i started to notice AI built sentences. But I am still genuinely happy for my first comment on this post.

And yes if the company pays ou AI bill, they want you to learn ... with your salary in comparison .. they do not really should care about the 5 Euro .. but about the experience you make and the learnings you can distill and share back with your colleagues.

Apex Stack • Apr 5

This really resonates. I run about 10 scheduled agents that each spin up their own sessions, and MCP tool bloat was one of the first things I had to audit. The "disabledTools is a runtime filter, not a context optimization" insight is critical and not documented anywhere obvious.

The supply chain angle you raise at the end is underappreciated too. When your Jira integration is a Docker image you never audited vs. 700 lines of bash you wrote yourself, the security posture is completely different. I've been thinking a lot about MCP server auditing lately — tool description injection, permission scope creep, config drift — and the simplest mitigation is often exactly what you did: just don't use the abstraction.

One thing I'd add: the "all or nothing" tool registration problem compounds when you have multiple MCP servers. Each one loads its full schema, and the cumulative token tax across 3-4 servers can easily eat 30-40K tokens before your first prompt. That's a meaningful chunk of your working context gone.

ChrisRemo • Apr 5

Great analysis. The 22k token tax is real, I measured similar numbers. That's why I built Code Mode into my proxy: instead of loading all tool schemas upfront, the LLM discovers tools on-demand via search_tools and batches multiple calls in one WASM-sandboxed execution. 33 tools in the registry but only ~200 tokens in the context until you actually need one. Your curl scripts are the right move for 6 tools. When you're back at 30+ across multiple servers, the discovery pattern scales better than disabling individual tools.