DEV Community

Cover image for Build MCP Servers that don't suck...tokens.
Scott Lepper
Scott Lepper

Posted on

Build MCP Servers that don't suck...tokens.

First-generation MCP servers were great. They gave AI agents access to a ton of external apps and data — Jira, Confluence, GitHub, Linear, you name it. But most of them just wrapped REST APIs. And that causes a ton of context bloat, hallucinations, and token burning.

Combining a few strategies from the ultra-mcp-toolkit, you can reduce that bloat dramatically — and save money.

Generating a cost-efficient MCP server is easy. Just install the skill and off you go.

Here's what "dramatically" looks like

Real benchmark, live Jira instance, reproducible:

Per-call response size

scenario naive with toolkit savings
fetch 1 simple ticket 20.3KB 1.2KB 17.5×
investigate rich ticket 270.7KB 15.5KB 17.5×
JQL search ~10 tickets 20.5KB 3.5KB 5.8×

That rich-ticket row is the one that hurts. 270 KB → 15.5 KB. ~67k tokens down to ~3.9k tokens. Same content; the full payload still lands on disk and the agent can fetch it via a ref: path only if it actually needs the detail.

Tool-list cost (paid every conversation)

approach bytes ~tokens savings
naive (one tool per op) 38.9KB 9,947
consolidated tools 25.1KB 6,427 1.5×
consolidated + filtered ~6 KB ~1,600
code-api mode 401B 100 99×

You read that right. Tool listings drop from ~10k tokens to ~100 tokens. On every. single. conversation.

Why MCP servers leak tokens

Four anti-patterns show up almost everywhere:

  1. Returning raw API JSON. A Jira issue carries iconUrls, nested self URLs, schema metadata, expand hints, three different shapes of the same status field. The agent needs none of it.
  2. One MCP tool per endpoint. A typical CRM has ~80 endpoints → 80 tool descriptions in the listing → ~10k tokens before the user types anything.
  3. Asking the LLM to filter or paginate. The model can't reliably page through huge structures, and the chunking logic itself costs tokens. Filtering belongs server-side.
  4. No discipline on what gets kept. Denylist trimming (delete result.iconUrl) silently breaks the day the API adds a new noisy field. Allowlists keep the contract stable.

The fix, in three strategies

1. Allowlist-style trim projections

import { pick } from "ultra-mcp-toolkit/trim";

const issueSummary = (raw) => {
  const r = raw as { key: string; fields: Record<string, unknown> };
  return {
    key: r.key,
    ...pick(r.fields, ["summary", "status", "priority", "assignee"]),
  };
};
Enter fullscreen mode Exit fullscreen mode

Register the trim once. Every response routes through it. New API fields default to dropped. The model sees what it needs; the full response lives on disk as a ref: the agent can dereference on demand.

2. Consolidated tools (action-discriminated)

Instead of 80 tools, expose ~15 — each taking an action arg:

{ action: "get", issueIdOrKey: "PROJ-1" }
{ action: "create", projectKey: "PROJ", summary: "..." }
{ action: "transition", issueIdOrKey: "PROJ-1", transition: "Done" }
Enter fullscreen mode Exit fullscreen mode

Same operations, 1/5th the tool-list cost. The toolkit's dispatcher handles per-action Zod validation, manifest routing, and a full: true escape hatch when the model genuinely needs the raw response.

3. Code-api mode (the 99× lever)

Expose a single MCP tool that hands the agent a path to a bundled CLI plus a socket address:

node <cli-path> issue.get --issueIdOrKey=PROJ-1
# stdout: trimmed summary as JSON
# final line: `ref: /path/to/full-response.json`
Enter fullscreen mode Exit fullscreen mode

The agent drives the whole API from its shell. Tool list stays at one tool forever, no matter how many operations exist. For shell-capable agents (Claude Code, Cursor, anything with bash), it's pure win.

Quick start

npm install ultra-mcp-toolkit
Enter fullscreen mode Exit fullscreen mode

The toolkit ships a Claude Code skill that auto-loads when you work on an MCP server. Install it:

npm run install-skill
Enter fullscreen mode Exit fullscreen mode

That's it. The skill walks the agent through manifest design, trim projections, dispatcher wiring, and server boot — the patterns that produce the numbers above.

Working from a non-Claude agent (Codex CLI, Cursor, Aider, Continue, Zed)? Point it at the skill markdown directly — AGENTS.md shows you how.

What's in the box

  • Operation manifest — declare endpoints as pure data; powers MCP tools, CLI, and code-api bridge from one source of truth.
  • Trim registry — type-safe allowlist projections.
  • Content-addressed sandbox — full responses land on disk; the model sees a ref: only.
  • Page cache — versioned-id disk cache for stable keys (PR diffs by SHA, Confluence pages by version).
  • Pooled retry-aware HTTP transportundici + 429-aware retry honoring Retry-After.
  • Atomic streaming downloads — sha256-verified, path-traversal-safe.
  • Consolidated tool dispatcher — Zod-validated, action-discriminated.
  • CLI scaffolding — bridge mode + direct mode, free with createCli.
  • Bundled Claude Code skill — installs in one command.

Production proof

Used in ultra-jira-mcp and ultra-bitbucket-mcp. The benchmark numbers above come from the Jira server running against a real Jira Cloud instance — every byte measured is one a production agent would actually receive.


If you're building an MCP server for any enterprise API — Jira, Confluence, GitHub, Linear, Notion, ServiceNow, Salesforce, whatever — and your token bill or context window is starting to bite, give it a try.

github.com/scottlepp/ultra-mcp-toolkit — issues, PRs, and benchmark contributions welcome.

What's the most token-bloated MCP server you've shipped or seen? Drop it in the comments — I'm collecting horror stories.

Top comments (1)

Collapse
 
gimi5555 profile image
Gilder Miller

A Salesforce MCP wrapper I saw returned the full object schema on every tool call. We're talking 80KB responses for a simple get contact by ID query. The agent spent half its context window just parsing metadata it never used.
The allowlist projection approach is the right call. Most APIs return way more than any agent actually needs. 🫠