MCP Is Costing You 37% More Tokens Than Necessary

#mcp #cli #codemode #claude

The Problem

MCP is powerful, but there's a cost issue that is often neglected.

MCP servers contain tools, and each tool comes with a JSON schema. As more tools are added, input token usage increases significantly.

Every MCP request sends the full JSON schema for every registered tool, including the ones the model won't actaully use. There has to be a more efficient way to do this.

I built a tool called clifast that converts functions into CLI commands and I benchmarked exactly how much MCP costs vs. a CLI approach. Here's the repository, so you can test it yourself.

With just 5 tools:

37% more input tokens
74% larger tool definitions
it scales linearly

The Analysis

MCP tool discovery works like this:

your client calls listTools()
gets back full JSON schemas for every tool
converts them to the Anthropic format
sends all of them on every API call. No caching, no partial loading.

The CLI alternative: one generic run_command tool with a text description of available commands. Fixed size regardless of how many commands you have.

Two approaches, same result

Both approaches use the same weather API, same model, same prompt. The only difference is how tools are presented to Claude.

MCP — 5 individual tools, each with a full Zod schema:

server.registerTool("get_forecast", {
  description: "Get daily weather forecast for a location for 1 to 3 days ahead",
  inputSchema: {
    location: z.string().describe("City name or location"),
    days: z.number().min(1).max(3).default(3).describe("Number of days to forecast (1-3)"),
  },
}, async ({ location, days }) => {
  const data = await fetchWeatherData(location);
  if (!data) return errorResponse("Weather API error");
  return successResponse({ /* ... */ });
});

// ... repeat for all 5 tools

CLI — one tool, plain text description:

const weatherCliToolDefinition: Anthropic.Messages.Tool = {
  name: "run_command",
  description: `Run a weather CLI command. Available commands:
  getCurrentWeather <location>
  getForecast <location> [days]
  getHourlyForecast <location> [day]
  getAstronomy <location>
  getWindAndPressure <location>`,
  input_schema: {
    type: "object",
    properties: {
      command: { type: "string", description: "Command name" },
      arguments: { type: "array", items: { type: "string" }, description: "Command arguments" },
    },
    required: ["command", "arguments"],
  },
};

The CLI tools are generated from plain TypeScript exports using clifast:

export async function getCurrentWeather(location: string) { /* ... */ }
export async function getForecast(location: string, days: string = "3") { /* ... */ }
export async function getHourlyForecast(location: string, day: string = "0") { /* ... */ }
export async function getAstronomy(location: string) { /* ... */ }
export async function getWindAndPressure(location: string) { /* ... */ }

One command generates the CLI:

npx clifast generate cli/get-weather.ts -y

Results

Metric	CLI	MCP	Difference
Input tokens	1,498	2,368	CLI uses 37% fewer
Output tokens	228	230	Roughly equal
Total tokens	1,726	2,598	CLI uses 34% fewer
Latency	7.3s	7.1s	Comparable
Tool definition size	479 B	1,851 B	CLI is 74% smaller

Benchmarked on claude-sonnet-4-6. Both produced equivalent, correct answers. The token gap comes entirely from tool definitions sent on each request.

The scaling math:

MCP grows linearly: each new tool adds its full schema to every request.
CLI stays constant: adding a command is one more line of text.
At 20 tools, you're looking at ~4x the schema overhead on every single API call, across every turn of the conversation.

The hallucination risk (and how to mitigate it)

Let's be honest, the CLI approach trades schema precision for token efficiency. With MCP, Claude gets structured type information for every parameter. With CLI, it reads a text description and could hallucinate a command that doesn't exist or pass wrong argument types.

In practice, models like Claude are really good at understanding prompt intent. But if you need stronger guarantees:

Group by domain: instead of one mega-tool with 50 commands, create a few CLI tools per domain (run_weather_command, run_billing_command). You still get massive token savings over MCP while reducing the hallucination surface.
Validate before execution: check the command name against a known list before running it. Return a clear error if it doesn't match.
Be specific in descriptions: include argument types and constraints in the text (Claude respects these).

The sweet spot is usually 5-15 commands per CLI tool. Enough to save tokens, few enough that the model doesn't get confused.

When to use which

Use MCP when interoperability matters: shared tool servers consumed by multiple clients, strict parameter validation, or when you need the ecosystem.

Use CLI when cost and speed matter: high-volume production, many tools per agent, token-constrained budgets, or when you control the full stack.

They're not mutually exclusive. You can use MCP for discovery and path execution through CLI tools (mix and match).

Try it yourself

Benchmark repository: https://github.com/AlexandrosGounis/mcp-vs-cli.git

The benchmark outputs a timestamped JSON report and prints the full comparison to stdout. MCP is a great protocol. But abstractions have costs, and in AI tooling, those costs show up on every request. Measure yours.