CLI vs MCP: Which Tool Interface Actually Works for AI Coding Agents?

#ai #webdev #tutorial #productivity

Your AI coding agent needs to call a tool — maybe list open pull requests, query a database, or trigger a deployment. You have two broad options: point the agent at a CLI binary it can shell out to, or wire up a Model Context Protocol (MCP) server. Both get the job done. The difference in token cost, reliability, and long-term maintainability is significant enough to matter for real production workloads.

This article breaks down how each approach works mechanically, where each falls short, and how leading tools like Claude Code and Cursor are handling the tradeoff in practice.

How Each Interface Works

CLI tools are the default assumption. An agent receives a system prompt that says "you can run bash commands," then uses git, gh, curl, jq, or any other binary the host has installed. The model treats shell output as plain text and parses it inline. No special protocol — just stdin/stdout. Because these tools appear heavily in public training data, models can use them without being handed an explicit schema first.

MCP is a JSON-RPC 2.0 protocol, released by Anthropic in November 2024 and since adopted by OpenAI, Google, and others. An MCP server runs as a separate process (or remote service) and exposes a list of typed tools via a tools/list response. On each conversation turn, the client injects the full tool catalogue — including parameter schemas, descriptions, and annotations — into the model's context window. The model then calls tools/call with structured arguments, gets a structured response back, and the MCP server handles auth, validation, and side effects.

The protocol supports three transports: stdio (same machine, subprocess), SSE (server-sent events over HTTP), and the newer streamable HTTP transport added in the March 2026 spec revision. Each adds different operational complexity.

MCP is modeled on the Language Server Protocol. Just as LSP standardized "how editors talk to language tools," MCP standardizes "how AI agents talk to external services." That analogy explains both its strength (write once, works in any MCP-compatible agent) and its weakness (LSP servers also have notorious cold-start and schema verbosity problems).

The Token Cost Problem Is Real — But Uneven

The most widely cited difference is context overhead. When a GitHub MCP server with 93 tools loads into a conversation, the full schema injection runs to roughly 55,000 tokens before you've sent a single request. For larger enterprise setups, multi-server configurations can consume 100,000–150,000 tokens in tool definitions alone — which, on a 200k context window, means 27–50% of your budget is gone before the agent does anything useful.

CLI doesn't carry this fixed overhead. A single gh pr list call with its output costs somewhere in the 900–3,000 token range total.

One developer's benchmark comparing the two approaches on the same tasks showed CLI at roughly 4,150 tokens versus MCP at roughly 145,000 tokens for a device-listing query — a 35x difference. A separate cost-projection from Vensas GmbH estimated $3.20 vs. $55.20 per month at 10,000 operations.

These numbers come with important caveats. The CLI-side cost grows with output length — if your git log pipes gigabytes of history, you pay for that too. And MCP's upfront cost is amortized across a long session: if you're making 200 tool calls in one conversation, you only pay the schema overhead once. The cost gap narrows for long agentic sessions with many tool calls.

Reliability data is also worth watching carefully. A 75-run benchmark showed CLI at roughly 100% success versus MCP's 72%, with TCP timeout failures driving most of MCP's misses. But a separate reliability study across 100 production MCP servers found the top decile passing 95%+ of trials — suggesting MCP server quality varies dramatically. Of 3,000+ MCP servers catalogued across public registries as of April 2026, only 12.9% met documented quality thresholds.

Where Each One Actually Belongs

The practical answer isn't CLI or MCP — it's that the choice depends on what you're connecting to and who's running the agent.

Use CLI when:

The tool has a first-class CLI (git, gh, kubectl, aws, stripe). These are heavily represented in model training data, meaning the agent can use them without a schema. Reliability is local. Composability with pipes is free.
You're doing local filesystem operations, builds, or test runs. make test and grep -r don't need MCP.
You're on a tight context budget or running many short sessions. The per-call cost stays predictable.

Use MCP when:

The service you're integrating has no CLI — Figma, Notion, Linear, custom internal APIs. Writing a CLI wrapper just to avoid MCP adds more maintenance than the MCP server would.
You need per-user OAuth. Shared CLI credentials mean one compromised token affects every user; MCP's structured auth model lets you revoke individual access.
You're building a product where other agents or tools need to discover capabilities at runtime. MCP's self-description is a genuine advantage when the tool surface isn't known ahead of time.
You need structured input validation. MCP's JSON Schema validation catches bad arguments before they hit your API. CLI tools typically fail on bad input with cryptic stderr output.

The emerging pattern among serious agentic systems reflects this split. Claude Code uses CLI tools (Bash, Read, Write, Edit) for local filesystem and process operations, while MCP handles integrations with SaaS services where CLI alternatives don't exist or require complex auth. Cursor follows a similar hybrid model. The tools aren't competing — they occupy different parts of the same pipeline.

Closing the Gap: Techniques Worth Knowing

The raw token cost of MCP is a known problem, and several approaches have emerged to address it without abandoning the protocol.

Lazy schema loading. Anthropic's own Code Mode achieved a reported 98.7% token reduction (from roughly 150,000 to 2,000 tokens) by exposing MCP functionality as TypeScript SDKs that get discovered on demand rather than injecting the full schema upfront.

Fewer, better tools. Cloudflare's Code Mode reduced 2,500 API endpoints from 1.17 million tokens to roughly 1,000 by providing two discovery tools — one that lists capability categories, one that retrieves the schema for a specific capability — rather than dumping everything at once.

MCP gateways with connection pooling. Gateway patterns can push MCP reliability from the unmanaged median (~71%) toward 99% by handling reconnection and caching. Tools like Cloudflare's Workers AI gateway and open-source alternatives address this.

If you're building an agent that needs to scale token efficiency with MCP, lazy schema loading is the approach most likely to matter. Upfront tool enumeration is a default behavior, not a protocol requirement.

The MCP ecosystem is still young. As of early 2026, the spec has gone through two major revisions in 18 months, and server-to-server compatibility issues are common. Avoid taking a dependency on undocumented server behaviors or assuming a given MCP server is production-hardened without testing it. Check whether it handles typed schemas, idempotency, and explicit cancellation — the traits that separate the top-decile servers from the median.

Making the Call

For most developer workflows running on a single machine, start with CLI. The tooling is already there, the model already knows how to use it, and the operational overhead is near zero. Reach for MCP when you hit the cases it genuinely solves: SaaS integrations without a CLI, per-user auth requirements, or toolsets that need to be discoverable at runtime by agents you don't control.

The question isn't which interface is better in the abstract — it's which one matches the shape of the problem. CLI tools are better for local, fast, composable operations with predictable token costs. MCP is better for structured, auth-aware, self-describing integrations to external services. Both will likely coexist in any non-trivial agentic system.

Originally published at pickuma.com. Subscribe to the RSS or follow @pickuma.bsky.social for new reviews.