DEV Community

Cover image for I measured MCP vs a CLI for agent search. The MCP used 17x more tokens per call.
ARY RABELO
ARY RABELO

Posted on • Originally published at github.com

I measured MCP vs a CLI for agent search. The MCP used 17x more tokens per call.

I ran the same Google search through SerpApi's official serpapi-mcp server and through serp, the small open-source (MIT) CLI I built for the same job. Before I had searched anything, the MCP had already put 771 tokens into the model's context. The CLI put zero. When I did search, the MCP returned 6,047 tokens and the CLI returned 351. Same query, same serpapi library underneath, same machine.

That standing cost, paid on every turn whether you search or not, is the number nobody puts in the demo. So I wrote it all down.

TL;DR: for stateless search inside an agent loop, a CLI costs roughly 0 standing tokens against ~771 per turn for an MCP tool, and ~351 per call against ~6,047. The compaction logic on both sides is identical; the CLI just trims to the fields you ask for and stays out of context when idle. Pick the transport that fits the call.

Standing cost, paid every turn

SerpApi MCP serp CLI
Tool schema in context, per turn 771 tokens ~0 (binary on PATH)
Skill metadata n/a ~110 tokens, and only until it triggers

The MCP injects its search tool schema, 771 tokens, on every request. The CLI injects nothing. It's a binary on PATH; the agent learns it exists once and forgets about it until it calls it.

Discovery cost, paid once

SerpApi MCP serp CLI
Learn the interface engine resource (google.json) = 5,816 tokens --help = ~290 tokens

Both are on demand. To learn one engine's parameters through the MCP you read its resource, and google.json is 5,816 tokens. To learn the CLI you read --help, about 290.

Per call, the same google query, byte for byte

Response Tokens
MCP complete (the default) 6,047
MCP compact 4,577
CLI --format complete 5,321
CLI compact, no --fields 3,940
CLI --fields title,link 351

The MCP's default mode is complete, so out of the box a search lands about 6,000 tokens in your context. The CLI defaults to compact and lets you ask for only the fields you want, so the same ten results come back at 351. That's roughly 17x smaller than the MCP default, and 13x smaller than the MCP's own compact mode.

Why the CLI costs 13x fewer tokens

The honest part first: the compaction logic on both sides is identical. Both drop the same five metadata blocks (search_metadata, search_parameters, search_information, pagination, serpapi_pagination). SerpApi's MCP is a good piece of software, and 771 tokens for one universal tool that covers every engine is a reasonable schema, not bloat. I'm not dunking on it.

The gap comes from three things the CLI does on purpose. First, it projects fields: --fields title,link trims every result down to the keys you named, where the MCP's compact mode strips metadata but still hands back every field of every result. That one feature is most of the 13x. Second, it minifies, while the MCP pretty-prints with indent=2, which by itself is about 15% more characters. Third, it costs nothing when idle. One MCP server's standing cost is cheap on its own. The catch is that it compounds: wire up ten of them and you're carrying a few thousand tokens of always-loaded schema before the agent has done any work.

Other people measured the same effect, harder

The principle isn't mine. Anthropic's framing is that the context window is a public good, and two published benchmarks point the same way. Their code-execution-with-MCP writeup took a Drive-to-Salesforce workflow from about 150,000 tokens to about 2,000 by calling tools as code instead of loading their definitions, a 98.7% cut. The OnlyCLI benchmark clocked a GitHub task at 44,026 tokens through MCP versus 1,365 through a CLI, about 32x. Those are big end-to-end scenarios with a lot of tools and intermediate results. My 13-17x on a single search is the small, conservative version of the same mechanism.

When you actually want the MCP

This isn't "CLI beats MCP." It's pick the transport that fits the call.

Reach for the MCP when the connection is the hard part: OAuth or multi-user auth, server-side quota and rate-limit governance, one hosted endpoint shared by many clients, a session that holds state across steps. SerpApi runs serpapi-mcp and a hosted version at mcp.serpapi.com, and that's where it earns its keep.

Reach for the CLI when the call is stateless. Query in, results out, one step, one key in the environment, and a fat payload you want to trim before it reaches the model. A search is the textbook case. You read --help once, and every call after that returns only what you asked for.

What serp is

It wraps SerpApi's REST endpoint and compiles to a single binary with bun build --compile, no runtime dependencies. compact drops the metadata blocks, --fields projects each result to the keys you name, and the geo flags (--location, --gl, --hl) only go on the wire when you set them. Output is minified JSON on stdout, because the thing reading it is a machine. The key reads from SERPAPI_API_KEY and falls back to SERP_API_KEY.

The parts that matter for testing are pure functions: the URL builder, the arg parser, the result shaping. The network call and the run() entry take an injected fetch and injected streams, so the whole suite, 37 tests, runs offline with no key and no requests. That's the part I'm actually happy with.

There's a Claude Code skill next to it, searching-with-serpapi, that holds the procedure: which engine fits which intent, compact vs complete, operators, when to dedup and cite, when not to search at all. It costs about 110 tokens until it triggers. Capability comes from the CLI (or the MCP), the how-to from the skill.

Two caveats I'd want if I were reading this

Prompt caching narrows the standing gap on warm sessions where the toolset doesn't change, since the static schema block gets amortized. The 771-per-turn number bites hardest on cold starts and whenever you add or swap a tool. The per-call gap doesn't care about caching; you pay it fresh on every search.

And code execution is a bigger lever than any of this. It's where the 98.7% comes from. But it needs a real sandbox with resource limits and monitoring, which a plain CLI call skips. Different tradeoff, worth naming out loud.

Bottom line

Match the transport to the call. For stateless search in a coding loop, a small CLI plus a skill is cheaper on context (about 0 standing against 771 a turn, about 350 a result against 6,000) and the standing savings stack as you add tools. For a hosted, governed, multi-client connection, the MCP is the right call.

If you are running agents with a stack of MCP servers, the standing cost is worth measuring on your own setup. The method is in the appendix, so it is easy to reproduce. I would genuinely like to know what numbers you get.

The repo is open source and MIT: github.com/aryrabelo/serpapi-agent-toolkit. The CLI and the skill ship together, both complements to SerpApi's serpapi-mcp.

Appendix: how I measured

Tokens are characters / 4, the same proxy on both sides, so trust the ratios more than the absolute numbers.

MCP standing is the real tools/list payload from serpapi-mcp running on FastMCP, counting the fields a client actually receives (name, description, inputSchema): 771 tokens for the one search tool. The server also exposes 107 engine resources; listing all of them is about 4,300 tokens, and reading google.json is 5,816.

Per call is one live google search for the same query, pulled through the same serpapi Python library the MCP uses, then serialized two ways: json.dumps(indent=2) to match the MCP, and minified with field projection to match the CLI. Exact tokens: MCP complete 6,047, MCP compact 4,577, CLI complete 5,321, CLI compact 3,940, CLI --fields title,link 351. CLI standing and --help come from the shipped v0.1.0 text, about 110 and 290.

Sources: Anthropic's "Code execution with MCP", "Writing tools for agents", and "Effective context engineering"; the OnlyCLI token-cost benchmark; SerpApi's serpapi-mcp and serpapi-javascript repos.

Top comments (2)

Collapse
 
uzoma_uche_3ec83974b4a8a5 profile image
Echo

17x is the kind of delta that makes you stop and rethink. The token cost per call is fine when MCP is the right shape, but most agent search workloads are not. CLI + cache was probably the right call for this benchmark.

Collapse
 
ary_rabelo_7fce97b75d6dbd profile image
ARY RABELO

This is exactly the read I was hoping someone would land on, and honestly you put it more cleanly than I did in the post. The one thing I'd add: the per-call gap is the part caching doesn't rescue. Prompt caching amortizes the MCP's standing schema on a warm session, but the 6,047 vs 351 is paid fresh on every single call, so it bites hardest on exactly the high-frequency search workloads you're describing. MCP genuinely earns its keep when the connection itself is the hard part, hosted, stateful, multi-client auth. Search usually isn't that. If you've run numbers like this on your own stack, I'd love to compare notes, the more real data points out there the better.