<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>DEV Community: ARY RABELO</title>
    <description>The latest articles on DEV Community by ARY RABELO (@ary_rabelo_7fce97b75d6dbd).</description>
    <link>https://dev.to/ary_rabelo_7fce97b75d6dbd</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F1541595%2F74d5952e-5b34-4aa9-8492-5c7e7a276e26.jpg</url>
      <title>DEV Community: ARY RABELO</title>
      <link>https://dev.to/ary_rabelo_7fce97b75d6dbd</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://dev.to/feed/ary_rabelo_7fce97b75d6dbd"/>
    <language>en</language>
    <item>
      <title>I measured MCP vs a CLI for agent search. The MCP used 17x more tokens per call.</title>
      <dc:creator>ARY RABELO</dc:creator>
      <pubDate>Tue, 02 Jun 2026 20:49:16 +0000</pubDate>
      <link>https://dev.to/ary_rabelo_7fce97b75d6dbd/i-measured-mcp-vs-a-cli-for-agent-search-the-mcp-used-17x-more-tokens-per-call-43p6</link>
      <guid>https://dev.to/ary_rabelo_7fce97b75d6dbd/i-measured-mcp-vs-a-cli-for-agent-search-the-mcp-used-17x-more-tokens-per-call-43p6</guid>
      <description>&lt;p&gt;I ran the same Google search through SerpApi's official &lt;a href="https://github.com/serpapi/serpapi-mcp" rel="noopener noreferrer"&gt;serpapi-mcp&lt;/a&gt; server and through &lt;a href="https://github.com/aryrabelo/serpapi-agent-toolkit" rel="noopener noreferrer"&gt;&lt;code&gt;serp&lt;/code&gt;&lt;/a&gt;, the small open-source (MIT) CLI I built for the same job. Before I had searched anything, the MCP had already put 771 tokens into the model's context. The CLI put zero. When I did search, the MCP returned 6,047 tokens and the CLI returned 351. Same query, same &lt;code&gt;serpapi&lt;/code&gt; library underneath, same machine.&lt;/p&gt;

&lt;p&gt;That standing cost, paid on every turn whether you search or not, is the number nobody puts in the demo. So I wrote it all down.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;TL;DR:&lt;/strong&gt; for stateless search inside an agent loop, a CLI costs roughly 0 standing tokens against ~771 per turn for an MCP tool, and ~351 per call against ~6,047. The compaction logic on both sides is identical; the CLI just trims to the fields you ask for and stays out of context when idle. Pick the transport that fits the call.&lt;/p&gt;

&lt;h2&gt;
  
  
  Standing cost, paid every turn
&lt;/h2&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;&lt;/th&gt;
&lt;th&gt;SerpApi MCP&lt;/th&gt;
&lt;th&gt;
&lt;code&gt;serp&lt;/code&gt; CLI&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Tool schema in context, per turn&lt;/td&gt;
&lt;td&gt;771 tokens&lt;/td&gt;
&lt;td&gt;~0 (binary on &lt;code&gt;PATH&lt;/code&gt;)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Skill metadata&lt;/td&gt;
&lt;td&gt;n/a&lt;/td&gt;
&lt;td&gt;~110 tokens, and only until it triggers&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;The MCP injects its &lt;code&gt;search&lt;/code&gt; tool schema, 771 tokens, on every request. The CLI injects nothing. It's a binary on &lt;code&gt;PATH&lt;/code&gt;; the agent learns it exists once and forgets about it until it calls it.&lt;/p&gt;

&lt;h2&gt;
  
  
  Discovery cost, paid once
&lt;/h2&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;&lt;/th&gt;
&lt;th&gt;SerpApi MCP&lt;/th&gt;
&lt;th&gt;
&lt;code&gt;serp&lt;/code&gt; CLI&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Learn the interface&lt;/td&gt;
&lt;td&gt;engine resource (&lt;code&gt;google.json&lt;/code&gt;) = 5,816 tokens&lt;/td&gt;
&lt;td&gt;
&lt;code&gt;--help&lt;/code&gt; = ~290 tokens&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;Both are on demand. To learn one engine's parameters through the MCP you read its resource, and &lt;code&gt;google.json&lt;/code&gt; is 5,816 tokens. To learn the CLI you read &lt;code&gt;--help&lt;/code&gt;, about 290.&lt;/p&gt;

&lt;h2&gt;
  
  
  Per call, the same google query, byte for byte
&lt;/h2&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Response&lt;/th&gt;
&lt;th&gt;Tokens&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;MCP &lt;code&gt;complete&lt;/code&gt; (the default)&lt;/td&gt;
&lt;td&gt;6,047&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;MCP &lt;code&gt;compact&lt;/code&gt;
&lt;/td&gt;
&lt;td&gt;4,577&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;CLI &lt;code&gt;--format complete&lt;/code&gt;
&lt;/td&gt;
&lt;td&gt;5,321&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;CLI compact, no &lt;code&gt;--fields&lt;/code&gt;
&lt;/td&gt;
&lt;td&gt;3,940&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;CLI &lt;code&gt;--fields title,link&lt;/code&gt;
&lt;/td&gt;
&lt;td&gt;351&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;The MCP's default mode is &lt;code&gt;complete&lt;/code&gt;, so out of the box a search lands about 6,000 tokens in your context. The CLI defaults to compact and lets you ask for only the fields you want, so the same ten results come back at 351. That's roughly 17x smaller than the MCP default, and 13x smaller than the MCP's own compact mode.&lt;/p&gt;

&lt;h2&gt;
  
  
  Why the CLI costs 13x fewer tokens
&lt;/h2&gt;

&lt;p&gt;The honest part first: the compaction logic on both sides is identical. Both drop the same five metadata blocks (&lt;code&gt;search_metadata&lt;/code&gt;, &lt;code&gt;search_parameters&lt;/code&gt;, &lt;code&gt;search_information&lt;/code&gt;, &lt;code&gt;pagination&lt;/code&gt;, &lt;code&gt;serpapi_pagination&lt;/code&gt;). SerpApi's MCP is a good piece of software, and 771 tokens for one universal tool that covers every engine is a reasonable schema, not bloat. I'm not dunking on it.&lt;/p&gt;

&lt;p&gt;The gap comes from three things the CLI does on purpose. First, it projects fields: &lt;code&gt;--fields title,link&lt;/code&gt; trims every result down to the keys you named, where the MCP's compact mode strips metadata but still hands back every field of every result. That one feature is most of the 13x. Second, it minifies, while the MCP pretty-prints with &lt;code&gt;indent=2&lt;/code&gt;, which by itself is about 15% more characters. Third, it costs nothing when idle. One MCP server's standing cost is cheap on its own. The catch is that it compounds: wire up ten of them and you're carrying a few thousand tokens of always-loaded schema before the agent has done any work.&lt;/p&gt;

&lt;h2&gt;
  
  
  Other people measured the same effect, harder
&lt;/h2&gt;

&lt;p&gt;The principle isn't mine. Anthropic's framing is that the context window is a public good, and two published benchmarks point the same way. Their code-execution-with-MCP writeup took a Drive-to-Salesforce workflow from about 150,000 tokens to about 2,000 by calling tools as code instead of loading their definitions, a 98.7% cut. The OnlyCLI benchmark clocked a GitHub task at 44,026 tokens through MCP versus 1,365 through a CLI, about 32x. Those are big end-to-end scenarios with a lot of tools and intermediate results. My 13-17x on a single search is the small, conservative version of the same mechanism.&lt;/p&gt;

&lt;h2&gt;
  
  
  When you actually want the MCP
&lt;/h2&gt;

&lt;p&gt;This isn't "CLI beats MCP." It's pick the transport that fits the call.&lt;/p&gt;

&lt;p&gt;Reach for the MCP when the connection is the hard part: OAuth or multi-user auth, server-side quota and rate-limit governance, one hosted endpoint shared by many clients, a session that holds state across steps. SerpApi runs &lt;code&gt;serpapi-mcp&lt;/code&gt; and a hosted version at &lt;code&gt;mcp.serpapi.com&lt;/code&gt;, and that's where it earns its keep.&lt;/p&gt;

&lt;p&gt;Reach for the CLI when the call is stateless. Query in, results out, one step, one key in the environment, and a fat payload you want to trim before it reaches the model. A search is the textbook case. You read &lt;code&gt;--help&lt;/code&gt; once, and every call after that returns only what you asked for.&lt;/p&gt;

&lt;h2&gt;
  
  
  What serp is
&lt;/h2&gt;

&lt;p&gt;It wraps SerpApi's REST endpoint and compiles to a single binary with &lt;code&gt;bun build --compile&lt;/code&gt;, no runtime dependencies. &lt;code&gt;compact&lt;/code&gt; drops the metadata blocks, &lt;code&gt;--fields&lt;/code&gt; projects each result to the keys you name, and the geo flags (&lt;code&gt;--location&lt;/code&gt;, &lt;code&gt;--gl&lt;/code&gt;, &lt;code&gt;--hl&lt;/code&gt;) only go on the wire when you set them. Output is minified JSON on stdout, because the thing reading it is a machine. The key reads from &lt;code&gt;SERPAPI_API_KEY&lt;/code&gt; and falls back to &lt;code&gt;SERP_API_KEY&lt;/code&gt;.&lt;/p&gt;

&lt;p&gt;The parts that matter for testing are pure functions: the URL builder, the arg parser, the result shaping. The network call and the &lt;code&gt;run()&lt;/code&gt; entry take an injected &lt;code&gt;fetch&lt;/code&gt; and injected streams, so the whole suite, 37 tests, runs offline with no key and no requests. That's the part I'm actually happy with.&lt;/p&gt;

&lt;p&gt;There's a Claude Code skill next to it, &lt;code&gt;searching-with-serpapi&lt;/code&gt;, that holds the procedure: which engine fits which intent, compact vs complete, operators, when to dedup and cite, when not to search at all. It costs about 110 tokens until it triggers. Capability comes from the CLI (or the MCP), the how-to from the skill.&lt;/p&gt;

&lt;h2&gt;
  
  
  Two caveats I'd want if I were reading this
&lt;/h2&gt;

&lt;p&gt;Prompt caching narrows the standing gap on warm sessions where the toolset doesn't change, since the static schema block gets amortized. The 771-per-turn number bites hardest on cold starts and whenever you add or swap a tool. The per-call gap doesn't care about caching; you pay it fresh on every search.&lt;/p&gt;

&lt;p&gt;And code execution is a bigger lever than any of this. It's where the 98.7% comes from. But it needs a real sandbox with resource limits and monitoring, which a plain CLI call skips. Different tradeoff, worth naming out loud.&lt;/p&gt;

&lt;h2&gt;
  
  
  Bottom line
&lt;/h2&gt;

&lt;p&gt;Match the transport to the call. For stateless search in a coding loop, a small CLI plus a skill is cheaper on context (about 0 standing against 771 a turn, about 350 a result against 6,000) and the standing savings stack as you add tools. For a hosted, governed, multi-client connection, the MCP is the right call.&lt;/p&gt;

&lt;p&gt;If you are running agents with a stack of MCP servers, the standing cost is worth measuring on your own setup. The method is in the appendix, so it is easy to reproduce. I would genuinely like to know what numbers you get.&lt;/p&gt;

&lt;p&gt;The repo is open source and MIT: &lt;a href="https://github.com/aryrabelo/serpapi-agent-toolkit" rel="noopener noreferrer"&gt;github.com/aryrabelo/serpapi-agent-toolkit&lt;/a&gt;. The CLI and the skill ship together, both complements to SerpApi's &lt;code&gt;serpapi-mcp&lt;/code&gt;.&lt;/p&gt;

&lt;h2&gt;
  
  
  Appendix: how I measured
&lt;/h2&gt;

&lt;p&gt;Tokens are &lt;code&gt;characters / 4&lt;/code&gt;, the same proxy on both sides, so trust the ratios more than the absolute numbers.&lt;/p&gt;

&lt;p&gt;MCP standing is the real &lt;code&gt;tools/list&lt;/code&gt; payload from &lt;code&gt;serpapi-mcp&lt;/code&gt; running on FastMCP, counting the fields a client actually receives (&lt;code&gt;name&lt;/code&gt;, &lt;code&gt;description&lt;/code&gt;, &lt;code&gt;inputSchema&lt;/code&gt;): 771 tokens for the one &lt;code&gt;search&lt;/code&gt; tool. The server also exposes 107 engine resources; listing all of them is about 4,300 tokens, and reading &lt;code&gt;google.json&lt;/code&gt; is 5,816.&lt;/p&gt;

&lt;p&gt;Per call is one live google search for the same query, pulled through the same &lt;code&gt;serpapi&lt;/code&gt; Python library the MCP uses, then serialized two ways: &lt;code&gt;json.dumps(indent=2)&lt;/code&gt; to match the MCP, and minified with field projection to match the CLI. Exact tokens: MCP complete 6,047, MCP compact 4,577, CLI complete 5,321, CLI compact 3,940, CLI &lt;code&gt;--fields title,link&lt;/code&gt; 351. CLI standing and &lt;code&gt;--help&lt;/code&gt; come from the shipped v0.1.0 text, about 110 and 290.&lt;/p&gt;

&lt;p&gt;Sources: Anthropic's "Code execution with MCP", "Writing tools for agents", and "Effective context engineering"; the OnlyCLI token-cost benchmark; SerpApi's &lt;code&gt;serpapi-mcp&lt;/code&gt; and &lt;code&gt;serpapi-javascript&lt;/code&gt; repos.&lt;/p&gt;

</description>
      <category>mcp</category>
      <category>ai</category>
      <category>typescript</category>
      <category>llm</category>
    </item>
  </channel>
</rss>
