<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>DEV Community: tim zhang</title>
    <description>The latest articles on DEV Community by tim zhang (@tim_zhang11).</description>
    <link>https://dev.to/tim_zhang11</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F3956122%2Ffd4451c8-763a-4675-8a63-be1bfe8476b7.png</url>
      <title>DEV Community: tim zhang</title>
      <link>https://dev.to/tim_zhang11</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://dev.to/feed/tim_zhang11"/>
    <language>en</language>
    <item>
      <title>I Measured MCP vs CLI for Agent Tool Use — MCP Used 17x More Tokens Per Call</title>
      <dc:creator>tim zhang</dc:creator>
      <pubDate>Wed, 03 Jun 2026 03:49:21 +0000</pubDate>
      <link>https://dev.to/tim_zhang11/i-measured-mcp-vs-cli-for-agent-tool-use-mcp-used-17x-more-tokens-per-call-3egc</link>
      <guid>https://dev.to/tim_zhang11/i-measured-mcp-vs-cli-for-agent-tool-use-mcp-used-17x-more-tokens-per-call-3egc</guid>
      <description>&lt;h2&gt;
  
  
  The Setup
&lt;/h2&gt;

&lt;p&gt;I've been building AI agents that use tools — reading files, running commands, calling APIs. There are two main ways to give agents these tools:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;MCP (Model Context Protocol)&lt;/strong&gt; — the new standard everyone's adopting&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Direct CLI calls&lt;/strong&gt; — good old command-line execution&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;Everyone says MCP is the future. But nobody talks about the &lt;strong&gt;token cost&lt;/strong&gt;. So I measured it.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Test
&lt;/h2&gt;

&lt;p&gt;I built a simple file-reading tool and measured the exact token consumption for each approach:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Method&lt;/th&gt;
&lt;th&gt;Tokens per Call&lt;/th&gt;
&lt;th&gt;Latency (avg)&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;MCP (structured)&lt;/td&gt;
&lt;td&gt;~3,400 tokens&lt;/td&gt;
&lt;td&gt;280ms&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;CLI + raw output&lt;/td&gt;
&lt;td&gt;~200 tokens&lt;/td&gt;
&lt;td&gt;45ms&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Ratio&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;17x&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;6x&lt;/strong&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;h2&gt;
  
  
  Why MCP Uses So Many Tokens
&lt;/h2&gt;

&lt;p&gt;The overhead comes from three places:&lt;/p&gt;

&lt;h3&gt;
  
  
  1. Tool Schema in Every Request
&lt;/h3&gt;

&lt;p&gt;MCP sends the &lt;strong&gt;full JSON Schema&lt;/strong&gt; of every available tool with each request to the LLM. My simple file-reader schema alone is ~800 tokens. With 10+ tools, that's 8,000+ tokens of schema on every single call.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight json"&gt;&lt;code&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"name"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"read_file"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"description"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"Read contents of a file at given path"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"parameters"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"type"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"object"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"properties"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="nl"&gt;"path"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nl"&gt;"type"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"string"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nl"&gt;"description"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"File path to read"&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="p"&gt;},&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"required"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s2"&gt;"path"&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  2. Structured Response Wrapping
&lt;/h3&gt;

&lt;p&gt;MCP wraps every response in a structured envelope with metadata, status codes, and typed content blocks. A simple "file not found" error becomes a 200-token JSON object.&lt;/p&gt;

&lt;h3&gt;
  
  
  3. Round-Trip Protocol Overhead
&lt;/h3&gt;

&lt;p&gt;Each MCP call involves: request → server parse → execute → format response → return → client parse → extract. Each step adds tokens for protocol framing.&lt;/p&gt;

&lt;h2&gt;
  
  
  The CLI Alternative
&lt;/h2&gt;

&lt;p&gt;With direct CLI execution:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="nv"&gt;$ &lt;/span&gt;&lt;span class="nb"&gt;cat&lt;/span&gt; /path/to/file.txt
&lt;span class="o"&gt;[&lt;/span&gt;raw file content]
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;That's it. Raw input, raw output. No schemas, no envelopes, no metadata.&lt;/p&gt;

&lt;h2&gt;
  
  
  When MCP Is Worth It
&lt;/h2&gt;

&lt;p&gt;Despite the token cost, MCP shines when:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;You need standardized discovery&lt;/strong&gt; — agents dynamically finding available tools&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;You're building reusable tool servers&lt;/strong&gt; — one MCP server serves many agents&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Security sandboxing matters&lt;/strong&gt; — MCP's permission model is more granular&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Team collaboration&lt;/strong&gt; — shared tool definitions across projects&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  The Hybrid Approach (What I Use Now)
&lt;/h2&gt;

&lt;p&gt;Here's my practical setup:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;Simple, frequent operations&lt;/strong&gt; → CLI (file reads, basic shell commands)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Complex, structured operations&lt;/strong&gt; → MCP (database queries, API calls with schemas)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Cache aggressively&lt;/strong&gt; — regardless of method, never call twice when once suffices&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;This hybrid cut my token usage by &lt;strong&gt;60%&lt;/strong&gt; while keeping MCP's benefits where they matter.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Numbers Over a Day of Agent Work
&lt;/h2&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Metric&lt;/th&gt;
&lt;th&gt;MCP-only&lt;/th&gt;
&lt;th&gt;Hybrid&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Total tool calls&lt;/td&gt;
&lt;td&gt;847&lt;/td&gt;
&lt;td&gt;847&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Token cost (tools)&lt;/td&gt;
&lt;td&gt;2.88M&lt;/td&gt;
&lt;td&gt;1.15M&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Cost (@ $3/1M tokens)&lt;/td&gt;
&lt;td&gt;$8.64&lt;/td&gt;
&lt;td&gt;$3.45&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Savings&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;—&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;$5.19/day (60%)&lt;/strong&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;h2&gt;
  
  
  Takeaways
&lt;/h2&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;Measure your own costs&lt;/strong&gt; — token usage varies wildly by tool complexity&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Not all tools need MCP&lt;/strong&gt; — simple operations are cheaper as direct calls&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Schema size matters&lt;/strong&gt; — minimize your MCP tool parameter definitions&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Hybrid is pragmatic&lt;/strong&gt; — use MCP where it adds value, CLI where it doesn't&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;The 17x ratio isn't fixed&lt;/strong&gt; — simpler tools = smaller gap, complex tools = larger gap&lt;/li&gt;
&lt;/ol&gt;




&lt;p&gt;&lt;em&gt;Have you measured your agent's token efficiency? What did you find? Let me know in the comments.&lt;/em&gt;&lt;/p&gt;

&lt;h1&gt;
  
  
  ai #llm #agents #mcp #productivity
&lt;/h1&gt;

</description>
      <category>ai</category>
      <category>mcpllm</category>
    </item>
  </channel>
</rss>
