DEV Community

Your AI agent wastes 13,000 tokens before saying "hello"

And you probably have no idea.


If you have an agent with 50 MCP tools installed, here's what happens before any user message is processed:

{
  "name": "gmail_send_email",
  "description": "Sends an email message via the Gmail API to one or more 
    recipients. Use this tool when the user explicitly requests to send, 
    compose and send, or deliver an email message to someone.",
  "input_schema": {
    "type": "object",
    "required": ["to", "subject", "body"],
    "properties": {
      "to": {
        "type": "string",
        "description": "The recipient email address or comma-separated list"
      },
      "subject": {
        "type": "string",
        "description": "The subject line of the email"
      },
      "body": {
        "type": "string",
        "description": "The body content of the email in plain text or HTML"
      }
    }
  }
}
Enter fullscreen mode Exit fullscreen mode

That's ~195 tokens. Per tool. Before anything else.

50 tools × 195 tokens = 9,750 tokens of pure overhead.

And that's just the catalog. You haven't touched user context, conversation history, documents, or anything useful yet.


"But there's prompt caching, right?"

Yes. It reduces the financial cost to ~10% of the base rate.

But caching does not reduce attention cost.

Those tokens still occupy the context window. The model still attends to all of them on every request. And if you use dynamic tool retrieval — selecting different tools per request based on user intent — the cache breaks on every different selection.

The bill doesn't disappear. It just gets cheaper.


The real problem nobody talks about

MCP JSON Schema was designed as a tool execution contract. Not as a semantic tool selection contract.

The result: information critical for LLM reasoning is either absent or buried in free-form text:

  • No error contract — the LLM doesn't know what to do when auth_failed
  • No explicit trigger — it has to infer "when to use this tool" from a paragraph of description
  • No retrieval taxonomy — no standard way to group or filter tools by domain

Verbose AND semantically incomplete. The worst of both worlds.


TTC — TERSE Tool Catalog

I spent the last few weeks solving this problem. The result is an extension of the TERSE Format called TTC — TERSE Tool Catalog.

The same tool above in TTC:

TOOL gmail_send_email
  PURPOSE: send email via Gmail
  IN: to:string, subject:string, body:string, cc:string?
  OUT: message_id:string
  ERR: auth_failed | quota_exceeded | invalid_recipient
  WHEN: user wants to send or compose an email
  TAGS: gmail, email, communication
Enter fullscreen mode Exit fullscreen mode

~55 tokens. 73.6% reduction.

And notice what was added, not just removed:

Field MCP JSON TTC
ERR — failure contract ❌ absent ✅ explicit
WHEN — selection trigger ❌ buried ✅ explicit
TAGS — retrieval taxonomy ❌ absent ✅ explicit

It's not compression. It's reallocation.

This is the most important point in the spec:

TTC does not reduce tokens by removing semantic content. It reduces syntactic and documentary overhead from JSON Schema — which serves human readability, not LLM reasoning — and reinvests part of those savings into explicit tool-selection semantics.

The actual math:

MCP JSON Schema:         ~195 tokens per tool
TTC without new fields:   ~35 tokens
TTC with all fields:      ~65 tokens

The 30-token "reinvestment" buys:
  ERR  → failure contract (absent from MCP)
  WHEN → selection trigger (absent from MCP)
  TAGS → retrieval taxonomy (absent from MCP)

Result: 195 → 65 tokens. -66.6%.
But those 65 tokens carry higher reasoning signal
than the original 195.
Enter fullscreen mode Exit fullscreen mode

This is net reasoning-signal gain — not information gain in the classical sense. A critic might say you removed content (parameter descriptions, JSON Schema constraints). Correct. Content that serves human documentation, not LLM inference.


Real benchmark — 10 measured tools

Measured with BPE tokenizer (cl100k_base) on 10 real MCP tool definitions:

Tool JSON Schema TTC Reduction
gmail_send_email 208 55 73.6%
calendar_create_event 262 78 70.2%
github_create_issue 269 84 68.8%
jira_create_ticket 254 77 69.7%
slack_send_message 206 69 66.5%
Total (10 tools) 1,948 650 66.6%

Projections for larger catalogs:

Catalog size JSON Schema TTC Absolute saving
20 tools ~3,896 ~1,300 ~2,596 tokens
50 tools ~9,740 ~3,250 ~6,490 tokens
100 tools ~19,480 ~6,500 ~12,980 tokens

The absolute saving grows linearly. The larger the catalog, the higher the ROI.


Normative WHEN vocabulary

A natural language field without a standard creates another problem: two independent MCP server authors write incompatible WHEN conditions, degrading selection accuracy in large catalogs.

TTC v1.0 solves this with a normative vocabulary:

WHEN: user [wants|requests|asks|needs|intends] to [action] [object]

Conformant examples:
  WHEN: user wants to send an email message
  WHEN: user requests to list files in Google Drive
  WHEN: user needs to create a calendar event

Non-conformant:
  WHEN: send email          ← missing intent verb
  WHEN: user email          ← missing action verb
Enter fullscreen mode Exit fullscreen mode

Accuracy simulation (TF-IDF cosine similarity, 12 tools, 36 queries):

Condition Accuracy
MCP free-form description 63.9%
TTC WHEN controlled vocabulary 72.2%
Delta +8.3 pp

Caveat: TF-IDF simulation, not a real LLM benchmark. Directional evidence.


Where it works best

Large catalogs (20+ tools) — where absolute savings justify migration

Local and smaller models — Qwen 7B, Llama 3, Mistral — no cache, narrow windows

Multi-agent pipelines — overhead compounds with every context handoff

RAG over tools — compact TTC is ideal for vector DB indexing and subset injection

❌ Small catalogs with large LLM and wide context — marginal gain

❌ Replacing JSON Schema in API execution contracts — not the use case


Links


If your agent has 50 tools installed and you haven't thought about catalog attention cost yet — now is a good time.


Tags: ai agents mcp llm tooling performance opensource

Top comments (0)