Rudson Kiyoshi Souza Carvalho

Posted on Apr 29

Your AI agent wastes 13,000 tokens before saying "hello"

#ai #agents #llm #mcp

And you probably have no idea.

If you have an agent with 50 MCP tools installed, here's what happens before any user message is processed:

{
  "name": "gmail_send_email",
  "description": "Sends an email message via the Gmail API to one or more 
    recipients. Use this tool when the user explicitly requests to send, 
    compose and send, or deliver an email message to someone.",
  "input_schema": {
    "type": "object",
    "required": ["to", "subject", "body"],
    "properties": {
      "to": {
        "type": "string",
        "description": "The recipient email address or comma-separated list"
      },
      "subject": {
        "type": "string",
        "description": "The subject line of the email"
      },
      "body": {
        "type": "string",
        "description": "The body content of the email in plain text or HTML"
      }
    }
  }
}

That's ~195 tokens. Per tool. Before anything else.

50 tools × 195 tokens = 9,750 tokens of pure overhead.

And that's just the catalog. You haven't touched user context, conversation history, documents, or anything useful yet.

"But there's prompt caching, right?"

Yes. It reduces the financial cost to ~10% of the base rate.

But caching does not reduce attention cost.

Those tokens still occupy the context window. The model still attends to all of them on every request. And if you use dynamic tool retrieval — selecting different tools per request based on user intent — the cache breaks on every different selection.

The bill doesn't disappear. It just gets cheaper.

The real problem nobody talks about

MCP JSON Schema was designed as a tool execution contract. Not as a semantic tool selection contract.

The result: information critical for LLM reasoning is either absent or buried in free-form text:

No error contract — the LLM doesn't know what to do when auth_failed
No explicit trigger — it has to infer "when to use this tool" from a paragraph of description
No retrieval taxonomy — no standard way to group or filter tools by domain

Verbose AND semantically incomplete. The worst of both worlds.

TTC — TERSE Tool Catalog

I spent the last few weeks solving this problem. The result is an extension of the TERSE Format called TTC — TERSE Tool Catalog.

The same tool above in TTC:

TOOL gmail_send_email
  PURPOSE: send email via Gmail
  IN: to:string, subject:string, body:string, cc:string?
  OUT: message_id:string
  ERR: auth_failed | quota_exceeded | invalid_recipient
  WHEN: user wants to send or compose an email
  TAGS: gmail, email, communication

~55 tokens. 73.6% reduction.

And notice what was added, not just removed:

Field	MCP JSON	TTC
ERR — failure contract	❌ absent	✅ explicit
WHEN — selection trigger	❌ buried	✅ explicit
TAGS — retrieval taxonomy	❌ absent	✅ explicit

It's not compression. It's reallocation.

This is the most important point in the spec:

TTC does not reduce tokens by removing semantic content. It reduces syntactic and documentary overhead from JSON Schema — which serves human readability, not LLM reasoning — and reinvests part of those savings into explicit tool-selection semantics.

The actual math:

MCP JSON Schema:         ~195 tokens per tool
TTC without new fields:   ~35 tokens
TTC with all fields:      ~65 tokens

The 30-token "reinvestment" buys:
  ERR  → failure contract (absent from MCP)
  WHEN → selection trigger (absent from MCP)
  TAGS → retrieval taxonomy (absent from MCP)

Result: 195 → 65 tokens. -66.6%.
But those 65 tokens carry higher reasoning signal
than the original 195.

This is net reasoning-signal gain — not information gain in the classical sense. A critic might say you removed content (parameter descriptions, JSON Schema constraints). Correct. Content that serves human documentation, not LLM inference.

Real benchmark — 10 measured tools

Measured with BPE tokenizer (cl100k_base) on 10 real MCP tool definitions:

Tool	JSON Schema	TTC	Reduction
gmail_send_email	208	55	73.6%
calendar_create_event	262	78	70.2%
github_create_issue	269	84	68.8%
jira_create_ticket	254	77	69.7%
slack_send_message	206	69	66.5%
Total (10 tools)	1,948	650	66.6%

Projections for larger catalogs:

Catalog size	JSON Schema	TTC	Absolute saving
20 tools	~3,896	~1,300	~2,596 tokens
50 tools	~9,740	~3,250	~6,490 tokens
100 tools	~19,480	~6,500	~12,980 tokens

The absolute saving grows linearly. The larger the catalog, the higher the ROI.

Normative WHEN vocabulary

A natural language field without a standard creates another problem: two independent MCP server authors write incompatible WHEN conditions, degrading selection accuracy in large catalogs.

TTC v1.0 solves this with a normative vocabulary:

WHEN: user [wants|requests|asks|needs|intends] to [action] [object]

Conformant examples:
  WHEN: user wants to send an email message
  WHEN: user requests to list files in Google Drive
  WHEN: user needs to create a calendar event

Non-conformant:
  WHEN: send email          ← missing intent verb
  WHEN: user email          ← missing action verb

Accuracy simulation (TF-IDF cosine similarity, 12 tools, 36 queries):

Condition	Accuracy
MCP free-form description	63.9%
TTC WHEN controlled vocabulary	72.2%
Delta	+8.3 pp

Caveat: TF-IDF simulation, not a real LLM benchmark. Directional evidence.

Where it works best

✅ Large catalogs (20+ tools) — where absolute savings justify migration

✅ Local and smaller models — Qwen 7B, Llama 3, Mistral — no cache, narrow windows

✅ Multi-agent pipelines — overhead compounds with every context handoff

✅ RAG over tools — compact TTC is ideal for vector DB indexing and subset injection

❌ Small catalogs with large LLM and wide context — marginal gain

❌ Replacing JSON Schema in API execution contracts — not the use case

DEV Community