And you probably have no idea.
If you have an agent with 50 MCP tools installed, here's what happens before any user message is processed:
{
"name": "gmail_send_email",
"description": "Sends an email message via the Gmail API to one or more
recipients. Use this tool when the user explicitly requests to send,
compose and send, or deliver an email message to someone.",
"input_schema": {
"type": "object",
"required": ["to", "subject", "body"],
"properties": {
"to": {
"type": "string",
"description": "The recipient email address or comma-separated list"
},
"subject": {
"type": "string",
"description": "The subject line of the email"
},
"body": {
"type": "string",
"description": "The body content of the email in plain text or HTML"
}
}
}
}
That's ~195 tokens. Per tool. Before anything else.
50 tools × 195 tokens = 9,750 tokens of pure overhead.
And that's just the catalog. You haven't touched user context, conversation history, documents, or anything useful yet.
"But there's prompt caching, right?"
Yes. It reduces the financial cost to ~10% of the base rate.
But caching does not reduce attention cost.
Those tokens still occupy the context window. The model still attends to all of them on every request. And if you use dynamic tool retrieval — selecting different tools per request based on user intent — the cache breaks on every different selection.
The bill doesn't disappear. It just gets cheaper.
The real problem nobody talks about
MCP JSON Schema was designed as a tool execution contract. Not as a semantic tool selection contract.
The result: information critical for LLM reasoning is either absent or buried in free-form text:
-
No error contract — the LLM doesn't know what to do when
auth_failed - No explicit trigger — it has to infer "when to use this tool" from a paragraph of description
- No retrieval taxonomy — no standard way to group or filter tools by domain
Verbose AND semantically incomplete. The worst of both worlds.
TTC — TERSE Tool Catalog
I spent the last few weeks solving this problem. The result is an extension of the TERSE Format called TTC — TERSE Tool Catalog.
The same tool above in TTC:
TOOL gmail_send_email
PURPOSE: send email via Gmail
IN: to:string, subject:string, body:string, cc:string?
OUT: message_id:string
ERR: auth_failed | quota_exceeded | invalid_recipient
WHEN: user wants to send or compose an email
TAGS: gmail, email, communication
~55 tokens. 73.6% reduction.
And notice what was added, not just removed:
| Field | MCP JSON | TTC |
|---|---|---|
| ERR — failure contract | ❌ absent | ✅ explicit |
| WHEN — selection trigger | ❌ buried | ✅ explicit |
| TAGS — retrieval taxonomy | ❌ absent | ✅ explicit |
It's not compression. It's reallocation.
This is the most important point in the spec:
TTC does not reduce tokens by removing semantic content. It reduces syntactic and documentary overhead from JSON Schema — which serves human readability, not LLM reasoning — and reinvests part of those savings into explicit tool-selection semantics.
The actual math:
MCP JSON Schema: ~195 tokens per tool
TTC without new fields: ~35 tokens
TTC with all fields: ~65 tokens
The 30-token "reinvestment" buys:
ERR → failure contract (absent from MCP)
WHEN → selection trigger (absent from MCP)
TAGS → retrieval taxonomy (absent from MCP)
Result: 195 → 65 tokens. -66.6%.
But those 65 tokens carry higher reasoning signal
than the original 195.
This is net reasoning-signal gain — not information gain in the classical sense. A critic might say you removed content (parameter descriptions, JSON Schema constraints). Correct. Content that serves human documentation, not LLM inference.
Real benchmark — 10 measured tools
Measured with BPE tokenizer (cl100k_base) on 10 real MCP tool definitions:
| Tool | JSON Schema | TTC | Reduction |
|---|---|---|---|
| gmail_send_email | 208 | 55 | 73.6% |
| calendar_create_event | 262 | 78 | 70.2% |
| github_create_issue | 269 | 84 | 68.8% |
| jira_create_ticket | 254 | 77 | 69.7% |
| slack_send_message | 206 | 69 | 66.5% |
| Total (10 tools) | 1,948 | 650 | 66.6% |
Projections for larger catalogs:
| Catalog size | JSON Schema | TTC | Absolute saving |
|---|---|---|---|
| 20 tools | ~3,896 | ~1,300 | ~2,596 tokens |
| 50 tools | ~9,740 | ~3,250 | ~6,490 tokens |
| 100 tools | ~19,480 | ~6,500 | ~12,980 tokens |
The absolute saving grows linearly. The larger the catalog, the higher the ROI.
Normative WHEN vocabulary
A natural language field without a standard creates another problem: two independent MCP server authors write incompatible WHEN conditions, degrading selection accuracy in large catalogs.
TTC v1.0 solves this with a normative vocabulary:
WHEN: user [wants|requests|asks|needs|intends] to [action] [object]
Conformant examples:
WHEN: user wants to send an email message
WHEN: user requests to list files in Google Drive
WHEN: user needs to create a calendar event
Non-conformant:
WHEN: send email ← missing intent verb
WHEN: user email ← missing action verb
Accuracy simulation (TF-IDF cosine similarity, 12 tools, 36 queries):
| Condition | Accuracy |
|---|---|
| MCP free-form description | 63.9% |
| TTC WHEN controlled vocabulary | 72.2% |
| Delta | +8.3 pp |
Caveat: TF-IDF simulation, not a real LLM benchmark. Directional evidence.
Where it works best
✅ Large catalogs (20+ tools) — where absolute savings justify migration
✅ Local and smaller models — Qwen 7B, Llama 3, Mistral — no cache, narrow windows
✅ Multi-agent pipelines — overhead compounds with every context handoff
✅ RAG over tools — compact TTC is ideal for vector DB indexing and subset injection
❌ Small catalogs with large LLM and wide context — marginal gain
❌ Replacing JSON Schema in API execution contracts — not the use case
Links
- 📄 Full spec (Zenodo): https://doi.org/10.5281/zenodo.19869007
- 💻 GitHub: https://github.com/RudsonCarvalho/terse-format/tree/main/extensions/ttc
- 🌐 Landing page: https://rudsoncarvalho.github.io/terse-format/
- 📦 TERSE Format (parent spec): https://doi.org/10.5281/zenodo.19058364
If your agent has 50 tools installed and you haven't thought about catalog attention cost yet — now is a good time.
Tags: ai agents mcp llm tooling performance opensource
Top comments (0)