DEV Community: Zied Mnif

The 20% of your AI agent's tool schemas that's pure cruft (and the one-liner to strip it)

Zied Mnif — Sun, 07 Jun 2026 20:08:19 +0000

Your AI agent re-sends every tool's JSON schema on every single turn. A surprising chunk of that — often ~20% — is non-semantic cruft: tokens that carry zero tool-selection signal, billed on every request.

I found this measuring the per-turn token cost of 13 real agents (full study). Here's where the waste hides, and the one-liner to remove it.

Where the cruft comes from

When you generate tool schemas from code, the converter quietly adds fields the model doesn't need:

Pydantic adds a "title" to every field via model_json_schema().
zod-to-json-schema appends "$schema" and "additionalProperties" to everything.
Pretty-printing alone swings a tool ~20%: the Fetch MCP tool is 236 tokens compact vs 288 pretty-printed — pure whitespace, re-sent every turn.

None of it helps the model pick the right tool. On a 20-tool agent that's easily hundreds of wasted tokens per turn, paid on every message.

The fix (Node / TypeScript)

const strip = (o) => Array.isArray(o) ? o.map(strip)
  : (o && typeof o === "object")
    ? Object.fromEntries(Object.entries(o)
        .filter(([k]) => !["$schema", "additionalProperties", "title"].includes(k))
        .map(([k, v]) => [k, strip(v)]))
    : o;

// compact, cruft-free — send this to the model
const tools_slim = JSON.stringify(strip(tools));

The fix (Python)

import json

def strip(o):
    if isinstance(o, list):  return [strip(x) for x in o]
    if isinstance(o, dict):  return {k: strip(v) for k, v in o.items()
                                     if k not in ("$schema", "additionalProperties", "title")}
    return o

tools_slim = json.dumps(strip(tools), separators=(",", ":"))  # compact, cruft-free

One caveat: additionalProperties:false is occasionally intentional (strict validation) — drop it from the strip list if you rely on it.

Measure before/after

Want to see exactly how much your schemas carry (and which tool is worst)? The free Agent Token Profiler flags this cruft automatically and shows the per-turn cost across models — paste your tools, no signup, no key.

From AgentLoop, a readable MIT Claude-agent starter. Production token-metering + multi-provider routing live in AgentLoop Pro.

I measured the token cost of 13 real AI agents (GitHub's MCP server alone is 3,546 tokens/turn)

Zied Mnif — Sun, 07 Jun 2026 18:07:05 +0000

Every AI agent re-sends its entire system prompt and every tool/function schema on every single turn. That fixed payload is billed as input tokens on each request — invisibly — until the bill arrives. I measured exactly how much across 13 real open-source agents and MCP servers (79 tools total).

The headline

The GitHub MCP server carries 3,546 tokens of tool schemas every turn — about $12.89 per 1,000 turns on Claude Sonnet, paid before the model reads a word of the user's question. 26 tools is all it takes to make the plumbing cost more than the work.

What I found

Median per-turn overhead: 547 tokens — against a realistic 57-token user request, that's 9.6× larger than the question itself, ~91% of the input.
The fattest single tool is 827 tokens (sequentialthinking from the official MCP servers repo) — bigger than the entire toolset of 8 of the 12 tool-providers measured.
A 35× spread separates the leanest agent (101 tok) from the most bloated (3,546). Same pricing, same tokenizer — bloat is a design choice, not a tax of nature.
~20% of typical schema bytes is non-semantic cruft — pretty-printing, Pydantic's auto-added title, zod's $schema/additionalProperties. Invisible tokens nobody wrote, billed every turn.

Full dataset — every number reproducible, repos pinned to commits: The Hidden Token Tax.

Measure your own agent

I built a free, in-browser tool that does this for your setup — paste your system prompt + tool schemas + sample outputs and it shows the per-turn breakdown, a $ projection across Claude / Haiku / OpenRouter, and flags the tool inflating your context (including the serialization cruft): Agent Token Profiler — no signup, no key.

How to fix it

Strip the cruft (compact JSON, drop auto-added fields), trim tool descriptions to what actually aids tool selection, load tools on demand (progressive disclosure), and route easy turns to a cheaper model like Haiku. The schema overhead is the half nobody meters — until they do.

Measured while building AgentLoop, a readable MIT Claude-agent starter. The production patterns — token metering, retries, multi-provider routing — are drop-in code in AgentLoop Pro.

Claude agent vs Claude Code: which one are you actually building?

Zied Mnif — Fri, 05 Jun 2026 00:02:54 +0000

Search "claude agent boilerplate" and you'll drown in Claude Code results — the agentic CLI, CLAUDE.md files, slash commands, hooks. Great tools. But none of that is what you want if you're trying to build your own agent on the Anthropic SDK.

Here's the disambiguation, and the ~40 lines that are actually the whole thing.

Two different "Claude agents"

Claude Code — Anthropic's agentic coding CLI. You configure it; you don't build it.
An agent you build — your app calls the Anthropic SDK in a loop: the model asks for a tool, you run it, feed the result back, repeat until it's done. That loop is the agent.

Most "agent frameworks" just hide that loop from you. It's small enough that you don't need them.

The whole loop

import Anthropic from "@anthropic-ai/sdk";
const client = new Anthropic();

async function runAgent(userText, tools, runners) {
  const messages = [{ role: "user", content: userText }];

  for (let i = 0; i < 10; i++) {
    const res = await client.messages.create({
      model: "claude-sonnet-4-6",
      max_tokens: 1024,
      tools,
      messages,
    });

    messages.push({ role: "assistant", content: res.content });

    // No tool requested -> the model is done.
    if (res.stop_reason !== "tool_use") {
      return res.content.filter((b) => b.type === "text").map((b) => b.text).join("");
    }

    // Run every tool it asked for this turn, collect one result each.
    const results = [];
    for (const block of res.content) {
      if (block.type === "tool_use") {
        const out = await runners[block.name](block.input);
        results.push({ type: "tool_result", tool_use_id: block.id, content: out });
      }
    }
    messages.push({ role: "user", content: results });
  }
}

That's it. The four things people get wrong:

Append the assistant turn verbatim — so the model sees its own tool request on the next call.
One tool_result per request, matched by tool_use_id.
Send all results back in a single user message.
Cap the turns so a confused model can't loop forever.

A runnable starter

If you'd rather start from a streaming Next.js app you can deploy in one command, I open-sourced exactly this (MIT): AgentLoop — the whole agent in ~150 readable lines, no framework. Clone it, add a key, deploy.

There's a $29 Pro pack for the patterns you hit in production (parallel tools, persistent memory, retries, rate limiting, approval gates, evals, token metering, and a multi-provider seam so it runs on any model) — but the free core stands alone forever.

Build the loop. Own it. Don't import a black box.

Your 150-line AI agent works in the demo. Here's what breaks in production.

Zied Mnif — Thu, 04 Jun 2026 11:54:30 +0000

A minimal agent — call the model, run the tool it asks for, feed the result back, repeat — is genuinely complete for a demo. I wrote one in ~150 readable lines: https://github.com/mnifzied-create/agentloop.

But the moment real users hit it, eight things break. None of them need a framework — each is a small, readable layer on top of the loop.

1. The model asks for three tools at once — and you run them one at a time. Wrap the tool calls in Promise.all. Parallel by default.

2. One flaky API call kills the whole turn. Wrap each tool in a retry with backoff, and return the error as a string to the model instead of throwing — it can recover on the next step.

3. It forgets everything between requests. Persist threads. Node's built-in node:sqlite is enough — no service, no native build.

4. One user (or a runaway loop) runs up your bill. A token-bucket rate limiter, per user / IP.

5. The agent deletes a record / sends an email / charges a card — with no confirmation. Wrap irreversible tools in a human-in-the-loop approval gate.

6. You tweak the prompt and three behaviors silently regress. A tiny eval harness with pass/fail cases you run in CI.

7. One agent juggling twelve tools gets confused. Expose a whole agent as a single tool — a sub-agent — and let a parent delegate.

8. You're regex-parsing the model's prose for data. Force a tool call whose input_schema is your output type. Typed JSON, no parsing.

That's the entire gap between "works in the demo" and "works in production" — and every item is a small composable piece you can read top to bottom, not magic hidden in a dependency.

The free core (the loop) and these eight patterns are all in the repo — read every line: https://github.com/mnifzied-create/agentloop

The point isn't the code. It's that you can own an agent instead of importing one.

What breaks for you in production that isn't on this list?

The AI agent loop, in ~150 lines (no framework)

Zied Mnif — Wed, 03 Jun 2026 22:19:56 +0000

"AI agent" sounds like it needs a framework. It doesn't. Strip away the branding and an agent is one loop:

Send the conversation to the model.
If it asks to use a tool, run the tool.
Feed the result back.
Repeat until it answers.

Here's the whole thing with Claude:

for (let step = 0; step < MAX_STEPS; step++) {
  const runner = anthropic.messages.stream({ model, max_tokens: 1024, system, tools, messages });
  runner.on("text", (delta) => send(delta));            // stream tokens out
  const final = await runner.finalMessage();
  if (final.stop_reason !== "tool_use") break;          // model answered, we're done

  messages.push({ role: "assistant", content: final.content });
  const results = [];
  for (const block of final.content) {
    if (block.type === "tool_use") {
      const result = await runTool(block.name, block.input);   // your code
      results.push({ type: "tool_result", tool_use_id: block.id, content: result });
    }
  }
  messages.push({ role: "user", content: results });    // feed the results back, loop
}

That's it. Tool use is just: the model emits a tool_use block, you run the matching function, you hand back a tool_result. Streaming is just forwarding token deltas as they arrive.

I packaged this into a runnable starter — Next.js, a streaming UI, two example tools, one-command Vercel deploy. Clone it, add a key, ship:

https://github.com/mnifzied-create/agentloop

The free core is the loop above, MIT-licensed. If you want the production patterns next — running multiple tools in parallel, persistent memory on SQLite, retries, an eval harness, human-in-the-loop approval, sub-agents — I wrote those up as a small, equally-readable kit too (linked in the repo).

The point isn't the kit, though. It's that you can read every line of an agent in one sitting. Go look.