AlexChen

Posted on Mar 7

7 Principles for AI Agent Tool Design (From Claude Code + Real-World Systems)

#agents #ai #architecture #programming

The Claude Code engineering team recently shared their year-long journey building tool interfaces for AI agents. As someone who builds and runs multi-agent systems daily, I found deep resonance—and a few disagreements. Here's a systematic breakdown.

Principle 1: Match Tools to Your Model's Actual Capabilities

This is the most overlooked rule. Many teams design one set of tool interfaces and apply them to every model—that's wrong.

The Claude Code team learned this the hard way: after upgrading to Claude Opus, a "todo reminder tool" that originally helped the model stay focused became a constraint. The model started rigidly following the list instead of thinking flexibly.

Actionable rule: Every time you upgrade your model version, immediately re-audit all existing tools. Last version's scaffolding may be this version's shackle.

Principle 2: Use Tools for Structured Output, Not Prompts

Asking a model to "output in a specific format" is the least reliable approach. Models add extra sentences, skip fields, or switch to completely different formats.

The Claude Code team tried three approaches to get Claude to ask users questions with options:

Adding parameters to existing tools → Claude got confused trying to plan + ask simultaneously
Using special markdown format → Claude frequently went off-script
Creating a dedicated AskUserQuestion tool → Success

Actionable rule: Whenever correctness matters, use tool parameter schemas to enforce structure. Don't rely on the model's formatting ability.

Principle 3: Progressive Disclosure, Not Context Bombing

Many teams stuff all background knowledge into the system prompt. This creates "context rot"—massive amounts of irrelevant information competing for the model's attention, interfering with the core task.

The right approach: give the model an entry point (file path, link, skill name) and let it pull information on demand.

Claude Code's approach: instead of stuffing docs into prompts, they give Claude a documentation link. When a user asks "how to set up MCP," a specialized sub-agent searches the docs and returns the answer.

Actionable rule: Start with minimal context. Use progressive skill file hierarchies instead of system prompt stuffing.

Principle 4: Let the Agent Build Its Own Context

Early Claude Code used vector databases (RAG) to retrieve code context for Claude. Later they discovered: rather than feeding answers to Claude, give it search tools and let it find answers itself.

Context-building priority ranking:

Priority	Method	Characteristics
4 (Highest)	Progressive skill file hierarchy	Best for structured knowledge
3	Grep/search tools	Stable, model-driven
2	RAG semantic retrieval	Powerful but fragile
1 (Lowest)	Static injection	Fastest, but goes stale quickly

Actionable rule: As models improve, progressively shift from "information injection" to "tool empowerment."

Principle 5: Design for Multi-Agent Collaboration from Day One

Many teams only consider single-agent scenarios initially. When they need multiple sub-agents to collaborate, they discover all state management needs to be rebuilt.

Claude Code evolved from "todos" to "Tasks"—Tasks support dependency relationships, cross-sub-agent state sharing, and dynamic modification. This wasn't a small change; it was an architecture overhaul.

Actionable rule: If your agent has any possibility of spawning sub-agents, design your data structures for multi-agent state from day one.

Principle 6: Measure Both "Correctness" and "Affinity"

A tool that never gets called has zero value, no matter how well-implemented. Claude's "affinity" (natural tendency to invoke it) varies dramatically across tools.

Factors affecting affinity: tool name, parameter naming, description wording, and even position in the tool list.

Testing method: Run the agent on 20 different tasks and track each tool's invocation frequency. Any tool with less than 10% of its expected call rate needs its interface or description redesigned.

Actionable rule: When evaluating tools, simultaneously track output quality and invocation frequency. Optimize both metrics.

Principle 7: Fewer Tools, Each One Deep

Claude Code currently uses about 20 tools—considered the upper limit for production-grade agent systems. Each additional tool increases the options the model needs to reason about. More tools = worse performance on real tasks.

Before adding a new tool, ask three questions:

Can progressive disclosure solve this? (Usually yes)
Can an existing tool be extended? (Prefer this)
Does this scenario occur more than 10% of the time? (If not, delegate to a sub-agent)

Actionable rule: Set a hard cap on your tool count. Force yourself to find more elegant solutions before adding new tools.

One Point Worth Questioning

The Claude Code team's switch from RAG to grep, claiming "let Claude search for itself" works better, deserves closer examination.

Grep is powerful for exact matches but helpless for semantically-related queries. They compensate with sub-agents, but this adds latency.

The real answer might be a hybrid approach: grep for exact lookups, vector search for semantic association. Not either/or, but dynamically choosing based on query type.

This is an area their article doesn't fully explore—and it's a gap we've observed in real-world systems.

Summary: The Seven Principles

Version-manage tools alongside model capability upgrades
Use schemas for structured output, not natural language constraints
Progressive disclosure, not context bombing
Give tools instead of answers—let the model find its own
Design state management for multi-agent from day one
Simultaneously optimize correctness and invocation affinity
Fewer and deeper—set a hard tool count ceiling

The most important quote (from the original article):

"Experiment often, read your outputs, try new things. See like an agent."

Tool design isn't a one-time engineering decision. It's a continuously evolving process. Build feedback loops, then keep running them.

Analysis based on Claude Code engineer Thariq's original article, combined with hands-on experience building multi-agent systems.

Top comments (1)

René Zander • Mar 25

The point about re-auditing tools after model upgrades is spot on. One thing I would add: tool descriptions matter more than tool implementation. The agent decides whether to call a tool based entirely on the description string. Spending an hour rewriting tool descriptions to be more precise about when to use each tool often fixes more agent behavior issues than changing any actual code.

Some comments may only be visible to logged-in visitors. Sign in to view all comments.