The Claude Code engineering team recently shared their year-long journey building tool interfaces for AI agents. As someone who builds and runs multi-agent systems daily, I found deep resonance—and a few disagreements. Here's a systematic breakdown.
Principle 1: Match Tools to Your Model's Actual Capabilities
This is the most overlooked rule. Many teams design one set of tool interfaces and apply them to every model—that's wrong.
The Claude Code team learned this the hard way: after upgrading to Claude Opus, a "todo reminder tool" that originally helped the model stay focused became a constraint. The model started rigidly following the list instead of thinking flexibly.
Actionable rule: Every time you upgrade your model version, immediately re-audit all existing tools. Last version's scaffolding may be this version's shackle.
Principle 2: Use Tools for Structured Output, Not Prompts
Asking a model to "output in a specific format" is the least reliable approach. Models add extra sentences, skip fields, or switch to completely different formats.
The Claude Code team tried three approaches to get Claude to ask users questions with options:
- Adding parameters to existing tools → Claude got confused trying to plan + ask simultaneously
- Using special markdown format → Claude frequently went off-script
- Creating a dedicated
AskUserQuestiontool → Success
Actionable rule: Whenever correctness matters, use tool parameter schemas to enforce structure. Don't rely on the model's formatting ability.
Principle 3: Progressive Disclosure, Not Context Bombing
Many teams stuff all background knowledge into the system prompt. This creates "context rot"—massive amounts of irrelevant information competing for the model's attention, interfering with the core task.
The right approach: give the model an entry point (file path, link, skill name) and let it pull information on demand.
Claude Code's approach: instead of stuffing docs into prompts, they give Claude a documentation link. When a user asks "how to set up MCP," a specialized sub-agent searches the docs and returns the answer.
Actionable rule: Start with minimal context. Use progressive skill file hierarchies instead of system prompt stuffing.
Principle 4: Let the Agent Build Its Own Context
Early Claude Code used vector databases (RAG) to retrieve code context for Claude. Later they discovered: rather than feeding answers to Claude, give it search tools and let it find answers itself.
Context-building priority ranking:
| Priority | Method | Characteristics |
|---|---|---|
| 4 (Highest) | Progressive skill file hierarchy | Best for structured knowledge |
| 3 | Grep/search tools | Stable, model-driven |
| 2 | RAG semantic retrieval | Powerful but fragile |
| 1 (Lowest) | Static injection | Fastest, but goes stale quickly |
Actionable rule: As models improve, progressively shift from "information injection" to "tool empowerment."
Principle 5: Design for Multi-Agent Collaboration from Day One
Many teams only consider single-agent scenarios initially. When they need multiple sub-agents to collaborate, they discover all state management needs to be rebuilt.
Claude Code evolved from "todos" to "Tasks"—Tasks support dependency relationships, cross-sub-agent state sharing, and dynamic modification. This wasn't a small change; it was an architecture overhaul.
Actionable rule: If your agent has any possibility of spawning sub-agents, design your data structures for multi-agent state from day one.
Principle 6: Measure Both "Correctness" and "Affinity"
A tool that never gets called has zero value, no matter how well-implemented. Claude's "affinity" (natural tendency to invoke it) varies dramatically across tools.
Factors affecting affinity: tool name, parameter naming, description wording, and even position in the tool list.
Testing method: Run the agent on 20 different tasks and track each tool's invocation frequency. Any tool with less than 10% of its expected call rate needs its interface or description redesigned.
Actionable rule: When evaluating tools, simultaneously track output quality and invocation frequency. Optimize both metrics.
Principle 7: Fewer Tools, Each One Deep
Claude Code currently uses about 20 tools—considered the upper limit for production-grade agent systems. Each additional tool increases the options the model needs to reason about. More tools = worse performance on real tasks.
Before adding a new tool, ask three questions:
- Can progressive disclosure solve this? (Usually yes)
- Can an existing tool be extended? (Prefer this)
- Does this scenario occur more than 10% of the time? (If not, delegate to a sub-agent)
Actionable rule: Set a hard cap on your tool count. Force yourself to find more elegant solutions before adding new tools.
One Point Worth Questioning
The Claude Code team's switch from RAG to grep, claiming "let Claude search for itself" works better, deserves closer examination.
Grep is powerful for exact matches but helpless for semantically-related queries. They compensate with sub-agents, but this adds latency.
The real answer might be a hybrid approach: grep for exact lookups, vector search for semantic association. Not either/or, but dynamically choosing based on query type.
This is an area their article doesn't fully explore—and it's a gap we've observed in real-world systems.
Summary: The Seven Principles
- Version-manage tools alongside model capability upgrades
- Use schemas for structured output, not natural language constraints
- Progressive disclosure, not context bombing
- Give tools instead of answers—let the model find its own
- Design state management for multi-agent from day one
- Simultaneously optimize correctness and invocation affinity
- Fewer and deeper—set a hard tool count ceiling
The most important quote (from the original article):
"Experiment often, read your outputs, try new things. See like an agent."
Tool design isn't a one-time engineering decision. It's a continuously evolving process. Build feedback loops, then keep running them.
Analysis based on Claude Code engineer Thariq's original article, combined with hands-on experience building multi-agent systems.
Top comments (1)
Some comments may only be visible to logged-in visitors. Sign in to view all comments.