The future of AI agents relies on their ability to seamlessly and efficiently integrate with vast libraries of tools—from internal databases and company-specific APIs to public services like GitHub and Slack. As professional developers, we know that scaling an agent from five tools to fifty or five hundred introduces critical bottlenecks: context bloat, slow execution, and unpredictable tool invocation.
To solve these challenges, we are excited to introduce three advanced features on the Claude Developer Platform that fundamentally change how agents discover, orchestrate, and utilize external capabilities. These features move Claude from simple sequential function calling to intelligent, programmatic orchestration.
1. Tool Search Tool: Dynamic Discovery for Infinite Scale
When building agents that connect to many services (e.g., a five-server MCP setup with dozens of tools), token overhead quickly becomes a critical limitation.
The Challenge: Context Bloat and Accuracy
Traditional tool use requires loading all tool definitions into Claude's context window upfront. For a large multi-server environment (e.g., GitHub, Slack, Jira, Sentry, Splunk), tool schemas can easily consume 50,000+ tokens before the agent even begins its work. This limits space for conversation history and reasoning, increasing cost and reducing performance. Furthermore, overwhelming Claude with dozens of similar-sounding tool names leads to frequent selection errors.
The Solution: On-Demand Tool Discovery
The Tool Search Tool solves this by letting Claude discover and load tools only when they are needed for the current task.
Developers mark less-critical tools with defer_loading: true in the API call. Claude initially only sees the lightweight Tool Search Tool itself and any frequently-used tools. When Claude determines it needs a specific capability (e.g., "create a pull request"), it searches the tool definitions, and only the relevant tools (e.g., github.createPullRequest) are loaded into the context.
Real-Life Use Case and Benefits
- Massive Token Savings: In internal testing, this feature resulted in an 85% reduction in token usage on average compared to the traditional approach, preserving hundreds of thousands of tokens for reasoning and history.
- Improved Accuracy at Scale: By only presenting Claude with the most relevant tools, selection accuracy significantly improves, especially when dealing with large tool libraries. Internal benchmarks showed accuracy improvements from 79.5% to 88.1% on complex, multi-server evaluations.
- Scale Without Cost: You can now connect your agent to hundreds of tools across dozens of microservices (e.g., a vast internal API ecosystem) without incurring a massive token penalty on every request.
2. Programmatic Tool Calling (PTC): Efficient Code Orchestration
For complex, multi-step workflows involving data processing, the traditional method of making sequential tool calls—with each result returning to Claude's context—is slow and inefficient.
The Challenge: Context Pollution and Inference Overhead
- Context Pollution: When processing large data (e.g., 10MB log file analysis or retrieving 2,000+ expense line items), the raw, intermediate results flood Claude’s context window. This pushes out critical information and forces Claude to use precious context space for raw data processing, rather than high-level reasoning.
- Inference Overhead: Each tool call requires a full API round-trip and a new model inference pass to process the result and decide the next step. A 5-tool workflow involves 5 inference passes, leading to high latency.
The Solution: Code-Driven Execution
Programmatic Tool Calling enables Claude to orchestrate complex operations by writing a Python script that runs in a sandboxed execution environment. Instead of the model seeing all intermediate results, the script handles the logic: loops, conditionals, data transformations, and parallel execution.
The script calls your tools (get_team_members, get_expenses, etc.), and the results are routed directly to the running code environment, bypassing Claude's context window. Only the final, processed output from the script is returned to Claude for the final response generation.
Real-Life Use Case and Benefits
-
The Budget Compliance Check: Imagine determining which team members exceeded their Q3 travel budget. PTC allows Claude to write a script that:
- Fetches team members and budgets in parallel.
- Fetches all expense line items (e.g., 2,000 records).
- Sums the expenses, compares them to the budget limit, and filters the results—all within the code sandbox.
- Claude only sees the final list of the 2-3 members who exceeded their budget, reducing context consumption from ~200KB of raw data to ~1KB of results.
- Reduced Latency: By orchestrating multiple tool calls within a single execution block, you eliminate most inference passes (e.g., 19+ passes saved in a 20-tool workflow), leading to significant latency improvements.
- Token & Accuracy Gains: Internal testing showed a 37% reduction in token consumption on complex research tasks and improved accuracy (e.g., GIA benchmarks improved from 46.5% to 51.2%) due to explicit, programmatic control flow.
3. Tool Use Examples: Defining Usage Patterns for Reliability
JSON Schema is necessary for defining the structural requirements of a tool, but it's often insufficient for defining how the tool should be used in practice.
The Challenge: Ambiguity and Parameter Errors
Schema cannot express critical usage conventions:
-
Format Ambiguity: What date format should
due_dateuse (YYYY-MM-DDvs.MM/DD/YYYY)? -
Correlated Parameters: How do
prioritylevel andescalation.sla_hoursrelate? -
Usage Conventions: When should optional, nested structures like
reporter.contactbe included?
These ambiguities are a primary source of malformed tool calls and inconsistent agent behavior.
The Solution: Concrete Usage Demonstration
Tool Use Examples allow you to include one or more concrete, successful tool invocation objects (input_examples) directly in your tool definition.
By providing examples of successful calls—including a minimal call, a partial call, and a full, complex call—you demonstrate the API's expected behavior and conventions.
Real-Life Use Case and Benefits
-
API Consistency: For a complex
create_ticketAPI, examples can teach Claude:- To use "YYYY-MM-DD" for dates.
- To follow a specific ID convention (e.g., "USR-12345").
- That
criticalpriority always requires populating the fullescalationobject.
- Increased Reliability: Claude learns the "intent" behind the schema, leading to more reliable and consistent parameter population, especially for optional fields and nested objects.
Strategic Layering for Optimal Agent Architecture
These three features are designed to be complementary, allowing you to layer them strategically to solve your specific agent bottlenecks:
| Bottleneck | Feature to Use | Benefit |
|---|---|---|
| Context bloat from tool definitions | Tool Search Tool | Dynamic discovery preserves context and improves accuracy at scale. |
| Large intermediate results polluting context | Programmatic Tool Calling | Code orchestration ensures only actionable final results enter context. |
| Parameter errors and malformed calls | Tool Use Examples | Demonstrates usage patterns for reliable and consistent invocation. |
By adopting dynamic discovery, code-based orchestration, and usage-pattern learning, you can build truly scalable, accurate, and efficient AI agents capable of mastering even the most complex enterprise workflows. These features are available today in beta for professional developers.
For more technical details and implementation guides, you can read the original article: Introducing advanced tool use on the Claude Developer Platform.
Top comments (0)