Introduction
On November 24, 2025, Anthropic officially released the Programmatic Tool Calling (PTC) feature, allowing Claude to orchestrate tool execution through code rather than single API calls. This innovation is considered a major breakthrough in Agent development, significantly reducing token consumption, latency, and improving accuracy.
However, as the creator of the minion framework, I'd like to share an interesting fact: minion has embraced this architectural philosophy from the very beginning. Before the PTC concept was formally introduced, minion had already proven the value of this approach in production environments.
What Problems Does PTC Solve?
Anthropic's blog post highlighted two core problems with traditional Tool Calling:
1. Context Pollution
In the traditional approach, the results of every tool call return to the LLM's context. For example, when analyzing a 10MB log file, the entire file content enters the context window, even if the LLM only needs a summary of error frequencies.
2. Reasoning Overhead and Manual Synthesis
Every tool call requires a complete model inference. The LLM must "eyeball" parse the data, extract relevant information, reason about how pieces fit together, then decide the next step—a process that's both slow and error-prone.
Minion's Solution: Native PTC Architecture
The minion framework adopted a fundamentally different architecture from the start: LLM focuses on planning and decision-making, while actual execution is delegated to the code environment.
Core Design Philosophy
# Minion's typical workflow
1. LLM analyzes user requirements, creates execution plan
2. LLM generates Python code to orchestrate tool calls
3. Code executes in isolated environment, handles all data operations
4. Only final results return to LLM
This is exactly what PTC aims to achieve, but minion has it as foundational architecture rather than an optional feature.
Practical Case Comparison
Let's look at the budget compliance check example from Anthropic's blog:
Task: Find team members who exceeded Q3 travel budget
Traditional Tool Calling approach:
- Get team members → 20 people
- Get Q3 expenses for each → 20 tool calls, each returning 50-100 expense items
- Get budget limits for each level
- All data enters context: 2000+ expense records (50KB+)
- LLM manually sums each person's expenses, looks up budgets, compares overages
With PTC:
- Claude writes a Python script to orchestrate the entire flow
- Script runs in Code Execution environment
- LLM only sees final result: 2-3 over-budget employees
In Minion, this pattern is the default behavior, agent will generate the code:
# Implementation in Minion (pseudocode)
async def check_budget_compliance():
# LLM-generated orchestration code
team = await get_team_members("engineering")
# Parallel data fetching
levels = list(set(m["level"] for m in team))
budgets = {
level: await get_budget_by_level(level)
for level in levels
}
# Data processing happens locally
exceeded = []
for member in team:
expenses = await get_expenses(member["id"], "Q3")
total = sum(e["amount"] for e in expenses)
budget = budgets[member["level"]]
if total > budget["travel_limit"]:
exceeded.append({
"name": member["name"],
"spent": total,
"limit": budget["travel_limit"]
})
return exceeded # Only return key results
The key difference:
- Minion: This is the framework's core design; all complex tasks are handled this way
- PTC: Requires explicit enablement, marking which tools allow programmatic calling
Minion's Advantages: Going Further
Minion not only implements PTC's core philosophy but provides additional advantages:
1. Complete Python Ecosystem
Minion's code execution environment has full Python ecosystem access:
# Minion can directly use any Python library
import pandas as pd
import numpy as np
from sklearn.cluster import KMeans
# Powerful data processing
df = pd.DataFrame(expense_data)
analysis = df.groupby('category').agg({
'amount': ['sum', 'mean', 'std'],
'count': 'size'
})
# Complex data science tasks
model = KMeans(n_clusters=3)
clusters = model.fit_predict(spending_patterns)
2. State Management and Persistence
Minion naturally supports complex state management:
class BudgetAnalyzer:
def __init__(self):
self.cache = {}
self.history = []
async def analyze_department(self, dept):
# State persists throughout analysis
if dept in self.cache:
return self.cache[dept]
result = await self._deep_analysis(dept)
self.cache[dept] = result
self.history.append(result)
return result
3. Error Handling and Retry Logic
Explicitly handle edge cases in code:
async def robust_fetch(user_id, max_retries=3):
for attempt in range(max_retries):
try:
return await get_expenses(user_id, "Q3")
except RateLimitError:
await asyncio.sleep(2 ** attempt)
except DataNotFoundError:
return [] # Reasonable default
raise Exception(f"Failed after {max_retries} attempts")
4. Parallel and Async Operations
Fully leverage Python's async capabilities:
# Efficient parallel processing
async def analyze_all_departments():
departments = ["eng", "sales", "marketing", "ops"]
# Analyze all departments simultaneously
results = await asyncio.gather(*[
analyze_department(dept)
for dept in departments
])
# Consolidate analysis results
return consolidate_results(results)
Performance Comparison
According to Anthropic's internal testing, PTC brings significant improvements:
- Token savings: Complex research tasks dropped from 43,588 to 27,297 tokens (37% reduction)
- Latency reduction: Eliminated multiple model inference round-trips
-
Accuracy improvement:
- Internal knowledge retrieval: 25.6% → 28.5%
- GIA benchmark: 46.5% → 51.2%
In minion's production use, we observe similar or better metrics because:
- Fewer model calls: LLM only participates in planning and final summarization
- More efficient resource utilization: Local data processing doesn't consume API tokens
- More predictable performance: Clear code execution paths reduce LLM uncertainty
Architectural Philosophy: Who Should Do What?
Minion's design is based on a core belief:
LLMs excel at understanding, planning, and reasoning; Python excels at execution, processing, and transformation.
This separation of responsibilities creates a clear architecture:
User Request
↓
[LLM: Understand intent, create plan]
↓
[Generate Python code]
↓
[Code Execution Environment: Call tools, process data, control flow]
↓
[Return structured results]
↓
[LLM: Interpret results, generate user-friendly response]
This isn't just optimization—it's an architectural-level rethinking.
Tool Search Tool: Minion's Dynamic Tool Discovery
Another new feature from Anthropic is the Tool Search Tool, which addresses context consumption with large tool libraries. Minion also has corresponding mechanisms:
Layered Tool Exposure
# Minion's tool layering strategy
class MinionToolRegistry:
def __init__(self):
self.core_tools = [] # Always loaded
self.domain_tools = {} # Loaded on demand
self.rare_tools = {} # Discovered via search
def get_tools_for_task(self, task_description):
# Intelligent tool selection
tools = self.core_tools.copy()
# Add relevant tools based on task description
if "database" in task_description:
tools.extend(self.domain_tools["database"])
if "visualization" in task_description:
tools.extend(self.domain_tools["plotting"])
return tools
Vector Search Tool Discovery
# Tool search using embeddings
from sentence_transformers import SentenceTransformer
class SemanticToolSearch:
def __init__(self, tool_descriptions):
self.model = SentenceTransformer('all-MiniLM-L6-v2')
self.tool_embeddings = self.model.encode(tool_descriptions)
def find_tools(self, query, top_k=5):
query_embedding = self.model.encode([query])
similarities = cosine_similarity(query_embedding, self.tool_embeddings)
return self.get_top_tools(similarities, top_k)
Real-World Applications: Minion in Production
The minion framework has proven this architecture's value in multiple real-world scenarios:
Case 1: Large-Scale Data Analysis
A fintech company uses minion to analyze millions of transaction records for anomaly patterns:
async def detect_anomalies():
# LLM planning: need to fetch data, clean, feature engineering, anomaly detection
# Execution code directly handles large datasets
transactions = await fetch_all_transactions(start_date, end_date)
# 1M+ records, but doesn't enter LLM context
df = pd.DataFrame(transactions)
df = clean_data(df)
features = engineer_features(df)
# Use ML for anomaly detection
anomalies = detect_with_isolation_forest(features)
# Only return anomaly summary to LLM
return {
"total_transactions": len(df),
"anomalies_found": len(anomalies),
"top_anomalies": anomalies.head(10).to_dict()
}
Results:
- Processed 1 million records
- LLM consumed only ~5K tokens (traditional approach needs 500K+)
- End-to-end latency: 30 seconds (vs. 5+ minutes traditional)
Case 2: Multi-Source Data Integration
A SaaS company uses minion to integrate customer data from multiple APIs:
async def comprehensive_customer_analysis(customer_id):
# Parallel fetch from all data sources
crm_data, support_tickets, usage_logs, billing_history = await asyncio.gather(
fetch_crm_data(customer_id),
fetch_support_tickets(customer_id),
fetch_usage_logs(customer_id),
fetch_billing_history(customer_id)
)
# Local data fusion and analysis
customer_profile = {
"health_score": calculate_health_score(...),
"churn_risk": predict_churn_risk(...),
"upsell_opportunities": identify_opportunities(...),
"support_sentiment": analyze_ticket_sentiment(support_tickets)
}
return customer_profile
Case 3: Automated Workflows
A DevOps team uses minion to automate complex deployment flows:
async def deploy_with_validation():
# Multi-step workflow with conditional logic at each step
# 1. Run tests
test_results = await run_test_suite()
if test_results.failed > 0:
return {"status": "blocked", "reason": "tests failed"}
# 2. Build and push image
image = await build_docker_image()
await push_to_registry(image)
# 3. Canary deployment
canary = await deploy_canary(image, percentage=10)
await asyncio.sleep(300) # Monitor for 5 minutes
metrics = await get_canary_metrics(canary)
if metrics.error_rate > 0.01:
await rollback_canary(canary)
return {"status": "rolled_back", "metrics": metrics}
# 4. Full deployment
await deploy_full(image)
return {"status": "success", "image": image.tag}
Beyond PTC: Minion's Future Directions
While PTC is an important advancement, minion's architectural design allows us to explore more possibilities:
1. Hybrid Reasoning Modes
Intelligently switch within a session:
# Simple tasks: direct tool calls
if task.complexity < THRESHOLD:
result = await simple_tool_call(task)
# Complex tasks: generate orchestration code
else:
orchestration_code = await llm.generate_code(task)
result = await execute_code(orchestration_code)
2. Incremental Computation and Caching
Intelligently reuse intermediate results:
# Memoized data fetching
@lru_cache(maxsize=1000)
async def cached_get_user_data(user_id):
return await fetch_user_data(user_id)
# Incremental updates instead of full recomputation
async def update_analysis(new_data):
previous_state = load_checkpoint()
delta = compute_delta(previous_state, new_data)
updated_state = apply_delta(previous_state, delta)
return updated_state
3. Multi-Model Collaboration
Different models handle different stages:
# Strong model for planning
plan = await claude_opus.create_plan(user_request)
# Specialized model for code generation
code = await codegen_model.generate(plan)
# Execution and monitoring
result = await execute_with_monitoring(code)
# Fast model for user interaction
response = await claude_haiku.format_response(result)
The Power of Open Source: Community-Driven Innovation
Minion as an open-source project (300+ GitHub stars) benefits from community contributions and feedback. This openness brings:
- Rapid iteration: Community discovers issues and use cases, driving quick improvements
- Diverse applications: Users employ minion in scenarios we never imagined
In contrast, while PTC is powerful:
- Requires explicit configuration (
allowed_callers,defer_loading, etc.) - Depends on specific API versions and beta features
- Tightly coupled with Claude's ecosystem
Minion's design principle is provider-agnostic—you can use any LLM backend (Claude, GPT-4, open-source models), and the architectural advantages remain.
Technical Details: Implementation Comparison
Let's dive into implementation details:
PTC's Implementation
# Anthropic's PTC requires specific configuration
{
"tools": [
{
"type": "code_execution_20250825",
"name": "code_execution"
},
{
"name": "get_team_members",
"allowed_callers": ["code_execution_20250825"],
...
}
]
}
# Claude generates tool call
{
"type": "server_tool_use",
"id": "srvtoolu_abc",
"name": "code_execution",
"input": {
"code": "team = get_team_members('engineering')\\\\n..."
}
}
Minion's Implementation
# Minion's tool definitions are standard Python
class MinionTools:
@tool
async def get_team_members(self, department: str):
"""Get all members of a department"""
return await self.db.query(...)
@tool
async def get_expenses(self, user_id: str, quarter: str):
"""Get expense records"""
return await self.expenses_api.fetch(...)
# LLM generates complete Python functions
async def analyze_budget():
# Direct tool function calls
team = await tools.get_team_members("engineering")
# Full Python language capabilities
expenses_by_user = {
member.id: await tools.get_expenses(member.id, "Q3")
for member in team
}
# Arbitrary complexity data processing
analysis = perform_complex_analysis(expenses_by_user)
return analysis
Key differences:
- PTC: Tool calls go through special API mechanisms with caller/callee relationships
- Minion: Tools are ordinary Python async functions; LLM generates standard code
Why This Architecture Matters
As AI Agents move toward production, the core challenges we face are:
- Scale: Processing millions of records can't all fit in context
- Reliability: Production systems need deterministic error handling
- Cost: Token consumption directly impacts commercial viability
- Performance: User experience demands sub-second responses
Traditional single tool call patterns hit bottlenecks on all these dimensions. Code orchestration patterns (whether PTC or minion) provide breakthroughs:
Traditional: LLM <-> Tool <-> LLM <-> Tool <-> LLM
(slow) (expensive) (fragile)
Orchestration: LLM -> [Code: Tool+Tool+Tool+Processing] -> LLM
(fast) (economical) (reliable)
1. Validated Architecture
PTC's release proves our architectural choices were correct—this isn't speculative design but conclusions independently reached by industry leaders.
2. First-Mover Advantage
Before PTC became an official feature, minion had already accumulated experience and best practices in production environments.
3. Broader Applicability
- Supports multiple LLM backends (Claude, GPT-4, open-source models)
- Flexible deployment options (cloud, local, hybrid)
- Rich Python ecosystem integration
4. Community and Ecosystem
300+ stars represent not just recognition but a potential user base and contributor community.
Conclusion: The Inevitable Architectural Convergence
Anthropic releasing PTC wasn't accidental—it's the inevitable direction of agent architecture evolution. When you need to build production-grade agents that handle complex tasks, large-scale data, and multi-step workflows, you naturally arrive at this conclusion:
LLMs should focus on what they're good at (understanding and planning), letting code handle what it's good at (execution and transformation).
Minion embraced this philosophy from the beginning and will continue pushing this direction:
- ✅ Today: Complete PTC-style architecture, production-validated
- 🚀 Tomorrow: Smarter tool discovery, more efficient state management
- 🌟 Future: Hybrid reasoning, incremental computation, multi-model collaboration
If you're building AI agents that need to handle real-world complexity, I invite you to:
- Try minion: GitHub Repository
- Join the discussion: Share your use cases and feedback
- Participate in the community: Contribute code, documentation, ideas
This isn't about who thought of a feature first—it's about collectively pushing AI agent architecture in the right direction. PTC's release is good news for the entire ecosystem—it validates this path and will attract more developers to explore the potential of programmatic orchestration.
Let's build the next generation of AI agents together.
Related Resources
Video Demos
- PTC Example - Expense Tracking: https://youtu.be/hDAIB0sF7-k
- Tool Search Tool Example - Create GitHub PR: https://youtu.be/G7dDvza9PO8
Documentation
- Advanced Tool Use Guide: https://github.com/femto/minion/blob/main/docs/advanced_tool_use.md
Contact:
- GitHub: minion framework
- GitHub: minion-agent framework
Top comments (0)