More tools. More capabilities. Better agent.
Right?
Wrong.
Your 50-tool agent is confused, slow, and picks the wrong tool half the time.
Here's why less is more.
The tool accumulation problem
It starts innocently:
# Week 1: Basic tools
tools:
- read_file
- write_file
- run_command
Then requirements come in:
# Week 4: "We need more specific tools"
tools:
- read_file
- read_file_lines
- read_file_head
- read_file_tail
- read_json_file
- read_yaml_file
- read_csv_file
- write_file
- write_file_append
- write_json_file
- patch_file
- run_command
- run_command_async
- run_command_with_timeout
# ... 36 more tools
50 tools. Maximum capability.
Minimum usefulness.
Problem 1: Decision paralysis
The LLM sees 50 tools. It has to pick one.
User: "Read the config file"
Agent thinking:
- read_file?
- read_file_lines?
- read_json_file? (it might be JSON)
- read_yaml_file? (it might be YAML)
- read_config_file? (there's a specific one?)
- run_command with cat?
Agent: *picks read_yaml_file*
File is actually JSON.
Error.
With 5 tools:
User: "Read the config file"
Agent thinking:
- read_file
Agent: *uses read_file*
Works.
Fewer choices = faster, better decisions.
Problem 2: Context bloat
Every tool costs tokens:
# Minimal tool definition: ~50 tokens
{
"name": "read",
"description": "Read a file",
"parameters": {"path": {"type": "string"}}
}
# Typical tool definition: ~150 tokens
{
"name": "read_file_with_line_numbers",
"description": "Read the contents of a file and return with line numbers prefixed to each line. Useful when you need to reference specific lines.",
"parameters": {
"path": {
"type": "string",
"description": "The path to the file to read"
},
"start_line": {
"type": "integer",
"description": "Optional starting line number"
},
"end_line": {
"type": "integer",
"description": "Optional ending line number"
}
}
}
Do the math:
5 tools × 100 tokens = 500 tokens (fixed cost per request)
50 tools × 150 tokens = 7,500 tokens (fixed cost per request)
7,500 tokens before the conversation even starts.
At $0.01/1K tokens, that's $0.075 per request just for tool definitions.
1,000 requests = $75 wasted on tool definitions alone.
Problem 3: Similar tools confuse models
tools:
- name: search_files
description: Search for files by name pattern
- name: search_content
description: Search for content within files
- name: search_code
description: Search for code patterns
- name: find_files
description: Find files matching criteria
- name: grep
description: Search text in files
- name: ripgrep
description: Fast search in files
Six search tools. The model will:
- Pick the wrong one 40% of the time
- Waste tokens trying multiple tools
- Sometimes give up and ask the user
One search tool:
tools:
- name: search
description: Search for text in files. Returns matching lines.
No confusion. Works every time.
Problem 4: Maintenance nightmare
50 tools means:
# 50 tool definitions to maintain
# 50 implementations to keep working
# 50 potential failure points
# 50 things to test
# 50 descriptions to keep accurate
def execute_tool(name, args):
if name == "read_file":
...
elif name == "read_file_lines":
...
elif name == "read_file_head":
...
# ... 47 more elif statements
When something breaks, good luck finding it.
Problem 5: The illusion of specificity
Developers think:
"A specific tool for JSON files will be more reliable than a generic read tool"
Reality:
# "Specific" JSON tool
def read_json_file(path):
with open(path) as f:
return json.load(f)
# Generic read tool
def read_file(path):
with open(path) as f:
return f.read()
# The LLM can parse JSON from read_file output just fine
# It's literally trained on millions of JSON examples
The specificity doesn't help. It just adds another tool to confuse things.
The evidence
I ran an experiment. Same tasks, different tool counts:
Task: "Find all TODO comments and list them"
50-tool agent:
- Attempts: 3.2 average (tried wrong tools first)
- Success rate: 76%
- Tokens used: 4,200 average
5-tool agent:
- Attempts: 1.1 average
- Success rate: 94%
- Tokens used: 1,800 average
The 5-tool agent was faster, cheaper, and more accurate.
The right number of tools
For most agents: 4-6 tools.
The universal toolkit
tools:
- name: read
description: Read a file's contents
- name: write
description: Write content to a file
- name: search
description: Search for text in files
- name: run
description: Run a shell command
Four tools. Covers 90% of coding tasks.
When you need more
# Add domain-specific tools only
tools:
- read
- write
- search
- run
- query_database # If your agent works with DBs
- call_api # If your agent needs external APIs
Six tools. Covers 98% of tasks.
How to consolidate
Before: File reading explosion
tools:
- read_file
- read_file_lines
- read_file_head
- read_file_tail
- read_json_file
- read_yaml_file
- read_csv_file
- read_binary_file
After: One tool, smart implementation
tools:
- name: read
description: Read a file. Supports text, JSON, YAML, CSV. Use 'lines' param for partial reads.
parameters:
- name: path
type: string
required: true
- name: lines
type: string
description: "Optional. Format: 'start:end' (e.g., '1:50' for first 50 lines)"
def read(path, lines=None):
content = open(path).read()
if lines:
start, end = map(int, lines.split(':'))
content = '\n'.join(content.split('\n')[start-1:end])
# Auto-detect and parse structured formats
if path.endswith('.json'):
return f"[JSON file]\n{content}"
elif path.endswith('.yaml') or path.endswith('.yml'):
return f"[YAML file]\n{content}"
return content
One tool. Same capabilities. No confusion.
Before: Search tool sprawl
tools:
- search_files
- search_content
- search_code
- find_files
- grep_files
- regex_search
After: One search tool
tools:
- name: search
description: Search files. Use query for content, pattern for filenames.
parameters:
- name: query
type: string
description: Text or regex to search for in file contents
- name: pattern
type: string
description: Glob pattern for filenames (e.g., "*.py")
- name: path
type: string
description: Directory to search in (default: current)
def search(query=None, pattern=None, path="."):
if query:
# Content search
result = subprocess.run(
f'rg "{query}" {path} --max-count=30',
shell=True, capture_output=True, text=True
)
return result.stdout or "No matches"
if pattern:
# Filename search
result = subprocess.run(
f'find {path} -name "{pattern}" | head -30',
shell=True, capture_output=True, text=True
)
return result.stdout or "No files found"
return "Provide either query or pattern"
Before: Command execution variants
tools:
- run_command
- run_command_async
- run_command_background
- run_command_with_timeout
- run_shell
- execute_script
After: One run tool
tools:
- name: run
description: Run a shell command
parameters:
- name: command
type: string
required: true
- name: timeout
type: integer
description: Timeout in seconds (default: 30)
The consolidation checklist
When reviewing your tools, ask:
-
Can these be merged?
-
read_file+read_json+read_yaml→read -
search_files+search_content→search
-
-
Is this just a parameter?
-
read_file_head→readwithlinesparameter -
run_command_with_timeout→runwithtimeoutparameter
-
-
Does the LLM need this distinction?
-
grepvsripgrep→ The LLM doesn't care, use one -
write_filevswrite_file_append→ One tool,modeparameter
-
-
Is this actually used?
- Log tool usage for a week
- Delete tools with <5% usage
Building lean with Gantz
With Gantz Run, start minimal:
# gantz.yaml - The lean toolkit
tools:
- name: read
description: Read a file
parameters:
- name: path
type: string
required: true
script:
shell: cat "{{path}}"
- name: write
description: Write to a file
parameters:
- name: path
type: string
required: true
- name: content
type: string
required: true
script:
shell: |
cat > "{{path}}" << 'CONTENT'
{{content}}
CONTENT
- name: search
description: Search for text in files
parameters:
- name: query
type: string
required: true
script:
shell: rg "{{query}}" . --max-count=30 || echo "No matches"
- name: run
description: Run a shell command
parameters:
- name: command
type: string
required: true
script:
shell: "{{command}}"
Four tools. Complete agent. Add more only when you have evidence you need them.
The mindset shift
Old thinking:
"What tools might the agent need?"
Builds 50 tools to cover every case
New thinking:
"What's the minimum toolkit that works?"
Builds 5 tools, adds more based on real failures
Summary
| Aspect | 50 Tools | 5 Tools |
|---|---|---|
| Decision speed | Slow (many choices) | Fast (few choices) |
| Accuracy | ~75% right tool | ~95% right tool |
| Token cost | 7,500+ per request | 500 per request |
| Maintenance | Nightmare | Simple |
| Debugging | Hard | Easy |
The best agent isn't the one with the most tools.
It's the one with the right tools.
Start with 5. Add only what you prove you need.
How many tools does your agent have? Have you tried cutting them down?
Top comments (0)