DEV Community

Cover image for Why Your 50-Tool Agent is Worse Than a 5-Tool One
Gantz AI for Gantz

Posted on

Why Your 50-Tool Agent is Worse Than a 5-Tool One

More tools. More capabilities. Better agent.

Right?

Wrong.

Your 50-tool agent is confused, slow, and picks the wrong tool half the time.

Here's why less is more.

The tool accumulation problem

It starts innocently:

# Week 1: Basic tools
tools:
  - read_file
  - write_file
  - run_command
Enter fullscreen mode Exit fullscreen mode

Then requirements come in:

# Week 4: "We need more specific tools"
tools:
  - read_file
  - read_file_lines
  - read_file_head
  - read_file_tail
  - read_json_file
  - read_yaml_file
  - read_csv_file
  - write_file
  - write_file_append
  - write_json_file
  - patch_file
  - run_command
  - run_command_async
  - run_command_with_timeout
  # ... 36 more tools
Enter fullscreen mode Exit fullscreen mode

50 tools. Maximum capability.

Minimum usefulness.

Problem 1: Decision paralysis

The LLM sees 50 tools. It has to pick one.

User: "Read the config file"

Agent thinking:
- read_file?
- read_file_lines?
- read_json_file? (it might be JSON)
- read_yaml_file? (it might be YAML)
- read_config_file? (there's a specific one?)
- run_command with cat?

Agent: *picks read_yaml_file*
File is actually JSON.
Error.
Enter fullscreen mode Exit fullscreen mode

With 5 tools:

User: "Read the config file"

Agent thinking:
- read_file

Agent: *uses read_file*
Works.
Enter fullscreen mode Exit fullscreen mode

Fewer choices = faster, better decisions.

Problem 2: Context bloat

Every tool costs tokens:

# Minimal tool definition: ~50 tokens
{
    "name": "read",
    "description": "Read a file",
    "parameters": {"path": {"type": "string"}}
}

# Typical tool definition: ~150 tokens
{
    "name": "read_file_with_line_numbers",
    "description": "Read the contents of a file and return with line numbers prefixed to each line. Useful when you need to reference specific lines.",
    "parameters": {
        "path": {
            "type": "string",
            "description": "The path to the file to read"
        },
        "start_line": {
            "type": "integer",
            "description": "Optional starting line number"
        },
        "end_line": {
            "type": "integer",
            "description": "Optional ending line number"
        }
    }
}
Enter fullscreen mode Exit fullscreen mode

Do the math:

5 tools × 100 tokens = 500 tokens (fixed cost per request)
50 tools × 150 tokens = 7,500 tokens (fixed cost per request)
Enter fullscreen mode Exit fullscreen mode

7,500 tokens before the conversation even starts.

At $0.01/1K tokens, that's $0.075 per request just for tool definitions.

1,000 requests = $75 wasted on tool definitions alone.

Problem 3: Similar tools confuse models

tools:
  - name: search_files
    description: Search for files by name pattern

  - name: search_content
    description: Search for content within files

  - name: search_code
    description: Search for code patterns

  - name: find_files
    description: Find files matching criteria

  - name: grep
    description: Search text in files

  - name: ripgrep
    description: Fast search in files
Enter fullscreen mode Exit fullscreen mode

Six search tools. The model will:

  • Pick the wrong one 40% of the time
  • Waste tokens trying multiple tools
  • Sometimes give up and ask the user

One search tool:

tools:
  - name: search
    description: Search for text in files. Returns matching lines.
Enter fullscreen mode Exit fullscreen mode

No confusion. Works every time.

Problem 4: Maintenance nightmare

50 tools means:

# 50 tool definitions to maintain
# 50 implementations to keep working
# 50 potential failure points
# 50 things to test
# 50 descriptions to keep accurate

def execute_tool(name, args):
    if name == "read_file":
        ...
    elif name == "read_file_lines":
        ...
    elif name == "read_file_head":
        ...
    # ... 47 more elif statements
Enter fullscreen mode Exit fullscreen mode

When something breaks, good luck finding it.

Problem 5: The illusion of specificity

Developers think:

"A specific tool for JSON files will be more reliable than a generic read tool"

Reality:

# "Specific" JSON tool
def read_json_file(path):
    with open(path) as f:
        return json.load(f)

# Generic read tool
def read_file(path):
    with open(path) as f:
        return f.read()

# The LLM can parse JSON from read_file output just fine
# It's literally trained on millions of JSON examples
Enter fullscreen mode Exit fullscreen mode

The specificity doesn't help. It just adds another tool to confuse things.

The evidence

I ran an experiment. Same tasks, different tool counts:

Task: "Find all TODO comments and list them"

50-tool agent:
- Attempts: 3.2 average (tried wrong tools first)
- Success rate: 76%
- Tokens used: 4,200 average

5-tool agent:
- Attempts: 1.1 average
- Success rate: 94%
- Tokens used: 1,800 average
Enter fullscreen mode Exit fullscreen mode

The 5-tool agent was faster, cheaper, and more accurate.

The right number of tools

For most agents: 4-6 tools.

The universal toolkit

tools:
  - name: read
    description: Read a file's contents

  - name: write
    description: Write content to a file

  - name: search
    description: Search for text in files

  - name: run
    description: Run a shell command
Enter fullscreen mode Exit fullscreen mode

Four tools. Covers 90% of coding tasks.

When you need more

# Add domain-specific tools only
tools:
  - read
  - write
  - search
  - run
  - query_database    # If your agent works with DBs
  - call_api          # If your agent needs external APIs
Enter fullscreen mode Exit fullscreen mode

Six tools. Covers 98% of tasks.

How to consolidate

Before: File reading explosion

tools:
  - read_file
  - read_file_lines
  - read_file_head
  - read_file_tail
  - read_json_file
  - read_yaml_file
  - read_csv_file
  - read_binary_file
Enter fullscreen mode Exit fullscreen mode

After: One tool, smart implementation

tools:
  - name: read
    description: Read a file. Supports text, JSON, YAML, CSV. Use 'lines' param for partial reads.
    parameters:
      - name: path
        type: string
        required: true
      - name: lines
        type: string
        description: "Optional. Format: 'start:end' (e.g., '1:50' for first 50 lines)"
Enter fullscreen mode Exit fullscreen mode
def read(path, lines=None):
    content = open(path).read()

    if lines:
        start, end = map(int, lines.split(':'))
        content = '\n'.join(content.split('\n')[start-1:end])

    # Auto-detect and parse structured formats
    if path.endswith('.json'):
        return f"[JSON file]\n{content}"
    elif path.endswith('.yaml') or path.endswith('.yml'):
        return f"[YAML file]\n{content}"

    return content
Enter fullscreen mode Exit fullscreen mode

One tool. Same capabilities. No confusion.

Before: Search tool sprawl

tools:
  - search_files
  - search_content
  - search_code
  - find_files
  - grep_files
  - regex_search
Enter fullscreen mode Exit fullscreen mode

After: One search tool

tools:
  - name: search
    description: Search files. Use query for content, pattern for filenames.
    parameters:
      - name: query
        type: string
        description: Text or regex to search for in file contents
      - name: pattern
        type: string
        description: Glob pattern for filenames (e.g., "*.py")
      - name: path
        type: string
        description: Directory to search in (default: current)
Enter fullscreen mode Exit fullscreen mode
def search(query=None, pattern=None, path="."):
    if query:
        # Content search
        result = subprocess.run(
            f'rg "{query}" {path} --max-count=30',
            shell=True, capture_output=True, text=True
        )
        return result.stdout or "No matches"

    if pattern:
        # Filename search
        result = subprocess.run(
            f'find {path} -name "{pattern}" | head -30',
            shell=True, capture_output=True, text=True
        )
        return result.stdout or "No files found"

    return "Provide either query or pattern"
Enter fullscreen mode Exit fullscreen mode

Before: Command execution variants

tools:
  - run_command
  - run_command_async
  - run_command_background
  - run_command_with_timeout
  - run_shell
  - execute_script
Enter fullscreen mode Exit fullscreen mode

After: One run tool

tools:
  - name: run
    description: Run a shell command
    parameters:
      - name: command
        type: string
        required: true
      - name: timeout
        type: integer
        description: Timeout in seconds (default: 30)
Enter fullscreen mode Exit fullscreen mode

The consolidation checklist

When reviewing your tools, ask:

  1. Can these be merged?

    • read_file + read_json + read_yamlread
    • search_files + search_contentsearch
  2. Is this just a parameter?

    • read_file_headread with lines parameter
    • run_command_with_timeoutrun with timeout parameter
  3. Does the LLM need this distinction?

    • grep vs ripgrep → The LLM doesn't care, use one
    • write_file vs write_file_append → One tool, mode parameter
  4. Is this actually used?

    • Log tool usage for a week
    • Delete tools with <5% usage

Building lean with Gantz

With Gantz Run, start minimal:

# gantz.yaml - The lean toolkit
tools:
  - name: read
    description: Read a file
    parameters:
      - name: path
        type: string
        required: true
    script:
      shell: cat "{{path}}"

  - name: write
    description: Write to a file
    parameters:
      - name: path
        type: string
        required: true
      - name: content
        type: string
        required: true
    script:
      shell: |
        cat > "{{path}}" << 'CONTENT'
        {{content}}
        CONTENT

  - name: search
    description: Search for text in files
    parameters:
      - name: query
        type: string
        required: true
    script:
      shell: rg "{{query}}" . --max-count=30 || echo "No matches"

  - name: run
    description: Run a shell command
    parameters:
      - name: command
        type: string
        required: true
    script:
      shell: "{{command}}"
Enter fullscreen mode Exit fullscreen mode

Four tools. Complete agent. Add more only when you have evidence you need them.

The mindset shift

Old thinking:

"What tools might the agent need?"
Builds 50 tools to cover every case

New thinking:

"What's the minimum toolkit that works?"
Builds 5 tools, adds more based on real failures

Summary

Aspect 50 Tools 5 Tools
Decision speed Slow (many choices) Fast (few choices)
Accuracy ~75% right tool ~95% right tool
Token cost 7,500+ per request 500 per request
Maintenance Nightmare Simple
Debugging Hard Easy

The best agent isn't the one with the most tools.

It's the one with the right tools.

Start with 5. Add only what you prove you need.


How many tools does your agent have? Have you tried cutting them down?

Top comments (0)