DEV Community

clearloop for OpenWalrus

Posted on • Originally published at openwalrus.xyz

Built-in tools: what your agent can reach

Every coding agent ships a toolbox. But what's actually in it?

The tools an agent has access to define its ceiling. An agent without a browser can't test web apps. An agent without code intelligence can't jump to definitions. An agent without a terminal can't run tests. What products choose to include — and what they leave out — reveals their theory of what an agent should do.

We cataloged the built-in tools in ten AI coding products, all from official documentation and public repos as of March 2026. This is a companion to our survey of built-in agents — that post covered the agents, this one covers what those agents can reach.

What we surveyed

Ten products, same list as the agents survey:

  • Claude Code (Anthropic) — CLI agent
  • GitHub Copilot — IDE + CLI agent
  • Cursor — IDE agent
  • Windsurf (Codeium) — IDE agent
  • Devin (Cognition) — autonomous cloud agent
  • OpenHands (All Hands AI) — multi-agent framework
  • Aider — terminal pair programmer
  • Amazon Q Developer — AWS-integrated agent
  • Gemini Code Assist (Google) — IDE agent
  • Augment Code — IDE agent + orchestration

Product by product

Claude Code

The most granular tool separation we found. Eleven named tools, each with distinct parameters and individually permissioned:

Tool Category Purpose
Read File Read file contents with optional line range
Write File Create or overwrite a file
Edit File Exact string replacement in a file
Glob File Fast file pattern matching
Bash Terminal Execute shell commands
Grep Search Content search built on ripgrep
WebSearch Web Search the web for information
WebFetch Web Fetch and process URL content
NotebookEdit File Edit Jupyter notebook cells
TodoWrite Planning Structured task tracking
Task Agent Launch subagent for complex work

Each tool can be placed in an allow list (auto-approved), deny list (blocked), or default ask list (requires user approval). Subagents receive restricted tool sets — the Explore agent gets read-only tools (Write, Edit denied), Plan gets read-only tools. This per-tool, per-agent permission model is the most fine-grained we found.

GitHub Copilot

The CLI and IDE agent modes share a common tool set, though the exact tool names aren't published the way Claude Code's are:

Category Capabilities
File Read files, edit files, create files
Terminal Run commands, view output
Search Semantic code search, file search by name
Web Web search, browser preview
Agent Delegate to Explore/Task/Plan/Code Review agents

Custom agents (Markdown files with YAML frontmatter) can restrict tool access via a tools list — the same pattern as Claude Code. MCP servers extend the tool set beyond built-in capabilities.

Copilot's agent mode automatically selects which tools to use and can run multiple tool calls in parallel. Tool selection is not user-visible in the same way as Claude Code's individual permission prompts.

Cursor

Ten documented tools available in Agent mode:

Tool Category Purpose
Semantic search Search Search by meaning across indexed codebase
File/folder search Search Find by name, directory, keywords
Web search Web Search the internet
Fetch rules Config Retrieve project rules
Read files File Read text and image files
Edit files File Suggest and auto-apply edits
Run shell commands Terminal Execute terminal commands
Browser control Web Navigate, screenshot, interact with pages
Image generation Vision Generate images
Ask clarifying questions User Request information from user

Browser control is notable — Cursor can navigate to URLs, take screenshots, click elements, and type text. Most products don't ship browser interaction at all. Image generation is also rare; Cursor can generate images as part of its workflow.

Custom Modes restrict which tools are available. Ask Mode removes write capabilities. Manual Mode limits to explicit file editing.

Windsurf

Windsurf's Cascade agent has a smaller, less granular tool set:

Category Capabilities
Search Search and analyze codebase
Web Web search
Terminal Terminal command execution
Code quality Linter integration (auto-fixes lint errors)
Package management Auto-detects and installs missing packages

A hard limit of 20 tool calls per prompt caps how much work the agent can do in a single turn. This is the only product in our survey with a documented tool-call ceiling.

Extensibility is limited to MCP server configuration. There's no per-tool permission model — tools are either available in a mode or not.

Devin

The broadest tool surface in our survey. Devin runs in a cloud IDE environment with full system access:

Category Capabilities
File Full filesystem access (editor + file explorer)
Terminal Multiple terminal sessions
Browser Full Chromium browser (real, not headless)
Search Devin Search with "Deep Mode" for complex queries
Knowledge Devin Wiki (auto-indexed repo documentation)
Review Devin Review (code review with commit application)
Testing Desktop Testing via computer use (Linux)
Git Full git operations

Devin's tools aren't discrete named functions — they're a full operating environment. The agent can open multiple terminals, browse the web, interact with GUIs, and run desktop applications. This is closer to giving the agent a full computer than a set of API tools.

The tradeoff: Devin runs in the cloud, not locally. Everything happens in Cognition's sandboxed VMs.

OpenHands

OpenHands takes the most radical approach to tooling: CodeActAgent unifies all actions into executable code.

Category Implementation
File operations open(), os.path, shell commands
Terminal Bash execution (arbitrary commands)
Python Interactive Python interpreter
Browser Delegated to BrowsingAgent
User interaction Natural language conversation

There are no named tools like "Read" or "Edit." The agent writes Python or bash that does what it needs. Want to read a file? cat file.txt. Want to search? grep -r pattern .. Want to install a package? pip install package.

This "code action space" approach means OpenHands has no tool ceiling — anything you can do in a terminal or Python REPL, the agent can do. But it also means there's no tool-level permission control. You can't say "allow file reads but deny file writes" because both happen through the same execution mechanism. We explored the security implications of this in our tool permissions survey.

Aider

Aider doesn't expose tools in the agent framework sense. Instead, capabilities are built into the conversation loop:

Category Implementation
File editing Built into the LLM response format (diff/whole-file/udiff edit formats)
Code search Repository map via tree-sitter (symbol-level index of entire repo)
Code quality Auto-lint after every LLM edit
Testing /test command runs tests and auto-fixes failures
File management File watching + auto-add when referenced
Git Auto-commit with descriptive messages after each edit
Voice Voice coding support (transcription → code changes)
Vision Image input for vision-capable models

The repository map is Aider's standout capability. It uses tree-sitter to build a symbol-level index of the entire codebase — function signatures, class definitions, method names — and sends a compressed map to the LLM as context. This gives the model a structural understanding of the codebase without reading every file. No other product in our survey uses tree-sitter this way.

No terminal access is exposed to the model directly — Aider runs commands (lint, test) on the model's behalf but doesn't give the model a shell.

Amazon Q Developer

Amazon Q's agent capabilities are organized as specialized features rather than named tools:

Category Capabilities
Code generation Real-time inline suggestions (25+ languages)
File editing Multi-file implementation with test validation
Security Vulnerability scanning (exposed credentials, injection, etc.)
Testing Iterative unit test generation
Documentation In-depth doc generation with data flow diagrams
Code review Logical errors, anti-patterns, security issues
Transformation .NET porting (Windows → Linux), Java version upgrades

The software development agent runs build and test scripts to validate generated code before presenting results. The CLI supports MCP for external tool integration.

Unlike Claude Code or Cursor, Amazon Q doesn't publish a list of discrete, named tools. The agent's capabilities are described as features, not as an API surface.

Gemini Code Assist

The most IDE-integrated tool set. Google's agent mode documentation lists ten named tools for IntelliJ:

Tool Category Purpose
read_file File Retrieve text content
write_file File Write text to files
find_files File Locate files by name or path
list_files File Enumerate directory contents
grep Search Search for text patterns
analyze_current_file Code Intel Check for errors and warnings
resolve_symbol Code Intel Trace symbol declarations
find_usages Code Intel Identify all references to a symbol
git Git Execute git CLI commands
list_vcs_roots Git Return version control repositories

resolve_symbol and find_usages are the standouts. These are code intelligence operations — go-to-definition and find-all-references — that leverage the IDE's language server. No other product in our survey exposes these as first-class agent tools. When Gemini needs to understand how a function is used across a codebase, it can ask the language server rather than grepping for text patterns.

In VS Code, all Gemini CLI built-in tools are available instead. MCP servers extend the set further.

Augment Code

Augment's IDE agent has the broadest integration surface:

Category Capabilities
File File operations (read, write, edit)
Terminal Terminal execution
Search Web search
Vision Image understanding
Multi-repo Cross-repository coordination
Native integrations GitHub, Linear, Jira, Confluence, Notion, Sentry, Stripe
MCP 100+ configurable tools
Multi-model Multiple AI models (Claude, GPT, etc.)

Two implementation details stand out. Parallel tool execution — Augment runs independent tool calls concurrently, claiming 2x faster turns. Most products execute tools sequentially. Native integrations — instead of generic MCP connections, Augment ships purpose-built integrations with project management (Linear, Jira), documentation (Confluence, Notion), and monitoring (Sentry, Stripe) tools. This means the agent can read Jira tickets and Sentry errors without configuring MCP servers.

The inventory at a glance

Product File Ops Terminal Search Web/Browser Code Intel Git Vision
Claude Code Read/Write/Edit/Glob Bash Grep + WebSearch WebFetch via Bash
Copilot Read/Edit Terminal Semantic + file Web search + preview via terminal
Cursor Read/Edit Shell Semantic + file + web Browser control via shell Image gen + read
Windsurf Search/analyze Terminal Web search Linter via terminal
Devin Editor + filesystem Terminal Devin Search Full browser Full git Desktop use
OpenHands via code Bash + Python via code BrowsingAgent via code
Aider Built-in edit Repo map (tree-sitter) tree-sitter Auto-commit Image input
Amazon Q Suggestions + edit Build/test Security scan
Gemini Code Assist read/write/find/list grep + find_files resolve_symbol, find_usages git CLI
Augment File ops Terminal Web search Native integrations GitHub native Image understanding

Three design philosophies

The ten products fall into three approaches to tool design:

Granular named tools. Claude Code and Gemini Code Assist give each operation a distinct name, specific parameters, and independent permissions. Read is not Grep is not Glob. The LLM sees a menu of specific operations and picks the right one. This enables fine-grained permission control — you can allow Read but deny Write, or allow Grep but deny Bash. The cost is more tool definitions consuming context window space, and more decision points where the model can pick the wrong tool.

Code-as-tools. OpenHands and (to a lesser degree) Aider skip the named-tool abstraction. The agent writes executable code — bash or Python — that performs whatever operation it needs. The "tool set" is infinite: anything you can do in a REPL is available. This is maximally expressive but minimally controllable. As we explored in our sandbox permissions survey, the security boundary shifts from "which tools are allowed" to "what can the sandbox environment access."

IDE-integrated tools. Cursor, Gemini Code Assist, and Augment map tools to IDE capabilities. Semantic search uses the IDE's index. resolve_symbol uses the language server. Browser control uses an embedded browser. The agent inherits whatever the IDE can do. This is powerful — code intelligence operations like find-all-references are genuinely useful for refactoring — but ties the agent to a specific IDE runtime.
[Interactive chart — see original post]

What stands out

Code intelligence is the biggest gap. Only Gemini Code Assist ships resolve_symbol and find_usages as named tools. Every other product relies on text search (grep, ripgrep, semantic search) to understand code structure. Text search can find where a function name appears, but it can't distinguish a definition from a call from a string literal. For large-scale refactoring, this difference matters — and it's the clearest area where IDE-integrated agents have an advantage.

Browser interaction is rare. Only Cursor (browser control: navigate, screenshot, click, type) and Devin (full Chromium in cloud VM) ship browser interaction. The other eight products can't test web UIs, can't follow links in documentation, and can't verify rendered output. As agent tasks get more complex, this gap will grow.

The granularity spectrum is wide. Claude Code has 11+ named tools. OpenHands has effectively 2 (bash + Python interpreter). Both ship, both work, and both have users. The tradeoff is control vs. expressiveness — and the bash bypass problem shows that granular tools don't provide real security if the agent also has a shell.

Vision is emerging but uneven. Cursor generates and reads images. Devin has full desktop computer use. Augment understands images. Aider accepts image input. But Claude Code, Copilot, Windsurf, OpenHands, Amazon Q, and Gemini Code Assist are primarily text-only in their tool interactions.

MCP is the escape hatch. Eight of ten products support MCP for adding tools beyond the built-in set. The built-in tools define the floor — the minimum capability surface. MCP raises the ceiling. But no two products ship the same MCP servers by default, so the "extended" tool set varies widely. We discussed MCP's role as a universal extensibility layer in our skills design post.
[Interactive chart — see original post]

What the research says

Tool selection accuracy remains an active research area. The ToolBench benchmark (May 2023) showed that GPT-4 achieved 56.6% pass rate on complex tool-use tasks involving 16,000+ real-world APIs — demonstrating that more tools doesn't automatically mean better performance. Models make selection errors when the tool set is large and tools have overlapping functionality.

The CodeAct paper (February 2024) that inspired OpenHands' approach found that code actions outperformed JSON-based tool calls on 6 of 7 benchmarks, with an average 20% improvement. The argument: LLMs are better at writing code than selecting from a tool menu, so "code is the tool" produces better results.

However, Gorilla (May 2023) showed that fine-tuning on API documentation significantly improves tool-use accuracy, and that constrained API calls (named tools with typed parameters) reduce hallucinated function calls compared to free-form code generation. The granular-tools camp has evidence too.

The tradeoff may not be universal. For coding tasks with well-known operations (read, write, search, run), named tools reduce errors. For novel tasks requiring creative tool composition, code-as-tools offers more flexibility.

Open questions

Will code intelligence tools become standard? Gemini Code Assist is alone in shipping resolve_symbol and find_usages. If agents become primary refactoring tools, every product will need symbol-level operations — not just text search. Will they build it, or will MCP language server integrations fill the gap?

Does tool granularity help or hurt LLM performance? Claude Code has 11+ tools; OpenHands has 2. ToolBench suggests more tools can reduce accuracy, but CodeAct suggests code beats API calls. The answer may depend on the model — larger models handle more tools better, but tool-call overhead costs tokens regardless of model size.

Will browser interaction become baseline? Cursor and Devin have it. Eight products don't. As agents take on full-stack tasks (frontend + backend + testing), can they remain effective without seeing the rendered page?

Does "code-as-tools" scale? OpenHands' approach is elegant — infinite expressiveness, zero tool ceiling. But it means every operation goes through bash or Python, making audit trails harder to parse and permissions harder to enforce. Does this matter at scale, or is it a theoretical concern?

Should the built-in tool set be standardized? MCP standardizes the protocol for adding tools. But there's no standard for what tools should ship built-in. If you write an MCP server that provides file operations, does it need to match Claude Code's Read/Write/Edit/Glob interface, or can it define its own? Tool portability across products doesn't exist yet.

What's the right tool-call limit? Windsurf caps at 20 tool calls per prompt. Most products have no documented limit. Is a limit a safety feature (prevents runaway agents) or a capability ceiling (prevents complex multi-step work)?

What this means for walrus

Walrus exposes capabilities to agents through WHS hooks — and the design questions in this survey map directly to WHS architecture.

The granularity question applies to hooks. Should a WHS memory hook expose fine-grained operations (store, query, delete, list) or a single broad operation (execute_memory_operation)? The Claude Code/Gemini approach (granular named tools) enables per-operation permissions. The OpenHands approach (code-as-tools) maximizes expressiveness. WHS currently leans toward granularity — each hook has a typed protobuf interface — and this survey suggests that's the right call for permission control.

Code intelligence is a differentiation opportunity. Nine of ten products can't do resolve_symbol or find_usages. Only Gemini Code Assist ships it, and only because it integrates with the IDE's language server. A WHS hook that provides language-server-style code intelligence (backed by tree-sitter, LSP, or a custom index) would give walrus-powered agents a capability most competitors lack.

Tool-call limits are worth considering. Windsurf's 20-call cap prevents runaway tool use. WHS hooks could implement per-hook rate limits — a memory hook might allow 50 operations per turn, while an inference hook might allow 1. This is more granular than a global tool-call cap and maps naturally to the hook lifecycle.

Further reading


Originally published at OpenWalrus.

Top comments (0)