DEV Community

NEE
NEE

Posted on

Deep Dive into Open Agent SDK (Part 2): Behind the 34 Built-in Tools

The previous article analyzed how the Agent Loop works, including one crucial step: "execute tools." When the LLM says "I need to call Bash," the SDK actually spawns a process to run the command. But the tool system behind this is far more nuanced than simply "calling a function." How are 34 built-in tools organized? How do you safely convert the LLM's JSON input into Swift types? How do you control which tools are available?

This article starts from the protocol definition and examines the Open Agent SDK tool system layer by layer.

ToolProtocol: What a Tool Looks Like

Every tool in the SDK conforms to the ToolProtocol protocol:

public protocol ToolProtocol: Sendable {
    var name: String { get }
    var description: String { get }
    var inputSchema: ToolInputSchema { get }
    var isReadOnly: Bool { get }
    var annotations: ToolAnnotations? { get }

    func call(input: Any, context: ToolContext) async -> ToolResult
}
Enter fullscreen mode Exit fullscreen mode

Five properties and one method. Let's go through each.

name is the tool's unique identifier. The LLM uses this name in tool_use blocks to specify which tool to invoke. All built-in tools use PascalCase naming: Read, Bash, Glob, CronCreate.

description is the tool description shown to the LLM. This text is included as part of the tool definition sent to the API, and its quality directly affects when the LLM chooses to invoke this tool.

inputSchema is a [String: Any] JSON Schema dictionary describing the input structure the tool accepts. It's passed as-is to the input_schema field in API calls.

isReadOnly is a boolean flag telling the Agent Loop whether the tool has side effects. As mentioned in the previous article, the Agent Loop uses this field for bucketing: read-only tools execute concurrently, mutation tools execute serially.

annotations are optional behavioral hints containing four boolean fields:

public struct ToolAnnotations: Sendable, Equatable {
    public let readOnlyHint: Bool       // Read-only, no side effects
    public let destructiveHint: Bool    // May perform irreversible operations
    public let idempotentHint: Bool     // Idempotent, multiple calls produce the same result
    public let openWorldHint: Bool      // Interacts with the external world
}
Enter fullscreen mode Exit fullscreen mode

Note that destructiveHint defaults to true — the SDK takes a "default dangerous" stance, requiring tools to proactively declare themselves safe. These hints don't affect the SDK's own execution logic, but the LLM references them when deciding how to use tools.

ToolResult and ToolExecuteResult

The call() method returns ToolResult, the content fed back to the LLM after tool execution:

public struct ToolResult: Sendable {
    public let toolUseId: String         // Corresponds to the LLM's tool_use ID
    public let content: String           // Text content
    public let typedContent: [ToolContent]?  // Multi-modal content (text, images, resource references)
    public let isError: Bool             // Whether this is an error result
}
Enter fullscreen mode Exit fullscreen mode

There's a compatibility design between content and typedContent: when typedContent has a value, content extracts all .text types and concatenates them; otherwise it returns the stored string directly. This way, older code using only content still works, while new code can use typedContent for non-text content like images.

ToolContent is an enum supporting three content types:

public enum ToolContent: Sendable {
    case text(String)
    case image(data: Data, mimeType: String)
    case resource(uri: String, name: String?)
}
Enter fullscreen mode Exit fullscreen mode

Inside tool closures, ToolExecuteResult is used — structurally almost identical to ToolResult, just missing toolUseId (this ID is auto-filled by the calling layer).

ToolContext: The Tool's Runtime Environment

ToolContext is injected context for each tool execution, with many fields:

Field Purpose
cwd Current working directory
toolUseId tool_use ID for this invocation
agentSpawner Sub-agent spawner (used by AgentTool)
cronStore Scheduled task store (used by CronTools)
todoStore Todo item store (used by TodoWrite)
worktreeStore Worktree store (used by WorktreeTools)
planStore Plan mode store (used by PlanTools)
taskStore Task management store (used by Task*Tools)
mailboxStore Mailbox store (used by SendMessage)
teamStore Team store (used by TeamCreate)
hookRegistry Hook event registry
permissionMode Permission mode
canUseTool Custom permission check callback
skillRegistry Skill registry (used by SkillTool)
restrictionStack Tool restriction stack
sandbox Sandbox settings
mcpConnections MCP connection info
fileCache File cache
env Custom environment variables

With this many optional fields, the rule is simple: inject what a tool needs; everything else is nil. The Read tool only looks at cwd, sandbox, fileCache; AgentTool only looks at agentSpawner; CronTools only looks at cronStore. Each tool depends only on its specific Store, unaware of and unconcerned with other Stores.

ToolContext also provides two copy methods: withToolUseId() for updating the call ID (called by ToolExecutor on each tool execution), and withSkillContext() for incrementing skill nesting depth (used when SkillTool calls sub-skills).

Three-Tier Tool Architecture

The SDK divides 34 tools into three tiers: Core (10), Advanced (11), and Specialist (13).

Core Tier (10)        Advanced Tier (11)      Specialist Tier (13)
┌──────────┐         ┌──────────────┐        ┌───────────────┐
│ Read      │         │ Agent        │        │ CronCreate    │
│ Write     │         │ Skill        │        │ CronDelete    │
│ Edit      │         │ TaskCreate   │        │ CronList      │
│ Glob      │         │ TaskGet      │        │ LSP           │
│ Grep      │         │ TaskList     │        │ Config        │
│ Bash      │         │ TaskOutput   │        │ TodoWrite     │
│ AskUser   │         │ TaskStop     │        │ EnterPlanMode │
│ ToolSearch│         │ TaskUpdate   │        │ ExitPlanMode  │
│ WebFetch  │         │ SendMessage  │        │ EnterWorktree │
│ WebSearch │         │ TeamCreate   │        │ ExitWorktree  │
└──────────┘         │ TeamDelete   │        │ RemoteTrigger │
                     │ NotebookEdit │        │ ListMcpRes    │
                     └──────────────┘        │ ReadMcpRes    │
                                              └───────────────┘
Enter fullscreen mode Exit fullscreen mode

The tier distinction is based not on technical implementation difficulty, but on dependency complexity and use case.

Core Tier: File System and Shell

The 10 Core tools are the Agent's foundational capabilities — reading files, writing files, searching code, running commands. They share a common trait: they only depend on basic ToolContext fields (cwd, sandbox, fileCache), requiring no Store injection.

Take the Read tool. Its input is a file path with optional offset and limit:

private struct FileReadInput: Codable {
    let file_path: String
    let offset: Int?
    let limit: Int?
}
Enter fullscreen mode Exit fullscreen mode

The execution logic is straightforward: resolve path → check sandbox → query cache → read file → paginate → return content with line numbers. A file caching detail: if context.fileCache is available, it checks the cache first, skipping disk I/O on a hit.

The Bash tool is much more complex, handling timeouts, output truncation, and background processes. Bash's input has 5 fields:

private struct BashInput: Codable {
    let command: String
    let timeout: Int?
    let description: String?
    let runInBackground: Bool?
    let dangerouslyDisableSandbox: Bool?
}
Enter fullscreen mode Exit fullscreen mode

Key implementation details:

  1. Timeout control. Default 120 seconds, maximum 600 seconds. Uses DispatchQueue.global().asyncAfter for timeout, calling process.terminate() when time's up.
  2. Output truncation. Output exceeding 100,000 characters keeps only the first 50,000 + last 50,000, connected with ...(truncated)....
  3. Background execution. When run_in_background = true, the process starts and returns a task ID immediately without waiting for completion.
  4. Process output collection uses ProcessOutputAccumulator, marked @unchecked Sendable because Pipe's readability handler and termination handler both dispatch on the same run loop queue, preventing data races.

Bash's annotations sets destructiveHint: true, explicitly telling the LLM this tool is destructive.

Advanced Tier: Sub-Agents and Task Orchestration

Advanced tier tools start requiring external dependencies — AgentTool needs agentSpawner, Task* tools need taskStore, SendMessage needs mailboxStore and teamStore.

The Agent tool is representative of this tier. Its purpose is letting the LLM "dispatch a sub-agent" for complex tasks:

public func createAgentTool() -> ToolProtocol {
    return defineTool(
        name: "Agent",
        description: "Launch a subagent to handle complex, multi-step tasks autonomously.",
        inputSchema: agentToolSchema,
        isReadOnly: false
    ) { (input: AgentToolInput, context: ToolContext) async throws -> ToolExecuteResult in
        guard let spawner = context.agentSpawner else {
            return ToolExecuteResult(
                content: "Error: Agent spawner not available.",
                isError: true
            )
        }
        // Parse built-in agent types, permission mode, then spawn sub-agent
        let result = await spawner.spawn(
            prompt: input.prompt,
            model: input.model ?? agentDef?.model,
            systemPrompt: agentDef?.systemPrompt,
            allowedTools: agentDef?.tools,
            ...
        )
        return ToolExecuteResult(content: result.text, isError: result.isError)
    }
}
Enter fullscreen mode Exit fullscreen mode

AgentTool's input supports 11 fields: prompt, description, subagent_type, model, name, maxTurns, run_in_background, isolation, team_name, mode, resume. The subagent_type can specify built-in Explore or Plan types, or use a custom name.

Note that agentSpawner is injected through ToolContext as a protocol type — AgentTool doesn't know how sub-agents are created. It just calls spawner.spawn(), with the concrete implementation injected by the Core layer. This dependency inversion means the Tools layer never needs to import the Core module.

Specialist Tier: Domain-Specific Tools

Specialist tier tools have heavier dependencies — each needs its own dedicated Store, and their functionality is highly domain-specific.

CronTools is a set of three tools: CronCreate, CronDelete, CronList, accessing scheduled task storage via context.cronStore:

public func createCronCreateTool() -> ToolProtocol {
    return defineTool(
        name: "CronCreate",
        description: "Create a scheduled recurring task (cron job).",
        inputSchema: cronCreateSchema,
        isReadOnly: false
    ) { (input: CronCreateInput, context: ToolContext) async throws -> ToolExecuteResult in
        guard let cronStore = context.cronStore else {
            return ToolExecuteResult(content: "Error: CronStore not available.", isError: true)
        }
        let job = await cronStore.create(
            name: input.name,
            schedule: input.schedule,
            command: input.command
        )
        return ToolExecuteResult(
            content: "Cron job created: \(job.id) \"\(job.name)\"",
            isError: false
        )
    }
}
Enter fullscreen mode Exit fullscreen mode

All three tools use guard let cronStore = context.cronStore for pre-checks — if the Store isn't injected, they return an error rather than crashing.

The LSP tool is another interesting example. It uses grep to simulate common Language Server Protocol operations (go to definition, find references, symbol search) without depending on an actual language server:

case "goToDefinition", "goToImplementation":
    // 1. Extract symbol name at cursor position using regex
    guard let symbol = getSymbolAtPosition(
        filePath: filePath, line: line, character: character
    ) else { ... }

    // 2. Grep search for definition patterns
    let pattern = "(func|class|struct|enum|protocol|typealias|let|var|export)\\s+\(symbol)"
    let results = await runGrep(
        arguments: ["grep", "-rn", "-E", pattern, cwd],
        cwd: cwd
    )
Enter fullscreen mode Exit fullscreen mode

LSP depends only on context.cwd, requiring no Store — the lightest tool in the Specialist tier.

defineTool: The Factory Function for Custom Tools

The SDK provides the defineTool factory function, letting developers create ToolProtocol-conforming tools with minimal code. It has four overloads covering different use cases.

Basic: Codable Input + String Output

The most commonly used overload accepts a Codable input type and a closure returning String:

let greetTool = defineTool(
    name: "Greet",
    description: "Generate a greeting message.",
    inputSchema: [
        "type": "object",
        "properties": [
            "name": ["type": "string", "description": "Person's name"]
        ],
        "required": ["name"]
    ],
    isReadOnly: true
) { (input: GreetInput, context: ToolContext) async throws -> String in
    return "Hello, \(input.name)!"
}

// Input type only needs to conform to Codable
struct GreetInput: Codable {
    let name: String
}
Enter fullscreen mode Exit fullscreen mode

Internally, defineTool does four things:

  1. Casts the LLM's Any type input to [String: Any]
  2. Serializes to Data using JSONSerialization
  3. Decodes to your defined Input type using JSONDecoder
  4. Calls your closure

If any step fails (input isn't a dictionary, JSON serialization fails, decoding fails, closure throws), it returns an isError: true result instead of crashing the Agent Loop. This means you can safely use try in your closures — errors are gracefully caught.

Structured Output: ToolExecuteResult

If a tool needs to explicitly mark errors (rather than using try to throw), use the overload returning ToolExecuteResult:

let divideTool = defineTool(
    name: "Divide",
    description: "Divide two numbers.",
    inputSchema: [
        "type": "object",
        "properties": [
            "a": ["type": "number"],
            "b": ["type": "number"]
        ],
        "required": ["a", "b"]
    ]
) { (input: DivideInput, context: ToolContext) async throws -> ToolExecuteResult in
    guard input.b != 0 else {
        return ToolExecuteResult(content: "Error: Division by zero.", isError: true)
    }
    return ToolExecuteResult(content: "\(input.a / input.b)", isError: false)
}
Enter fullscreen mode Exit fullscreen mode

Most built-in tools use this overload because many errors are logic-level (file doesn't exist, Store not injected) and aren't well represented by exceptions.

No Input: NoInputTool

Some tools don't need input parameters (e.g., list operations, health checks):

let listTool = defineTool(
    name: "ListItems",
    description: "List all items.",
    inputSchema: ["type": "object", "properties": [:]]
) { (context: ToolContext) async throws -> String in
    return "No items found."
}
Enter fullscreen mode Exit fullscreen mode

The closure only receives ToolContext, completely ignoring input.

Raw Dictionary Input: RawInputTool

The final overload skips Codable decoding, passing the raw [String: Any] dictionary directly to the closure. Useful when input field types are dynamic — e.g., ConfigTool's value field can be a string, number, boolean, array, object, or null:

let configTool = defineTool(
    name: "Config",
    description: "Read or write configuration values.",
    inputSchema: configSchema
) { (input: [String: Any], context: ToolContext) async -> ToolExecuteResult in
    let key = input["key"] as? String ?? ""
    let value = input["value"]  // Any type
    // ...
}
Enter fullscreen mode Exit fullscreen mode

CodingKeys for snake_case

LLM-sent JSON field names typically use snake_case (e.g., file_path, run_in_background), but Swift convention is camelCase. Input types map between them using the CodingKeys enum:

private struct BashInput: Codable {
    let command: String
    let runInBackground: Bool?

    private enum CodingKeys: String, CodingKey {
        case command
        case runInBackground = "run_in_background"
    }
}
Enter fullscreen mode Exit fullscreen mode

This is standard Swift Codable practice — defineTool's internal JSONDecoder automatically uses CodingKeys for field name conversion.

Tool Pool Assembly and Filtering

Tools aren't just thrown at the LLM wholesale. The SDK has an assembly and filtering mechanism.

assembleToolPool

assembleToolPool merges three tool sources into a deduplicated tool pool:

public func assembleToolPool(
    baseTools: [ToolProtocol],     // SDK built-in tools
    customTools: [ToolProtocol]?,  // User-defined custom tools
    mcpTools: [ToolProtocol]?,     // MCP server-provided tools
    allowed: [String]?,
    disallowed: [String]?
) -> [ToolProtocol] {
    // 1. Merge all sources: base + custom + MCP
    var combined = baseTools
    if let customTools { combined.append(contentsOf: customTools) }
    if let mcpTools { combined.append(contentsOf: mcpTools) }

    // 2. Deduplicate by name (latter overwrites former)
    var byName = [String: ToolProtocol]()
    for tool in combined {
        byName[tool.name] = tool
    }

    // 3. Apply filtering rules
    return filterTools(
        tools: Array(byName.values),
        allowed: allowed,
        disallowed: disallowed
    )
}
Enter fullscreen mode Exit fullscreen mode

Deduplication uses a Dictionary — same-named tools encountered later overwrite earlier ones. This means the priority is: MCP > custom > built-in — users can replace built-in tools with custom or MCP tools of the same name.

filterTools

filterTools implements allowlist/denylist filtering:

public func filterTools(
    tools: [ToolProtocol],
    allowed: [String]?,       // Allowlist, nil or empty means no filter
    disallowed: [String]?     // Denylist, nil or empty means no filter
) -> [ToolProtocol] {
    var filtered = tools
    // Apply allowlist first
    if let allowed, !allowed.isEmpty {
        let allowedSet = Set(allowed)
        filtered = filtered.filter { allowedSet.contains($0.name) }
    }
    // Then apply denylist (denylist takes priority over allowlist)
    if let disallowed, !disallowed.isEmpty {
        let disallowedSet = Set(disallowed)
        filtered = filtered.filter { !disallowedSet.contains($0.name) }
    }
    return filtered
}
Enter fullscreen mode Exit fullscreen mode

When both rules exist, the denylist takes priority — even if a tool is in the allowlist, it's excluded if it appears in the denylist.

ToolRestrictionStack: Skills System Tool Restrictions

ToolRestrictionStack is a stack structure used by the Skills system to control tool visibility. When a Skill configures toolRestrictions, it pushes restrictions before execution and pops them after:

let stack = ToolRestrictionStack()
stack.push([.bash, .read])     // Skill A: only Bash and Read
stack.push([.grep, .glob])     // Skill B (nested): only Grep and Glob
// currentAllowedToolNames now returns only Grep and Glob
stack.pop()                     // Skill B done → back to Bash and Read
stack.pop()                     // Skill A done → restore all tools
Enter fullscreen mode Exit fullscreen mode

The stack's LIFO nature ensures correct behavior for nested Skills — inner Skill restrictions override outer ones, automatically restored on exit. Thread safety is ensured by an internal serial DispatchQueue.

currentAllowedToolNames logic is simple: empty stack returns all tools; non-empty stack returns only tool names in the top restriction list.

toApiTool: Converting Tools to API Format

The final step is converting tools to the format required by the Anthropic API:

public func toApiTool(_ tool: ToolProtocol) -> [String: Any] {
    var result: [String: Any] = [
        "name": tool.name,
        "description": tool.description,
        "input_schema": tool.inputSchema
    ]
    if let annotations = tool.annotations {
        result["annotations"] = [
            "readOnlyHint": annotations.readOnlyHint,
            "destructiveHint": annotations.destructiveHint,
            "idempotentHint": annotations.idempotentHint,
            "openWorldHint": annotations.openWorldHint
        ]
    }
    return result
}
Enter fullscreen mode Exit fullscreen mode

annotations are only included when present — saving tokens.

A Complete Custom Tool Example

Tying everything together, here's a custom tool you can run directly — fetching weather:

import Foundation
import OpenAgentSDK

// 1. Define input type
struct WeatherInput: Codable {
    let city: String
    let unit: String?  // "celsius" or "fahrenheit"

    private enum CodingKeys: String, CodingKey {
        case city, unit
    }
}

// 2. Create tool with defineTool
let weatherTool = defineTool(
    name: "Weather",
    description: "Get current weather for a city.",
    inputSchema: [
        "type": "object",
        "properties": [
            "city": [
                "type": "string",
                "description": "City name, e.g. 'Beijing'"
            ],
            "unit": [
                "type": "string",
                "enum": ["celsius", "fahrenheit"],
                "description": "Temperature unit, defaults to celsius"
            ]
        ],
        "required": ["city"]
    ],
    isReadOnly: true,
    annotations: ToolAnnotations(
        readOnlyHint: true,
        destructiveHint: false,
        openWorldHint: true  // Needs to access external API
    )
) { (input: WeatherInput, context: ToolContext) async throws -> ToolExecuteResult in
    let unit = input.unit ?? "celsius"
    // Call weather API (specific implementation omitted)
    let weather = try await fetchWeather(city: input.city, unit: unit)
    return ToolExecuteResult(content: weather, isError: false)
}

// 3. Register with Agent
let agent = createAgent(options: AgentOptions(
    apiKey: "sk-...",
    model: "claude-sonnet-4-6",
    customTools: [weatherTool]  // Custom tools automatically join the tool pool
))
Enter fullscreen mode Exit fullscreen mode

This tool gets merged, deduplicated, and filtered by assembleToolPool along with built-in tools, then sent to the LLM. When the LLM sees the tool definition, it automatically invokes it when it needs weather data. defineTool's internal Codable bridge automatically decodes the LLM's JSON into WeatherInput — you don't need to handle any JSON parsing manually.

Summary

The tool system's design philosophy can be summarized in a few keywords:

Protocol-driven. ToolProtocol specifies only the shape of a tool (name, description, input schema, execution method), not how tools are implemented. This means built-in and custom tools follow the exact same code path.

Dependency injection. ToolContext's 20+ optional fields look like a lot, but each tool only reads the fields it needs. AgentTool doesn't know CronStore exists; CronCreate doesn't know SubAgentSpawner exists.

Tiered organization. The Core/Advanced/Specialist tiers aren't code layers (their code structure is identical), but a division by dependency complexity. Core tools run independently, Advanced tools need Stores, Specialist tools need more specialized domain infrastructure.

Fault tolerance first. defineTool wraps all potential failure points (type casting, serialization, decoding, execution) in do/catch blocks. Any error returns isError: true instead of crashing. Tool errors in the Agent Loop don't propagate — the LLM gets the error message and can adjust strategy.

The next article covers MCP Integration: how the SDK connects to external tool servers, converts MCP tools to ToolProtocol, and coexists with built-in tools in the Agent Loop.


Deep Dive into Open Agent SDK (Swift) Series:

GitHub: terryso/open-agent-sdk-swift

Top comments (0)