DEV Community

NEE
NEE

Posted on

Deep Dive into Open Agent SDK (Part 1): Agent Loop Internals

Most LLM wrapper libraries do three things: send a request, get a response, done. But a true Agent goes further — it decides whether to call tools, executes them, feeds the results back to the LLM, and loops until it arrives at a final answer. This loop is the Agent Loop.

This article analyzes the Open Agent SDK (Swift) Agent Loop implementation — how it uses native Swift concurrency to run the entire cycle in-process.

What Is the Agent Loop?

In one sentence: user sends a prompt → LLM returns a response → if the LLM wants to call tools, execute them → feed tool results back to the LLM → repeat until the LLM says "I'm done".

Rendered as a flowchart:

flowchart TD
    A["User prompt"] --> B["Build messages + tools"]
    B --> C["Call LLM API"]
    C -->|end_turn / stop_sequence| D["Return result"]
    C -->|max_tokens| C2["Append 'please continue'"]
    C2 --> C
    C -->|tool_use| E["Extract tool_use blocks"]
    E --> F["Partition into read-only / mutation"]
    F --> G["Read-only tools: concurrent execution"]
    F --> H["Mutation tools: serial execution"]
    G --> I["Micro-compact large results"]
    H --> I
    I --> J["tool_result appended to messages"]
    J --> C
Enter fullscreen mode Exit fullscreen mode

Several key decision points in this loop:

  1. When to stop? Normal exit when the LLM returns end_turn or stop_sequence; forced stop at maxTurns; interrupted when exceeding budget (maxBudgetUsd); or user-initiated cancellation.
  2. How to execute tools? Read-only tools run concurrently (up to 10), mutation tools run serially — avoiding concurrent file writes.
  3. What if context gets too long? Auto-compaction — use an LLM call to summarize history, freeing up space to continue.
  4. What if something goes wrong mid-loop? Built-in retry, fallback models, and error isolation (tool errors don't crash the loop).

Two Entry Points: prompt() and stream()

The SDK provides two ways to trigger the Agent Loop:

Blocking prompt()

let agent = createAgent(options: AgentOptions(
    apiKey: "sk-...",
    model: "claude-sonnet-4-6",
    maxTurns: 10
))

let result = await agent.prompt("Read Package.swift and summarize it.")
print(result.text)
print("Turns: \(result.numTurns), Cost: $\(String(format: "%.4f", result.totalCostUsd))")
Enter fullscreen mode Exit fullscreen mode

prompt() is the "fire and wait" mode. A single call runs through all turns and returns the final QueryResult. Best for scenarios where you don't need to see intermediate steps — background tasks, CLI tools, etc.

Streaming stream()

for await message in agent.stream("Explain this codebase.") {
    switch message {
    case .partialMessage(let data):
        print(data.text, terminator: "")  // Real-time text output
    case .toolUse(let data):
        print("[Using tool: \(data.toolName)]")
    case .toolResult(let data):
        print("[Tool done, \(data.content.count) chars]")
    case .result(let data):
        print("\nDone: \(data.numTurns) turns, $\(String(format: "%.4f", data.totalCostUsd))")
    default:
        break
    }
}
Enter fullscreen mode Exit fullscreen mode

stream() returns AsyncStream<SDKMessage>, continuously pushing events as the LLM processes. The SDK defines 17 message types — from partialMessage (text fragments) to toolUse (tool invocations) to result (final outcome) — covering every stage of the Agent Loop.

Which one to choose depends on your UI requirements: use stream() for real-time display, prompt() when you don't need it.

Inside the Loop: What Happens in a Turn

Regardless of the entry point, the core logic of each turn is identical. Let's trace through the code.

1. Check if Compaction Is Needed

if shouldAutoCompact(messages: messages, model: model, state: compactState) {
    let (newMessages, _, newState) = await compactConversation(
        client: client, model: model,
        messages: messages, state: compactState,
        fileCache: fileCache,
        sessionMemory: sessionMemory
    )
    messages = newMessages
    compactState = newState
}
Enter fullscreen mode Exit fullscreen mode

Before each turn, check whether the estimated token count of the message history is approaching the context window limit. If so, use an LLM call to compress history into a summary, replacing the original messages.

The compaction threshold is model context window - 10,000 tokens (buffer). After 3 consecutive compaction failures, attempts stop to avoid wasting tokens.

2. Send LLM Request (with Retry and Fallback)

response = try await withRetry({
    try await client.sendMessage(
        model: model, messages: messages,
        maxTokens: maxTokens, system: buildSystemPrompt(),
        tools: apiTools, ...
    )
}, retryConfig: retryConfig)
Enter fullscreen mode Exit fullscreen mode

All LLM requests are wrapped with withRetry, handling transient errors (network timeouts, 429 rate limits, etc.) according to the configured retry policy.

If the primary model fails completely, a fallbackModel is configured to retry:

if let fallbackModel = self.options.fallbackModel, fallbackModel != self.model {
    // Retry with fallbackModel...
}
Enter fullscreen mode Exit fullscreen mode

3. Handle stop_reason

The stop_reason in the LLM response determines the loop's direction:

stop_reason Meaning Loop Behavior
end_turn LLM is done speaking Normal loop exit
stop_sequence Hit a stop sequence Normal loop exit
tool_use LLM wants to call tools Execute tools, continue loop
max_tokens Output was truncated Append "please continue", continue loop

The max_tokens case has a guard: at most 3 auto-continuations, preventing infinite loops.

4. Tool Execution: Bucketed Concurrency

When the LLM returns tool_use, the SDK doesn't just queue tools sequentially. Instead, it partitions them into buckets:

// ToolExecutor.partitionTools()
for block in blocks {
    let tool = tools.first { $0.name == block.name }
    if let tool = tool, tool.isReadOnly {
        readOnly.append(item)   // Read-only bucket
    } else {
        mutations.append(item)  // Mutation bucket
    }
}
Enter fullscreen mode Exit fullscreen mode

Read-only tools (Read, Glob, Grep, WebSearch, etc.) can safely run concurrently using TaskGroup, up to 10 at a time:

let batchResults = await withTaskGroup(of: ToolResult.self) { group in
    for item in batchSlice {
        group.addTask {
            await executeSingleTool(block: item.block, tool: item.tool, context: ...)
        }
    }
    // Collect results
}
Enter fullscreen mode Exit fullscreen mode

Mutation tools (Write, Edit, Bash, etc.) must execute serially, one after another, to avoid concurrent write conflicts:

for item in items {
    let result = await executeSingleTool(...)
    results.append(result)
}
Enter fullscreen mode Exit fullscreen mode

Execution order: all read-only tools first (concurrent), then all mutation tools (serial). This significantly improves performance when the LLM returns multiple tool calls in one response — for example, when the LLM requests reading 5 files simultaneously, all 5 reads complete in parallel.

5. Micro-Compaction

After tool execution, results go through micro-compaction before being fed back to the LLM:

for result in toolResults {
    let processedContent = await processToolResult(result.content, isError: result.isError)
    processedResults.append(ToolResult(
        toolUseId: result.toolUseId,
        content: processedContent,
        isError: result.isError
    ))
}
Enter fullscreen mode Exit fullscreen mode

If a tool returns content exceeding 50,000 characters (e.g., reading a large file), the SDK uses an additional LLM call to compress it. Error results are not compacted — full error information is preserved for LLM diagnosis.

Cost Tracking: Accumulated Per Turn

After each LLM call, the SDK updates token usage and cost:

let turnCost = estimateCost(model: model, usage: turnUsage)
totalCostUsd += turnCost
costByModel[model] = CostBreakdownEntry(
    model: model,
    inputTokens: turnUsage.inputTokens,
    outputTokens: turnUsage.outputTokens,
    costUsd: turnCost
)
Enter fullscreen mode Exit fullscreen mode

costByModel records costs grouped by model. This means if you switch models mid-session (via switchModel()), each model's cost is tracked separately. The final result.costBreakdown tells you exactly how much each model cost.

Budget checking happens after each turn:

if let budget = options.maxBudgetUsd, totalCostUsd > budget {
    status = .errorMaxBudgetUsd
    break
}
Enter fullscreen mode Exit fullscreen mode

When the budget is exceeded, the loop exits immediately, but any text already generated is preserved in the result — you get a partial result, not a blank one.

Cancellation: Cooperative Cancellation

Swift's structured concurrency uses Task.isCancelled for cooperative cancellation. The SDK checks this flag at multiple checkpoints in the loop:

  1. While loop entry
  2. Between read-only and mutation tools
  3. Inside the SSE event loop
  4. Before and after tool execution
// Loop entry
if Task.isCancelled || _interrupted {
    status = .cancelled
    break
}

// Between read-only/mutation
if Task.isCancelled { return results }
Enter fullscreen mode Exit fullscreen mode

stream() additionally supports cancellation via the interrupt() method — internally it cancels the Task holding the stream.

After cancellation, the result is a QueryResult(isCancelled: true) with the partial text and token usage as of the cancellation moment.

Error Handling: Don't Crash, Don't Lose Data

The SDK's error handling principle: tool execution errors don't propagate, API errors get retries, final failures preserve partial results.

During tool execution, any error is captured as ToolResult(isError: true):

static func executeSingleTool(...) async -> ToolResult {
    guard let tool = tool else {
        return ToolResult(toolUseId: block.id, content: "Error: Unknown tool", isError: true)
    }
    // ... try executing
    let result = await tool.call(input: block.input, context: context)
    return ToolResult(toolUseId: block.id, content: result.content, isError: result.isError)
}
Enter fullscreen mode Exit fullscreen mode

Tool error results are still fed back to the LLM, which can see the error message and adjust its strategy. The Agent Loop never crashes due to a tool failure.

API-level errors (network issues, 500s, etc.) trigger retries; after retries are exhausted, the fallback model kicks in; if everything fails, an errorDuringExecution status is returned.

Hook Integration: Loop Lifecycle

The Agent Loop fires Hook events at critical points:

Hook Event Trigger Timing
sessionStart Before the loop starts
preToolUse Before each tool execution
postToolUse After successful tool execution
postToolUseFailure After failed tool execution
stop When the loop ends (normal or abnormal)
sessionEnd Before returning the result

A typical use of Hooks is to intercept dangerous operations at preToolUse:

await hookRegistry.register(.preToolUse, definition: HookDefinition(
    matcher: "Bash",
    handler: { input in
        return HookOutput(message: "Bash blocked in production", block: true)
    }
))
Enter fullscreen mode Exit fullscreen mode

Tools intercepted by Hooks are not executed — instead, an error result is returned. The LLM sees "Bash blocked in production" and can find an alternative way to complete the task.

A Third Entry Point: streamInput()

Besides prompt() and stream(), the SDK provides a third entry point — streamInput(), which accepts an AsyncStream<String> as input:

let input = AsyncStream<String> { continuation in
    continuation.yield("What's in this project?")
    continuation.yield("Now explain the test structure.")
    continuation.finish()
}

for await message in agent.streamInput(input) {
    // Handle the response for each input
}
Enter fullscreen mode Exit fullscreen mode

Each input element is treated as a new user message, triggering a complete prompt cycle. This is ideal for chat-style interactions: each user message is an element in the input stream, and the Agent processes them one by one with streaming output.

Summary

The Agent Loop is the heart of the entire SDK. Once you understand how it works, everything else is layered on top:

  • Tool System — The "execute tools" step in the loop
  • MCP Integration — Connecting external tool servers when the loop starts
  • Session Persistence — Saving the messages array after the loop ends
  • Permission Control — Interception points before tool execution
  • Hook System — Lifecycle event callbacks in the loop

The next article dives into the Tool System: how the 34 built-in tools are organized, the design philosophy behind the ToolProtocol, and how to create custom tools with defineTool.


Deep Dive into Open Agent SDK (Swift) Series:

GitHub: terryso/open-agent-sdk-swift

Top comments (0)