DEV Community: NEE

BMAD Loop: Handing Control of the Dev Loop Back to Deterministic Code

NEE — Sat, 04 Jul 2026 07:06:34 +0000

If you read my earlier write-up on Story Automator, you might remember my bottom line:

During the day, running things by hand is still faster. But handing a batch of stories to it before bed and checking the results in the morning — that's the use case where it genuinely shines.

I'd left a question dangling in that piece: why was it slower than doing it by hand? At the time I said I hadn't dug into how it worked yet.

BMAD 6.10 rewrites the whole thing, renames it BMAD Loop, and incidentally answers that question. The answer is a single sentence, and it's the key to understanding the entire design:

The control loop should have no LLM in it.

First, let's clear up the most common misconception

When people first meet BMAD Loop, they assume it's a few new skills: bmad-loop-setup, bmad-loop-sweep, bmad-loop-resolve, bmad-dev-auto.

It isn't. On their own, those skills do nothing.

What actually drives the loop is a Python tool installed from Git via uv — the bmad-loop package (repo: bmad-code-org/bmad-loop). Those skills are just the "primitives" (the official word) the orchestrator calls at different stages of the loop — the most basic, individually-dispatchable units of work:

bmad-dev-auto: develop — turn intent into an artifact that survives review
bmad-loop-sweep: triage — clean up the deferred-work ledger
bmad-loop-resolve: interact — disambiguate things together with a human

In other words, the skills are the muscles; the Python orchestrator is the central nervous system. Once that clicks, every design choice below falls into place.

The core tenet: No LLM in the control loop

The README's subtitle states it bluntly:

A deterministic ralph-loop orchestrator for the BMAD-METHOD implementation phase.

"A deterministic loop orchestrator." Deterministic is the single most important word in this entire article.

It splits the development loop into two very different kinds of work:

Work	Done by	Why
Control logic: which story to pick, how many retries, what counts as done, whether to commit	Pure Python	Needs to be deterministic, debuggable, reproducible, and free
Creative work: writing code, writing tests, doing adversarial review	An LLM (in a one-shot session)	This is what LLMs are good at — and the only thing only they can do

Go back and ask why Story Automator was slow: its control loop was stuffed with "ask the LLM by prompt what to do next" steps. Every single ask costs tokens, waits on inference, can drift, and when it drifts you ask again. Handing scheduling to an LLM is like putting a distractible, per-token-billing rookie in charge of dispatching on the assembly line.

BMAD Loop's move: swap the dispatcher for a piece of Python code that doesn't get distracted and doesn't bill by the word. The LLM only does its creative job at each station and leaves.

This buys four things, and they're the foundation for every mechanism that follows:

Deterministic: run the same sprint twice, get the same scheduling path
Debuggable: the flow is code — when something breaks you set a breakpoint or read a log, instead of guessing which prompt was unclear
Reproducible: every run's decisions are recorded as a state machine on disk
Cheap: control logic costs zero tokens

Four mechanisms that make "hands-off" safe

"No LLM in the control loop" sounds clean, but it raises a sharp question: how does the orchestrator know an LLM session finished — and finished correctly? The old answer was to make the orchestrator itself an LLM and have it "watch" the session's output. That was exactly Story Automator's burden.

BMAD Loop sidesteps that burden with four mechanisms.

Mechanism 1: every step is a one-shot session with fresh context

Dev and review are two independent sessions, and the review session never inherits the dev session's context.

This is counter-intuitive but critical. If the review session carried the memory of the dev session writing the code, it would naturally go soft on it — humans get anchored to code they just wrote (the psychologists call it the anchoring effect), and so do LLMs. Put review in a fresh session that has never seen the dev's code, and it will actually nitpick instead of rubber-stamping.

Analogy: you can't let the person who wrote the code and the person who reviews it be the same brain. Context isolation gives review a pair of eyes that has "never seen this code."

Mechanism 2: communicate via hook event files, never scrape panes

How does the orchestrator know a session ended? By registering hooks on the coding CLI (Claude Code / Codex / Gemini): Stop, SessionStart, SessionEnd, PreCompact. These hooks write structured event files to disk at key moments, and the orchestrator just watches those files.

Each skill, when it finishes in automation mode, writes a machine-readable result.json declaring what it produced and its status.

Old way (Story Automator):        BMAD Loop's way:
┌──────────────┐                  ┌──────────────┐
│ Orchestrator │                  │ Orchestrator │
│   (LLM)      │                  │  (Python)    │
│ reads screen │ ←fragile/costly  │  watches     │ ←stable/free/
│              │  /error-prone    │   files      │  structured
└──────────────┘                  └──────────────┘
       ↑                                 ↑
   scrape pane /                    read Stop-hook events
   read the conversation            read the skill's result.json

Pane-scraping was last era's pain: change the terminal output format or have the model say one extra sentence and the orchestrator is lost. Swap to "hooks write files, orchestrator reads files" and the interface drops from natural language down to structured data — robustness jumps a level.

Mechanism 3: Trust nothing, verify everything

This is the most hard-core part. After every LLM session, the orchestrator does not trust the session's own "I'm done" — it independently verifies on disk:

spec frontmatter status: did the story spec's status field actually flip to done?
baseline-commit match: do the files the session claims to have changed line up with git's actual diff? — a cheap "LLM lie detector"
non-empty diff: did it actually change anything?
sprint-status sync: does the status file agree with real progress?
your tests / lint: right before committing, run your own tests and lint

All checks pass, then commit. Any check fails, retry or escalate.

This philosophy is worth remembering on its own: LLMs hallucinate; git doesn't. Move the "is it really done?" judgment from "ask the LLM" to "look at the disk evidence," and the whole system gets solid.

Mechanism 4: the deferred-work ledger + sweep — finally, someone reads it

Loops always hit work that "can't be done right now" — an edge case waiting on another story, a decision a human has to make. You can't force it and you can't drop it, so it goes into a ledger: deferred-work.md.

What's interesting is this ledger's backstory. In earlier BMAD versions it was a famous half-feature: bmad-code-review wrote deferred items into deferred-work.md, but no skill ever read it back (the community even filed an issue about the bug). Debt written down, never repaid.

BMAD Loop's bmad-loop-sweep finally closes that loop. It does read-only triage: take every open entry in the ledger, verify it against the real codebase (grep for the symptom, check git log, read the relevant files), then partition it into five buckets:

Bucket	Meaning	What the orchestrator does
`already_resolved`	Later work incidentally fixed it, but nobody marked it	Closes it automatically, with evidence (file:line / commit)
`bundles`	Buildable now; group entries sharing a file/subsystem into one dev session	Executes the bundle
`blocked`	Needs a future story/epic to land first	Records the blocker and leaves it
`skip`	Stale, irrelevant, or explicitly excluded by the project	Skips it
`decisions`	A human has to decide (changing a frozen spec, changing an API shape, etc.)	Escalates to a human

"Debt gets repaid" — and repaid with evidence, not by guessing from stale ledger state. This is the mechanism that lets the loop run unattended for a long time without debt snowballing.

The whole loop in one diagram

Stack those four mechanisms together, and a story's full life cycle in BMAD Loop looks like this:

The entire control flow is Python. Only steps ② ③ ④ — the "creative stations" — are LLMs working in one-shot sessions. That's the complete meaning of "deterministic orchestrator."

Multi-model orchestration: three CLIs, mixed per role

BMAD Loop drives three coding CLIs through a generic tmux adapter: claude (default), codex, gemini. And you can mix them per stage — configured in the project's .bmad-loop/policy.toml:

[adapter]
name = "claude"          # default for every stage

[adapter.review]
name = "codex"           # but the review stage runs on codex

Why mix? Because different models are good at different things. A practical combo: have one model write the code and a different model do the adversarial review — two models from different families scrutinizing each other is far more ruthless than one model self-reviewing. This stacks on top of "mechanism 1: review in fresh context" for a double bias kill.

This isn't "calling a model" anymore — it's model orchestration.

When it stops and asks you: the CRITICAL escalation

Unattended does not mean no human involvement. There's one case where the orchestrator proactively pauses the whole run to wait for you — a CRITICAL escalation.

The trigger is usually: a dev or review session finds the frozen spec (the <frozen-after-approval> block) self-contradictory, or silent on a key case, and can't safely continue. Instead of guessing or forcing, it parks the run and waits for:

bmad-loop resolve --story <story-key>

That starts an interactive session. A human is present (you), so it asks you questions, gives 2–4 concrete options with a recommendation. Once you decide, it goes and edits the spec itself — not the code — to remove the ambiguity, and then the orchestrator re-drives the story against a corrected, contradiction-free spec.

The design is restrained, with a few hard rules worth applauding:

the resolve session only edits spec content — it writes no feature code, runs no tests, makes no commits
it does not touch sprint-status.yaml and does not set the spec's status field — those are produced deterministically by the orchestrator on resume
if the information isn't there, or the right fix is out of scope for a spec edit (e.g. it needs a PRD/architecture change), it says plainly "I can't resolve this", writes no completion marker, and the run stays parked — the safe default

In one line: when unsure, it stops and waits for you rather than fabricating an answer and charging ahead. That's the most responsible reading of "unattended."

How to use it: three steps

There's one prerequisite: you need a BMAD v6 project where bmad-sprint-planning has already run and produced a sprint-status.yaml. In other words, the PRD / architecture / epics & stories / sprint planning chain has to be done first — otherwise the loop has no stories to grind.

After install (via the bmad-loop-setup skill, which installs the Python tool from Git and runs bmad-loop init to register hooks, lay down skills, and write policy.toml), the core commands are actually few:

bmad-loop init        # install bmad-loop-* skills + hooks + policy.toml + gitignore
bmad-loop validate    # preflight: config / sprint-status / git / tmux / CLI / hooks
bmad-loop run --dry-run   # print the plan without spawning anything
bmad-loop run         # go
bmad-loop tui         # …or drive everything from the dashboard

The full command list covers run / sweep / resume / resolve / decisions / status / attach / stop / clean, but 90% of daily use is the handful above. The bmad-loop tui dashboard is genuinely nice — run picker, sprint tree, deferred-work ledger, a live per-story task table, color-coded journal tail, all on one screen.

One must-know one-time gotcha: if the coding CLI has never run in the target project (e.g. claude has never been started in that directory), start it once manually first and accept the workspace-trust and hooks-approval dialogs. Sessions spawned by the orchestrator can't click those first-run dialogs for you, and a pending dialog gets misread by the orchestrator as a "session timeout."

Lineage: from Story Automator to BMAD Loop

Drop BMAD Loop onto the timeline and its position is obvious:

Story Automator          bmad-automator / bmad-auto          BMAD Loop
(early 2026, the          (intermediate form,                (6.10, rewritten as a
 version I tested)         tool name bmad-auto)               deterministic Python orchestrator)
      │                        │                                 │
      └──── LLM in control loop ─┴─── rewrite ───► no LLM in control loop ──┘
              (slow / costly / drifty)                (deterministic / debuggable / free)

The README is candid about it: "Inspired by the original bmad-automator (a separate, legacy project)." It explicitly treats the previous generation as legacy and itself as a ground-up rewrite.

And it positions itself as "a deterministic ralph-loop orchestrator." If you follow the autonomous-dev scene you've heard of Ralph — the tool that has Claude Code run its own dev loop. BMAD Loop borrows the "ralph-loop" pattern (unattended, repeatedly-iterating small loops) but implements it deterministically on top of BMAD's story system. So it's "Ralph's spirit + BMAD's skeleton + a Python core."

Who it's for, who it's not for

Keeping the honest tone from my Story Automator review, here's a no-hype take.

Good fits:

You've gone through the full BMAD planning chain (PRD → architecture → epics → sprint planning) and have a string of clear, independently-implementable stories
You're fine with async "run it overnight" delivery — check results in the morning, not by staring at it live
Your project has reliable tests and lint (mechanism 3's last gate leans on them), otherwise verify is theater
You want multi-model cross-review without manually switching CLIs

Not for / be careful:

Stories are still vague or dependencies tangled — running those through the loop mostly generates a pile of CRITICAL escalations, which is more tiring than helpful
Projects with no tests — the orchestrator can be as smart as it likes, but the final verify gate spins empty, which is basically running without a net
Expecting "fast + good + fully automatic" — deterministic orchestration makes it more stable and cheaper, but a single story's absolute speed isn't necessarily faster than a practiced human hand-running it. Its value is in batch, async, resumable — not single-point speedup

The biggest difference from the previous generation, and the reason I'm most optimistic about it: because the control loop is deterministic code, it's debuggable, reproducible, and trustworthy. The "why is it so slow" black box from the Story Automator era is finally opened — the flow is Python, and you can see exactly what each step is doing and why it decided that. That alone is enough to graduate it from "experiment" to "something you can seriously use."

Closing

From v6.8's "locking intent" (have AI nail down what you actually want first) to 6.10's BMAD Loop (deterministic code as the dispatcher, LLMs only writing code), BMAD's direction over these years has been remarkably consistent:

Take back, one job at a time, the work that shouldn't be done by an LLM.

Lock down what should be locked (intent → SPEC). Make deterministic what should be deterministic (scheduling → Python). Hand to a human what a human should decide (park the run and wait). LLMs get progressively narrowed down to the creative work they're actually good at.

This isn't distrust of LLMs — it's respect for them: don't make them do what they're bad at, hallucinate at, and bill you per token for.

If you're using BMAD on a project, I'd genuinely recommend trying BMAD Loop on one sprint for real. Even if only so the deferred-work.md ledger — "the debt no one ever repays" — finally has someone managing it.

References:

BMAD Loop repo: bmad-code-org/bmad-loop
BMAD Method docs: docs.bmad-method.org
Previous generation (my review): BMAD Story Automator
The v6.8 context: BMad v6.8 — the "intent-locking" era

BMAD Story Automator in Practice: Handing 5 Stories to AI for Autonomous Execution

NEE — Sat, 16 May 2026 23:52:52 +0000

If you're already used to writing code with BMAD, the real time sink isn't "writing" — it's "coordination."

An Epic has 5, 10, or 20 Stories. Each one goes through spec creation, implementation, test automation, code review, and retrospective. What's exhausting isn't any single step — it's constantly monitoring the process, switching sessions, handling failures, and deciding what to do next.

Story Automator aims to eliminate this "human orchestration" layer.

Last night I ran through the full /bmad-story-automator workflow. Below is a walkthrough of the experience, with screenshots taken along the way.

It doesn't solve "writing code" — it solves "babysitting the process"

One-sentence summary of Story Automator:

You tell it which Stories to process and what execution strategy to use, and it automatically runs the pipeline: spec creation → implementation → test automation → code review → retrospective — only interrupting you when a genuine human decision is needed.

This is different from running a single Skill command. It's more like a build cycle orchestrator:

Initialization
  → Read Epic / Sprint status
  → Select Story scope
  → Assess complexity
  → Choose Agent strategy
  → Execute create / dev / automate / review / retro
  → Escalate to human on failures or conflicts

By design, it automates "coordination work" rather than replacing any specific development step.

Upgrade BMAD to the latest version (v6.6)

To try Story Automator, you first need to upgrade BMAD to 6.6. During installation, make sure to check the BMad Automator (Experimental) module.

Otherwise, the Automator won't be available after installing BMAD.

First run: installing the Stop Hook to prevent mid-workflow interruptions

The first time you run /bmad-story-automator, it doesn't rush to start executing. Instead, it performs initialization checks.

As shown below, it loads the configuration, then attempts to read the current orchestration state. If the state directory doesn't exist yet, that's expected for a first run. It then automatically installs a Stop Hook into .claude/settings.json.

Step 1: Select Stories — don't blindly run all pending items

Before entering the orchestration phase, Story Automator reads the Epic and sprint status, then lets you decide the scope.

The screenshot below shows a typical scenario: across Epics 5, 6, and 7, some Stories are completed while others are still pending. The tool displays these states directly and asks which Stories you want to process.

Step 2: Complexity isn't guesswork — it scores first, then assigns Agents

This is one of the most interesting parts of Story Automator.

Instead of dumping all Stories to the same Agent, it first generates a Story complexity matrix. The screenshot shows:

Story 5.3 "Server Commands & API Authentication" → 4 pts / Medium
Story 6.1 "Expose Axion via SDK AgentMCPServer" → 2 pts / Low
Story 6.2 "axion mcp Command & External Agent Integration Verification" → 2 pts / Low
Story 7.1 "User Takeover Mechanism Based on SDK Pause Protocol" → 2 pts / Low
Story 7.2 "--fast Mode" → 5 pts / Medium

More importantly, it doesn't just give scores — it shows reasoning:

authorization / permissions
real-time communication
high number of acceptance criteria
large Story text

This means complexity assessment isn't a black box. You see not just the "conclusion" but "why it considers this Story harder." This directly influences the subsequent Agent selection strategy.

In other words, what Story Automator is really doing:

First turning Stories into "schedulable objects," then deciding who executes them.

Step 3: You can inject custom instructions, but it doesn't force you to overthink

Next, it asks whether you have custom instructions.

For example:

Run tests after every modification
Prioritize a specific Story
Watch out for database migrations

I really like this design because it strikes a nice balance between "fully automatic" and "controllable":

If you have no special requirements, just select none
If you have iteration-specific preferences, you can inject them temporarily

In other words, it treats "human experience" as an optional input rather than forcing you to configure a pile of parameters every time.

Step 4: Execution settings determine how aggressively it runs

Next comes execution strategy configuration. The screenshot shows two core questions:

Whether to skip the automate step (test automation)
What's the maximum number of parallel sessions

The defaults are:

Don't skip automate
Maximum 1 parallel session

For most real projects, test automation is the last thing you should skip in the delivery loop; and keeping parallelism at 1 avoids multiple sessions stepping on each other when modifying the same codebase. In other words:

It defaults to "controllable completion" rather than "parallel sprint to the limit."

Step 5: Different complexity levels automatically map to different Agents

With the complexity matrix in hand, Story Automator can recommend Agent configurations.

The recommended configuration:

Low: create / dev / auto / review all use Claude
Medium: create / dev / auto / review all use Codex, with Claude as fallback
High: Also primarily Codex
Retro: Claude for the retrospective phase

From this configuration, it's clear that we're no longer at the "call a model" level — it's doing model orchestration.

It also provides strategy options:

Suggested: Use the recommended complexity-tiered configuration
Uniform: Use the same Agent for all Stories

Different teams can use it differently:

Play it safe, follow recommendations
Keep behavior consistent, use uniform Agents

And from the configuration summary screenshot later, you can see that this demo ended up saving as all-claude. This perfectly illustrates: recommendations are recommendations, not mandates. You can let the system intelligently assign by complexity, or manually unify to the same Agent type for stability or consistency.

This step makes the system more like a "dispatcher" than a simple command wrapper.

Step 6: Configuration is explicitly saved for easy recovery and review

Once you confirm, Story Automator saves the run configuration. The screenshot shows a configuration summary named all-claude:

The summary covers several things:

Which Epic
Story scope
Whether there are custom instructions
Which Agent is used for create / dev / auto / review / retro respectively
Whether test automation is skipped

This kind of "summary page" looks ordinary, but it's the foundation for an orchestrator to be recoverable, auditable, and reviewable.

Because once automation spans multiple Stories, sessions, and phases, these questions inevitably arise:

What if it stops midway?
What configuration did I actually choose this time?
Why did this batch of Stories use that Agent combination?

With explicitly saved configuration, subsequent recovery or post-hoc analysis won't be a guessing game.

Real-world experience takeaways

Before going to bed last night, I handed all 5 Stories to Story Automator. When I checked the results in the morning, it had run for 5.5 hours.

Honestly, it was slower than if I had run them manually. At my usual pace, I could have finished these 5 Stories in about 3 hours. As for why it was so much slower, I haven't carefully analyzed the implementation details yet — but it's still an experimental release, so getting it to work at all is what matters most.

For now, the best use case is: fire it off before bed.

Another thing I think it does well is that after each Epic completes, it automatically runs a retrospective and appends useful information to project-context.md.

So my current take is simple:

Running things manually during the day is still faster.
But handing a batch of Stories to it for an overnight run — that's a scenario where it really shines.

If you're using BMAD to develop projects, you should definitely give Story Automator a try. It might be the tool that reduces your repetitive coordination time from hours to minutes.

Deep Dive into SwiftWork (Part 4): Data Layer and Services — SwiftData, State Restore, Markdown Rendering

NEE — Sun, 03 May 2026 16:12:43 +0000

The previous three posts covered how events flow from the SDK to the UI, how the timeline renders, and how tool cards visualize. This final post looks at SwiftWork's infrastructure — how data is stored, how state is restored, how Markdown is rendered, how code is highlighted, and how API keys are managed.

These components are independent, but all essential to making the app usable.

SwiftData Model Layer

SwiftWork uses SwiftData for persistence, registering four models:

// SwiftWorkApp.swift
.modelContainer(for: [
    Session.self,
    Event.self,
    AppConfiguration.self,
    PermissionRule.self
])

Session

@Model
final class Session {
    @Attribute(.unique) var id: UUID
    var title: String
    var createdAt: Date
    var updatedAt: Date
    var workspacePath: String?
    @Relationship(deleteRule: .cascade, inverse: \Event.session)
    var events: [Event]
}

@Relationship(deleteRule: .cascade) means deleting a Session automatically deletes all its Events. workspacePath is optional — users can assign different working directories to each session.

Event

@Model
final class Event {
    @Attribute(.unique) var id: UUID
    var sessionID: UUID
    var eventType: String
    var rawData: Data        // JSON serialized AgentEvent
    var timestamp: Date
    var order: Int
    var session: Session?
}

This design was covered in the first post — rawData is the entire AgentEvent serialized as a JSON blob. The reason for not splitting it into separate fields is that the metadata structure varies by event type. Splitting into fields would result in many empty columns and frequent schema changes.

AppConfiguration

@Model
final class AppConfiguration {
    @Attribute(.unique) var id: UUID
    var key: String
    var value: Data
    var updatedAt: Date
}

A generic key-value store. It uses SwiftData instead of UserDefaults because SwiftData supports async access, data migration, and iCloud sync (which may be needed in the future). The stored values include:

hasCompletedOnboarding — whether the initial onboarding is complete
selectedModel — the user's chosen model
lastActiveSessionID — the last active session ID
windowFrame — window position and size
inspectorVisible — whether the Inspector panel is visible

AppStateManager: App State Restoration

AppStateManager is responsible for restoring the user's working state after the app restarts — the previously open session, window position, and Inspector panel visibility.

@MainActor
@Observable
final class AppStateManager {
    var lastActiveSessionID: UUID?
    var windowFrame: NSRect?
    var isInspectorVisible: Bool = false

    func loadAppState() {
        lastActiveSessionID = loadUUID(key: "lastActiveSessionID")
        windowFrame = loadNSRect(key: "windowFrame")
        isInspectorVisible = loadBool(key: "inspectorVisible")
    }

    func saveLastActiveSessionID(_ id: UUID?) { ... }
    func saveWindowFrame(_ frame: NSRect) { ... }
    func saveInspectorVisibility(_ visible: Bool) { ... }
}

Under the hood, it uses AppConfiguration's key-value storage:

private func saveString(_ string: String, forKey key: String) {
    let descriptor = FetchDescriptor<AppConfiguration>(
        predicate: #Predicate { $0.key == key }
    )
    if let existing = try? modelContext.fetch(descriptor).first {
        existing.value = Data(string.utf8)
    } else {
        let config = AppConfiguration(key: key, value: Data(string.utf8))
        modelContext.insert(config)
    }
    try? modelContext.save()
}

This is an upsert pattern — first check if it exists, update if found, insert if not. loadNSRect converts the string back to NSRect (using NSRectFromString), and loadBool compares against the string "true".

Save Timing

State saving is not done all at once when the app exits, but distributed across various trigger points:

State	Save trigger
`lastActiveSessionID`	When the user switches sessions (`SessionViewModel.selectSession`)
`windowFrame`	On window move/resize (500ms throttle) + on app exit
`inspectorVisible`	When the Inspector panel is toggled

Window position saving is throttled — didMoveNotification and didResizeNotification fire very frequently, and writing to SwiftData on every event is wasteful. A 500ms Task.sleep acts as a debounce, so only the final move/resize is actually saved:

// ContentView.swift
let saveWindowFrameThrottled: (Notification) -> Void = { _ in
    saveTask?.cancel()
    saveTask = Task { @MainActor in
        try? await Task.sleep(for: .milliseconds(500))
        guard !Task.isCancelled else { return }
        if let window = mainWindow {
            appStateManager.saveWindowFrame(window.frame)
        }
    }
}

Restoration Flow

When the app launches, ContentView.task triggers the restoration:

.task {
    settingsViewModel.configure(modelContext: modelContext)
    hasCompletedOnboarding = settingsViewModel.isAPIKeyConfigured
        && !settingsViewModel.isFirstLaunch

    if hasCompletedOnboarding == true {
        configureAndRestoreState()
    }
}

configureAndRestoreState restores state in order:

Initialize AppStateManager, loading saved state
Initialize SessionViewModel, fetching the session list
Select the session matching lastActiveSessionID
Restore isInspectorVisible
Restore window position (if the window reference has arrived)

Window position restoration has a timing issue — WindowAccessor's callback is asynchronous, so the window reference may arrive after task completes. That's why onChange(of: mainWindow) also handles restoration:

.onChange(of: mainWindow) { _, newWindow in
    if let newWindow {
        restoreWindowFrame(in: newWindow)
    }
}

MarkdownRenderer: Visitor Pattern for Markdown Rendering

Agent responses are in Markdown format — headings, lists, code blocks, bold text, links. SwiftWork uses Apple's swift-markdown library to parse Markdown, then traverses the AST with the Visitor pattern to generate SwiftUI views.

Why Not Use an Existing Markdown Rendering Component

There aren't many Markdown rendering components on macOS. AttributedString(markdown:) only supports basic formatting (bold, links) and doesn't support code blocks, tables, or block quotes. WebView-based solutions (rendering Markdown to HTML via Markdown.js) introduce WebKit dependencies and memory overhead. Writing a custom Visitor gives precise control over how each element renders, without adding extra dependencies.

Visitor Implementation

private struct MarkdownToViewsVisitor: @preconcurrency MarkupVisitor {
    private(set) var views: [AnyView] = []

    mutating func visitHeading(_ heading: Heading) -> Result { ... }
    mutating func visitParagraph(_ paragraph: Paragraph) -> Result { ... }
    mutating func visitCodeBlock(_ codeBlock: CodeBlock) -> Result { ... }
    mutating func visitUnorderedList(_ unorderedList: UnorderedList) -> Result { ... }
    mutating func visitOrderedList(_ orderedList: OrderedList) -> Result { ... }
    mutating func visitBlockQuote(_ blockQuote: BlockQuote) -> Result { ... }
    mutating func visitTable(_ table: Table) -> Result { ... }
    mutating func visitThematicBreak(_ thematicBreak: ThematicBreak) -> Result { ... }
}

Each visit method handles one Markdown node type, appending the generated view to the views array. Finally, MarkdownRenderer.render() returns this array, and MarkdownContentView renders it with ForEach.

Inline Format Handling

Inline formatting within paragraphs and list items (bold, italic, inline code, links) is handled by collectAttributedString. It recursively traverses child nodes to build an AttributedString:

private mutating func collectAttributedString(from markup: any Markup) -> AttributedString {
    var result = AttributedString()
    for child in markup.children {
        if let strong = child as? Strong {
            var s = collectAttributedString(from: strong)
            s.font = .body.bold()
            result.append(s)
        } else if let emphasis = child as? Emphasis {
            var e = collectAttributedString(from: emphasis)
            e.font = .body.italic()
            result.append(e)
        } else if let inlineCode = child as? InlineCode {
            var codeAttr = AttributedString(inlineCode.code)
            codeAttr.backgroundColor = Color.primary.opacity(0.06)
            codeAttr.font = .system(.body, design: .monospaced)
            result.append(codeAttr)
        } else if let link = child as? MarkdownLink {
            var linkAttr = AttributedString(collectInlineText(from: link))
            linkAttr.foregroundColor = Color.accentColor
            linkAttr.underlineStyle = .single
            linkAttr.link = URL(string: link.destination)
            result.append(linkAttr)
        }
        // ... SoftBreak, LineBreak, Strikethrough
    }
    return result
}

AttributedString is a rich text type natively supported by SwiftUI. Passing it to SwiftUI.Text(attributed) causes SwiftUI to render with the specified font, color, and backgroundColor. Inline code gets a gray-background monospaced font, and links get blue underlines.

Type Name Conflicts

swift-markdown and SwiftUI have type name conflicts — both define Text, Link, and other types. The solution uses typealiases:

private typealias MarkdownText = Markdown.Text
private typealias MarkdownLink = Markdown.Link

Inside the visitor, MarkdownText and MarkdownLink reference swift-markdown types, while SwiftUI.Text references SwiftUI types.

CodeHighlighter: Splash Code Highlighting

Code block highlighting uses John Sundell's Splash library. Currently only Swift syntax highlighting is supported; other languages fall back to monospaced plain text:

enum CodeHighlighter {
    static func highlight(code: String, language: String?) -> AnyView {
        let trimmedLanguage = language?.lowercased()
        if trimmedLanguage == "swift" {
            return highlightedSwiftView(code: code)
        } else {
            return plainCodeView(code: code)
        }
    }

    private static func highlightedSwiftView(code: String) -> AnyView {
        let theme = Theme.sundellsColors(withFont: Splash.Font(size: 13))
        let format = AttributedStringOutputFormat(theme: theme)
        let highlighter = SyntaxHighlighter(format: format)
        let attributed = try? AttributedString(highlighter.highlight(code), including: \.appKit)
        return AnyView(Text(attributed ?? AttributedString(code)))
    }
}

Splash's pipeline: source code string -> SyntaxHighlighter -> AttributedStringOutputFormat -> NSAttributedString -> AttributedString -> SwiftUI.Text.

Why only Swift? Because Splash only supports Swift. To support Python/JavaScript/Bash, you'd need a multi-language highlighting library (like a Swift wrapper for Highlight.js) or Tree-sitter. Currently, Swift code blocks are highlighted most frequently (SwiftWork itself is a Swift project), so Swift-only support is sufficient for now.

KeychainManager: Secure API Key Storage

API keys should not be stored in plaintext in SwiftData or UserDefaults. SwiftWork uses the macOS Keychain:

struct KeychainManager: KeychainManaging, Sendable {
    func save(key: String, data: Data) throws {
        let query = [
            kSecClass: kSecClassGenericPassword,
            kSecAttrService: service,
            kSecAttrAccount: key
        ]
        let status = SecItemAdd(query.merging([kSecValueData: data]), nil)
        if status == errSecDuplicateItem {
            SecItemUpdate(query, [kSecValueData: data])
        }
    }

    func load(key: String) throws -> Data? {
        let query = [
            kSecClass: kSecClassGenericPassword,
            kSecAttrService: service,
            kSecAttrAccount: key,
            kSecReturnData: true,
            kSecMatchLimit: kSecMatchLimitOne
        ]
        var result: AnyObject?
        let status = SecItemCopyMatching(query, &result)
        if status == errSecItemNotFound { return nil }
        return result as? Data
    }
}

The KeychainManaging protocol abstracts the underlying implementation, making it easy to mock during testing. Protocol extensions provide convenience methods like saveAPIKey/getAPIKey/deleteAPIKey.

Keychain storage has two advantages: data is encrypted at the system level, and it is not subject to App Sandbox file access restrictions.

TitleGenerator: Auto-Generating Session Titles

Newly created sessions have the title "New Session". After the Agent completes its first execution, TitleGenerator uses an LLM to generate a short title based on the conversation content:

enum TitleGenerator {
    static func generate(events: [AgentEvent], apiKey: String, ...) async -> String? {
        guard !apiKey.isEmpty else { return nil }

        let messages = events
            .filter { $0.type == .userMessage || $0.type == .assistant }
            .suffix(10)  // Only take the last 10 messages
            .map { ["role": ..., "content": String($0.content.prefix(500))] }

        let body = [
            "model": model,
            "max_tokens": 50,
            "system": "Generate a short title based on the following conversation (maximum 20 characters). Output only the title.",
            "messages": messages
        ]
        // Call LLM API, return title text
    }
}

The trigger lives in WorkspaceView.setupTitleGeneration — via the AgentBridge.onResult callback, it fires when the Agent finishes execution and the session title is still "New Session":

agentBridge.onResult = { [weak session] _ in
    guard let session, session.title == "New Session" else { return }
    if let title = await TitleGenerator.generate(events: events, ...) {
        sessionViewModel.updateSessionTitle(session, title: title)
    }
}

This is a lightweight LLM call — only a 50-token output limit, a short system prompt, and it takes the last 10 messages with each truncated to 500 characters. In practice, latency is 1-2 seconds and doesn't affect the user experience.

Summary

SwiftWork's data layer and service components each have their own responsibilities:

Component	Responsibility
SwiftData	Session/Event/AppConfiguration persistence
AppStateManager	App state restoration (session, window, panel)
EventStore	Event persistence protocol, SwiftData implementation
MarkdownRenderer	swift-markdown AST -> SwiftUI views
CodeHighlighter	Splash syntax highlighting (Swift)
KeychainManager	Secure API key storage
TitleGenerator	LLM auto-generated session titles

They form the "support layer" beneath the core pipeline (AgentBridge -> EventMapper -> TimelineView) covered in previous posts. The app would still run without them, but the user experience would be significantly worse — no persistence means starting from scratch on every restart, no Markdown rendering means Agent responses are raw text, and no Keychain management means API keys are stored in plaintext.

Deep Dive into SwiftWork Series:

Part 0: Building a macOS Agent Workbench with SwiftUI
Part 1: SDK Integration — Bridging AsyncStream to SwiftUI
Part 2: Event Timeline — Visualizing 18 Event Types
Part 3: Tool Card — An Extensible Tool Visualization System
Part 4: Data Layer and Services — SwiftData, State Restore, Markdown Rendering

GitHub: SwiftWork | Open Agent SDK

Deep Dive into SwiftWork (Part 3): Tool Card — An Extensible Tool Visualization System

NEE — Sun, 03 May 2026 16:12:38 +0000

The previous two posts covered how events flow from the SDK to the UI. This post focuses on visualizing one specific type of event: tool calls.

Tool invocations are the most frequent operations in an Agent application. A typical task might call tools twenty or thirty times—reading files, writing files, executing commands, searching code. If every tool call renders as the same gray block, it's hard for users to quickly tell "what command is Bash running" or "which file is Edit changing."

SwiftWork's solution is an extensible tool rendering system: each tool type registers a renderer, and ToolCardView looks up the matching renderer by tool name. When adding a new tool type, you only need to write a struct that implements the ToolRenderable protocol and register it with ToolRendererRegistry—no changes to TimelineView required.

Starting from the Problem: Why Not Use a Unified Tool View

The simplest approach is to use the same view for all tool calls—showing the tool name, input parameters, and output. The ToolCallView from post 2 serves this role:

struct ToolCallView: View {
    let event: AgentEvent
    var body: some View {
        VStack(alignment: .leading, spacing: 4) {
            HStack(spacing: 4) {
                Image(systemName: "wrench.and.screwdriver")
                Text(event.content)  // 工具名称
            }
            Text(input)  // 原始 JSON
        }
    }
}

This view treats all tools equally—the same wrench icon, the same JSON output. It works fine as a fallback, but has a few problems:

For Bash calls, users want to see the command itself (git status), not {"command": "git status"}
For Read calls, users want to see the file path (src/main.swift), not the full JSON
Search tool results might contain multi-line matches, which need to be distinguished from single-line output

Each tool has different "most useful information." The Tool Card system lets each tool decide how to present itself.

The ToolRenderable Protocol

The protocol defines the contract for tool renderers:

protocol ToolRenderable: Sendable {
    /// 此渲染器处理的工具名称（与 SDK ToolUseData.toolName 匹配）
    static var toolName: String { get }

    /// 工具类型主题色（左边条、图标着色）
    static var accentColor: Color { get }

    /// 工具类型 SF Symbol 图标名
    static var icon: String { get }

    /// 根据工具内容生成 SwiftUI 视图
    @ViewBuilder @MainActor
    func body(content: ToolContent) -> any View

    /// 生成摘要标题（折叠状态显示）
    func summaryTitle(content: ToolContent) -> String

    /// 生成副标题（如文件路径、命令摘要）
    func subtitle(content: ToolContent) -> String?
}

The protocol extension provides default values:

extension ToolRenderable {
    static var accentColor: Color { .gray }
    static var icon: String { "wrench.and.screwdriver" }

    func summaryTitle(content: ToolContent) -> String {
        content.toolName
    }

    func subtitle(content: ToolContent) -> String? {
        nil
    }
}

Six members, three with defaults. Implementers only need to provide toolName (the static routing key) and body (the rendered content). summaryTitle and subtitle can be overridden to provide more meaningful summaries, and accentColor and icon can be overridden for visual distinction.

ToolRendererRegistry

The registry is a [String: ToolRenderable] dictionary, keyed by toolName:

@MainActor
@Observable
final class ToolRendererRegistry {
    private var renderers: [String: any ToolRenderable] = [:]

    init() {
        register(BashToolRenderer())
        register(FileEditToolRenderer())
        register(SearchToolRenderer())
        register(ReadToolRenderer())
        register(WriteToolRenderer())
    }

    func register(_ renderer: any ToolRenderable) {
        renderers[type(of: renderer).toolName] = renderer
    }

    func renderer(for toolName: String) -> (any ToolRenderable)? {
        renderers[toolName]
    }
}

Five built-in renderers are pre-registered during init. Lookup is O(1) dictionary access. The @Observable annotation lets SwiftUI automatically refresh when new renderers are registered—though in current usage all renderers are registered at init time, dynamic registration is reserved for a future plugin system.

5 Built-in Renderers

BashToolRenderer — Terminal Commands

struct BashToolRenderer: ToolRenderable {
    static let toolName = "Bash"
    static let accentColor: Color = .green
    static let icon: String = "terminal"

    func summaryTitle(content: ToolContent) -> String {
        // 从 input JSON 提取 command 字段
        // {"command": "git status"} → "git status"
        guard let json = parseInput(content),
              let command = json["command"] as? String
        else { return content.toolName }
        return command
    }
}

Green theme + terminal icon. summaryTitle extracts the command field from the input JSON—users see the running command directly in the collapsed state.

ReadToolRenderer — File Reading

struct ReadToolRenderer: ToolRenderable {
    static let toolName = "Read"
    static let accentColor: Color = .blue
    static let icon: String = "doc.text"

    func summaryTitle(content: ToolContent) -> String {
        // {"file_path": "src/main.swift"} → "src/main.swift"
        guard let json = parseInput(content),
              let filePath = json["file_path"] as? String
        else { return content.toolName }
        return filePath
    }
}

Blue theme + document icon. summaryTitle extracts the file path.

WriteToolRenderer — File Writing

struct WriteToolRenderer: ToolRenderable {
    static let toolName = "Write"
    static let accentColor: Color = .orange
    static let icon: String = "pencil.and.outline"

    func summaryTitle(content: ToolContent) -> String {
        // 提取 file_path
    }

    func subtitle(content: ToolContent) -> String? {
        // 提取 content 字段，截取前 80 字符
        // {"content": "import Foundation\n..."} → "import Foundation..."
        guard let json = parseInput(content),
              let contentStr = json["content"] as? String, !contentStr.isEmpty
        else { return nil }
        return "\(contentStr.prefix(80))..."
    }
}

Orange theme + pencil icon. It has one more field than Read—a subtitle showing the first 80 characters of the written content. Since written content is typically long, the subtitle gives users a quick preview.

FileEditToolRenderer — File Editing

struct FileEditToolRenderer: ToolRenderable {
    static let toolName = "Edit"
    static let accentColor: Color = .orange
    static let icon: String = "pencil.line"

    func summaryTitle(content: ToolContent) -> String {
        // 提取 file_path
    }

    func subtitle(content: ToolContent) -> String? {
        // 提取 old_string，截取前 50 字符
        // {"old_string": "func hello() {"} → "Editing: func hello() {"
        guard let json = parseInput(content),
              let oldString = json["old_string"] as? String, !oldString.isEmpty
        else { return nil }
        return "Editing: \(oldString.prefix(50))"
    }
}

Orange theme + edit icon. The subtitle displays the old text being replaced—so users know which line Edit is changing.

SearchToolRenderer — Code Search

struct SearchToolRenderer: ToolRenderable {
    static let toolName = "Grep"
    static let accentColor: Color = .purple
    static let icon: String = "text.magnifyingglass"

    func summaryTitle(content: ToolContent) -> String {
        // 提取 pattern
    }

    func subtitle(content: ToolContent) -> String? {
        // 提取 path
    }
}

Purple theme + magnifying glass icon. summaryTitle shows the search pattern, and subtitle shows the search path.

Visual Distinction Summary

Tool	Color	Icon	summaryTitle	subtitle
Bash	Green	terminal	Command	-
Read	Blue	doc.text	File path	-
Write	Orange	pencil.and.outline	File path	First 80 chars of content
Edit	Orange	pencil.line	File path	First 50 chars of replaced text
Grep	Purple	text.magnifyingglass	Search pattern	Search path

All five tool types can be distinguished at a glance in the collapsed state: different colors, different icons, different summary text.

ToolCardView: The Container View

ToolCardView is the container for tool cards. It doesn't handle rendering itself—it delegates to the renderer looked up from the registry:

struct ToolCardView: View {
    let content: ToolContent
    let registry: ToolRendererRegistry
    let isSelected: Bool
    let onSelect: () -> Void

    @State private var isExpanded = false

    var body: some View {
        HStack(spacing: 0) {
            // 左边条（3px，渲染器的主题色）
            RoundedRectangle(cornerRadius: 2)
                .fill(toolAccentColor)
                .frame(width: 3)

            VStack(alignment: .leading, spacing: 0) {
                titleRow       // 始终可见
                    .onTapGesture {
                        onSelect()
                        withAnimation { isExpanded.toggle() }
                    }

                if isExpanded {
                    expandedContent  // 展开后可见
                }
            }
        }
    }
}

The card has two layers: titleRow (always visible) and expandedContent (shown on click to expand).

titleRow

private var titleRow: some View {
    HStack(alignment: .top, spacing: 6) {
        Image(systemName: toolIcon)          // 渲染器的图标
            .foregroundStyle(toolIconColor)

        VStack(alignment: .leading, spacing: 2) {
            HStack(spacing: 4) {
                Text(resolvedSummaryTitle)    // 渲染器的 summaryTitle
                    .fontWeight(.medium)
                Spacer()
                if content.status == .running {
                    ProgressView().controlSize(.mini)  // 运行中转圈
                }
                Text(statusLabel)             // pending / running / completed / failed
                    .font(.system(size: 9))
                    .background(statusColor.opacity(0.15))
            }
            Text(content.toolName)            // 工具名称（小字）
            if let subtitle = resolvedSubtitle {  // 渲染器的 subtitle
                Text(subtitle)
            }
        }
    }
}

The title row gets its icon, color, summary title, and subtitle from the renderer. The status label (pending/running/completed/failed) is determined by ToolContent.status and is outside the renderer's control—it's a generic execution status, independent of tool type.

expandedContent

private var expandedContent: some View {
    VStack(alignment: .leading, spacing: 8) {
        Divider()

        // 工具特定的 body（从渲染器获取）
        if let renderer = registry.renderer(for: content.toolName) {
            AnyView(renderer.body(content: content))
        } else {
            genericToolBody  // fallback
        }

        // 通用 INPUT 区域
        if !content.input.isEmpty {
            HStack {
                Text("INPUT")
                Spacer()
                CopyButton(text: content.input)
            }
            Text(content.input)
                .font(.system(.caption, design: .monospaced))
        }

        // 通用 OUTPUT 区域
        if let output = content.output, !output.isEmpty {
            ToolResultContentView(output: output, isError: content.isError)
        }
    }
}

The expanded content has three sections:

The renderer's body: Tool-specific custom content. The current 5 built-in renderers each display an icon summary block in their body—similar to the information in titleRow but more detailed. In the future, more complex tools could provide richer body content (such as code diff previews).
INPUT section: Generic raw input JSON display with a copy button.
OUTPUT section: ToolResultContentView, covered in the next section.

genericToolBody is the fallback when no renderer is registered—it just shows the tool name and raw input.

ToolResultContentView: Output Rendering + Diff Detection

ToolResultContentView has a smart feature: it automatically detects whether the output content is in diff format, and if so, applies color highlighting.

private var isDiffContent: Bool {
    let lines = output.components(separatedBy: "\n")
    let diffLines = lines.filter { $0.hasPrefix("+") || $0.hasPrefix("-") || $0.hasPrefix("@@") }
    return diffLines.count >= 2
}

Detection logic: if the output has at least two lines starting with +, -, or @@, it's treated as diff content. Simple but effective—the SDK's Edit tool outputs results in diff format.

Diff rendering adds background colors to each line:

private func diffLineView(_ line: String) -> some View {
    Text(line)
        .font(.system(.caption, design: .monospaced))
        .padding(.horizontal, 4)
        .background(diffLineBackground(line))
}

private func diffLineBackground(_ line: String) -> Color {
    if line.hasPrefix("+") { return .green.opacity(0.15) }  // 新增行
    if line.hasPrefix("-") { return .red.opacity(0.15) }    // 删除行
    if line.hasPrefix("@@") { return .blue.opacity(0.1) }   // 位置标记
    return .clear
}

Non-diff content renders as plain text with truncation logic—it collapses when exceeding 5 lines or 200 characters, with an expand button.

How to Add a New Tool Renderer

Say the SDK adds a new WebFetch tool, and you want to give it a dedicated card style in SwiftWork. You only need two steps:

Step 1: Write the renderer

struct WebFetchToolRenderer: ToolRenderable {
    static let toolName = "WebFetch"
    static let accentColor: Color = .cyan
    static let icon: String = "globe"

    @MainActor
    func body(content: ToolContent) -> any View {
        // 自定义视图...
    }

    func summaryTitle(content: ToolContent) -> String {
        // 从 input 提取 URL
        guard let json = parseInput(content),
              let url = json["url"] as? String
        else { return content.toolName }
        return url
    }
}

Step 2: Register it

// ToolRendererRegistry.init()
register(WebFetchToolRenderer())

No need to modify TimelineView, ToolCardView, or any other file. ToolCardView looks up the renderer at render time via registry.renderer(for:)—if found, it uses it; if not, it falls back to the generic view.

Summary

The Tool Card system follows a protocol + registry design pattern:

Component	Responsibility
`ToolRenderable`	Defines the rendering contract—tool name, color, icon, summary, custom view
`ToolRendererRegistry`	Dictionary lookup, `toolName → ToolRenderable`
`ToolCardView`	Container view, delegates to renderer, handles generic logic (expand/collapse, status labels, INPUT/OUTPUT sections)
`ToolResultContentView`	Output rendering with automatic diff detection

The benefit of this pattern is open for extension, closed for modification. TimelineView's dispatch logic (the toolCardView(for:) from post 2) doesn't need to know how many tool types exist—it only queries the registry. When adding a new tool type, the scope of changes is limited to the renderer file and the registry's init method.

The next and final post covers the data layer—SwiftData session/event persistence, app state restoration, Markdown rendering, and syntax highlighting.

Deep Dive into SwiftWork Series:

Part 0: Building a macOS Agent Workbench with SwiftUI
Part 1: SDK Integration — Bridging AsyncStream to SwiftUI
Part 2: Event Timeline — Visualizing 18 Event Types
Part 3: Tool Card — An Extensible Tool Visualization System
Part 4: Data Layer and Services — SwiftData, State Restore, Markdown Rendering

GitHub: SwiftWork | Open Agent SDK

Deep Dive into SwiftWork (Part 2): Event Timeline — Visualizing 18 Event Types

NEE — Sun, 03 May 2026 16:12:33 +0000

Post 1 covered how AgentBridge converts the SDK's AsyncStream<SDKMessage> into [AgentEvent]. This post looks at what [AgentEvent] becomes — how TimelineView renders 18 event types, handles scroll behavior, and stays smooth when the event count gets large.

TimelineView Structure

TimelineView is the main body of the workspace, filling all the space between the sidebar and the input box. Its view hierarchy is shallow:

TimelineView
  ├── ScrollView
  │   ├── topPlaceholder (virtualization spacer)
  │   ├── LazyVStack
  │   │   └── ForEach(virtualizedEvents) → eventView(for:)
  │   ├── bottomPlaceholder (virtualization spacer)
  │   ├── StreamingTextView (streaming text)
  │   └── bottom-anchor (scroll anchor)
  └── returnToBottomButton (return to bottom)

When there are no events, an empty state is shown: "Send a message to start a conversation with the Agent." When events exist, it enters a ScrollViewReader + LazyVStack structure.

Event Dispatch: 18 Types to 8 Views

eventView(for:) is the core of event dispatch. 18 AgentEventType values map to 8 views:

@ViewBuilder
private func eventView(for event: AgentEvent) -> some View {
    switch event.type {
    case .userMessage:       UserMessageView(event: event)
    case .partialMessage:    EmptyView()
    case .assistant:         AssistantMessageView(event: event)
    case .toolUse:           toolCardView(for: event)
    case .toolResult,
         .toolProgress:      pairedToolEventView(for: event)
    case .result:            ResultView(event: event)
    case .system:            systemOrThinking(event: event)
    case .hookStarted, .hookProgress, .hookResponse,
         .taskStarted, .taskProgress, .authStatus,
         .filesPersisted, .localCommandOutput,
         .promptSuggestion, .toolUseSummary:
                             SystemEventView(event: event)
    case .unknown:           UnknownEventView(event: event)
    }
}

A few dispatch logic decisions worth noting:

partialMessage renders as EmptyView. Streaming text does not go through ForEach(events) — it is rendered separately by StreamingTextView below the LazyVStack. The reason was covered in Post 1: partialMessage only accumulates in streamingText and never enters the events array. This avoids the flickering and performance overhead caused by frequent insertions and deletions in ForEach.

toolUse goes through toolCardView, while toolResult/toolProgress go through pairedToolEventView. If the toolContentMap has a matching entry (meaning a paired toolUse has already been received), toolUse renders as ToolCardView, and the paired toolResult/toolProgress renders as EmptyView — because their content is already merged into the card. If there is no match in toolContentMap (e.g., incomplete historical event loading), it falls back to simple ToolCallView/ToolResultView.

The system type needs to distinguish between "thinking" and ordinary system events. The systemOrThinking method checks the subtype in metadata:

private func systemOrThinking(event: AgentEvent) -> some View {
    let subtype = event.metadata["subtype"] as? String ?? ""
    let isLastEvent = agentBridge.events.last?.id == event.id
    if (subtype == "init" || subtype == "status") && isLastEvent {
        ThinkingView()              // spinning gear + "Thinking..."
    } else if subtype == "init" || subtype == "status" {
        ThinkingView(isActive: false) // checkmark + "Agent responded"
    } else if let isError = event.metadata["isError"] as? Bool, isError {
        SystemEventView(event: event, isError: true)  // red error bar
    } else {
        SystemEventView(event: event)  // normal system message
    }
}

Only the last init/status event shows the spinning animation. Historical events display a static "Agent responded" state. This prevents all historical thinking states from spinning endlessly.

Design of Each Event View

UserMessageView — Right-Aligned Blue Bubble

struct UserMessageView: View {
    let event: AgentEvent
    var body: some View {
        HStack {
            Spacer()
            Text(event.content)
                .padding(.horizontal, 12)
                .padding(.vertical, 8)
                .background(.blue.opacity(0.15))
                .clipShape(RoundedRectangle(cornerRadius: 12))
        }
    }
}

User messages are right-aligned with a semi-transparent blue background and rounded rectangle. This matches the ChatGPT message layout.

AssistantMessageView — Left Vertical Line + Markdown

struct AssistantMessageView: View {
    let event: AgentEvent
    var body: some View {
        HStack(alignment: .top, spacing: 0) {
            RoundedRectangle(cornerRadius: 1)
                .fill(Color.secondary.opacity(0.3))
                .frame(width: 2)
                .padding(.trailing, 8)
            MarkdownContentView(markdown: event.content)
            Spacer()
        }
    }
}

A gray vertical line on the left serves as a visual separator, and the content is rendered using MarkdownContentView. This component handles Markdown parsing, code highlighting, and long text folding — Post 4 will cover it in detail.

ThinkingView — Spinning Gear Animation

struct ThinkingView: View {
    var isActive: Bool = true
    @State private var isAnimating = false

    var body: some View {
        HStack(spacing: 8) {
            if isActive {
                Image(systemName: "gearshape")
                    .rotationEffect(.degrees(isAnimating ? 360 : 0))
                    .animation(.linear(duration: 1).repeatForever(autoreverses: false),
                               value: isAnimating)
                Text("Thinking...")
            } else {
                Image(systemName: "checkmark.circle")
                Text("Agent responded")
            }
            Spacer()
        }
        .onAppear { if isActive { isAnimating = true } }
    }
}

isActive controls two states: a spinning gear indicates active thinking, and a green checkmark indicates thinking is complete. onAppear triggers the animation, and it does not re-trigger when the view scrolls off-screen and back.

ResultView — Execution Result + Statistics

struct ResultView: View {
    let event: AgentEvent
    // Extract durationMs, totalCostUsd, numTurns from metadata
    var body: some View {
        HStack(spacing: 4) {
            Image(systemName: statusIcon)  // checkmark.circle / pause.circle / xmark.circle
                .foregroundStyle(statusColor)
            Text(subtype)  // success / cancelled / error
        }
        // Below: duration | turns | cost
        HStack(spacing: 12) {
            Label("\(duration)ms", systemImage: "clock")
            Label("\(turns) turns", systemImage: "arrow.triangle.2.circlepath")
            Label(String(format: "$%.4f", cost), systemImage: "dollarsign.circle")
        }
    }
}

The Result event displays a summary of execution statistics — duration in milliseconds, number of conversation turns, and cost in US dollars. Errors are highlighted with a red background.

SystemEventView — System Messages and Error Alerts

struct SystemEventView: View {
    let event: AgentEvent
    let isError: Bool

    var body: some View {
        HStack(spacing: 4) {
            if isError {
                RoundedRectangle(cornerRadius: 1).fill(Color.red).frame(width: 3)
                Image(systemName: "exclamationmark.triangle.fill").foregroundStyle(.red)
            } else {
                Image(systemName: "info.circle").foregroundStyle(.secondary)
            }
            Text(event.content)
        }
        .background(isError ? Color.red.opacity(0.08) : Color.clear)
    }
}

Normal system messages appear as a single line of gray text with an info icon. Error messages add a red left bar, red background, and warning icon.

Scroll Behavior: Follow Latest vs Manual Browse

An Agent continuously produces events during execution. Users typically want to see the latest events (auto-scroll to bottom), but sometimes want to scroll up and review history. These two needs conflict.

SwiftWork uses ScrollModeManager to manage switching between two modes:

enum ScrollMode {
    case followLatest    // Auto-follow the latest event
    case manualBrowse    // User manually browses history
}

@MainActor
@Observable
final class ScrollModeManager {
    var scrollMode: ScrollMode = .followLatest

    var showReturnToBottomButton: Bool {
        scrollMode == .manualBrowse
    }

    private let nearBottomThreshold: CGFloat = 96
    private let scrollUpThreshold: CGFloat = 16
    private var cumulativeUpwardDelta: CGFloat = 0
}

Auto-follow condition: When the user is within 96pt of the bottom, the mode automatically switches back to followLatest. Each time a new event arrives, TimelineView auto-scrolls to the bottom.

Switch to manual browse condition: When the user scrolls up more than 16pt, the mode switches to manualBrowse. At this point, new events no longer trigger auto-scrolling, and a "return to bottom" button appears in the lower-right corner.

// TimelineView.swift
.onChange(of: agentBridge.events.count) { _, newCount in
    updateVisibleRangeForCount(newCount)
    if scrollModeManager.scrollMode == .followLatest {
        scrollToLast(proxy: proxy)
    }
}
.onChange(of: agentBridge.streamingText) { _, _ in
    if scrollModeManager.scrollMode == .followLatest {
        scrollToLast(proxy: proxy)
    }
}

Two onChange handlers listen for event count changes and streaming text changes. Auto-scrolling only happens in followLatest mode.

Return to bottom button: When tapped, it switches back to followLatest, updates visibleRange to the latest 50 events, and animates the scroll to the bottom:

Button {
    scrollModeManager.returnToBottom()
    let total = agentBridge.events.count
    let lower = max(0, total - 50)
    visibleRange = lower..<total
    withAnimation {
        proxy.scrollTo("bottom-anchor", anchor: .bottom)
    }
}

Virtualization: Render Only the Visible Range

When the event count exceeds a few hundred, rendering everything causes LazyVStack to create a large number of views, leading to dropped frames during scrolling. SwiftWork uses visibleRange + renderBuffer for virtualization — only rendering events within approximately ±20 of the visible area.

@MainActor
final class TimelineVirtualizationManager {
    let renderBuffer = 20

    func eventsToRender(visibleRange: Range<Int>, allEvents: [AgentEvent]) -> [AgentEvent] {
        guard !allEvents.isEmpty else { return [] }
        let lower = max(0, visibleRange.lowerBound - renderBuffer)
        let upper = min(allEvents.count, visibleRange.upperBound + renderBuffer)
        guard lower < upper else { return [] }
        return Array(allEvents[lower..<upper])
    }
}

What gets passed to ForEach is not agentBridge.events, but virtualizedEvents — a subset trimmed by virtualization:

private var virtualizedEvents: [AgentEvent] {
    let allEvents = agentBridge.events
    if allEvents.isEmpty { return [] }
    if visibleRange.isEmpty {
        let upper = allEvents.count
        let lower = max(0, upper - 50)
        return virtualizationManager.eventsToRender(visibleRange: lower..<upper, allEvents: allEvents)
    }
    return virtualizationManager.eventsToRender(visibleRange: visibleRange, allEvents: allEvents)
}

Clipped regions are replaced with spacers to maintain accurate scrollbar positioning:

private var topPlaceholder: some View {
    let upper = max(0, visibleRange.lowerBound - virtualizationManager.renderBuffer)
    return Group {
        if upper > 0 && !visibleRange.isEmpty {
            Spacer().frame(height: CGFloat(upper) * estimatedRowHeight)
        }
    }
}

estimatedRowHeight is set to 80pt — an empirical value around which most event views fall. It does not need to be exact; it just needs to keep the scrollbar position roughly correct.

When visibleRange Gets Updated

visibleRange is updated at several key moments:

Initial load (.task(id: agentBridge.events.first?.id)): set to the last 50 events
New event arrives (.onChange(of: events.count)): if in followLatest mode, the sliding window keeps the latest 50 events
Return to bottom: reset to the latest 50 events

Currently, dynamic visibleRange updates during scrolling are not implemented — when the user scrolls up to browse a large number of historical events, visibleRange does not follow the scroll position. This is a known limitation that could be addressed in the future using onAppear/onDisappear callbacks or ScrollView offset monitoring.

Initial Scroll: Fixing First-Load Flash

When the event list first loads, SwiftUI's ScrollView starts rendering from the top by default. If a session has hundreds of events, the user first sees the top events, then a flash as it jumps to the bottom. This flash appears every time the user switches sessions.

SwiftWork's solution: delay scrolling to the bottom by 150ms, waiting for LazyVStack to complete its first-screen render:

.task(id: agentBridge.events.first?.id) {
    hasCompletedInitialScroll = false
    guard !agentBridge.events.isEmpty else { return }
    scrollModeManager.scrollMode = .followLatest
    visibleRange = 0..<0
    try? await Task.sleep(for: .milliseconds(150))
    guard !Task.isCancelled else { return }
    let total = agentBridge.events.count
    let lower = max(0, total - 50)
    visibleRange = lower..<total
    withAnimation {
        proxy.scrollTo("bottom-anchor", anchor: .bottom)
    }
    hasCompletedInitialScroll = true
}

The hasCompletedInitialScroll flag controls subsequent scroll mode switching — before the initial scroll completes, onChange(of: scrollPositionId) does not trigger mode switching, avoiding interference.

Summary

TimelineView's design can be summarized as three subsystems:

Subsystem	Problem Solved	Implementation
Event dispatch	18 types to 8 views	`eventView(for:)` + ViewBuilder
Scroll control	Auto-follow vs manual browse	`ScrollModeManager` + `scrollPosition`
Virtualization	Render performance with many events	`visibleRange` + `renderBuffer` + placeholders

Event dispatch is pure view logic — selecting the corresponding view component based on event.type. Scroll control and virtualization are performance concerns unique to TimelineView, unrelated to the SDK integration layer.

The next post covers the Tool Card system — how the ToolRenderable protocol gives each tool its own renderer, and how ToolRendererRegistry enables adding new tool types without modifying the timeline code.

Deep Dive into SwiftWork Series:

Part 0: Building a macOS Agent Workbench with SwiftUI
Part 1: SDK Integration — Bridging AsyncStream to SwiftUI
Part 2: Event Timeline — Visualizing 18 Event Types
Part 3: Tool Card — An Extensible Tool Visualization System
Part 4: Data Layer and Services — SwiftData, State Restore, Markdown Rendering

GitHub: SwiftWork | Open Agent SDK

Deep Dive into SwiftWork (Part 1): SDK Integration — Bridging AsyncStream to SwiftUI

NEE — Sun, 03 May 2026 16:12:28 +0000

Post 0 painted the full picture: AsyncStream<SDKMessage> → AgentBridge → EventMapper → SwiftUI. This post breaks open the two middle layers: AgentBridge and EventMapper, to see how they transform the SDK's message stream into an event list that SwiftUI can consume directly.

Let's start with the conclusion: AgentBridge is the single most complex file in the entire app. It does five things at once: consume a Stream, map events, pair tool content, persist data, and manage memory. None of these are difficult on their own, but stacking all five requires handling quite a bit of state. This article walks through each one.

From SDK to AgentBridge: Where's the Interface

Recall the core interface the SDK provides (covered in Post 1):

// SDK 的 Agent.stream() 返回 AsyncStream<SDKMessage>
let agent = createAgent(options: ...)
for await message in agent.stream("hello") {
    switch message {
    case .assistant(let data): ...
    case .toolUse(let data): ...
    case .toolResult(let data): ...
    // 18 种类型
    }
}

The SDK gives you an AsyncStream<SDKMessage> — an asynchronous event stream. SwiftUI needs a [AgentEvent] — an array that can be rendered on the main thread. AgentBridge is the bridge between the two.

Its core state is just a few properties:

@MainActor
@Observable
final class AgentBridge {
    var events: [AgentEvent] = []         // SwiftUI 消费的事件数组
    var isRunning = false                  // Agent 是否在执行
    var streamingText: String = ""         // 流式文本的累积缓冲区
    var toolContentMap: [String: ToolContent] = [:]  // 工具内容配对
    var errorMessage: String?              // 错误信息

    @ObservationIgnored private var agent: Agent?
    @ObservationIgnored private var currentTask: Task<Void, Never>?
    // ...
}

@MainActor ensures all state is accessed on the main thread. @Observable lets SwiftUI automatically track changes. @ObservationIgnored marks agent and currentTask as implementation details that shouldn't trigger UI updates — they're not UI state.

sendMessage: The Complete Lifecycle of a Message

The user types in the input bar and presses Enter. InputBarView calls agentBridge.sendMessage(text). Here's what happens next:

func sendMessage(_ text: String) {
    guard let agent, !text.isEmpty else { return }

    if isRunning { cancelExecution() }  // 如果正在跑，先停掉

    // 1. 用户消息立即追加到事件列表
    let userEvent = AgentEvent(type: .userMessage, content: text, timestamp: .now)
    appendAndPersist(userEvent)

    errorMessage = nil
    isRunning = true

    // 2. 递增 generation 计数器（用于检测过期的 cancel）
    activeTaskGeneration &+= 1
    let myGeneration = activeTaskGeneration

    // 3. 在后台 Task 中消费 stream
    currentTask = Task { [weak self] in
        guard let self else { return }
        var receivedResult = false
        let stream = agent.stream(text)
        for await message in stream {
            guard !Task.isCancelled else { break }
            if case .userMessage = message { continue }

            let event = EventMapper.map(message)

            // 流式文本走单独的缓冲区，不进 events 数组
            if event.type == .partialMessage {
                self.streamingText += event.content
                continue
            }
            if event.type == .assistant {
                self.streamingText = ""
            }
            if event.type == .result {
                receivedResult = true
                self.onResult?(event.content)
            }
            self.appendAndPersist(event)
        }
        // 流结束但没收到 result → 异常终止
        if !Task.isCancelled && !receivedResult {
            self.appendAndPersist(AgentEvent(
                type: .system,
                content: "Agent 流异常结束，未收到完整响应。",
                metadata: ["isError": true],
                timestamp: .now
            ))
        }
        self.finalizeToolContentMap()
        if self.activeTaskGeneration == myGeneration {
            self.currentTask = nil
        }
        self.isRunning = false
    }
}

Several design decisions worth noting:

The user message doesn't wait for the Stream. The user message is appended directly to events without waiting for the SDK's AsyncStream to return .userMessage. This lets the UI display user input immediately, with no network round-trip. The .userMessage received from the stream is skipped with continue.

Streaming text has a separate buffer. partialMessage events don't go into the events array; instead, they accumulate in streamingText. When a complete .assistant event arrives, streamingText is cleared. This way, SwiftUI's TimelineView can use a separate StreamingTextView to render in-progress text, while ForEach(events) doesn't need to constantly insert and delete items.

The generation counter prevents cancel race conditions. activeTaskGeneration is a monotonically incrementing counter. Each sendMessage call increments it and records its own generation. When the stream ends, it checks if self.activeTaskGeneration == myGeneration — only clearing currentTask when the current generation matches. This prevents cancel races when the user rapidly sends messages in succession — a previous stream's cleanup won't wipe the reference to the new Task.

EventMapper: Pure Function Mapping for 18 Message Types

EventMapper does one pure thing: SDKMessage → AgentEvent. No side effects, no state.

struct EventMapper {
    static func map(_ message: SDKMessage) -> AgentEvent {
        switch message {
        case .partialMessage(let data):
            return AgentEvent(type: .partialMessage, content: data.text, timestamp: .now)

        case .assistant(let data):
            return AgentEvent(type: .assistant, content: data.text,
                metadata: ["model": data.model, "stopReason": data.stopReason],
                timestamp: .now)

        case .toolUse(let data):
            return AgentEvent(type: .toolUse, content: data.toolName,
                metadata: ["toolName": data.toolName, "toolUseId": data.toolUseId,
                           "input": data.input],
                timestamp: .now)

        case .toolResult(let data):
            return AgentEvent(type: .toolResult, content: data.content,
                metadata: ["toolUseId": data.toolUseId, "isError": data.isError],
                timestamp: .now)

        case .toolProgress(let data):
            return AgentEvent(type: .toolProgress, content: data.toolName,
                metadata: ["toolUseId": data.toolUseId, "toolName": data.toolName,
                           "elapsedTimeSeconds": data.elapsedTimeSeconds ?? 0],
                timestamp: .now)

        case .result(let data):
            return AgentEvent(type: .result, content: data.text,
                metadata: ["subtype": data.subtype.rawValue, "numTurns": data.numTurns,
                           "durationMs": data.durationMs, "totalCostUsd": data.totalCostUsd],
                timestamp: .now)

        case .system(let data):
            return AgentEvent(type: .system, content: data.message,
                metadata: ["subtype": data.subtype.rawValue], timestamp: .now)

        // hook、task、auth 等消息全部映射为 system 类型
        case .hookStarted, .hookProgress, .hookResponse,
             .taskStarted, .taskProgress,
             .authStatus, .filesPersisted,
             .localCommandOutput, .promptSuggestion, .toolUseSummary:
            return AgentEvent(type: .system, content: extractContent(from: message),
                metadata: extractMetadata(from: message), timestamp: .now)

        case .userMessage(let data):
            return AgentEvent(type: .userMessage, content: data.message, timestamp: .now)
        }
    }
}

Mapping strategy:

One-to-one mapping: assistant, toolUse, toolResult, toolProgress, result, userMessage each map to their own AgentEventType
Merged mapping: hookStarted/hookProgress/hookResponse, taskStarted/taskProgress, authStatus, filesPersisted, and other SDK messages — 10 types total — all map to .system, differentiated by their metadata
Data extraction: Data fields from SDK messages are extracted into the metadata dictionary as needed; UI views read them by key

Why use metadata: [String: any Sendable] instead of defining a separate struct for each event type? Because metadata is a flexible dictionary — when a new event type is added, you only need to add a case in EventMapper, without defining a new model type. The tradeoff is reduced type safety: values require as? casting at read time. For the UI layer, this tradeoff is reasonable — event data is only read during rendering and doesn't need compile-time type checking.

ToolContent Pairing: Merging Three Events into One Card

An SDK tool call goes through three stages: toolUse (start) → toolProgress (progress updates) → toolResult (completion). These are three separate SDKMessage instances, but the UI needs to display them as a single tool card — showing the tool name, input parameters, execution progress, and output result.

That's what toolContentMap is for. It uses toolUseId as the key, merging events from all three stages into a single ToolContent:

// AgentBridge+ToolContentMap.swift
func processToolContentMap(for event: AgentEvent) {
    switch event.type {
    case .toolUse:
        let content = ToolContent.fromToolUseEvent(event)
        toolContentMap[content.toolUseId] = content

    case .toolProgress:
        let toolUseId = event.metadata["toolUseId"] as? String ?? ""
        if let existing = toolContentMap[toolUseId] {
            toolContentMap[toolUseId] = existing.applyingProgress(event)
        }

    case .toolResult:
        let resultContent = ToolContent.fromToolResultEvent(event)
        let toolUseId = resultContent.toolUseId
        if let existing = toolContentMap[toolUseId] {
            toolContentMap[toolUseId] = ToolContent(
                toolName: existing.toolName,
                toolUseId: existing.toolUseId,
                input: existing.input,
                output: resultContent.output,
                isError: resultContent.isError,
                status: resultContent.status,
                elapsedTimeSeconds: existing.elapsedTimeSeconds
            )
        }

    default:
        break
    }
}

The pairing process:

Receive toolUse → create a ToolContent with status .pending
Receive toolProgress → update the existing entry, change status to .running, record elapsed time
Receive toolResult → merge output and error status, change status to .completed or .failed

ToolContent is a struct; each update creates a new copy. AgentBridge's toolContentMap is an @Observable-tracked property, so every assignment triggers a SwiftUI update. This means tool cards can display progress changes in real time.

There's also a finalizeToolContentMap method — called when the stream ends — that marks any tools still in .pending or .running status as .completed. This prevents the UI from showing a permanently spinning progress bar when a stream terminates abnormally.

Event Persistence: The EventStore Protocol

Every event goes through appendAndPersist, which updates both the in-memory array and the database:

private func appendAndPersist(_ event: AgentEvent) {
    events.append(event)
    processToolContentMap(for: event)

    guard event.type != .partialMessage,
          let eventStore, let currentSession else { return }

    totalPersistedEvents += 1
    try eventStore.persist(event, session: currentSession, order: eventOrder)
    eventOrder += 1

    trimOldEvents()
}

Persistence is abstracted through the EventStoring protocol:

@MainActor
protocol EventStoring {
    func persist(_ event: AgentEvent, session: Session, order: Int) throws
    func fetchEvents(for sessionID: UUID) throws -> [AgentEvent]
    func fetchEvents(for sessionID: UUID, offset: Int, limit: Int) throws -> [AgentEvent]
    func totalEventCount(for sessionID: UUID) throws -> Int
}

There's currently one implementation: SwiftDataEventStore, which uses SwiftData's ModelContext for storage. Serialization is hand-written JSON — EventSerializer converts AgentEvent into a [String: Any] dictionary and then compresses it into Data:

// SwiftData 的 Event 模型
@Model
final class Event {
    @Attribute(.unique) var id: UUID
    var sessionID: UUID
    var eventType: String
    var rawData: Data        // JSON 序列化的 AgentEvent
    var timestamp: Date
    var order: Int
    var session: Session?
}

Why stuff metadata into rawData instead of splitting it into separate SwiftData fields? Because metadata content varies by event type — toolUse has toolName/toolUseId/input, while result has numTurns/durationMs/totalCostUsd. Splitting into separate fields would result in many empty columns and require Schema changes every time a new event type is added. Storing it as a JSON blob and deserializing on read is more flexible.

The write timing for persistence is once per event. For a typical Agent execution (which may produce 50–100 events), this means 50–100 SwiftData writes. In practice, there are no performance issues — SwiftData caches in memory and flushes to disk in batches. If event volume grows significantly in the future, this could be changed to batch writes.

Memory Management: Sliding Window + Pagination

A complex Agent execution can produce thousands of events. Keeping them all in memory isn't feasible. AgentBridge uses a two-tier strategy:

In-Memory Sliding Window

private let maxInMemory = 500

func trimOldEvents() {
    guard events.count > maxInMemory else { return }
    let removeCount = events.count - maxInMemory
    let removed = Array(events.prefix(removeCount))
    events.removeFirst(removeCount)
    trimmedEventCount += removeCount

    for event in removed {
        if event.type == .toolUse {
            let toolUseId = event.metadata["toolUseId"] as? String ?? ""
            toolContentMap.removeValue(forKey: toolUseId)
        }
    }
}

The in-memory array keeps at most 500 events. Anything beyond that is removed from the head, and corresponding entries in toolContentMap are cleaned up. trimmedEventCount tracks how many events have been removed, used for offset calculations during paginated queries.

Pagination on Load

When switching sessions, loadEvents determines the loading strategy based on total count:

func loadEvents(for session: Session) {
    clearEvents()
    currentSession = session
    guard let eventStore else { return }

    let total = try eventStore.totalEventCount(for: session.id)
    totalPersistedEvents = total

    if total > 1000 {
        // 大会话：只加载第一页
        let firstPage = try eventStore.fetchEvents(for: session.id, offset: 0, limit: 50)
        events = firstPage
        eventOrder = total
    } else {
        // 小会话：全部加载
        let persisted = try eventStore.fetchEvents(for: session.id)
        events = persisted
        eventOrder = persisted.count
    }
    rebuildToolContentMap()
}

When the user scrolls up, loadMoreEvents appends events by page:

func loadMoreEvents() {
    guard let eventStore, let currentSession else { return }
    let offset = trimmedEventCount + events.count
    guard offset < totalPersistedEvents else { return }

    let remaining = totalPersistedEvents - offset
    let limit = min(pageSize, remaining)
    let nextPage = try eventStore.fetchEvents(for: currentSession.id, offset: offset, limit: limit)
    events.append(contentsOf: nextPage)
    rebuildToolContentMap()
}

hasMoreEvents is a computed property that SwiftUI can use to show a "load more" button:

var hasMoreEvents: Bool {
    totalPersistedEvents > trimmedEventCount + events.count
}

Permission System: User Approval Before Agent Tool Calls

The SDK's permissionMode: .default prompts the user for permission before executing a tool. AgentBridge integrates this mechanism through the setCanUseTool callback:

private func setupPermissionCallback() {
    agent?.setCanUseTool { [weak self] tool, input, _ in
        guard let self else { return .allow() }
        return await self.handlePermission(tool: tool, input: input)
    }
}

PermissionHandler first checks existing permission rules (tools the user previously selected "always allow" for). If a rule matches, it allows the call immediately. If no rule matches, it presents a native SwiftUI sheet for user approval:

var pendingPermissionRequest: PendingPermissionRequest?

PendingPermissionRequest internally uses a CheckedContinuation to suspend async execution, resuming after the user taps "Allow Once" / "Always Allow" / "Deny":

private func presentPermissionDialog(...) async -> CanUseToolResult {
    let request = PendingPermissionRequest(...)
    self.pendingPermissionRequest = request
    let dialogResult = await request.waitForResult()  // 挂起，等 UI 操作
    self.pendingPermissionRequest = nil

    switch dialogResult {
    case .allowOnce:   // 本次允许
    case .alwaysAllow:  // 写入持久规则
    case .deny:         // 拒绝
    }
}

This design bridges the SDK's synchronous permission check (canUseTool callback) with SwiftUI's asynchronous UI interaction (user tapping a button), powered by Swift's async/await + CheckedContinuation.

Configuration and Lifecycle

AgentBridge's configuration entry point is configure:

func configure(apiKey: String, baseURL: String?, model: String, workspacePath: String?) {
    let options = AgentOptions(
        apiKey: apiKey,
        model: model,
        baseURL: baseURL,
        maxTurns: 10,
        permissionMode: .default,
        cwd: workspacePath,
        tools: getAllBaseTools(tier: .core)
    )
    self.agent = createAgent(options: options)
    setupPermissionCallback()
}

Each time the user switches sessions, WorkspaceView calls configure again (because different sessions may have different workspace paths):

// WorkspaceView.swift
.onChange(of: session.id) { _, _ in
    agentBridge.clearEvents()
    configureAgent()        // 重新创建 Agent
    loadPersistedEvents()   // 加载该会话的历史事件
    setupTitleGeneration()  // 设置自动标题
}

clearEvents does a full reset — clears the event array, cancels any running Task, and resets pagination state:

func clearEvents() {
    events = []
    streamingText = ""
    errorMessage = nil
    isRunning = false
    toolContentMap = [:]
    currentTask?.cancel()
    currentTask = nil
    eventOrder = 0
    totalPersistedEvents = 0
    trimmedEventCount = 0
}

Summary

AgentBridge carries five responsibilities:

Responsibility	Implementation
Consume Stream	`for await` loop inside a `Task`, `Task.cancel()` on cancel
Map Events	`EventMapper.map()` pure function
Pair Tool Content	`toolContentMap: [String: ToolContent]`
Persist Data	`EventStoring` protocol + SwiftData implementation
Manage Memory	500-event sliding window + on-demand paginated loading

The entire pipeline runs on @MainActor, and SwiftUI responds to changes automatically through @Observable. The view layer doesn't need to know about the Stream or SDK types — it only deals with AgentEvent and ToolContent.

The next post looks at the event timeline — how TimelineView renders 18 event types, handles virtualization, and manages streaming text and scroll behavior.

Deep Dive into SwiftWork Series:

Part 0: Building a macOS Agent Workbench with SwiftUI
Part 1: SDK Integration — Bridging AsyncStream to SwiftUI
Part 2: Event Timeline — Visualizing 18 Event Types
Part 3: Tool Card — An Extensible Tool Visualization System
Part 4: Data Layer and Services — SwiftData, State Restore, Markdown Rendering

GitHub: SwiftWork | Open Agent SDK

Building a macOS Agent Workbench with SwiftUI

NEE — Sun, 03 May 2026 16:12:24 +0000

Across the previous seven articles plus a bonus chapter, we thoroughly explored the inner workings of Open Agent SDK — Agent Loop, the tool system, MCP integration, multi-Agent collaboration, conversation persistence, and multi-LLM support. The bonus chapter even embedded the SDK into a macOS native app, Motive, and ran it live.

But Motive was just a backend-swap experiment. The real question is: once you have the SDK, how do you build a complete Agent application from scratch? The SDK gives you the Agent's "brain" — but users don't see the Agent Loop; they see an interface. When the Agent is calling tools, reading files, and executing commands, users need to know what it's doing, how things are progressing, and what the results are.

That's the problem SwiftWork set out to solve — a macOS-native Agent visual workbench.

What is SwiftWork

SwiftWork is a macOS-native AI Agent desktop application. Its purpose in one sentence: let users see what the Agent is doing.

Specifically:

Users type a prompt in the input box
The Agent runs the Agent Loop in the background (calling tools, reading files, executing commands)
Every step is displayed in real time on a timeline — text output, tool calls, execution results, error messages, all visualized

This is not a CLI tool in a terminal, nor a web application. It's a native macOS app written in SwiftUI, using @observable for state management, SwiftData for data persistence, and Apple's native rendering pipeline for Markdown and code highlighting.

Why Build SwiftWork

There are two motivations behind SwiftWork.

First, the SDK needs a "showcase app." The SDK's 31 sample projects cover a wide range of use cases — streaming output, custom tools, MCP integration — but they are all command-line tools. The SDK's capabilities need a GUI to be fully demonstrated, especially things that CLI tools do poorly: tool call visualization, real-time event stream rendering, and so on.

Second, Agent application visualization is an underrated problem. Current Agent applications (including Claude Code itself) run in a terminal, where users see a scrolling stream of text. But when an Agent executes a complex task, it might call dozens of tools, read and write multiple files, and execute multiple commands. Linear terminal output makes it hard for users to understand the overall progress. SwiftWork attempts to address this with an event timeline and tool cards.

Architecture Overview

SwiftWork adopts an event-driven architecture. The entire data flow is a one-way pipeline:

SDK Agent Loop
  │
  │  AsyncStream<SDKMessage>
  ▼
AgentBridge (@Observable)
  │
  │  EventMapper.map() → AgentEvent
  ▼
AgentBridge.events: [AgentEvent]
  │
  │  SwiftUI auto-responds to @Observable changes
  ▼
TimelineView → EventViews

Four roles:

Component	Responsibility
Agent Loop	SDK provides, runs Agent inference loop, produces SDKMessage stream
AgentBridge	Consumes AsyncStream, maps to AgentEvent, manages lifecycle
EventMapper	Pure function, SDKMessage → AgentEvent type mapping
TimelineView	SwiftUI view, consumes AgentEvent array, renders timeline

Core design decision: views never directly touch SDK types. AgentEvent is SwiftWork's own UI model, completely decoupled from the SDK's SDKMessage. Views only know about AgentEvent — they neither know nor care which SDK version the events come from.

Core Data Flow

Here's what happens across the entire pipeline when a user sends a message:

1. User Input → Agent Starts

// AgentBridge.swift
func sendMessage(_ text: String) {
    // User message appended directly to event list
    let userEvent = AgentEvent(type: .userMessage, content: text, timestamp: .now)
    appendAndPersist(userEvent)

    isRunning = true

    // Consume stream in background Task
    currentTask = Task { [weak self] in
        let stream = agent.stream(text)
        for await message in stream {
            let event = EventMapper.map(message)
            self.appendAndPersist(event)
        }
        self.isRunning = false
    }
}

The user message is appended to the event list first (displayed immediately), then a Task is launched to consume the SDK's AsyncStream.

2. SDKMessage → AgentEvent

EventMapper is a pure function that maps the SDK's 18 SDKMessage types into SwiftWork's AgentEventType:

// EventMapper.swift
static func map(_ message: SDKMessage) -> AgentEvent {
    switch message {
    case .assistant(let data):
        return AgentEvent(type: .assistant, content: data.text,
            metadata: ["model": data.model, "stopReason": data.stopReason], timestamp: .now)
    case .toolUse(let data):
        return AgentEvent(type: .toolUse, content: data.toolName,
            metadata: ["toolName": data.toolName, "toolUseId": data.toolUseId, "input": data.input],
            timestamp: .now)
    case .toolResult(let data):
        return AgentEvent(type: .toolResult, content: data.content,
            metadata: ["toolUseId": data.toolUseId, "isError": data.isError], timestamp: .now)
    // ... 18 message types
    }
}

Why have this mapping layer? Because the SDK's types are designed for the Agent runtime and contain many details the UI doesn't need. AgentEvent retains only the fields needed for UI rendering: type, content, metadata, and timestamp. Views don't need to know the enum definition of SDKMessage — they only need to handle AgentEventType.

3. Event Append + Persistence

Every event goes through appendAndPersist, which simultaneously updates the in-memory array and the SwiftData database:

private func appendAndPersist(_ event: AgentEvent) {
    events.append(event)
    processToolContentMap(for: event)

    guard event.type != .partialMessage,
          let eventStore, let currentSession else { return }

    try eventStore.persist(event, session: currentSession, order: eventOrder)
    eventOrder += 1

    trimOldEvents()
}

Note that partialMessage is not persisted — it's an intermediate fragment of streaming text. Once accumulated, a complete .assistant event is generated.

4. SwiftUI Auto-Rendering

AgentBridge is marked with @observable. When the events array changes, TimelineView automatically re-renders:

// TimelineView.swift
ForEach(virtualizedEvents) { event in
    eventView(for: event)
}

eventView dispatches to different view components based on event.type — UserMessageView, AssistantMessageView, ToolCardView, SystemEventView, and so on.

Project Structure

SwiftWork/
├── App/
│   ├── SwiftWorkApp.swift            # @main entry, registers SwiftData models
│   └── ContentView.swift             # NavigationSplitView root view
├── Models/
│   ├── UI/                           # UI model layer
│   │   ├── AgentEvent.swift          # Event model (SwiftUI rendering)
│   │   ├── AgentEventType.swift      # 18 event type enum
│   │   ├── ToolContent.swift         # Tool content (pairs toolUse + toolResult)
│   │   ├── PermissionDecision.swift  # Permission decision
│   │   └── AppError.swift            # Error model
│   └── SwiftData/                    # Persistence model layer
│       ├── Session.swift             # Session
│       ├── Event.swift               # Persisted event
│       ├── AppConfiguration.swift    # App configuration
│       └── PermissionRule.swift      # Permission rule
├── ViewModels/
│   ├── SessionViewModel.swift        # Session CRUD
│   └── SettingsViewModel.swift       # Settings management
├── Views/
│   ├── Sidebar/                      # Session list
│   ├── Workspace/
│   │   ├── Timeline/
│   │   │   ├── TimelineView.swift    # Timeline main view + virtualization
│   │   │   ├── EventViews/           # Per-event-type views + ToolCardView
│   │   │   │   ├── ToolRenderers/    # 5 built-in tool renderers
│   │   │   │   ├── StreamingTextView.swift
│   │   │   │   ├── MarkdownContentView.swift
│   │   │   │   └── ...
│   │   │   └── Inspector/            # Event detail panel
│   │   └── InputBar/                 # Message input bar
│   ├── Settings/                     # Settings view
│   ├── Onboarding/                   # First-launch onboarding
│   └── Permission/                   # Permission approval dialog
├── SDKIntegration/
│   ├── AgentBridge.swift             # SDK ↔ ViewModel bridge
│   ├── AgentBridge+ToolContentMap.swift  # Tool content pairing logic
│   ├── EventMapper.swift             # SDKMessage → AgentEvent
│   ├── ToolRenderable.swift          # Tool rendering protocol
│   └── ToolRendererRegistry.swift    # Tool renderer registry
├── Services/
│   ├── CodeHighlighter.swift         # Splash code highlighting
│   ├── MarkdownRenderer.swift        # swift-markdown rendering
│   ├── KeychainManager.swift         # API Key secure storage
│   ├── EventStore.swift              # Event persistence interface
│   ├── AppStateManager.swift         # App state save/restore
│   └── TitleGenerator.swift          # Auto session title generation
└── Utils/
    └── Extensions/                   # Color, date formatting utilities

Key structural layers:

Models/UI/ and Models/SwiftData/ are two independent model layers. UI models (AgentEvent) are for SwiftUI rendering, SwiftData models (Event) are for persistence. There's conversion logic between them.
SDKIntegration/ is the bridge layer between SDK and UI. Views and ViewModels don't directly import OpenAgentSDK.
Views/ is organized by feature, with one view file per event type and a dedicated subdirectory for tool renderers.

Technology Choices

Component	Choice	Reason
Language	Swift 6.1 strict concurrency	Agent SDK requires it, Sendable ensures thread safety
UI	SwiftUI + @observable	macOS 14+ support, works well with Swift concurrency
Persistence	SwiftData	Deep SwiftUI integration, simpler than Core Data
Markdown	swift-markdown (Apple)	Native Apple library, CommonMark compatible
Code Highlighting	Splash (John Sundell)	Lightweight, supports Swift/Python/JS/Bash
Auto-update	Sparkle 2.x	Standard macOS app update solution
Agent SDK	Open Agent SDK	Built it ourselves, of course we use it

Series Preview

This article gave you the big picture. The following articles will unpack each subsystem:

Part 1: SDK Integration — how AgentBridge consumes AsyncStream, maps events, and manages lifecycle
Part 2: Event Timeline — visualizing 18 event types, streaming text, virtualization
Part 3: Tool Card — ToolRenderable protocol and extensible tool renderers
Part 4: Data Layer and Services — SwiftData persistence, state restore, Markdown rendering, code highlighting

Links:

SwiftWork: terryso/SwiftWork
Open Agent SDK: terryso/open-agent-sdk-swift

Deep Dive into SwiftWork Series:

Part 0: Building a macOS Agent Workbench with SwiftUI
Part 1: SDK Integration — Bridging AsyncStream to SwiftUI
Part 2: Event Timeline — Visualizing 18 Event Types
Part 3: Tool Card — An Extensible Tool Visualization System
Part 4: Data Layer and Services — SwiftData, State Restore, Markdown Rendering

GitHub: SwiftWork | Open Agent SDK

Deep Dive into Open Agent SDK (Part 6): Multi-LLM Providers and Runtime Controls

NEE — Mon, 27 Apr 2026 03:55:20 +0000

An Agent shouldn't be locked to a single LLM provider. Different tasks suit different models — simple questions use cheap models, complex reasoning uses expensive ones, and some scenarios even require local models. Runtime needs change too: users might want deeper thinking mid-session, discover the budget is running low and need to downgrade, or switch to a local model to save money.

Open Agent SDK's approach: define a unified LLMClient protocol, with Anthropic and OpenAI-compatible providers each having an implementation. Internally, the Agent uses Anthropic format throughout. Switching providers requires changing only one configuration parameter, and models can be switched dynamically at runtime with adjustable thinking depth and budget control.

This article analyzes the SDK's multi-provider adaptation mechanism and runtime control capabilities.

1. LLMClient Protocol — Unified Interface

First, the protocol definition:

public protocol LLMClient: Sendable {
    nonisolated func sendMessage(
        model: String,
        messages: [[String: Any]],
        maxTokens: Int,
        system: String?,
        tools: [[String: Any]]?,
        toolChoice: [String: Any]?,
        thinking: [String: Any]?,
        temperature: Double?
    ) async throws -> [String: Any]

    nonisolated func streamMessage(
        model: String,
        messages: [[String: Any]],
        maxTokens: Int,
        system: String?,
        tools: [[String: Any]]?,
        toolChoice: [String: Any]?,
        thinking: [String: Any]?,
        temperature: Double?
    ) async throws -> AsyncThrowingStream<SSEEvent, Error>
}

Two core methods: one blocking, one streaming. The parameter list covers all capabilities of mainstream LLM APIs: model selection, message history, token limit, system prompt, tool definitions, tool choice strategy, thinking configuration, and temperature.

Key design decision: return values are always in Anthropic format dictionaries. Whether the underlying API is Anthropic native or OpenAI-compatible, the Agent internally receives the same structure — content arrays with {"type": "text", "text": "..."} or {"type": "tool_use", "name": "...", "input": {...}}, and stop_reason as end_turn/tool_use/max_tokens. This means Agent Loop processing logic doesn't need to care about which API is underneath.

Streaming returns use AsyncThrowingStream<SSEEvent, Error>, where SSEEvent is an enum:

public enum SSEEvent: @unchecked Sendable {
    case messageStart(message: [String: Any])
    case contentBlockStart(index: Int, contentBlock: [String: Any])
    case contentBlockDelta(index: Int, delta: [String: Any])
    case contentBlockStop(index: Int)
    case messageDelta(delta: [String: Any], usage: [String: Any])
    case messageStop
    case ping
    case error(data: [String: Any])
}

7 event types covering all streaming response events from the Anthropic Messages API. OpenAI-compatible layer streaming output is converted to the same SSEEvent sequence.

2. AnthropicClient — Native Claude API

AnthropicClient is the Anthropic native implementation of LLMClient, using actor for concurrency safety:

public actor AnthropicClient: LLMClient {
    private let apiKey: String
    private let baseURL: URL      // Default https://api.anthropic.com
    private let urlSession: URLSession

    public init(apiKey: String, baseURL: String? = nil, urlSession: URLSession? = nil) {
        self.apiKey = apiKey
        self.baseURL = URL(string: baseURL ?? "https://api.anthropic.com")!
        self.urlSession = urlSession ?? URLSession.shared
    }
}

Requests are POST to /v1/messages with x-api-key and anthropic-version headers:

private nonisolated func buildRequest(body: [String: Any]) throws -> URLRequest {
    var request = URLRequest(url: URL(string: baseURL.absoluteString + "/v1/messages")!)
    request.httpMethod = "POST"
    request.timeoutInterval = 300
    request.setValue(apiKey, forHTTPHeaderField: "x-api-key")
    request.setValue("2023-06-01", forHTTPHeaderField: "anthropic-version")
    request.setValue("application/json", forHTTPHeaderField: "content-type")
    request.httpBody = try JSONSerialization.data(withJSONObject: body, options: [])
    return request
}

Since it uses the Anthropic native API, sendMessage request and response bodies don't need format conversion — request parameters are assembled directly as dictionaries, responses are parsed directly. Streaming mode directly parses Anthropic SSE text.

A security detail: all error messages replace the API Key with *** to prevent key leakage into logs:

let safeMessage = errorMessage.replacingOccurrences(of: apiKey, with: "***")

AnthropicClient directly supports Extended Thinking. When the Agent configures ThinkingConfig, the thinking parameter is passed through:

if let thinking {
    body["thinking"] = thinking
}

3. OpenAI-Compatible Layer — Adapting GLM/Ollama/OpenRouter etc.

OpenAIClient is the heavy lifter. It accepts Anthropic-format parameters, converts them to OpenAI Chat Completion API format, sends the request, then converts the OpenAI response back to Anthropic format. The Agent is completely unaware of the underlying OpenAI-compatible API.

public actor OpenAIClient: LLMClient {
    private let apiKey: String
    private let baseURL: URL      // Default https://api.openai.com/v1

    public init(apiKey: String, baseURL: String? = nil, urlSession: URLSession? = nil) {
        self.apiKey = apiKey
        self.baseURL = URL(string: baseURL ?? "https://api.openai.com/v1")!
        self.urlSession = urlSession ?? URLSession.shared
    }
}

Requests go to /chat/completions with Bearer token authentication — standard practice for OpenAI-compatible APIs. Any provider supporting the /v1/chat/completions endpoint works with this client.

Message Format Conversion

Several key differences between Anthropic and OpenAI message formats must be handled during conversion:

1. System Message Position

Anthropic passes the system prompt as a top-level parameter; OpenAI includes it as the first role: "system" message:

if let system {
    result.append(["role": "system", "content": system])
}

2. Tool Result Representation

Anthropic packages multiple tool_results in one role: "user" message's content array; OpenAI requires each tool result as a separate role: "tool" message:

let toolResults = blocks.filter { $0["type"] as? String == "tool_result" }
if !toolResults.isEmpty {
    return toolResults.map { block in
        [
            "role": "tool",
            "tool_call_id": block["tool_use_id"] as? String ?? "",
            "content": block["content"] ?? "",
        ]
    }
}

3. Tool Use Representation

Anthropic uses type: "tool_use" blocks in the content array; OpenAI uses a tool_calls array at the message top level:

result["tool_calls"] = toolUseBlocks.enumerated().map { index, block in
    let inputDict = block["input"] as? [String: Any] ?? [:]
    let arguments = (try? JSONSerialization.data(withJSONObject: inputDict, options: []))
        .flatMap { String(data: $0, encoding: .utf8) } ?? "{}"
    return [
        "id": block["id"] as? String ?? "call_\(index)",
        "type": "function",
        "function": [
            "name": block["name"] as? String ?? "",
            "arguments": arguments,  // OpenAI requires JSON string, not dictionary
        ],
    ]
}

Note that OpenAI's arguments must be a JSON string, not a dictionary object — serialization is done here.

Response Format Conversion

OpenAI's response structure (choices[0].message) needs conversion to Anthropic format:

// stop_reason mapping
private static func mapStopReason(_ finishReason: String) -> String {
    switch finishReason {
    case "stop": return "end_turn"
    case "tool_calls": return "tool_use"
    case "length": return "max_tokens"
    default: return finishReason
    }
}

// usage mapping
usage = [
    "input_tokens": openAIUsage["prompt_tokens"] as? Int ?? 0,
    "output_tokens": openAIUsage["completion_tokens"] as? Int ?? 0,
]

Streaming Conversion

Streaming conversion is more complex. OpenAI's streaming format (data: {"choices":[{"delta":{...}}]}) must be converted chunk by chunk to Anthropic's SSEEvent sequence:

First chunk → messageStart
Text delta → contentBlockDelta(type: "text_delta")
Tool call start → contentBlockStart(type: "tool_use"), parameter delta → contentBlockDelta(type: "input_json_delta")
End → contentBlockStop + messageDelta + messageStop

The conversion function tracks how many content blocks are open, whether text blocks are closed, and which tool call blocks are still open to generate correct index values. A safety check ensures messageStop is always emitted, even if the original stream doesn't end normally.

Usage Examples

Connecting to different OpenAI-compatible providers only requires changing baseURL and model:

// DeepSeek
let agent = createAgent(options: AgentOptions(
    apiKey: "sk-...",
    model: "deepseek-chat",
    baseURL: "https://api.deepseek.com/v1",
    provider: .openai
))

// Ollama local
let localAgent = createAgent(options: AgentOptions(
    apiKey: "ollama",           // Ollama doesn't need a key, any value works
    model: "qwen3:8b",
    baseURL: "http://localhost:11434/v1",
    provider: .openai
))

// GLM
let glmAgent = createAgent(options: AgentOptions(
    apiKey: "xxx.glm-xxx",
    model: "glm-4-plus",
    baseURL: "https://open.bigmodel.cn/api/paas/v4",
    provider: .openai
))

4. Runtime Model Switching

The SDK supports dynamic model switching at runtime without recreating the Agent:

let agent = createAgent(options: AgentOptions(
    apiKey: apiKey,
    model: "claude-sonnet-4-6",
    fallbackModel: "claude-haiku-4-5"  // Used if primary model fails
))

// Use sonnet for a simple question first
let result1 = await agent.prompt("What is 2 + 3?")
print(result1.costBreakdown)
// [CostBreakdownEntry(model: "claude-sonnet-4-6", inputTokens: 45, outputTokens: 3, costUsd: 0.000180)]

// Switch to opus for reasoning-intensive question
try agent.switchModel("claude-opus-4-6")
let result2 = await agent.prompt("Explain the difference between structs and classes in Swift.")
print(result2.costBreakdown)
// [CostBreakdownEntry(model: "claude-opus-4-6", inputTokens: 52, outputTokens: 156, costUsd: 0.011970)]

switchModel() implementation:

public func switchModel(_ model: String) throws {
    let trimmed = model.trimmingCharacters(in: .whitespacesAndNewlines)
    guard !trimmed.isEmpty else {
        throw SDKError.invalidConfiguration("Model name cannot be empty")
    }
    let oldModel = self.model
    self.model = trimmed
    self.options.model = trimmed
    Logger.shared.info("Agent", "model_switch", data: ["from": oldModel, "to": trimmed])
}

No allowlist validation — whatever model name is passed gets used. Unsupported models will error at the API level. This design choice exists because OpenAI-compatible provider model names can't be exhaustively listed.

fallbackModel is configured in AgentOptions. When the primary model fails completely (retries exhausted), the SDK automatically retries with the fallback:

if let fallbackModel = self.options.fallbackModel, fallbackModel != self.model {
    let fallbackResponse = try await retryClient.sendMessage(
        model: fallbackModel,
        messages: retryMessages, ...
    )
    // Temporarily switch to fallback for cost tracking
    let originalModel = self.model
    self.model = fallbackModel
    // ... process response
}

Per-Model Cost Breakdown

CostBreakdownEntry records costs grouped by model name:

public struct CostBreakdownEntry: Sendable, Equatable {
    public let model: String
    public let inputTokens: Int
    public let outputTokens: Int
    public let costUsd: Double
}

If models are switched mid-query (or fallback triggered), QueryResult.costBreakdown contains multiple entries with per-model costs. Costs are calculated from built-in price tables:

public nonisolated(unsafe) var MODEL_PRICING: [String: ModelPricing] = [
    "claude-opus-4-6":   ModelPricing(input: 15.0 / 1_000_000, output: 75.0 / 1_000_000),
    "claude-sonnet-4-6": ModelPricing(input: 3.0 / 1_000_000, output: 15.0 / 1_000_000),
    "claude-haiku-4-5":  ModelPricing(input: 0.8 / 1_000_000, output: 4.0 / 1_000_000),
    // ...
]

Custom models can register pricing via registerModel(_:pricing:):

registerModel("glm-4-plus", pricing: ModelPricing(
    input: 0.1 / 1_000_000, output: 0.1 / 1_000_000
))

5. Thinking and Effort Configuration

ThinkingConfig

The SDK uses the ThinkingConfig enum to control LLM deep thinking:

public enum ThinkingConfig: Sendable, Equatable {
    case adaptive                  // Model decides whether to think
    case enabled(budgetTokens: Int) // Specify thinking token budget
    case disabled                  // Disable deep thinking
}

Three modes for different uses:

adaptive: Let the model judge — no thinking for simple questions, automatic thinking for complex ones. Most convenient for daily use.
enabled(budgetTokens:): Explicitly control thinking budget. For deep analysis, allocate 10,000 thinking tokens.
disabled: Turn off thinking entirely for maximum speed.

EffortLevel

EffortLevel is a higher-level abstraction mapping to specific thinking token budgets:

public enum EffortLevel: String, Sendable, CaseIterable {
    case low    // 1024 tokens
    case medium // 5120 tokens
    case high   // 10240 tokens
    case max    // 32768 tokens

    public var budgetTokens: Int {
        switch self {
        case .low: return 1024
        case .medium: return 5120
        case .high: return 10240
        case .max: return 32768
        }
    }
}

Set in AgentOptions:

let agent = createAgent(options: AgentOptions(
    apiKey: apiKey,
    model: "claude-sonnet-4-6",
    effort: .high  // 10240 thinking tokens
))

Runtime Dynamic Adjustment

setMaxThinkingTokens() adjusts the thinking budget between queries:

// Simple question, fewer thinking tokens
try agent.setMaxThinkingTokens(2048)
let r1 = await agent.prompt("Summarize this file.")

// Complex reasoning, increase budget
try agent.setMaxThinkingTokens(16000)
let r2 = await agent.prompt("Design a concurrent data structure for...")

// Disable thinking
try agent.setMaxThinkingTokens(nil)

Positive integer enables thinking with the specified budget; nil disables it. Zero or negative throws SDKError.invalidConfiguration.

ModelInfo describes each model's capabilities:

public struct ModelInfo: Sendable, Equatable {
    public let value: String
    public let displayName: String
    public let description: String
    public let supportsEffort: Bool
    public let supportedEffortLevels: [EffortLevel]?
    public let supportsAdaptiveThinking: Bool?
    public let supportsFastMode: Bool?
}

This lets UI layers dynamically show available options based on model capabilities.

6. Skills System

Skills are a special extension mechanism in the SDK — essentially "prompt templates with tool restrictions." A Skill defines a set of prompt instructions, an allowed tool subset, and an optional model override.

Skill Structure

public struct Skill: Sendable {
    public let name: String
    public let description: String
    public let aliases: [String]              // Aliases, e.g. ["ci"] for commit
    public let userInvocable: Bool            // Whether users can invoke via /command
    public let toolRestrictions: [ToolRestriction]?  // Restrict available tools, nil = all
    public let modelOverride: String?         // Override model during execution
    public let isAvailable: @Sendable () -> Bool     // Runtime availability check
    public let promptTemplate: String         // Prompt template content
    public let whenToUse: String?             // Tell LLM when to use this skill
    public let argumentHint: String?          // Argument hint, e.g. "[message]"
    public let baseDir: String?               // Absolute path to skill directory
    public let supportingFiles: [String]      // Supporting files (references, scripts, etc.)
}

5 Built-in Skills

The SDK predefines 5 common Skills accessible via the BuiltInSkills namespace:

Skill	Aliases	Allowed Tools	Function
`commit`	`ci`	bash, read, glob, grep	Analyze git diff, generate commit message
`review`	`review-pr`, `cr`	bash, read, glob, grep	Review code changes from 5 dimensions
`simplify`	—	bash, read, grep, glob	Review code for reuse, quality, efficiency
`debug`	`investigate`, `diagnose`	read, grep, glob, bash	Analyze errors, locate root cause
`test`	`run-tests`	bash, read, write, glob, grep	Generate and execute test cases

Each Skill restricts its tool scope. commit only allows bash, read, glob, grep — no file writing needed. debug is also read-only (read, grep, glob, bash), diagnosing without modifying. test is the only built-in Skill allowing write, since it creates test files.

test Skill also has a runtime availability check:

isAvailable: {
    let cwd = FileManager.default.currentDirectoryPath
    let testIndicators = [
        "Package.swift", "pytest.ini", "jest.config",
        "vitest.config", "Cargo.toml", "go.mod",
    ]
    for indicator in testIndicators {
        if FileManager.default.fileExists(atPath: cwd + "/" + indicator) {
            return true
        }
    }
    return false
}

The test Skill is only visible to users when a test framework configuration file is detected.

SkillRegistry

SkillRegistry is a thread-safe skill manager using DispatchQueue for concurrent access protection:

public final class SkillRegistry: @unchecked Sendable {
    private var skills: [String: Skill] = [:]
    private var orderedNames: [String] = []
    private var aliases: [String: String] = [:]
    private let queue = DispatchQueue(label: "com.openagentsdk.skillregistry")

    public func register(_ skill: Skill) { ... }
    public func find(_ name: String) -> Skill? { ... }   // Find by name or alias
    public var allSkills: [Skill] { ... }
    public var userInvocableSkills: [Skill] { ... }
}

Register, find, replace, and delete are all queue.sync-protected operations. Aliases automatically build mappings on registration — after registering BuiltInSkills.commit, registry.find("ci") also finds it.

SkillLoader: Filesystem Discovery

Skills don't all need code registration. SkillLoader can automatically discover skills from the filesystem — any directory containing a SKILL.md file is recognized as a skill package.

Scanning directories by priority from low to high:

~/.config/agents/skills      (lowest priority)
~/.agents/skills
~/.claude/skills
$PWD/.agents/skills
$PWD/.claude/skills           (highest priority)

Same-named skills discovered later override earlier ones (last-wins).

SKILL.md uses YAML frontmatter for metadata:

---
name: polyv-live-cli
description: Manage live streaming services
aliases: live, plv
allowed-tools: Bash, Read, Write, Glob
when-to-use: user asks about live streaming management
argument-hint: [action] [options]
---

# polyv-live-cli Skill

You are a live streaming management assistant...

The allowed-tools in frontmatter is parsed into ToolRestriction arrays, restricting which tools the skill can use during execution.

SkillLoader uses a "progressive loading" strategy: only loading the SKILL.md Markdown body as the prompt template. Supporting files (references, scripts, templates) only have their paths recorded without loading content. The Agent reads them on-demand via Read/Bash tools when needed.

let registry = SkillRegistry()
registry.register(BuiltInSkills.commit)
registry.register(BuiltInSkills.review)
// Discover custom skills from filesystem
let count = registry.registerDiscoveredSkills()
// Or specify directories
registry.registerDiscoveredSkills(from: ["/opt/custom-skills"])
// Or only register whitelisted skills
registry.registerDiscoveredSkills(skillNames: ["polyv-live-cli"])

ToolRestriction

ToolRestriction enum defines restrictable tools:

public enum ToolRestriction: String, Sendable, CaseIterable {
    case bash, read, write, edit, glob, grep
    case webFetch, webSearch, askUser, toolSearch
    case agent, sendMessage
    case taskCreate, taskList, taskUpdate, taskGet, taskStop, taskOutput
    case teamCreate, teamDelete
    case notebookEdit, skill
}

When a Skill sets toolRestrictions: [.bash, .read, .glob], the Agent can only use these three tools during execution. Other tool calls are intercepted.

Using Skills in an Agent

To make Skills available to an Agent, add SkillTool to the tools list:

var tools = getAllBaseTools(tier: .core)
tools.append(createSkillTool(registry: registry))

let agent = createAgent(options: AgentOptions(
    apiKey: apiKey,
    model: "claude-sonnet-4-6",
    permissionMode: .bypassPermissions,
    tools: tools
))

// Agent auto-discovers and invokes based on skill list in system prompt
let result = await agent.prompt("Use the commit skill to analyze current changes")

SkillRegistry.formatSkillsForPrompt() generates a skill list snippet injected into the system prompt, including each skill's name, description, and trigger conditions. The LLM sees this list and knows when to invoke which skill.

7. Other Runtime Controls

Budget Control

maxBudgetUsd sets the cost ceiling per query:

let agent = createAgent(options: AgentOptions(
    apiKey: apiKey,
    model: "claude-sonnet-4-6",
    maxBudgetUsd: 0.05  // Maximum 5 cents
))

Cumulative cost is checked after each turn:

if let budget = options.maxBudgetUsd, totalCostUsd > budget {
    status = .errorMaxBudgetUsd
    break
}

When the budget is exceeded, the loop exits immediately. Any text and token statistics already generated are preserved in QueryResult — you get a partial result, not a blank one.

Query Interruption

Two ways to interrupt an in-progress query:

// Method 1: Call interrupt()
agent.interrupt()

// Method 2: Cancel Task
let task = Task {
    await agent.prompt("Long running query...")
}
// Later
task.cancel()

interrupt() internally sets the _interrupted flag and cancels the stream task. The Agent Loop checks this flag at multiple checkpoints (loop entry, between read-only/mutation tools, inside SSE event loop, before/after tool execution), exiting immediately on detection.

Dynamic Permission Switching

Runtime permission mode and tool authorization callbacks can be switched:

// Switch permission mode
agent.setPermissionMode(.askForPermission)

// Set custom authorization callback (higher priority than permissionMode)
agent.setCanUseTool { toolName, input in
    if toolName == "Bash" {
        return .deny("Bash is disabled")
    }
    return .allow
}

// Revert to permissionMode control
agent.setCanUseTool(nil)

setCanUseTool callback takes priority over permissionMode. Calling setPermissionMode() clears any previously set callback.

Environment Variable Configuration

The SDK supports configuration via environment variables. Priority: code settings > environment variables > defaults.

Environment Variable	Corresponding Field	Default
`CODEANY_API_KEY`	`apiKey`	`nil`
`CODEANY_MODEL`	`model`	`claude-sonnet-4-6`
`CODEANY_BASE_URL`	`baseURL`	`nil` (use provider default)

Merged using SDKConfiguration.resolved():

// Code-set values take priority; unset values read from environment
let config = SDKConfiguration.resolved(overrides: SDKConfiguration(
    apiKey: "sk-...",           // Overrides CODEANY_API_KEY
    model: "claude-sonnet-4-6"  // Overrides CODEANY_MODEL
))

// Environment variables only
let envConfig = SDKConfiguration.fromEnvironment()

Retry Mechanism

All LLM requests are wrapped with withRetry:

public struct RetryConfig: Sendable {
    public let maxRetries: Int          // Max retries, default 3
    public let baseDelayMs: Int         // Base delay, default 2000ms
    public let maxDelayMs: Int          // Max delay, default 30000ms
    public let retryableStatusCodes: Set<Int>  // Default [429, 500, 502, 503, 529]
}

Exponential backoff + 25% random jitter to avoid thundering herd. Only SDKError.apiError with status codes in the retryable set triggers retries; other errors are thrown directly.

let delay = config.baseDelayMs * (1 << attempt)
let jitterMs = Int(Double(delay) * 0.25 * (Double.random(in: -1...1)))
let totalMs = max(0, min(delay + jitterMs, config.maxDelayMs))

Series Recap

Six articles complete, covering the full architecture of Open Agent SDK (Swift):

Part 0: Project overview — what the SDK does, overall architecture, how to use it
Part 1: Agent Loop internals — the complete cycle from prompt to multi-turn conversation
Part 2: 34 built-in tools — ToolProtocol design, three-tier architecture, custom extensions
Part 3: MCP integration — connecting external tool servers, discovery, and communication
Part 4: Multi-agent collaboration — Team/Task models, inter-agent communication
Part 5: Session persistence and security — session storage, permission control, Hook system
Part 6 (this article): Multi-LLM providers and runtime controls — LLMClient protocol, OpenAI adapter, model switching, Thinking/Effort, Skills system

Starting from the Agent Loop core, the tool system is the loop's "execution" stage, MCP is external tool extension, multi-agent is the collaboration pattern, sessions are state persistence, security and Hooks are governance mechanisms, and this article's multi-provider and runtime controls ensure flexibility — letting the same Agent choose the most appropriate model and control strategy for each scenario.

Deep Dive into Open Agent SDK (Swift) Series:

Part 0: Open Agent SDK (Swift): Build AI Agent Applications with Native Swift Concurrency
Part 1: Deep Dive into Open Agent SDK (Part 1): Agent Loop Internals
Part 2: Deep Dive into Open Agent SDK (Part 2): Behind the 34 Built-in Tools
Part 3: Deep Dive into Open Agent SDK (Part 3): MCP Integration in Practice
Part 4: Deep Dive into Open Agent SDK (Part 4): Multi-Agent Collaboration
Part 5: Deep Dive into Open Agent SDK (Part 5): Session Persistence and Security
Part 6: Deep Dive into Open Agent SDK (Part 6): Multi-LLM Providers and Runtime Controls

GitHub: terryso/open-agent-sdk-swift

Deep Dive into Open Agent SDK (Part 5): Session Persistence and Security

NEE — Mon, 27 Apr 2026 03:55:09 +0000

An Agent is more than a one-shot Q&A tool. A truly useful Agent must do three things: remember context (where we left off), control permissions (which operations are allowed), and audit behavior (who did what and when). Open Agent SDK uses four subsystems to cover these needs — SessionStore, PermissionPolicy, SandboxSettings, and HookRegistry.

This article analyzes the implementation details of these four subsystems, how each works individually, and how they combine to build a secure Agent.

1. Session Persistence: SessionStore

Each Agent Loop run produces a messages array. Without persistence, it's lost when the process exits. SessionStore persists this conversation history to disk for restoration on next startup.

What Is SessionStore

SessionStore is an actor — all methods require await. By default, sessions are stored in ~/.open-agent-sdk/sessions/, with one subdirectory per session containing a transcript.json file.

let sessionStore = SessionStore()  // Default path
let sessionStore = SessionStore(sessionsDir: "/custom/path")  // Custom path

Five Core Operations

SessionStore provides five core methods covering the full session lifecycle.

save — Save a session. Serializes the messages array and metadata to JSON and writes to disk:

try await sessionStore.save(
    sessionId: "my-session",
    messages: messages,
    metadata: PartialSessionMetadata(
        cwd: "/project",
        model: "claude-sonnet-4-6",
        summary: "Code analysis session",
        tag: "analysis",
        firstPrompt: "Analyze project structure"
    )
)

Storage structure:

~/.open-agent-sdk/sessions/
  my-session/
    transcript.json    // { "metadata": {...}, "messages": [...] }

File permissions are 0600, directory permissions 0700 — only the current user can read/write. Each save preserves the original createdAt timestamp, updating only updatedAt.

load — Load a session. Reads transcript.json from disk and deserializes into SessionData:

if let data = try await sessionStore.load(sessionId: "my-session") {
    print("Messages: \(data.metadata.messageCount)")
    print("Model: \(data.metadata.model)")
    // data.messages is [[String: Any]] array
}

load supports pagination parameters limit and offset for loading only the tail when full history isn't needed:

// Load only the last 50 messages
let recent = try await sessionStore.load(sessionId: "my-session", limit: 50, offset: nil)

list — List all sessions, sorted by updatedAt descending (most recent first):

let sessions = try await sessionStore.list(limit: 10)
for session in sessions {
    print("\(session.id) — \(session.summary ?? "(untitled)") [\(session.messageCount) messages]")
}

SessionMetadata includes id, cwd, model, createdAt, updatedAt, messageCount, and optional summary, tag, firstPrompt, gitBranch, fileSize.

fork — Fork a session. Copies messages from an existing session to a new one, optionally specifying a truncation point:

// Full copy
let newId = try await sessionStore.fork(sourceSessionId: "my-session")

// Copy only the first 10 messages
let truncatedId = try await sessionStore.fork(
    sourceSessionId: "my-session",
    upToMessageIndex: 10
)

// Specify new session ID
let customId = try await sessionStore.fork(
    sourceSessionId: "my-session",
    newSessionId: "forked-session"
)

delete — Delete an entire session directory:

let deleted = try await sessionStore.delete(sessionId: "my-session")

Additional helper methods include rename (change title) and tag (add tags).

Three Session Recovery Modes

When SessionStore is injected into an Agent, the SDK provides three recovery strategies:

1. Specified sessionId Recovery

The most direct approach: given a session ID, the Agent loads historical messages at startup, prepending them to the messages array:

let agent = createAgent(options: AgentOptions(
    apiKey: apiKey,
    model: "claude-sonnet-4-6",
    sessionStore: sessionStore,
    sessionId: "my-session"      // Specify which session to restore
))

2. continueRecentSession — Auto-Continue Most Recent

When you don't know the session ID, let the SDK automatically find the most recent one:

let agent = createAgent(options: AgentOptions(
    apiKey: apiKey,
    model: "claude-sonnet-4-6",
    sessionStore: sessionStore,
    continueRecentSession: true   // Auto-load most recent session
))

Internally calls sessionStore.list() and takes the first result (already sorted by updatedAt descending).

3. forkSession + resumeSessionAt — Fork and Truncate

Fork a new branch from an existing session, optionally truncating at a specific message:

let agent = createAgent(options: AgentOptions(
    apiKey: apiKey,
    model: "claude-sonnet-4-6",
    sessionStore: sessionStore,
    sessionId: "my-session",
    forkSession: true,                     // Copy to new session
    resumeSessionAt: "msg-uuid-123"        // Truncate at this message
))

SDK parsing order: continueRecentSession determines session ID first, then forkSession creates a fork, then resumeSessionAt truncates history. These three options work independently or in combination.

SessionStore Security Details

SessionStore includes path traversal prevention in session ID validation:

private func validateSessionId(_ sessionId: String) throws {
    guard !sessionId.isEmpty else {
        throw SDKError.sessionError(message: "Session ID must not be empty")
    }
    let forbidden = ["/", "\\", ".."]
    for component in forbidden {
        if sessionId.contains(component) {
            throw SDKError.sessionError(message: "Session ID contains invalid character: '\(component)'")
        }
    }
}

Session IDs cannot contain /, \, or .. — preventing attackers from crafting IDs to read/write unexpected paths.

2. Permission Control: PermissionPolicy

Session persistence solves "remembering." Permission control solves "what's allowed."

Six PermissionModes

The SDK defines 6 permission modes:

Mode	Behavior
`default`	Ask user before each tool execution
`plan`	Read-only tools execute directly; write operations require confirmation
`auto`	Automatically execute all tools except dangerous operations
`acceptEdits`	File edits auto-execute; other operations require confirmation
`dontAsk`	Don't ask user; auto-judge based on context
`bypassPermissions`	Skip all permission checks

let agent = createAgent(options: AgentOptions(
    apiKey: apiKey,
    model: "claude-sonnet-4-6",
    permissionMode: .plan  // Read-only tools run directly; writes need confirmation
))

canUseTool Callback: More Granular Than PermissionMode

permissionMode is a global switch with coarse granularity. For fine-grained control by tool name or properties, use the canUseTool callback:

let agent = createAgent(options: AgentOptions(
    apiKey: apiKey,
    model: "claude-sonnet-4-6",
    permissionMode: .bypassPermissions,
    canUseTool: { tool, input, context in
        if tool.name == "Bash" {
            return CanUseToolResult.deny("Bash is not allowed")
        }
        return nil  // nil means "no opinion, defer to permissionMode"
    }
))

canUseTool returns CanUseToolResult?. Returning nil means the callback has no opinion, passing to the next check. Non-nil results use the callback's decision, ignoring permissionMode.

CanUseToolResult has three factory methods:

CanUseToolResult.allow()                              // Allow
CanUseToolResult.deny("reason")                          // Deny
CanUseToolResult.allowWithInput(modifiedInput)         // Allow but modify input parameters

allowWithInput is rare but practical — you can modify tool input parameters during permission checks, such as redirecting file write paths to a safe directory.

Policy Pattern: Composable Permission Rules

Writing closures is flexible but not reusable. The SDK provides a PermissionPolicy protocol, encapsulating permission judgments as composable policies:

public protocol PermissionPolicy: Sendable {
    func evaluate(
        tool: ToolProtocol,
        input: Any,
        context: ToolContext
    ) async -> CanUseToolResult?
}

Four built-in policies:

ToolNameAllowlistPolicy — Allowlist, only permits specified tools:

let policy = ToolNameAllowlistPolicy(allowedToolNames: ["Read", "Glob", "Grep"])
// Write, Edit, Bash etc. all denied

ToolNameDenylistPolicy — Denylist, rejects specified tools:

let policy = ToolNameDenylistPolicy(deniedToolNames: ["Bash", "Write"])
// Other tools execute normally

ReadOnlyPolicy — Only allows read-only tools (isReadOnly == true):

let policy = ReadOnlyPolicy()
// Read, Glob, Grep, WebSearch etc. allowed
// Write, Edit, Bash etc. denied

CompositePolicy — Combines multiple policies, evaluated in order:

let policy = CompositePolicy(policies: [
    ToolNameDenylistPolicy(deniedToolNames: ["Bash"]),
    ReadOnlyPolicy()
])
// Denylist checked first (Bash denied), then read-only policy

CompositePolicy evaluation rules:

Any sub-policy returning deny causes overall deny (short-circuit)
Sub-policy returning nil (no opinion) is skipped
All sub-policies allow or no opinion results in overall allow

Bridge policies to callbacks with canUseTool(policy:):

let policy = CompositePolicy(policies: [
    ToolNameDenylistPolicy(deniedToolNames: ["Bash"]),
    ReadOnlyPolicy()
])

let agent = createAgent(options: AgentOptions(
    apiKey: apiKey,
    model: "claude-sonnet-4-6",
    permissionMode: .bypassPermissions,
    canUseTool: canUseTool(policy: policy)
))

3. Sandbox Mechanism: SandboxSettings + SandboxChecker

Permission control manages "can this tool execute." Sandbox manages "is this operation within allowed bounds." For example, the Bash tool passes permission checks, but you still need to ensure it won't run rm -rf /.

SandboxSettings Configuration

let sandbox = SandboxSettings(
    // Path control
    allowedReadPaths: ["/project/"],
    allowedWritePaths: ["/project/build/"],
    deniedPaths: ["/etc/", "/var/"],

    // Command control
    deniedCommands: ["rm", "sudo"],           // Denylist
    // allowedCommands: ["git", "swift"],     // Allowlist (choose one with denylist)

    // Behavior control
    allowNestedSandbox: false,
    autoAllowBashIfSandboxed: false,          // Auto-approve Bash when sandbox is active
    allowUnsandboxedCommands: false,
    enableWeakerNestedSandbox: false,

    // Network control
    network: SandboxNetworkConfig(
        allowedDomains: ["api.example.com"],
        allowLocalBinding: false
    )
)

Paths and commands each have two modes:

Paths: allowedReadPaths/allowedWritePaths are allowlists (empty = allow all); deniedPaths is a denylist (higher priority)
Commands: allowedCommands is an allowlist (non-nil restricts to listed commands); deniedCommands is a denylist. allowedCommands takes priority over deniedCommands

SandboxChecker Execution Logic

SandboxChecker is a stateless enum providing isPathAllowed, checkPath, isCommandAllowed, checkCommand static methods. isXxx returns Bool; checkXxx throws SDKError.permissionDenied on failure.

Path checking uses prefix matching with segment boundary guarantees:

// /project/ matches /project/src/file.swift
// /project/ does NOT match /project-backup/file.swift
SandboxChecker.isPathAllowed("/project/src/main.swift", for: .read, settings: sandbox)
// -> true

SandboxChecker.isPathAllowed("/project-backup/old.swift", for: .read, settings: sandbox)
// -> false (segment boundary mismatch)

The key is SandboxPathNormalizer — normalizes paths first (resolving .., ., symlinks), then ensures trailing / for segment boundaries during prefix comparison:

// Path traversal attacks get normalized away
let normalized = SandboxPathNormalizer.normalize("/project/src/../../etc/passwd")
// -> "/etc/passwd", then caught by deniedPaths

Command checking has three phases:

Shell metacharacter detection — identifying bypass patterns like bash -c "cmd", $(cmd), `cmd`
Basename extraction — extracting rm from /usr/bin/rm -rf /tmp
Allowlist/denylist matching

// Denylist has "rm"
SandboxChecker.isCommandAllowed("rm -rf /tmp", settings: blocklist)
// -> false

// Path-form commands are recognized
SandboxChecker.isCommandAllowed("/usr/bin/rm -rf /tmp", settings: blocklist)
// -> false (basename extracted as "rm")

// Backslash bypass
SandboxChecker.isCommandAllowed("\\rm -rf /tmp", settings: blocklist)
// -> false (leading \ removed, gets "rm")

// Quote bypass
SandboxChecker.isCommandAllowed("\"rm\" -rf /tmp", settings: blocklist)
// -> false (quotes removed, gets "rm")

// Subshell bypass
SandboxChecker.isCommandAllowed("bash -c \"rm -rf /tmp\"", settings: blocklist)
// -> false (recursive check of inner command)

Commands that can't be reliably parsed default to denied.

File paths in command arguments are also extracted and checked — if a command references a path in deniedPaths, the command is rejected.

autoAllowBashIfSandboxed

This option bridges sandbox and permission systems. When autoAllowBashIfSandboxed = true, the Bash tool skips canUseTool permission callback checks but still undergoes SandboxChecker.checkCommand() filtering.

The design rationale: if you've configured comprehensive sandbox rules, what Bash can do is already constrained. No need for an additional permission confirmation.

4. Hook System: 20+ Lifecycle Events

The first three systems solve "can it be done." The Hook system solves "know when it's done" and "intervene before it happens."

20+ HookEvents

The SDK defines 24 lifecycle events:

Event	Trigger Timing
`preToolUse`	Before tool execution
`postToolUse`	After successful tool execution
`postToolUseFailure`	After failed tool execution
`sessionStart`	Agent session starts
`sessionEnd`	Agent session ends
`stop`	Agent Loop stops
`subagentStart`	Sub-agent launches
`subagentStop`	Sub-agent completes
`userPromptSubmit`	User submits prompt
`permissionRequest`	Permission check occurs
`permissionDenied`	Permission denied
`taskCreated`	Task created
`taskCompleted`	Task completed
`configChange`	Configuration change
`cwdChanged`	Working directory change
`fileChanged`	File change
`notification`	Notification event
`preCompact`	Before conversation compaction
`postCompact`	After conversation compaction
`teammateIdle`	Team member idle
`setup`	Agent initialization
`worktreeCreate`	Worktree created
`worktreeRemove`	Worktree removed

Function Hooks vs Shell Hooks

Hooks have two implementation approaches: function callbacks and shell commands.

Function Hook — Swift closure, suitable for in-process logic:

await registry.register(.preToolUse, definition: HookDefinition(
    handler: { input in
        // input is HookInput with event, toolName, toolInput, sessionId, etc.
        return HookOutput(message: "Intercepted", block: true)
    }
))

Shell Hook — External command, suitable for integrating non-Swift scripts:

await registry.register(.preToolUse, definition: HookDefinition(
    command: "python3 /path/to/check.py"  // HookInput passed via stdin JSON
))

Shell Hooks execute via ShellHookExecutor: using /bin/bash -c to launch a process, serializing HookInput as JSON to stdin, reading HookOutput JSON from stdout. If stdout isn't valid JSON, it's wrapped as HookOutput(message: stdout).

Shell Hook environment variables include HOOK_EVENT, HOOK_TOOL_NAME, HOOK_SESSION_ID, HOOK_CWD for easy context detection in scripts.

HookRegistry Registration and Execution

HookRegistry is an actor, internally maintaining [HookEvent: [HookDefinition]] mappings:

let registry = HookRegistry()

// Register function Hook
await registry.register(.preToolUse, definition: HookDefinition(
    handler: { input in
        return HookOutput(message: "Bash blocked", block: true)
    },
    matcher: "Bash"  // Only match Bash tool
))

// Register Shell Hook
await registry.register(.postToolUse, definition: HookDefinition(
    command: "/usr/bin/logger 'Tool executed'",
    timeout: 5000  // 5 second timeout
))

// Execute all Hooks registered on an event
let results = await registry.execute(.preToolUse, input: hookInput)
// results: [HookOutput], containing return values from all matching Hooks

matcher filtering: Each HookDefinition can have a matcher (regex). During execution, input.toolName is checked against the matcher; non-matching Hooks are skipped. nil matcher matches all tools.

Timeout handling: Function Hooks use withThrowingTaskGroup for timeout — placing actual execution and Task.sleep in the same TaskGroup, using whichever completes first. Timed-out Hooks don't affect other Hooks. Shell Hooks use DispatchQueue.asyncAfter for timeout, terminating the process when time's up.

Execution order: Hooks on the same event execute serially in registration order.

HookOutput Capabilities

HookOutput can do all of this:

HookOutput(
    message: "Log message",                          // Attached info
    block: true,                                      // Intercept operation
    notification: HookNotification(               // Send notification
        title: "Warning",
        body: "Dangerous operation detected",
        level: .warning
    ),
    permissionUpdate: PermissionUpdate(           // Dynamically modify permissions
        tool: "Bash",
        behavior: .deny
    ),
    systemMessage: "Please operate within sandbox",  // Inject system message
    reason: "Security policy",                         // Interception reason
    updatedInput: ["command": "echo safe"],            // Modify tool input
    decision: .block                                   // Explicit approve/block
)

block: true prevents tool execution, returning an error result to the LLM. permissionUpdate dynamically modifies tool permissions during Hook execution. updatedInput replaces tool input parameters.

5. Practical Combination: Building a Secure Agent

Four subsystems, each with its own role:

SessionStore — Remember conversation history
PermissionPolicy — Control whether tools can execute
SandboxSettings — Limit operational scope
HookRegistry — Audit and intercept

Here's a complete example showing how to combine them:

import Foundation
import OpenAgentSDK

// 1. Create SessionStore
let sessionStore = SessionStore()

// 2. Create HookRegistry, register audit and security interception
let hookRegistry = HookRegistry()

// Log all tool executions
await hookRegistry.register(.postToolUse, definition: HookDefinition(
    handler: { input in
        if let toolName = input.toolName {
            print("[Audit] Tool \(toolName) completed")
        }
        return nil
    }
))

// Intercept dangerous Bash commands
await hookRegistry.register(.preToolUse, definition: HookDefinition(
    handler: { input in
        return HookOutput(
            message: "Bash blocked by security policy",
            block: true,
            decision: .block
        )
    },
    matcher: "Bash"
))

// Log permission denial events
await hookRegistry.register(.permissionDenied, definition: HookDefinition(
    handler: { input in
        print("[Security Alert] Permission denied: \(input.error ?? "unknown")")
        return nil
    }
))

// Session lifecycle tracking
await hookRegistry.register(.sessionStart, definition: HookDefinition(
    handler: { _ in print("[Session] Started"); return nil }
))
await hookRegistry.register(.sessionEnd, definition: HookDefinition(
    handler: { _ in print("[Session] Ended"); return nil }
))

// 3. Configure sandbox: restrict paths and commands
let sandbox = SandboxSettings(
    allowedReadPaths: ["/project/"],
    allowedWritePaths: ["/project/src/", "/project/tests/"],
    deniedPaths: ["/etc/", "/var/", "/tmp/"],
    deniedCommands: ["rm", "sudo", "chmod", "chown"],
    autoAllowBashIfSandboxed: false,
    allowNestedSandbox: false
)

// 4. Configure permission policy: read-only + exclude Bash
let policy = CompositePolicy(policies: [
    ToolNameDenylistPolicy(deniedToolNames: ["Bash"]),
    ReadOnlyPolicy()
])

// 5. Create Agent, inject all components
let agent = createAgent(options: AgentOptions(
    apiKey: "sk-...",
    model: "claude-sonnet-4-6",
    systemPrompt: "You are a code analysis assistant. Read-only, no modifications.",
    maxTurns: 10,
    permissionMode: .bypassPermissions,
    canUseTool: canUseTool(policy: policy),
    sessionStore: sessionStore,
    sessionId: "analysis-session",
    hookRegistry: hookRegistry,
    sandbox: sandbox
))

// 6. Execute query
let result = await agent.prompt("Analyze the Swift source file structure in the project")
print(result.text)

// 7. Resume session later
let resumedAgent = createAgent(options: AgentOptions(
    apiKey: "sk-...",
    model: "claude-sonnet-4-6",
    permissionMode: .bypassPermissions,
    canUseTool: canUseTool(policy: policy),
    sessionStore: sessionStore,
    sessionId: "analysis-session",  // Same session ID, auto-restores history
    hookRegistry: hookRegistry,
    sandbox: sandbox
))

let continued = await resumedAgent.prompt("Continue analyzing test files")
print(continued.text)

This Agent's security features:

Permission layer: CompositePolicy ensures only read-only tools execute, with Bash denied by denylist
Sandbox layer: Even if tools pass permission checks, they're restricted by path — only reading files under /project/, unable to touch /etc/ or /var/
Hook layer: All tool executions are logged (audit), and Bash calls are secondarily intercepted by the preToolUse Hook
Session layer: Conversations auto-saved and restored, continuing previous work after restart

Multi-layer defense benefit: even if one layer has a configuration gap, others provide backup. For example, if you accidentally add Bash to the allowlist, the Hook's matcher still intercepts it. Even if the Hook misses it, the sandbox's command filtering still blocks it.

Summary

SessionStore, PermissionPolicy, SandboxSettings, and HookRegistry — four systems each managing one concern, but combined they form a complete security framework:

SessionStore's actor isolation and session ID validation ensure storage security
PermissionPolicy's composable policies provide flexible permission management
SandboxChecker's path normalization and segment boundary matching prevent directory traversal
HookRegistry's matcher filtering and timeout mechanisms ensure Hook system reliability

The next article covers the SDK's multi-LLM providers: how to simultaneously support Anthropic, OpenAI, and other LLMs, the Provider protocol design, and runtime model switching mechanisms.

Deep Dive into Open Agent SDK (Swift) Series:

Part 0: Open Agent SDK (Swift): Build AI Agent Applications with Native Swift Concurrency
Part 1: Deep Dive into Open Agent SDK (Part 1): Agent Loop Internals
Part 2: Deep Dive into Open Agent SDK (Part 2): Behind the 34 Built-in Tools
Part 3: Deep Dive into Open Agent SDK (Part 3): MCP Integration in Practice
Part 4: Deep Dive into Open Agent SDK (Part 4): Multi-Agent Collaboration
Part 5: Deep Dive into Open Agent SDK (Part 5): Session Persistence and Security
Part 6: Deep Dive into Open Agent SDK (Part 6): Multi-LLM Providers and Runtime Controls

GitHub: terryso/open-agent-sdk-swift

Deep Dive into Open Agent SDK (Part 4): Multi-Agent Collaboration

NEE — Mon, 27 Apr 2026 03:54:59 +0000

A single Agent, no matter how powerful, is just one executor. Real development tasks are often multi-step and multi-role: someone explores the codebase, someone designs a plan, then someone writes code and runs tests. A single Agent working alone easily bloats its context and loses efficiency.

Open Agent SDK addresses this at three levels:

Sub-Agents — The main Agent dynamically spawns sub-agents during execution, delegating specialized tasks
Task System — Tracks progress and results of multi-step work
Team + Messaging — Multiple Agents form a team, communicating via a mailbox system

This article analyzes each level's implementation, then examines how they combine for task orchestration.

1. Sub-Agents: SubAgentSpawner Protocol and AgentTool

SubAgentSpawner Protocol

Sub-agent spawning isn't AgentTool directly creating a new Agent — there's a protocol layer in between. SubAgentSpawner is defined in Types/AgentTypes.swift:

public protocol SubAgentSpawner: Sendable {
    func spawn(
        prompt: String,
        model: String?,
        systemPrompt: String?,
        allowedTools: [String]?,
        maxTurns: Int?
    ) async -> SubAgentResult

    func spawn(
        prompt: String,
        model: String?,
        systemPrompt: String?,
        allowedTools: [String]?,
        maxTurns: Int?,
        disallowedTools: [String]?,
        mcpServers: [AgentMcpServerSpec]?,
        skills: [String]?,
        runInBackground: Bool?,
        isolation: String?,
        name: String?,
        teamName: String?,
        mode: PermissionMode?,
        resume: String?
    ) async -> SubAgentResult
}

Two methods: a basic version (5 parameters) and an enhanced version (13 parameters). The protocol also provides a default implementation where the enhanced version calls the basic one, so existing implementations don't need changes to be compatible.

Why is the spawner defined in Types/ instead of Core/? Because Tools/Advanced/AgentTool.swift needs it, but Tools/ shouldn't import Core/. The protocol is defined in Types/, with concrete implementation in Core/, injected via ToolContext.agentSpawner — a common dependency inversion pattern in the SDK.

DefaultSubAgentSpawner Implementation

DefaultSubAgentSpawner in Core/DefaultSubAgentSpawner.swift does the following:

final class DefaultSubAgentSpawner: SubAgentSpawner, @unchecked Sendable {
    private let apiKey: String
    private let baseURL: String?
    private let parentModel: String
    private let parentTools: [ToolProtocol]
    private let provider: LLMProvider
    private let client: (any LLMClient)?

    func spawn(...) async -> SubAgentResult {
        // 1. Filter out AgentTool to prevent infinite recursion
        var subTools = parentTools.filter { $0.name != "Agent" }

        // 2. If allowedTools specified, filter further
        if let allowed = allowedTools, !allowed.isEmpty {
            let allowedSet = Set(allowed)
            subTools = subTools.filter { allowedSet.contains($0.name) }
        }

        // 3. disallowedTools filters again (higher priority than allowedTools)
        if let disallowed = disallowedTools, !disallowed.isEmpty {
            let disallowedSet = Set(disallowed)
            subTools = subTools.filter { !disallowedSet.contains($0.name) }
        }

        // 4. Create sub-agent and execute
        let options = AgentOptions(
            apiKey: apiKey,
            model: model ?? parentModel,
            systemPrompt: systemPrompt,
            maxTurns: maxTurns ?? 10,
            tools: subTools
        )
        let agent = Agent(options: options)
        let result = await agent.prompt(prompt)

        return SubAgentResult(
            text: result.text.isEmpty
                ? "(Subagent completed with no text output)"
                : result.text,
            toolCalls: [],
            isError: result.status != .success
        )
    }
}

Key points:

Recursion prevention: Sub-agents never receive AgentTool, preventing Agent-in-Agent-in-Agent scenarios
Tool inheritance: Sub-agents inherit all parent tools (except AgentTool) by default, but can be restricted via allowedTools/disallowedTools
Blocking execution: The parent Agent awaits after calling spawn(), waiting for the sub-agent to finish before continuing

AgentTool: The Sub-Agent Tool as Seen by the LLM

AgentTool is the tool exposed to the LLM. When the LLM calls the Agent tool, it passes a prompt and parameters. AgentTool handles calling the spawner to generate a sub-agent.

It has two built-in sub-agent types:

private let BUILTIN_AGENTS: [String: AgentDefinition] = [
    "Explore": AgentDefinition(
        name: "Explore",
        description: "Fast agent specialized for exploring codebases...",
        systemPrompt: "You are a codebase exploration agent...",
        tools: ["Read", "Glob", "Grep", "Bash"],
        maxTurns: 10
    ),
    "Plan": AgentDefinition(
        name: "Plan",
        description: "Software architect agent for designing implementation plans...",
        systemPrompt: "You are a software architect. Design implementation plans...",
        tools: ["Read", "Glob", "Grep", "Bash"],
        maxTurns: 10
    ),
]

Explore: Codebase exploration, using Glob to find files, Grep to search content, Read to read files
Plan: Software architect, understanding the codebase then outputting implementation plans

When the LLM calls AgentTool, it specifies the type via the subagent_type field:

{
  "prompt": "Explore the project structure and find all Swift source files",
  "description": "Explore codebase",
  "subagent_type": "Explore"
}

AgentTool also supports optional parameters: model, maxTurns, run_in_background, isolation, team_name, mode. These are passed through to the spawner.

A Complete Example

The SDK includes a SubagentExample demonstrating the full flow of a coordinator main Agent delegating to an Explore sub-agent:

// Main agent system prompt
let systemPrompt = """
You are a coordinator agent. When given a task, you should delegate it to a sub-agent \
using the Agent tool. The Agent tool will spawn a specialized agent (e.g., "Explore" type) \
that can use Read, Glob, Grep, and Bash tools to investigate the codebase. \
After the sub-agent returns its findings, summarize the results for the user.
"""

// Register tools: core tools + AgentTool
let agent = createAgent(options: AgentOptions(
    apiKey: apiKey,
    model: defaultModel,
    systemPrompt: systemPrompt,
    maxTurns: 10,
    tools: getAllBaseTools(tier: .core) + [createAgentTool()]
))

// Send task — main Agent will call AgentTool to delegate to Explore sub-agent
for await message in agent.stream("""
    Explore the current project directory. Find all Swift source files, \
    examine the project structure, and provide a summary. \
    Use the Agent tool to delegate this task to an Explore sub-agent.
""") {
    switch message {
    case .toolUse(let data):
        if data.toolName == "Agent" {
            print("[Sub-agent Delegation: \(data.toolName)]")
        }
    case .toolResult(let data):
        print("[Result: \(data.content.prefix(200))]")
    case .result(let data):
        print("Turns: \(data.numTurns), Cost: $\(data.totalCostUsd)")
    default:
        break
    }
}

Execution flow: user sends prompt → main Agent decides it needs to explore the codebase → calls AgentTool → AgentTool spawns Explore sub-agent via spawner → sub-agent uses Glob/Grep/Read → results returned to main Agent → main Agent summarizes and responds to user.

2. Task System: Task Tracking and State Machine

Sub-agents solve the "who does the work" problem. The Task system solves "how much work is done, who's doing it, and what are the results."

TaskStore: Thread-Safe Actor

TaskStore is a Swift Actor, ensuring concurrency safety:

public actor TaskStore {
    private var tasks: [String: Task] = [:]
    private var taskCounter: Int = 0

    public func create(
        subject: String,
        description: String? = nil,
        owner: String? = nil,
        status: TaskStatus = .pending
    ) -> Task {
        taskCounter += 1
        let id = "task_\(taskCounter)"
        let now = dateFormatter.string(from: Date())
        let task = Task(
            id: id, subject: subject, description: description,
            status: status, owner: owner,
            createdAt: now, updatedAt: now
        )
        tasks[id] = task
        return task
    }
}

Using an Actor instead of a regular class means all methods are implicitly serialized — no manual locking needed. Multiple Agents creating tasks simultaneously won't cause race conditions.

Task State Machine

Tasks have 5 states with clear transition rules:

public enum TaskStatus: String, Sendable, Equatable, Codable {
    case pending      // Waiting to start
    case inProgress   // In progress
    case completed    // Completed
    case failed       // Failed
    case cancelled    // Cancelled
}

State transitions have constraints: pending and inProgress can transition to any state, but completed, failed, and cancelled are terminal states that cannot change:

private func isValidTransition(from: TaskStatus, to: TaskStatus) -> Bool {
    switch from {
    case .pending, .inProgress:
        return true
    case .completed, .failed, .cancelled:
        return false  // Terminal state, cannot transition
    }
}

As a state diagram:

pending ──→ inProgress ──→ completed
   │            │
   │            ├──→ failed
   │            │
   └──→ cancelled ←──┘

TaskStatus also has a convenient parse() method supporting both camelCase (inProgress) and snake_case (in_progress), since LLM JSON formats aren't always consistent:

public static func parse(_ string: String) -> TaskStatus? {
    if let direct = TaskStatus(rawValue: string) { return direct }
    // snake_case → camelCase
    let camel = string
        .split(separator: "_")
        .enumerated()
        .map { $0.offset == 0 ? String($0.element) : String($0.element).capitalized }
        .joined()
    return TaskStatus(rawValue: camel)
}

Task Struct

A Task instance includes dependency relationships and metadata beyond basic status tracking:

public struct Task: Sendable, Equatable, Codable {
    public let id: String
    public var subject: String
    public var description: String?
    public var status: TaskStatus
    public var owner: String?        // Who's working on it
    public let createdAt: String
    public var updatedAt: String
    public var output: String?       // Result
    public var blockedBy: [String]?  // Blocked by which tasks
    public var blocks: [String]?     // Which tasks this blocks
    public var metadata: [String: String]?
}

The blockedBy and blocks fields show the Task system has built-in task dependency support — Task A can declare "I need Tasks B and C to complete before I can start."

Three Task Tools

The SDK provides three tools for the LLM to operate the Task system:

TaskCreate — Create a task:

public func createTaskCreateTool() -> ToolProtocol {
    return defineTool(
        name: "TaskCreate",
        description: "Create a new task for tracking work progress.",
        inputSchema: taskCreateSchema,
        isReadOnly: false
    ) { (input: TaskCreateInput, context: ToolContext) in
        guard let taskStore = context.taskStore else {
            return ToolExecuteResult(content: "Error: TaskStore not available.", isError: true)
        }
        let initialStatus: TaskStatus = input.status.flatMap { TaskStatus.parse($0) } ?? .pending
        let task = await taskStore.create(
            subject: input.subject,
            description: input.description,
            owner: input.owner,
            status: initialStatus
        )
        return ToolExecuteResult(
            content: "Task created: \(task.id) - \"\(task.subject)\" (\(task.status.rawValue))",
            isError: false
        )
    }
}

TaskList — List tasks (supports filtering by status and owner):

// LLM can query "list all pending tasks" or "list tasks assigned to agent-1"
let tasks = await taskStore.list(status: status, owner: input.owner)

TaskUpdate — Update a task (status, description, owner, output):

do {
    let task = try await taskStore.update(
        id: input.id,
        status: status,
        description: input.description,
        owner: input.owner,
        output: input.output
    )
    return ToolExecuteResult(
        content: "Task updated: \(task.id) - \(task.status.rawValue) - \"\(task.subject)\"",
        isError: false
    )
} catch let error as TaskStoreError {
    return ToolExecuteResult(content: "Error: \(error.localizedDescription)", isError: true)
}

Note that TaskUpdate throws invalidStatusTransition errors — e.g., trying to change a completed task to inProgress. The LLM receives the error message and can adjust its strategy.

3. Team System: Team Formation and Management

The Task system tracks "what to do." The Team system answers "who works with whom."

TeamStore

Like TaskStore, TeamStore is an Actor:

public actor TeamStore {
    private var teams: [String: Team] = [:]
    private var teamCounter: Int = 0

    public func create(
        name: String,
        members: [TeamMember] = [],
        leaderId: String = "self"
    ) -> Team {
        teamCounter += 1
        let id = "team_\(teamCounter)"
        let team = Team(
            id: id, name: name, members: members,
            leaderId: leaderId,
            createdAt: dateFormatter.string(from: Date()),
            status: .active
        )
        teams[id] = team
        return team
    }
}

Teams have two states: active and disbanded. Deleting a Team doesn't actually delete it — the status changes to disbanded. Disbanded Teams cannot have members added or removed.

TeamMember and Roles

public enum TeamRole: String, Sendable, Equatable, Codable {
    case leader   // Team leader
    case member   // Regular member
}

public struct TeamMember: Sendable, Equatable, Codable {
    public let name: String
    public let role: TeamRole
}

When TeamCreateTool creates a Team, all members default to member role, and leaderId defaults to "self" (the creator):

let members: [TeamMember] = input.members?.map { TeamMember(name: $0) } ?? []
let team = await teamStore.create(
    name: input.name,
    members: members,
    leaderId: "self"
)

TeamStore also supports dynamic member management:

// Add member
try teamStore.addMember(teamId: "team_1", member: TeamMember(name: "agent-coder"))

// Remove member
try teamStore.removeMember(teamId: "team_1", agentName: "agent-coder")

// Find which team an Agent belongs to
let team = await teamStore.getTeamForAgent(agentName: "agent-coder")

getTeamForAgent is important for messaging — when sending a message, you need to know which Team the sender belongs to in order to verify the recipient is a teammate.

AgentRegistry: Agent Registration

Besides TeamStore, there's an AgentRegistry tracking all active Agents:

public actor AgentRegistry {
    private var agents: [String: AgentRegistryEntry] = [:]
    private var nameIndex: [String: String] = [:]  // name -> agentId

    public func register(agentId: String, name: String, agentType: String) throws -> AgentRegistryEntry {
        if nameIndex[name] != nil {
            throw AgentRegistryError.duplicateAgentName(name: name)
        }
        let entry = AgentRegistryEntry(...)
        agents[agentId] = entry
        nameIndex[name] = agentId
        return entry
    }

    public func getByName(name: String) -> AgentRegistryEntry? {
        guard let agentId = nameIndex[name] else { return nil }
        return agents[agentId]
    }
}

Name uniqueness constraint — no two Agents with the same name in one AgentRegistry. nameIndex is a reverse lookup index supporting O(1) name lookups.

4. Messaging: MailboxStore and SendMessage

With Teams in place, Agents need to communicate. The SDK uses a Mailbox pattern — messages aren't pushed directly to the recipient but placed in their mailbox for them to pick up.

MailboxStore

public actor MailboxStore {
    private var mailboxes: [String: [AgentMessage]] = [:]

    // Point-to-point send
    public func send(from: String, to: String, content: String, type: AgentMessageType = .text) {
        let message = AgentMessage(from: from, to: to, content: content,
                                   timestamp: dateFormatter.string(from: Date()), type: type)
        if mailboxes[to] == nil { mailboxes[to] = [] }
        mailboxes[to]?.append(message)
    }

    // Broadcast — to all Agents with mailboxes
    public func broadcast(from: String, content: String, type: AgentMessageType = .text) {
        let timestamp = dateFormatter.string(from: Date())
        for (agentName, _) in mailboxes {
            let message = AgentMessage(from: from, to: agentName, content: content,
                                       timestamp: timestamp, type: type)
            mailboxes[agentName]?.append(message)
        }
    }

    // Read and clear mailbox
    public func read(agentName: String) -> [AgentMessage] {
        guard let messages = mailboxes[agentName] else { return [] }
        mailboxes[agentName] = []  // Clear after reading
        return messages
    }
}

Three core operations: send (point-to-point), broadcast (broadcast), read (read). read is destructive — reading clears the mailbox. broadcast only sends to Agents that already have mailboxes, not creating new ones.

Message types beyond plain text (.text) include .shutdownRequest, .shutdownResponse, .planApprovalResponse — special types for team management coordination.

SendMessage Tool

SendMessageTool performs three layers of validation:

// 1. Must have MailboxStore
guard let mailboxStore = context.mailboxStore else { ... }
// 2. Must have TeamStore
guard let teamStore = context.teamStore else { ... }
// 3. Must know who the sender is
guard let senderName = context.senderName else { ... }

// 4. Sender must be in a Team
guard let team = await teamStore.getTeamForAgent(agentName: senderName) else { ... }

// 5. Recipient must be a teammate
let isMember = team.members.contains { $0.name == input.to }
guard isMember else { ... }

Broadcast uses "*" as recipient:

{ "to": "*", "message": "Phase 1 complete, starting Phase 2." }

Point-to-point uses a specific name:

{ "to": "agent-coder", "message": "Here's the spec for module A." }

Failed validations return error messages. The LLM can see which members are available and adjust the target.

5. Orchestration Patterns: Combining These Capabilities

Individual Agent, Task, Team, and Mailbox capabilities are clear. How do they combine in practice?

Pattern 1: Main Agent + Parallel Sub-Agents

The simplest pattern. The main Agent receives a complex task and launches multiple sub-agents simultaneously, each handling a portion:

let agent = createAgent(options: AgentOptions(
    apiKey: apiKey,
    model: "claude-sonnet-4-6",
    systemPrompt: """
    You are a coordinator. Break complex tasks into subtasks, \
    delegate each to an Explore sub-agent, then synthesize results.
    """,
    maxTurns: 20,
    tools: getAllBaseTools(tier: .core) + [
        createAgentTool(),
        createTaskCreateTool(),
        createTaskUpdateTool(),
        createTaskListTool()
    ],
    taskStore: TaskStore()
))

The LLM might orchestrate like this:

TaskCreate("Analyze module A") — Create task
Agent(prompt: "Analyze module A", subagent_type: "Explore") — Delegate to sub-agent
TaskUpdate(id: "task_1", status: "completed", output: result) — Mark complete
Repeat steps 1-3 for other modules
Synthesize all results

Pattern 2: Team Collaboration + Messaging

When multiple Agents need to collaborate long-term, use Team + Mailbox:

let mailboxStore = MailboxStore()
let teamStore = TeamStore()

let agent = createAgent(options: AgentOptions(
    apiKey: apiKey,
    model: "claude-sonnet-4-6",
    agentName: "coordinator",
    mailboxStore: mailboxStore,
    teamStore: teamStore,
    tools: getAllBaseTools(tier: .core) + [
        createAgentTool(),
        createTeamCreateTool(),
        createTeamDeleteTool(),
        createSendMessageTool(),
        createTaskCreateTool(),
        createTaskListTool(),
        createTaskUpdateTool()
    ]
))

The LLM's orchestration might look like:

TeamCreate(name: "refactor-team", members: ["explorer", "planner", "coder"]) — Form team
TaskCreate("Explore codebase", owner: "explorer") — Create task
Agent(prompt: "...", name: "explorer", subagent_type: "Explore") — Launch explore agent
SendMessage(to: "planner", message: "Exploration done, here's the summary...") — Notify planner
TaskCreate("Write implementation plan", owner: "planner") — Next task
Continue progressing...

Pattern 3: Work Queue

Use the Task system as a work queue. The main Agent creates a batch of tasks, and sub-agents claim and execute them one by one:

Main Agent:
  TaskCreate("Fix bug #1")     → task_1 (pending)
  TaskCreate("Fix bug #2")     → task_2 (pending)
  TaskCreate("Add feature X")  → task_3 (pending)

Sub-Agent A:
  TaskList(status: "pending")       → [task_1, task_2, task_3]
  TaskUpdate(task_1, status: "in_progress", owner: "agent-a")
  ... do work ...
  TaskUpdate(task_1, status: "completed", output: "Fixed by ...")

Sub-Agent B:
  TaskList(status: "pending")       → [task_2, task_3]
  TaskUpdate(task_2, status: "in_progress", owner: "agent-b")
  ... do work ...

TaskStore is an Actor, so multiple Agents concurrently updating the same task won't cause issues (first-come-first-served), but there's no automatic assignment — the LLM coordinates who claims which task.

Design Trade-offs

This multi-agent collaboration mechanism involves several deliberate design choices:

Why can't sub-agents spawn their own sub-agents? DefaultSubAgentSpawner filters out AgentTool when creating sub-agents. This is an intentional limit — without it, an Agent spawning an Agent spawning an Agent leads to uncontrollable recursion depth and exponential token consumption.

Why is messaging pull-based instead of push-based? MailboxStore.read() is destructive reading — Agents must actively call to receive messages. This is much simpler than push mode — no callbacks to maintain, no handling for offline Agents. The trade-off is reduced real-time responsiveness, but at the frequency of tool calls in the Agent Loop (tools can be called every turn), pull latency is acceptable.

Why doesn't the Task state machine auto-transition? The blockedBy field declares dependency relationships, but TaskStore.update() doesn't automatically check whether prerequisite tasks are complete. This means "wait for Task A before doing Task B" logic must be implemented by the LLM — calling TaskList to check status, then deciding next steps. This is a pragmatic trade-off: automatic dependency resolution could be added, but for the LLM, explicit checking is more controllable.

Summary

Open Agent SDK's multi-agent collaboration consists of three layers:

Sub-Agents: Via SubAgentSpawner protocol and AgentTool, the main Agent dynamically spawns sub-agents at runtime for task delegation, with built-in Explore and Plan types
Task System: Task tracking based on TaskStore Actor with a clear state machine (pending → inProgress → completed/failed/cancelled), where terminal states are irreversible
Team + Mailbox: TeamStore manages teams and members, MailboxStore implements mailbox-style messaging, supporting point-to-point and broadcast

All three layers can be used independently or combined — use Tasks to track progress, Teams to organize members, Mailbox for coordination, and sub-agents to execute the actual work.

The next article covers the SDK's session persistence: how Agent conversation history is stored, restored, and how to continue previous work after a restart.

Deep Dive into Open Agent SDK (Swift) Series:

Part 0: Open Agent SDK (Swift): Build AI Agent Applications with Native Swift Concurrency
Part 1: Deep Dive into Open Agent SDK (Part 1): Agent Loop Internals
Part 2: Deep Dive into Open Agent SDK (Part 2): Behind the 34 Built-in Tools
Part 3: Deep Dive into Open Agent SDK (Part 3): MCP Integration in Practice
Part 4: Deep Dive into Open Agent SDK (Part 4): Multi-Agent Collaboration
Part 5: Deep Dive into Open Agent SDK (Part 5): Session Persistence and Security
Part 6: Deep Dive into Open Agent SDK (Part 6): Multi-LLM Providers and Runtime Controls

GitHub: terryso/open-agent-sdk-swift

Deep Dive into Open Agent SDK (Part 3): MCP Integration in Practice

NEE — Mon, 27 Apr 2026 03:54:53 +0000

The previous article looked at the SDK's 34 built-in tools — file read/write, Bash execution, code search — covering common development scenarios. But an Agent's capabilities can't rely solely on built-in tools. You need to connect to databases, call enterprise APIs, and operate internal systems. These require a standardized integration approach.

MCP (Model Context Protocol) is designed for exactly this. This article examines how Open Agent SDK uses the MCP protocol to bring external tools into the Agent Loop.

What Is the MCP Protocol

MCP is an open protocol proposed by Anthropic that defines communication standards between LLM applications and external tools/data sources. The concept:

Tool side (MCP Server) exposes a set of tools, each with a name, description, and input schema
Calling side (MCP Client) discovers tools, invokes them, and gets results through a standard protocol
Communication is based on JSON-RPC with swappable transports

Why does an Agent need this? Because it's impossible to build all tools into the SDK. With MCP, anyone can write an MCP Server (e.g., @modelcontextprotocol/server-filesystem), and any Agent can connect — no SDK code changes needed, no adapter to write, just one line of configuration.

Open Agent SDK's MCP integration has two paths:

External MCP servers: Connect to third-party MCP Servers via stdio/HTTP/SSE, running the full MCP protocol
In-process MCP servers: Use InProcessMCPServer to wrap SDK tools as an MCP Server with zero protocol overhead

Let's examine each.

Five Transport Configurations

The SDK uses the McpServerConfig enum to unify all transport methods:

public enum McpServerConfig: Sendable, Equatable {
    case stdio(McpStdioConfig)       // Child process stdin/stdout
    case sse(McpTransportConfig)     // Server-Sent Events
    case http(McpTransportConfig)    // HTTP POST
    case sdk(McpSdkServerConfig)     // In-process, zero overhead
    case claudeAIProxy(McpClaudeAIProxyConfig) // ClaudeAI proxy
}

Stdio: Launching Child Processes

The most common approach. The Agent launches a child process and exchanges JSON-RPC messages via stdin/stdout. Suitable for MCP Servers written in Node.js/Python:

let servers: [String: McpServerConfig] = [
    "filesystem": .stdio(McpStdioConfig(
        command: "npx",
        args: ["-y", "@modelcontextprotocol/server-filesystem", "/tmp"]
    )),
    "git": .stdio(McpStdioConfig(
        command: "uvx",
        args: ["mcp-server-git"],
        env: ["GIT_REPO_PATH": "/my/repo"]
    ))
]

MCPStdioTransport internally uses Foundation's Process to launch child processes and FileDescriptor for low-level I/O. A few details:

Command resolution: If the command isn't an absolute path, it's looked up via which first. Falls back to treating it as a file path if not found
Message delimiting: Each JSON-RPC message is delimited by newlines, with CRLF support
Security filtering: CODEANY_API_KEY is not passed to child processes by default unless explicitly specified in env
Reconnection: MCPClient is configured with up to 2 automatic retries, initial interval 1 second, exponential backoff up to max 10 seconds

SSE and HTTP: Connecting to Remote Services

Remote MCP Servers connect via HTTP, in two modes:

// SSE mode (long connection, server push)
let sseServer: [String: McpServerConfig] = [
    "remote-tools": .sse(McpTransportConfig(
        url: "https://mcp.example.com/sse",
        headers: ["Authorization": "Bearer token123"]
    ))
]

// HTTP mode (request-response)
let httpServer: [String: McpServerConfig] = [
    "api-tools": .http(McpTransportConfig(
        url: "https://mcp.example.com/api"
    ))
]

SSE is for scenarios requiring server push; HTTP for simple request-response. Both use HTTPClientTransport internally, differing in the streaming parameter. McpSseConfig and McpHttpConfig are actually type aliases for McpTransportConfig:

public typealias McpSseConfig = McpTransportConfig
public typealias McpHttpConfig = McpTransportConfig

SDK: In-Process Zero Overhead

No network protocol at all — tools are registered directly in-process. Covered in detail in section six below.

ClaudeAI Proxy

Connects to ClaudeAI's proxy endpoint, using server ID for authentication:

let proxyServer: [String: McpServerConfig] = [
    "claude-tools": .claudeAIProxy(McpClaudeAIProxyConfig(
        url: "https://claudeai.example.com/proxy",
        id: "server-abc-123"
    ))
]

Internally, this is just HTTP transport with an added X-ClaudeAI-Server-ID header.

Connection Flow: From Configuration to Tool Pool

How does the Agent merge MCP tools into its own tool pool? Tracing from assembleFullToolPool():

func assembleFullToolPool() async -> ([ToolProtocol], MCPClientManager?) {
    let baseTools = options.tools ?? []

    guard let mcpServers = options.mcpServers, !mcpServers.isEmpty else {
        return (baseTools, nil)
    }

    // Step 1: Separate SDK configs from external configs
    let (sdkTools, externalServers) = await Self.processMcpConfigs(mcpServers)

    // Step 2: Connect to external MCP servers
    var externalTools: [ToolProtocol] = []
    var manager: MCPClientManager? = nil

    if !externalServers.isEmpty {
        let mcpManager = MCPClientManager()
        await mcpManager.connectAll(servers: externalServers)
        externalTools = await mcpManager.getMCPTools()
        manager = mcpManager
    }

    // Step 3: Merge all tools
    let allMCPTools = sdkTools + externalTools
    let pool = assembleToolPool(
        baseTools: getAllBaseTools(tier: .core) + getAllBaseTools(tier: .specialist),
        customTools: baseTools,
        mcpTools: allMCPTools,
        allowed: options.allowedTools,
        disallowed: options.disallowedTools
    )

    return (pool, manager)
}

Three steps:

1. Separate configurations. processMcpConfigs() splits .sdk configs from external configs (stdio/sse/http). SDK configs directly extract tools from InProcessMCPServer, adding namespace prefixes via SdkToolWrapper; external configs are left for MCPClientManager.

2. Connect to external servers. MCPClientManager is an actor that uses withTaskGroup to connect to all servers concurrently. Each connection goes through four steps:

Create Transport → Start Connection → MCP Handshake (initialize) → listTools() to discover tools

Discovered tools are wrapped as MCPToolDefinition — a struct conforming to ToolProtocol. Tool names follow the mcp__{serverName}__{toolName} format to avoid conflicts with built-in tools. For example, the read_file tool on the filesystem server becomes mcp__filesystem__read_file.

3. Assemble tool pool. MCP tools merge with built-in and custom tools, pass through allowedTools/disallowedTools filtering, and form the final tool pool. The LLM sees the filtered complete tool list.

Complete end-to-end usage code:

let agent = createAgent(options: AgentOptions(
    apiKey: "sk-...",
    model: "claude-sonnet-4-6",
    permissionMode: .bypassPermissions,
    mcpServers: [
        "filesystem": .stdio(McpStdioConfig(
            command: "npx",
            args: ["-y", "@modelcontextprotocol/server-filesystem", "/tmp"]
        ))
    ]
))

// Agent Loop auto-connects MCP servers, discovers tools, merges into tool pool on startup
let result = await agent.prompt("List all files in /tmp and read the first one")

Runtime Management

MCP servers aren't just connect-and-forget. During runtime, you may need to check status, reconnect, toggle, or even dynamically swap server sets. The SDK provides four methods.

Check Status: mcpServerStatus()

let status = await agent.mcpServerStatus()
for (name, info) in status {
    print("\(name): \(info.status.rawValue)")  // connected / failed / pending / disabled / needsAuth
    print("  tools: \(info.tools)")             // ["read_file", "write_file", ...]
    if let error = info.error {
        print("  error: \(error)")
    }
}

McpServerStatus has five states (aligned with the TypeScript SDK):

State	Meaning
`connected`	Connected, tools available
`failed`	Connection failed
`pending`	Connecting
`disabled`	Disabled by user
`needsAuth`	Requires authentication

Reconnect: reconnectMcpServer()

Manually reconnect a server after network issues or server restarts:

try await agent.reconnectMcpServer(name: "filesystem")

Implementation: disconnect old connection → clean up state → re-run the connection flow with the initial config. MCPClientManager saves the original config at first connection (originalConfigs), using it directly for reconnection.

Toggle: toggleMcpServer()

Temporarily disable a server (disconnect but keep config), can be re-enabled later:

// Disable
try await agent.toggleMcpServer(name: "filesystem", enabled: false)

// Re-enable
try await agent.toggleMcpServer(name: "filesystem", enabled: true)

Dynamic Replacement: setMcpServers()

Replace the entire MCP server set at runtime. The SDK diffs: new ones get connected, removed ones get disconnected, changed ones get reconnected:

let result = try await agent.setMcpServers([
    "filesystem": .stdio(McpStdioConfig(
        command: "npx",
        args: ["-y", "@modelcontextprotocol/server-filesystem", "/data"]
    )),
    "database": .stdio(McpStdioConfig(
        command: "python3",
        args: ["-m", "my_db_server"]
    ))
])

print("Added: \(result.added)")      // ["database"]
print("Removed: \(result.removed)")  // Previously existing but now absent
print("Errors: \(result.errors)")    // Failed connections

MCPClientManager.setServers() diff logic:

public func setServers(_ servers: [String: McpServerConfig]) async -> McpServerUpdateResult {
    let existingNames = Set(originalConfigs.keys)
    let newNames = Set(servers.keys)

    let addedNames = newNames.subtracting(existingNames)
    let removedNames = existingNames.subtracting(newNames)

    // Changed configs are treated as remove + add
    let changedNames = newNames.intersection(existingNames).filter { name in
        originalConfigs[name] != servers[name]
    }

    let effectiveAdded = addedNames.union(changedNames)
    // ...execute connections and disconnections
}

Removes unneeded servers first, then connects new and changed ones. Changed servers are completely rebuilt, not hot-updated. This matters for long-running Agent applications — you can adjust MCP configuration without restarting the Agent.

MCP Resources: Beyond Tools

The MCP protocol includes resources in addition to tools. Tools "do things"; resources "read data" — e.g., a database MCP Server can expose a query tool alongside a tables resource letting the Agent see which tables exist.

The SDK has two resource-related tools built in: ListMcpResources and ReadMcpResource.

ListMcpResources

Lists available resources from all connected MCP servers:

// Tool description as seen by LLM:
// "List available resources from connected MCP servers.
//  Resources can include files, databases, and other data sources."

// Optional parameter: server — filter by server name

Internal implementation queries each connection via the MCPResourceProvider protocol:

public protocol MCPResourceProvider: Sendable {
    func listResources() async -> [MCPResourceItem]?
    func readResource(uri: String) async throws -> MCPReadResult
}

Resources are represented as MCPResourceItem — with name, description, and URI.

ReadMcpResource

Reads the content of a specified URI resource:

// LLM sees the tool:
// "Read a specific resource from an MCP server."
// Parameters: server (server name), uri (resource URI)

Both tools are read-only, accessing connection info through ToolContext.mcpConnections — no global variables, thread-safe.

In-Process MCP: InProcessMCPServer

InProcessMCPServer is a unique design in the SDK. It lets you create tools with defineTool(), then wrap them as an MCP Server — but without actually running the MCP protocol.

Why? Because sometimes you just want to add your own tools to the Agent's tool pool without cross-process communication. Calling a function directly is far more efficient than going through JSON-RPC serialization.

Basic Usage

// Create tool with defineTool
struct WeatherInput: Codable {
    let city: String
}

let weatherTool = defineTool(
    name: "get_weather",
    description: "Get the current weather for a given city.",
    inputSchema: [
        "type": "object",
        "properties": [
            "city": ["type": "string", "description": "The city name"]
        ],
        "required": ["city"]
    ],
    isReadOnly: true
) { (input: WeatherInput, context: ToolContext) -> String in
    let data: [String: String] = [
        "Beijing": "Sunny, 22C",
        "Tokyo": "Cloudy, 18C",
    ]
    return data[input.city] ?? "No data for \(input.city)"
}

// Wrap as InProcessMCPServer
let server = InProcessMCPServer(
    name: "weather",       // Tool name will be mcp__weather__get_weather
    version: "1.0.0",
    tools: [weatherTool],
    cwd: "/tmp"
)

// Generate config via asConfig(), inject into Agent
let agent = createAgent(options: AgentOptions(
    apiKey: "sk-...",
    model: "claude-sonnet-4-6",
    mcpServers: ["weather": await server.asConfig()]
))

Internal Implementation

InProcessMCPServer is an actor with two working modes:

SDK internal mode (common): When processMcpConfigs() detects a .sdk config, it calls server.getTools() directly and adds namespace prefixes via SdkToolWrapper. Throughout this process, the tool's call() method is invoked directly with zero serialization overhead:

private struct SdkToolWrapper: ToolProtocol, Sendable {
    let serverName: String
    let innerTool: ToolProtocol

    var name: String { "mcp__\(serverName)__\(innerTool.name)" }

    func call(input: Any, context: ToolContext) async -> ToolResult {
        return await innerTool.call(input: input, context: context)
    }
}

Note that SdkToolWrapper's call() directly forwards to innerTool — no JSON-RPC, no Value conversion, just a direct function call.

External client mode: If an external MCP Client wants to connect in, createSession() creates an InMemoryTransport pair, running the full MCP handshake. Protocol overhead exists only in this scenario:

public func createSession() async throws -> (Server, InMemoryTransport) {
    let mcpServer = await getOrCreateMCPServer()
    let session = await mcpServer.createSession()
    let (clientTransport, serverTransport) = await InMemoryTransport.createConnectedPair()
    try await session.start(transport: serverTransport)
    return (session, clientTransport)
}

InProcessMCPServer internally maintains an MCPServer instance (lazy-loaded). When registering tools, each ToolProtocol's call() is wrapped as an MCP handler closure — handling parameter format conversion ([String: Value] to [String: Any]), building ToolContext, and processing error results.

Caveats

Naming restriction: Server name cannot contain __ (double underscore), as it would conflict with the namespace prefix mcp__{server}__{tool}. A precondition check exists in the constructor.
Error handling: When a tool returns isError: true, the MCP layer throws ToolExecutionError, causing the MCP protocol to return isError: true.
Tool registration failure: Triggers assertionFailure, indicating a code bug (e.g., duplicate tool names).

Complete Example: Multi-Tool MCP Server

This is the core of the AdvancedMCPExample, demonstrating multi-tool registration and error handling:

// Weather tool — returns String
let weatherTool = defineTool(
    name: "get_weather",
    description: "Get the current weather for a given city.",
    inputSchema: [
        "type": "object",
        "properties": [
            "city": ["type": "string", "description": "The city name"]
        ],
        "required": ["city"]
    ],
    isReadOnly: true
) { (input: WeatherInput, context: ToolContext) -> String in
    let data: [String: String] = [
        "Beijing": "Sunny, 22C, humidity 45%",
        "Tokyo": "Cloudy, 18C, humidity 65%",
    ]
    return data[input.city] ?? "No data for \(input.city)"
}

// Email validation — returns ToolExecuteResult with error handling
let validationTool = defineTool(
    name: "validate_email",
    description: "Validate an email address.",
    inputSchema: [
        "type": "object",
        "properties": [
            "email": ["type": "string", "description": "The email address"]
        ],
        "required": ["email"]
    ],
    isReadOnly: true
) { (input: ValidateInput, context: ToolContext) -> ToolExecuteResult in
    if !input.email.contains("@") {
        return ToolExecuteResult(
            content: "Invalid email: '\(input.email)' missing '@'",
            isError: true
        )
    }
    return ToolExecuteResult(content: "Email '\(input.email)' is valid.", isError: false)
}

// Package as MCP server
let utilityServer = InProcessMCPServer(
    name: "utility",
    version: "1.0.0",
    tools: [weatherTool, validationTool],
    cwd: "/tmp"
)

// Create Agent
let agent = createAgent(options: AgentOptions(
    apiKey: apiKey,
    model: "claude-sonnet-4-6",
    systemPrompt: "You have weather and email validation tools.",
    permissionMode: .bypassPermissions,
    mcpServers: ["utility": await utilityServer.asConfig()]
))

// LLM will automatically call mcp__utility__get_weather or mcp__utility__validate_email
let result = await agent.prompt("Check weather in Tokyo and validate test@example.com")
print(result.text)

When a tool returns an error, the Agent doesn't crash. The error message is fed back to the LLM, which sees it and adjusts strategy — for example, telling the user the email format is invalid.

Practical Recommendations

Choose the right transport. Use InProcessMCPServer (SDK mode) for in-process tools, stdio for external local tools, and HTTP/SSE for remote tools. Don't use stdio to connect to remote services or HTTP for local CLI tools.

Naming conventions. MCP tool names follow the mcp__{server}__{tool} three-segment format. Keep server names short and meaningful, avoiding double underscores. filesystem is better than fs-tools-v2, because the LLM can directly guess the meaning of mcp__filesystem__read_file.

Error tolerance. MCPClientManager connection failures don't crash the Agent — failed servers get their status marked as error, contributing zero tools. The Agent Loop continues running, just without those tools. Design your system with the same principle: degrade gracefully when external services are unavailable, don't crash entirely.

Use runtime management. Long-running Agent applications should check mcpServerStatus() after startup, retrying failures with reconnectMcpServer(). Use setMcpServers() for dynamic adjustments rather than rebuilding the Agent.

Deep Dive into Open Agent SDK (Swift) Series:

Part 0: Open Agent SDK (Swift): Build AI Agent Applications with Native Swift Concurrency
Part 1: Deep Dive into Open Agent SDK (Part 1): Agent Loop Internals
Part 2: Deep Dive into Open Agent SDK (Part 2): Behind the 34 Built-in Tools
Part 3: Deep Dive into Open Agent SDK (Part 3): MCP Integration in Practice
Part 4: Deep Dive into Open Agent SDK (Part 4): Multi-Agent Collaboration
Part 5: Deep Dive into Open Agent SDK (Part 5): Session Persistence and Security
Part 6: Deep Dive into Open Agent SDK (Part 6): Multi-LLM Providers and Runtime Controls

GitHub: terryso/open-agent-sdk-swift

Deep Dive into Open Agent SDK (Part 2): Behind the 34 Built-in Tools

NEE — Mon, 27 Apr 2026 03:54:33 +0000

The previous article analyzed how the Agent Loop works, including one crucial step: "execute tools." When the LLM says "I need to call Bash," the SDK actually spawns a process to run the command. But the tool system behind this is far more nuanced than simply "calling a function." How are 34 built-in tools organized? How do you safely convert the LLM's JSON input into Swift types? How do you control which tools are available?

This article starts from the protocol definition and examines the Open Agent SDK tool system layer by layer.

ToolProtocol: What a Tool Looks Like

Every tool in the SDK conforms to the ToolProtocol protocol:

public protocol ToolProtocol: Sendable {
    var name: String { get }
    var description: String { get }
    var inputSchema: ToolInputSchema { get }
    var isReadOnly: Bool { get }
    var annotations: ToolAnnotations? { get }

    func call(input: Any, context: ToolContext) async -> ToolResult
}

Five properties and one method. Let's go through each.

name is the tool's unique identifier. The LLM uses this name in tool_use blocks to specify which tool to invoke. All built-in tools use PascalCase naming: Read, Bash, Glob, CronCreate.

description is the tool description shown to the LLM. This text is included as part of the tool definition sent to the API, and its quality directly affects when the LLM chooses to invoke this tool.

inputSchema is a [String: Any] JSON Schema dictionary describing the input structure the tool accepts. It's passed as-is to the input_schema field in API calls.

isReadOnly is a boolean flag telling the Agent Loop whether the tool has side effects. As mentioned in the previous article, the Agent Loop uses this field for bucketing: read-only tools execute concurrently, mutation tools execute serially.

annotations are optional behavioral hints containing four boolean fields:

public struct ToolAnnotations: Sendable, Equatable {
    public let readOnlyHint: Bool       // Read-only, no side effects
    public let destructiveHint: Bool    // May perform irreversible operations
    public let idempotentHint: Bool     // Idempotent, multiple calls produce the same result
    public let openWorldHint: Bool      // Interacts with the external world
}

Note that destructiveHint defaults to true — the SDK takes a "default dangerous" stance, requiring tools to proactively declare themselves safe. These hints don't affect the SDK's own execution logic, but the LLM references them when deciding how to use tools.

ToolResult and ToolExecuteResult

The call() method returns ToolResult, the content fed back to the LLM after tool execution:

public struct ToolResult: Sendable {
    public let toolUseId: String         // Corresponds to the LLM's tool_use ID
    public let content: String           // Text content
    public let typedContent: [ToolContent]?  // Multi-modal content (text, images, resource references)
    public let isError: Bool             // Whether this is an error result
}

There's a compatibility design between content and typedContent: when typedContent has a value, content extracts all .text types and concatenates them; otherwise it returns the stored string directly. This way, older code using only content still works, while new code can use typedContent for non-text content like images.

ToolContent is an enum supporting three content types:

public enum ToolContent: Sendable {
    case text(String)
    case image(data: Data, mimeType: String)
    case resource(uri: String, name: String?)
}

Inside tool closures, ToolExecuteResult is used — structurally almost identical to ToolResult, just missing toolUseId (this ID is auto-filled by the calling layer).

ToolContext: The Tool's Runtime Environment

ToolContext is injected context for each tool execution, with many fields:

Field	Purpose
`cwd`	Current working directory
`toolUseId`	tool_use ID for this invocation
`agentSpawner`	Sub-agent spawner (used by AgentTool)
`cronStore`	Scheduled task store (used by CronTools)
`todoStore`	Todo item store (used by TodoWrite)
`worktreeStore`	Worktree store (used by WorktreeTools)
`planStore`	Plan mode store (used by PlanTools)
`taskStore`	Task management store (used by Task*Tools)
`mailboxStore`	Mailbox store (used by SendMessage)
`teamStore`	Team store (used by TeamCreate)
`hookRegistry`	Hook event registry
`permissionMode`	Permission mode
`canUseTool`	Custom permission check callback
`skillRegistry`	Skill registry (used by SkillTool)
`restrictionStack`	Tool restriction stack
`sandbox`	Sandbox settings
`mcpConnections`	MCP connection info
`fileCache`	File cache
`env`	Custom environment variables

With this many optional fields, the rule is simple: inject what a tool needs; everything else is nil. The Read tool only looks at cwd, sandbox, fileCache; AgentTool only looks at agentSpawner; CronTools only looks at cronStore. Each tool depends only on its specific Store, unaware of and unconcerned with other Stores.

ToolContext also provides two copy methods: withToolUseId() for updating the call ID (called by ToolExecutor on each tool execution), and withSkillContext() for incrementing skill nesting depth (used when SkillTool calls sub-skills).

Three-Tier Tool Architecture

The SDK divides 34 tools into three tiers: Core (10), Advanced (11), and Specialist (13).

Core Tier (10)        Advanced Tier (11)      Specialist Tier (13)
┌──────────┐         ┌──────────────┐        ┌───────────────┐
│ Read      │         │ Agent        │        │ CronCreate    │
│ Write     │         │ Skill        │        │ CronDelete    │
│ Edit      │         │ TaskCreate   │        │ CronList      │
│ Glob      │         │ TaskGet      │        │ LSP           │
│ Grep      │         │ TaskList     │        │ Config        │
│ Bash      │         │ TaskOutput   │        │ TodoWrite     │
│ AskUser   │         │ TaskStop     │        │ EnterPlanMode │
│ ToolSearch│         │ TaskUpdate   │        │ ExitPlanMode  │
│ WebFetch  │         │ SendMessage  │        │ EnterWorktree │
│ WebSearch │         │ TeamCreate   │        │ ExitWorktree  │
└──────────┘         │ TeamDelete   │        │ RemoteTrigger │
                     │ NotebookEdit │        │ ListMcpRes    │
                     └──────────────┘        │ ReadMcpRes    │
                                              └───────────────┘

The tier distinction is based not on technical implementation difficulty, but on dependency complexity and use case.

Core Tier: File System and Shell

The 10 Core tools are the Agent's foundational capabilities — reading files, writing files, searching code, running commands. They share a common trait: they only depend on basic ToolContext fields (cwd, sandbox, fileCache), requiring no Store injection.

Take the Read tool. Its input is a file path with optional offset and limit:

private struct FileReadInput: Codable {
    let file_path: String
    let offset: Int?
    let limit: Int?
}

The execution logic is straightforward: resolve path → check sandbox → query cache → read file → paginate → return content with line numbers. A file caching detail: if context.fileCache is available, it checks the cache first, skipping disk I/O on a hit.

The Bash tool is much more complex, handling timeouts, output truncation, and background processes. Bash's input has 5 fields:

private struct BashInput: Codable {
    let command: String
    let timeout: Int?
    let description: String?
    let runInBackground: Bool?
    let dangerouslyDisableSandbox: Bool?
}

Key implementation details:

Timeout control. Default 120 seconds, maximum 600 seconds. Uses DispatchQueue.global().asyncAfter for timeout, calling process.terminate() when time's up.
Output truncation. Output exceeding 100,000 characters keeps only the first 50,000 + last 50,000, connected with ...(truncated)....
Background execution. When run_in_background = true, the process starts and returns a task ID immediately without waiting for completion.
Process output collection uses ProcessOutputAccumulator, marked @unchecked Sendable because Pipe's readability handler and termination handler both dispatch on the same run loop queue, preventing data races.

Bash's annotations sets destructiveHint: true, explicitly telling the LLM this tool is destructive.

Advanced Tier: Sub-Agents and Task Orchestration

Advanced tier tools start requiring external dependencies — AgentTool needs agentSpawner, Task* tools need taskStore, SendMessage needs mailboxStore and teamStore.

The Agent tool is representative of this tier. Its purpose is letting the LLM "dispatch a sub-agent" for complex tasks:

public func createAgentTool() -> ToolProtocol {
    return defineTool(
        name: "Agent",
        description: "Launch a subagent to handle complex, multi-step tasks autonomously.",
        inputSchema: agentToolSchema,
        isReadOnly: false
    ) { (input: AgentToolInput, context: ToolContext) async throws -> ToolExecuteResult in
        guard let spawner = context.agentSpawner else {
            return ToolExecuteResult(
                content: "Error: Agent spawner not available.",
                isError: true
            )
        }
        // Parse built-in agent types, permission mode, then spawn sub-agent
        let result = await spawner.spawn(
            prompt: input.prompt,
            model: input.model ?? agentDef?.model,
            systemPrompt: agentDef?.systemPrompt,
            allowedTools: agentDef?.tools,
            ...
        )
        return ToolExecuteResult(content: result.text, isError: result.isError)
    }
}

AgentTool's input supports 11 fields: prompt, description, subagent_type, model, name, maxTurns, run_in_background, isolation, team_name, mode, resume. The subagent_type can specify built-in Explore or Plan types, or use a custom name.

Note that agentSpawner is injected through ToolContext as a protocol type — AgentTool doesn't know how sub-agents are created. It just calls spawner.spawn(), with the concrete implementation injected by the Core layer. This dependency inversion means the Tools layer never needs to import the Core module.

Specialist Tier: Domain-Specific Tools

Specialist tier tools have heavier dependencies — each needs its own dedicated Store, and their functionality is highly domain-specific.

CronTools is a set of three tools: CronCreate, CronDelete, CronList, accessing scheduled task storage via context.cronStore:

public func createCronCreateTool() -> ToolProtocol {
    return defineTool(
        name: "CronCreate",
        description: "Create a scheduled recurring task (cron job).",
        inputSchema: cronCreateSchema,
        isReadOnly: false
    ) { (input: CronCreateInput, context: ToolContext) async throws -> ToolExecuteResult in
        guard let cronStore = context.cronStore else {
            return ToolExecuteResult(content: "Error: CronStore not available.", isError: true)
        }
        let job = await cronStore.create(
            name: input.name,
            schedule: input.schedule,
            command: input.command
        )
        return ToolExecuteResult(
            content: "Cron job created: \(job.id) \"\(job.name)\"",
            isError: false
        )
    }
}

All three tools use guard let cronStore = context.cronStore for pre-checks — if the Store isn't injected, they return an error rather than crashing.

The LSP tool is another interesting example. It uses grep to simulate common Language Server Protocol operations (go to definition, find references, symbol search) without depending on an actual language server:

case "goToDefinition", "goToImplementation":
    // 1. Extract symbol name at cursor position using regex
    guard let symbol = getSymbolAtPosition(
        filePath: filePath, line: line, character: character
    ) else { ... }

    // 2. Grep search for definition patterns
    let pattern = "(func|class|struct|enum|protocol|typealias|let|var|export)\\s+\(symbol)"
    let results = await runGrep(
        arguments: ["grep", "-rn", "-E", pattern, cwd],
        cwd: cwd
    )

LSP depends only on context.cwd, requiring no Store — the lightest tool in the Specialist tier.

defineTool: The Factory Function for Custom Tools

The SDK provides the defineTool factory function, letting developers create ToolProtocol-conforming tools with minimal code. It has four overloads covering different use cases.

Basic: Codable Input + String Output

The most commonly used overload accepts a Codable input type and a closure returning String:

let greetTool = defineTool(
    name: "Greet",
    description: "Generate a greeting message.",
    inputSchema: [
        "type": "object",
        "properties": [
            "name": ["type": "string", "description": "Person's name"]
        ],
        "required": ["name"]
    ],
    isReadOnly: true
) { (input: GreetInput, context: ToolContext) async throws -> String in
    return "Hello, \(input.name)!"
}

// Input type only needs to conform to Codable
struct GreetInput: Codable {
    let name: String
}

Internally, defineTool does four things:

Casts the LLM's Any type input to [String: Any]
Serializes to Data using JSONSerialization
Decodes to your defined Input type using JSONDecoder
Calls your closure

If any step fails (input isn't a dictionary, JSON serialization fails, decoding fails, closure throws), it returns an isError: true result instead of crashing the Agent Loop. This means you can safely use try in your closures — errors are gracefully caught.

Structured Output: ToolExecuteResult

If a tool needs to explicitly mark errors (rather than using try to throw), use the overload returning ToolExecuteResult:

let divideTool = defineTool(
    name: "Divide",
    description: "Divide two numbers.",
    inputSchema: [
        "type": "object",
        "properties": [
            "a": ["type": "number"],
            "b": ["type": "number"]
        ],
        "required": ["a", "b"]
    ]
) { (input: DivideInput, context: ToolContext) async throws -> ToolExecuteResult in
    guard input.b != 0 else {
        return ToolExecuteResult(content: "Error: Division by zero.", isError: true)
    }
    return ToolExecuteResult(content: "\(input.a / input.b)", isError: false)
}

Most built-in tools use this overload because many errors are logic-level (file doesn't exist, Store not injected) and aren't well represented by exceptions.

No Input: NoInputTool

Some tools don't need input parameters (e.g., list operations, health checks):

let listTool = defineTool(
    name: "ListItems",
    description: "List all items.",
    inputSchema: ["type": "object", "properties": [:]]
) { (context: ToolContext) async throws -> String in
    return "No items found."
}

The closure only receives ToolContext, completely ignoring input.

Raw Dictionary Input: RawInputTool

The final overload skips Codable decoding, passing the raw [String: Any] dictionary directly to the closure. Useful when input field types are dynamic — e.g., ConfigTool's value field can be a string, number, boolean, array, object, or null:

let configTool = defineTool(
    name: "Config",
    description: "Read or write configuration values.",
    inputSchema: configSchema
) { (input: [String: Any], context: ToolContext) async -> ToolExecuteResult in
    let key = input["key"] as? String ?? ""
    let value = input["value"]  // Any type
    // ...
}

CodingKeys for snake_case

LLM-sent JSON field names typically use snake_case (e.g., file_path, run_in_background), but Swift convention is camelCase. Input types map between them using the CodingKeys enum:

private struct BashInput: Codable {
    let command: String
    let runInBackground: Bool?

    private enum CodingKeys: String, CodingKey {
        case command
        case runInBackground = "run_in_background"
    }
}

This is standard Swift Codable practice — defineTool's internal JSONDecoder automatically uses CodingKeys for field name conversion.

Tool Pool Assembly and Filtering

Tools aren't just thrown at the LLM wholesale. The SDK has an assembly and filtering mechanism.

assembleToolPool

assembleToolPool merges three tool sources into a deduplicated tool pool:

public func assembleToolPool(
    baseTools: [ToolProtocol],     // SDK built-in tools
    customTools: [ToolProtocol]?,  // User-defined custom tools
    mcpTools: [ToolProtocol]?,     // MCP server-provided tools
    allowed: [String]?,
    disallowed: [String]?
) -> [ToolProtocol] {
    // 1. Merge all sources: base + custom + MCP
    var combined = baseTools
    if let customTools { combined.append(contentsOf: customTools) }
    if let mcpTools { combined.append(contentsOf: mcpTools) }

    // 2. Deduplicate by name (latter overwrites former)
    var byName = [String: ToolProtocol]()
    for tool in combined {
        byName[tool.name] = tool
    }

    // 3. Apply filtering rules
    return filterTools(
        tools: Array(byName.values),
        allowed: allowed,
        disallowed: disallowed
    )
}

Deduplication uses a Dictionary — same-named tools encountered later overwrite earlier ones. This means the priority is: MCP > custom > built-in — users can replace built-in tools with custom or MCP tools of the same name.

filterTools

filterTools implements allowlist/denylist filtering:

public func filterTools(
    tools: [ToolProtocol],
    allowed: [String]?,       // Allowlist, nil or empty means no filter
    disallowed: [String]?     // Denylist, nil or empty means no filter
) -> [ToolProtocol] {
    var filtered = tools
    // Apply allowlist first
    if let allowed, !allowed.isEmpty {
        let allowedSet = Set(allowed)
        filtered = filtered.filter { allowedSet.contains($0.name) }
    }
    // Then apply denylist (denylist takes priority over allowlist)
    if let disallowed, !disallowed.isEmpty {
        let disallowedSet = Set(disallowed)
        filtered = filtered.filter { !disallowedSet.contains($0.name) }
    }
    return filtered
}

When both rules exist, the denylist takes priority — even if a tool is in the allowlist, it's excluded if it appears in the denylist.

ToolRestrictionStack: Skills System Tool Restrictions

ToolRestrictionStack is a stack structure used by the Skills system to control tool visibility. When a Skill configures toolRestrictions, it pushes restrictions before execution and pops them after:

let stack = ToolRestrictionStack()
stack.push([.bash, .read])     // Skill A: only Bash and Read
stack.push([.grep, .glob])     // Skill B (nested): only Grep and Glob
// currentAllowedToolNames now returns only Grep and Glob
stack.pop()                     // Skill B done → back to Bash and Read
stack.pop()                     // Skill A done → restore all tools

The stack's LIFO nature ensures correct behavior for nested Skills — inner Skill restrictions override outer ones, automatically restored on exit. Thread safety is ensured by an internal serial DispatchQueue.

currentAllowedToolNames logic is simple: empty stack returns all tools; non-empty stack returns only tool names in the top restriction list.

toApiTool: Converting Tools to API Format

The final step is converting tools to the format required by the Anthropic API:

public func toApiTool(_ tool: ToolProtocol) -> [String: Any] {
    var result: [String: Any] = [
        "name": tool.name,
        "description": tool.description,
        "input_schema": tool.inputSchema
    ]
    if let annotations = tool.annotations {
        result["annotations"] = [
            "readOnlyHint": annotations.readOnlyHint,
            "destructiveHint": annotations.destructiveHint,
            "idempotentHint": annotations.idempotentHint,
            "openWorldHint": annotations.openWorldHint
        ]
    }
    return result
}

annotations are only included when present — saving tokens.

A Complete Custom Tool Example

Tying everything together, here's a custom tool you can run directly — fetching weather:

import Foundation
import OpenAgentSDK

// 1. Define input type
struct WeatherInput: Codable {
    let city: String
    let unit: String?  // "celsius" or "fahrenheit"

    private enum CodingKeys: String, CodingKey {
        case city, unit
    }
}

// 2. Create tool with defineTool
let weatherTool = defineTool(
    name: "Weather",
    description: "Get current weather for a city.",
    inputSchema: [
        "type": "object",
        "properties": [
            "city": [
                "type": "string",
                "description": "City name, e.g. 'Beijing'"
            ],
            "unit": [
                "type": "string",
                "enum": ["celsius", "fahrenheit"],
                "description": "Temperature unit, defaults to celsius"
            ]
        ],
        "required": ["city"]
    ],
    isReadOnly: true,
    annotations: ToolAnnotations(
        readOnlyHint: true,
        destructiveHint: false,
        openWorldHint: true  // Needs to access external API
    )
) { (input: WeatherInput, context: ToolContext) async throws -> ToolExecuteResult in
    let unit = input.unit ?? "celsius"
    // Call weather API (specific implementation omitted)
    let weather = try await fetchWeather(city: input.city, unit: unit)
    return ToolExecuteResult(content: weather, isError: false)
}

// 3. Register with Agent
let agent = createAgent(options: AgentOptions(
    apiKey: "sk-...",
    model: "claude-sonnet-4-6",
    customTools: [weatherTool]  // Custom tools automatically join the tool pool
))

This tool gets merged, deduplicated, and filtered by assembleToolPool along with built-in tools, then sent to the LLM. When the LLM sees the tool definition, it automatically invokes it when it needs weather data. defineTool's internal Codable bridge automatically decodes the LLM's JSON into WeatherInput — you don't need to handle any JSON parsing manually.

Summary

The tool system's design philosophy can be summarized in a few keywords:

Protocol-driven. ToolProtocol specifies only the shape of a tool (name, description, input schema, execution method), not how tools are implemented. This means built-in and custom tools follow the exact same code path.

Dependency injection. ToolContext's 20+ optional fields look like a lot, but each tool only reads the fields it needs. AgentTool doesn't know CronStore exists; CronCreate doesn't know SubAgentSpawner exists.

Tiered organization. The Core/Advanced/Specialist tiers aren't code layers (their code structure is identical), but a division by dependency complexity. Core tools run independently, Advanced tools need Stores, Specialist tools need more specialized domain infrastructure.

Fault tolerance first. defineTool wraps all potential failure points (type casting, serialization, decoding, execution) in do/catch blocks. Any error returns isError: true instead of crashing. Tool errors in the Agent Loop don't propagate — the LLM gets the error message and can adjust strategy.

The next article covers MCP Integration: how the SDK connects to external tool servers, converts MCP tools to ToolProtocol, and coexists with built-in tools in the Agent Loop.

Deep Dive into Open Agent SDK (Swift) Series:

Part 0: Open Agent SDK (Swift): Build AI Agent Applications with Native Swift Concurrency
Part 1: Deep Dive into Open Agent SDK (Part 1): Agent Loop Internals
Part 2: Deep Dive into Open Agent SDK (Part 2): Behind the 34 Built-in Tools
Part 3: Deep Dive into Open Agent SDK (Part 3): MCP Integration in Practice
Part 4: Deep Dive into Open Agent SDK (Part 4): Multi-Agent Collaboration
Part 5: Deep Dive into Open Agent SDK (Part 5): Session Persistence and Security
Part 6: Deep Dive into Open Agent SDK (Part 6): Multi-LLM Providers and Runtime Controls

GitHub: terryso/open-agent-sdk-swift