NEE

Posted on Apr 27

Deep Dive into Open Agent SDK (Part 6): Multi-LLM Providers and Runtime Controls

#ai #swift #llm #opensource

An Agent shouldn't be locked to a single LLM provider. Different tasks suit different models — simple questions use cheap models, complex reasoning uses expensive ones, and some scenarios even require local models. Runtime needs change too: users might want deeper thinking mid-session, discover the budget is running low and need to downgrade, or switch to a local model to save money.

Open Agent SDK's approach: define a unified LLMClient protocol, with Anthropic and OpenAI-compatible providers each having an implementation. Internally, the Agent uses Anthropic format throughout. Switching providers requires changing only one configuration parameter, and models can be switched dynamically at runtime with adjustable thinking depth and budget control.

This article analyzes the SDK's multi-provider adaptation mechanism and runtime control capabilities.

1. LLMClient Protocol — Unified Interface

First, the protocol definition:

public protocol LLMClient: Sendable {
    nonisolated func sendMessage(
        model: String,
        messages: [[String: Any]],
        maxTokens: Int,
        system: String?,
        tools: [[String: Any]]?,
        toolChoice: [String: Any]?,
        thinking: [String: Any]?,
        temperature: Double?
    ) async throws -> [String: Any]

    nonisolated func streamMessage(
        model: String,
        messages: [[String: Any]],
        maxTokens: Int,
        system: String?,
        tools: [[String: Any]]?,
        toolChoice: [String: Any]?,
        thinking: [String: Any]?,
        temperature: Double?
    ) async throws -> AsyncThrowingStream<SSEEvent, Error>
}

Two core methods: one blocking, one streaming. The parameter list covers all capabilities of mainstream LLM APIs: model selection, message history, token limit, system prompt, tool definitions, tool choice strategy, thinking configuration, and temperature.

Key design decision: return values are always in Anthropic format dictionaries. Whether the underlying API is Anthropic native or OpenAI-compatible, the Agent internally receives the same structure — content arrays with {"type": "text", "text": "..."} or {"type": "tool_use", "name": "...", "input": {...}}, and stop_reason as end_turn/tool_use/max_tokens. This means Agent Loop processing logic doesn't need to care about which API is underneath.

Streaming returns use AsyncThrowingStream<SSEEvent, Error>, where SSEEvent is an enum:

public enum SSEEvent: @unchecked Sendable {
    case messageStart(message: [String: Any])
    case contentBlockStart(index: Int, contentBlock: [String: Any])
    case contentBlockDelta(index: Int, delta: [String: Any])
    case contentBlockStop(index: Int)
    case messageDelta(delta: [String: Any], usage: [String: Any])
    case messageStop
    case ping
    case error(data: [String: Any])
}

7 event types covering all streaming response events from the Anthropic Messages API. OpenAI-compatible layer streaming output is converted to the same SSEEvent sequence.

2. AnthropicClient — Native Claude API

AnthropicClient is the Anthropic native implementation of LLMClient, using actor for concurrency safety:

public actor AnthropicClient: LLMClient {
    private let apiKey: String
    private let baseURL: URL      // Default https://api.anthropic.com
    private let urlSession: URLSession

    public init(apiKey: String, baseURL: String? = nil, urlSession: URLSession? = nil) {
        self.apiKey = apiKey
        self.baseURL = URL(string: baseURL ?? "https://api.anthropic.com")!
        self.urlSession = urlSession ?? URLSession.shared
    }
}

Requests are POST to /v1/messages with x-api-key and anthropic-version headers:

private nonisolated func buildRequest(body: [String: Any]) throws -> URLRequest {
    var request = URLRequest(url: URL(string: baseURL.absoluteString + "/v1/messages")!)
    request.httpMethod = "POST"
    request.timeoutInterval = 300
    request.setValue(apiKey, forHTTPHeaderField: "x-api-key")
    request.setValue("2023-06-01", forHTTPHeaderField: "anthropic-version")
    request.setValue("application/json", forHTTPHeaderField: "content-type")
    request.httpBody = try JSONSerialization.data(withJSONObject: body, options: [])
    return request
}

Since it uses the Anthropic native API, sendMessage request and response bodies don't need format conversion — request parameters are assembled directly as dictionaries, responses are parsed directly. Streaming mode directly parses Anthropic SSE text.

A security detail: all error messages replace the API Key with *** to prevent key leakage into logs:

let safeMessage = errorMessage.replacingOccurrences(of: apiKey, with: "***")

AnthropicClient directly supports Extended Thinking. When the Agent configures ThinkingConfig, the thinking parameter is passed through:

if let thinking {
    body["thinking"] = thinking
}

3. OpenAI-Compatible Layer — Adapting GLM/Ollama/OpenRouter etc.

OpenAIClient is the heavy lifter. It accepts Anthropic-format parameters, converts them to OpenAI Chat Completion API format, sends the request, then converts the OpenAI response back to Anthropic format. The Agent is completely unaware of the underlying OpenAI-compatible API.

public actor OpenAIClient: LLMClient {
    private let apiKey: String
    private let baseURL: URL      // Default https://api.openai.com/v1

    public init(apiKey: String, baseURL: String? = nil, urlSession: URLSession? = nil) {
        self.apiKey = apiKey
        self.baseURL = URL(string: baseURL ?? "https://api.openai.com/v1")!
        self.urlSession = urlSession ?? URLSession.shared
    }
}

Requests go to /chat/completions with Bearer token authentication — standard practice for OpenAI-compatible APIs. Any provider supporting the /v1/chat/completions endpoint works with this client.

Message Format Conversion

Several key differences between Anthropic and OpenAI message formats must be handled during conversion:

1. System Message Position

Anthropic passes the system prompt as a top-level parameter; OpenAI includes it as the first role: "system" message:

if let system {
    result.append(["role": "system", "content": system])
}

2. Tool Result Representation

Anthropic packages multiple tool_results in one role: "user" message's content array; OpenAI requires each tool result as a separate role: "tool" message:

let toolResults = blocks.filter { $0["type"] as? String == "tool_result" }
if !toolResults.isEmpty {
    return toolResults.map { block in
        [
            "role": "tool",
            "tool_call_id": block["tool_use_id"] as? String ?? "",
            "content": block["content"] ?? "",
        ]
    }
}

3. Tool Use Representation

Anthropic uses type: "tool_use" blocks in the content array; OpenAI uses a tool_calls array at the message top level:

result["tool_calls"] = toolUseBlocks.enumerated().map { index, block in
    let inputDict = block["input"] as? [String: Any] ?? [:]
    let arguments = (try? JSONSerialization.data(withJSONObject: inputDict, options: []))
        .flatMap { String(data: $0, encoding: .utf8) } ?? "{}"
    return [
        "id": block["id"] as? String ?? "call_\(index)",
        "type": "function",
        "function": [
            "name": block["name"] as? String ?? "",
            "arguments": arguments,  // OpenAI requires JSON string, not dictionary
        ],
    ]
}

Note that OpenAI's arguments must be a JSON string, not a dictionary object — serialization is done here.

Response Format Conversion

OpenAI's response structure (choices[0].message) needs conversion to Anthropic format:

// stop_reason mapping
private static func mapStopReason(_ finishReason: String) -> String {
    switch finishReason {
    case "stop": return "end_turn"
    case "tool_calls": return "tool_use"
    case "length": return "max_tokens"
    default: return finishReason
    }
}

// usage mapping
usage = [
    "input_tokens": openAIUsage["prompt_tokens"] as? Int ?? 0,
    "output_tokens": openAIUsage["completion_tokens"] as? Int ?? 0,
]

Streaming Conversion

Streaming conversion is more complex. OpenAI's streaming format (data: {"choices":[{"delta":{...}}]}) must be converted chunk by chunk to Anthropic's SSEEvent sequence:

First chunk → messageStart
Text delta → contentBlockDelta(type: "text_delta")
Tool call start → contentBlockStart(type: "tool_use"), parameter delta → contentBlockDelta(type: "input_json_delta")
End → contentBlockStop + messageDelta + messageStop

The conversion function tracks how many content blocks are open, whether text blocks are closed, and which tool call blocks are still open to generate correct index values. A safety check ensures messageStop is always emitted, even if the original stream doesn't end normally.

Usage Examples

Connecting to different OpenAI-compatible providers only requires changing baseURL and model:

// DeepSeek
let agent = createAgent(options: AgentOptions(
    apiKey: "sk-...",
    model: "deepseek-chat",
    baseURL: "https://api.deepseek.com/v1",
    provider: .openai
))

// Ollama local
let localAgent = createAgent(options: AgentOptions(
    apiKey: "ollama",           // Ollama doesn't need a key, any value works
    model: "qwen3:8b",
    baseURL: "http://localhost:11434/v1",
    provider: .openai
))

// GLM
let glmAgent = createAgent(options: AgentOptions(
    apiKey: "xxx.glm-xxx",
    model: "glm-4-plus",
    baseURL: "https://open.bigmodel.cn/api/paas/v4",
    provider: .openai
))

4. Runtime Model Switching

The SDK supports dynamic model switching at runtime without recreating the Agent:

let agent = createAgent(options: AgentOptions(
    apiKey: apiKey,
    model: "claude-sonnet-4-6",
    fallbackModel: "claude-haiku-4-5"  // Used if primary model fails
))

// Use sonnet for a simple question first
let result1 = await agent.prompt("What is 2 + 3?")
print(result1.costBreakdown)
// [CostBreakdownEntry(model: "claude-sonnet-4-6", inputTokens: 45, outputTokens: 3, costUsd: 0.000180)]

// Switch to opus for reasoning-intensive question
try agent.switchModel("claude-opus-4-6")
let result2 = await agent.prompt("Explain the difference between structs and classes in Swift.")
print(result2.costBreakdown)
// [CostBreakdownEntry(model: "claude-opus-4-6", inputTokens: 52, outputTokens: 156, costUsd: 0.011970)]

switchModel() implementation:

public func switchModel(_ model: String) throws {
    let trimmed = model.trimmingCharacters(in: .whitespacesAndNewlines)
    guard !trimmed.isEmpty else {
        throw SDKError.invalidConfiguration("Model name cannot be empty")
    }
    let oldModel = self.model
    self.model = trimmed
    self.options.model = trimmed
    Logger.shared.info("Agent", "model_switch", data: ["from": oldModel, "to": trimmed])
}

No allowlist validation — whatever model name is passed gets used. Unsupported models will error at the API level. This design choice exists because OpenAI-compatible provider model names can't be exhaustively listed.

fallbackModel is configured in AgentOptions. When the primary model fails completely (retries exhausted), the SDK automatically retries with the fallback:

if let fallbackModel = self.options.fallbackModel, fallbackModel != self.model {
    let fallbackResponse = try await retryClient.sendMessage(
        model: fallbackModel,
        messages: retryMessages, ...
    )
    // Temporarily switch to fallback for cost tracking
    let originalModel = self.model
    self.model = fallbackModel
    // ... process response
}

Per-Model Cost Breakdown

CostBreakdownEntry records costs grouped by model name:

public struct CostBreakdownEntry: Sendable, Equatable {
    public let model: String
    public let inputTokens: Int
    public let outputTokens: Int
    public let costUsd: Double
}

If models are switched mid-query (or fallback triggered), QueryResult.costBreakdown contains multiple entries with per-model costs. Costs are calculated from built-in price tables:

public nonisolated(unsafe) var MODEL_PRICING: [String: ModelPricing] = [
    "claude-opus-4-6":   ModelPricing(input: 15.0 / 1_000_000, output: 75.0 / 1_000_000),
    "claude-sonnet-4-6": ModelPricing(input: 3.0 / 1_000_000, output: 15.0 / 1_000_000),
    "claude-haiku-4-5":  ModelPricing(input: 0.8 / 1_000_000, output: 4.0 / 1_000_000),
    // ...
]

Custom models can register pricing via registerModel(_:pricing:):

registerModel("glm-4-plus", pricing: ModelPricing(
    input: 0.1 / 1_000_000, output: 0.1 / 1_000_000
))

5. Thinking and Effort Configuration

ThinkingConfig

The SDK uses the ThinkingConfig enum to control LLM deep thinking:

public enum ThinkingConfig: Sendable, Equatable {
    case adaptive                  // Model decides whether to think
    case enabled(budgetTokens: Int) // Specify thinking token budget
    case disabled                  // Disable deep thinking
}

Three modes for different uses:

adaptive: Let the model judge — no thinking for simple questions, automatic thinking for complex ones. Most convenient for daily use.
enabled(budgetTokens:): Explicitly control thinking budget. For deep analysis, allocate 10,000 thinking tokens.
disabled: Turn off thinking entirely for maximum speed.

EffortLevel

EffortLevel is a higher-level abstraction mapping to specific thinking token budgets:

public enum EffortLevel: String, Sendable, CaseIterable {
    case low    // 1024 tokens
    case medium // 5120 tokens
    case high   // 10240 tokens
    case max    // 32768 tokens

    public var budgetTokens: Int {
        switch self {
        case .low: return 1024
        case .medium: return 5120
        case .high: return 10240
        case .max: return 32768
        }
    }
}

Set in AgentOptions:

let agent = createAgent(options: AgentOptions(
    apiKey: apiKey,
    model: "claude-sonnet-4-6",
    effort: .high  // 10240 thinking tokens
))

Runtime Dynamic Adjustment

setMaxThinkingTokens() adjusts the thinking budget between queries:

// Simple question, fewer thinking tokens
try agent.setMaxThinkingTokens(2048)
let r1 = await agent.prompt("Summarize this file.")

// Complex reasoning, increase budget
try agent.setMaxThinkingTokens(16000)
let r2 = await agent.prompt("Design a concurrent data structure for...")

// Disable thinking
try agent.setMaxThinkingTokens(nil)

Positive integer enables thinking with the specified budget; nil disables it. Zero or negative throws SDKError.invalidConfiguration.

ModelInfo describes each model's capabilities:

public struct ModelInfo: Sendable, Equatable {
    public let value: String
    public let displayName: String
    public let description: String
    public let supportsEffort: Bool
    public let supportedEffortLevels: [EffortLevel]?
    public let supportsAdaptiveThinking: Bool?
    public let supportsFastMode: Bool?
}

This lets UI layers dynamically show available options based on model capabilities.

6. Skills System

Skills are a special extension mechanism in the SDK — essentially "prompt templates with tool restrictions." A Skill defines a set of prompt instructions, an allowed tool subset, and an optional model override.

Skill Structure

public struct Skill: Sendable {
    public let name: String
    public let description: String
    public let aliases: [String]              // Aliases, e.g. ["ci"] for commit
    public let userInvocable: Bool            // Whether users can invoke via /command
    public let toolRestrictions: [ToolRestriction]?  // Restrict available tools, nil = all
    public let modelOverride: String?         // Override model during execution
    public let isAvailable: @Sendable () -> Bool     // Runtime availability check
    public let promptTemplate: String         // Prompt template content
    public let whenToUse: String?             // Tell LLM when to use this skill
    public let argumentHint: String?          // Argument hint, e.g. "[message]"
    public let baseDir: String?               // Absolute path to skill directory
    public let supportingFiles: [String]      // Supporting files (references, scripts, etc.)
}

5 Built-in Skills

The SDK predefines 5 common Skills accessible via the BuiltInSkills namespace:

Skill	Aliases	Allowed Tools	Function
`commit`	`ci`	bash, read, glob, grep	Analyze git diff, generate commit message
`review`	`review-pr`, `cr`	bash, read, glob, grep	Review code changes from 5 dimensions
`simplify`	—	bash, read, grep, glob	Review code for reuse, quality, efficiency
`debug`	`investigate`, `diagnose`	read, grep, glob, bash	Analyze errors, locate root cause
`test`	`run-tests`	bash, read, write, glob, grep	Generate and execute test cases

Each Skill restricts its tool scope. commit only allows bash, read, glob, grep — no file writing needed. debug is also read-only (read, grep, glob, bash), diagnosing without modifying. test is the only built-in Skill allowing write, since it creates test files.

test Skill also has a runtime availability check:

isAvailable: {
    let cwd = FileManager.default.currentDirectoryPath
    let testIndicators = [
        "Package.swift", "pytest.ini", "jest.config",
        "vitest.config", "Cargo.toml", "go.mod",
    ]
    for indicator in testIndicators {
        if FileManager.default.fileExists(atPath: cwd + "/" + indicator) {
            return true
        }
    }
    return false
}

The test Skill is only visible to users when a test framework configuration file is detected.

SkillRegistry

SkillRegistry is a thread-safe skill manager using DispatchQueue for concurrent access protection:

public final class SkillRegistry: @unchecked Sendable {
    private var skills: [String: Skill] = [:]
    private var orderedNames: [String] = []
    private var aliases: [String: String] = [:]
    private let queue = DispatchQueue(label: "com.openagentsdk.skillregistry")

    public func register(_ skill: Skill) { ... }
    public func find(_ name: String) -> Skill? { ... }   // Find by name or alias
    public var allSkills: [Skill] { ... }
    public var userInvocableSkills: [Skill] { ... }
}

Register, find, replace, and delete are all queue.sync-protected operations. Aliases automatically build mappings on registration — after registering BuiltInSkills.commit, registry.find("ci") also finds it.

SkillLoader: Filesystem Discovery

Skills don't all need code registration. SkillLoader can automatically discover skills from the filesystem — any directory containing a SKILL.md file is recognized as a skill package.

Scanning directories by priority from low to high:

~/.config/agents/skills      (lowest priority)
~/.agents/skills
~/.claude/skills
$PWD/.agents/skills
$PWD/.claude/skills           (highest priority)

Same-named skills discovered later override earlier ones (last-wins).

SKILL.md uses YAML frontmatter for metadata:

---
name: polyv-live-cli
description: Manage live streaming services
aliases: live, plv
allowed-tools: Bash, Read, Write, Glob
when-to-use: user asks about live streaming management
argument-hint: [action] [options]
---

# polyv-live-cli Skill

You are a live streaming management assistant...

The allowed-tools in frontmatter is parsed into ToolRestriction arrays, restricting which tools the skill can use during execution.

SkillLoader uses a "progressive loading" strategy: only loading the SKILL.md Markdown body as the prompt template. Supporting files (references, scripts, templates) only have their paths recorded without loading content. The Agent reads them on-demand via Read/Bash tools when needed.

let registry = SkillRegistry()
registry.register(BuiltInSkills.commit)
registry.register(BuiltInSkills.review)
// Discover custom skills from filesystem
let count = registry.registerDiscoveredSkills()
// Or specify directories
registry.registerDiscoveredSkills(from: ["/opt/custom-skills"])
// Or only register whitelisted skills
registry.registerDiscoveredSkills(skillNames: ["polyv-live-cli"])

ToolRestriction

ToolRestriction enum defines restrictable tools:

public enum ToolRestriction: String, Sendable, CaseIterable {
    case bash, read, write, edit, glob, grep
    case webFetch, webSearch, askUser, toolSearch
    case agent, sendMessage
    case taskCreate, taskList, taskUpdate, taskGet, taskStop, taskOutput
    case teamCreate, teamDelete
    case notebookEdit, skill
}

When a Skill sets toolRestrictions: [.bash, .read, .glob], the Agent can only use these three tools during execution. Other tool calls are intercepted.

Using Skills in an Agent

To make Skills available to an Agent, add SkillTool to the tools list:

var tools = getAllBaseTools(tier: .core)
tools.append(createSkillTool(registry: registry))

let agent = createAgent(options: AgentOptions(
    apiKey: apiKey,
    model: "claude-sonnet-4-6",
    permissionMode: .bypassPermissions,
    tools: tools
))

// Agent auto-discovers and invokes based on skill list in system prompt
let result = await agent.prompt("Use the commit skill to analyze current changes")

SkillRegistry.formatSkillsForPrompt() generates a skill list snippet injected into the system prompt, including each skill's name, description, and trigger conditions. The LLM sees this list and knows when to invoke which skill.

7. Other Runtime Controls

Budget Control

maxBudgetUsd sets the cost ceiling per query:

let agent = createAgent(options: AgentOptions(
    apiKey: apiKey,
    model: "claude-sonnet-4-6",
    maxBudgetUsd: 0.05  // Maximum 5 cents
))

Cumulative cost is checked after each turn:

if let budget = options.maxBudgetUsd, totalCostUsd > budget {
    status = .errorMaxBudgetUsd
    break
}

When the budget is exceeded, the loop exits immediately. Any text and token statistics already generated are preserved in QueryResult — you get a partial result, not a blank one.

Query Interruption

Two ways to interrupt an in-progress query:

// Method 1: Call interrupt()
agent.interrupt()

// Method 2: Cancel Task
let task = Task {
    await agent.prompt("Long running query...")
}
// Later
task.cancel()

interrupt() internally sets the _interrupted flag and cancels the stream task. The Agent Loop checks this flag at multiple checkpoints (loop entry, between read-only/mutation tools, inside SSE event loop, before/after tool execution), exiting immediately on detection.

Dynamic Permission Switching

Runtime permission mode and tool authorization callbacks can be switched:

// Switch permission mode
agent.setPermissionMode(.askForPermission)

// Set custom authorization callback (higher priority than permissionMode)
agent.setCanUseTool { toolName, input in
    if toolName == "Bash" {
        return .deny("Bash is disabled")
    }
    return .allow
}

// Revert to permissionMode control
agent.setCanUseTool(nil)

setCanUseTool callback takes priority over permissionMode. Calling setPermissionMode() clears any previously set callback.

Environment Variable Configuration

The SDK supports configuration via environment variables. Priority: code settings > environment variables > defaults.

Environment Variable	Corresponding Field	Default
`CODEANY_API_KEY`	`apiKey`	`nil`
`CODEANY_MODEL`	`model`	`claude-sonnet-4-6`
`CODEANY_BASE_URL`	`baseURL`	`nil` (use provider default)

Merged using SDKConfiguration.resolved():

// Code-set values take priority; unset values read from environment
let config = SDKConfiguration.resolved(overrides: SDKConfiguration(
    apiKey: "sk-...",           // Overrides CODEANY_API_KEY
    model: "claude-sonnet-4-6"  // Overrides CODEANY_MODEL
))

// Environment variables only
let envConfig = SDKConfiguration.fromEnvironment()

Retry Mechanism

All LLM requests are wrapped with withRetry:

public struct RetryConfig: Sendable {
    public let maxRetries: Int          // Max retries, default 3
    public let baseDelayMs: Int         // Base delay, default 2000ms
    public let maxDelayMs: Int          // Max delay, default 30000ms
    public let retryableStatusCodes: Set<Int>  // Default [429, 500, 502, 503, 529]
}

Exponential backoff + 25% random jitter to avoid thundering herd. Only SDKError.apiError with status codes in the retryable set triggers retries; other errors are thrown directly.

let delay = config.baseDelayMs * (1 << attempt)
let jitterMs = Int(Double(delay) * 0.25 * (Double.random(in: -1...1)))
let totalMs = max(0, min(delay + jitterMs, config.maxDelayMs))

Series Recap

Six articles complete, covering the full architecture of Open Agent SDK (Swift):

Part 0: Project overview — what the SDK does, overall architecture, how to use it
Part 1: Agent Loop internals — the complete cycle from prompt to multi-turn conversation
Part 2: 34 built-in tools — ToolProtocol design, three-tier architecture, custom extensions
Part 3: MCP integration — connecting external tool servers, discovery, and communication
Part 4: Multi-agent collaboration — Team/Task models, inter-agent communication
Part 5: Session persistence and security — session storage, permission control, Hook system
Part 6 (this article): Multi-LLM providers and runtime controls — LLMClient protocol, OpenAI adapter, model switching, Thinking/Effort, Skills system

Starting from the Agent Loop core, the tool system is the loop's "execution" stage, MCP is external tool extension, multi-agent is the collaboration pattern, sessions are state persistence, security and Hooks are governance mechanisms, and this article's multi-provider and runtime controls ensure flexibility — letting the same Agent choose the most appropriate model and control strategy for each scenario.

Deep Dive into Open Agent SDK (Swift) Series:

Part 0: Open Agent SDK (Swift): Build AI Agent Applications with Native Swift Concurrency
Part 1: Deep Dive into Open Agent SDK (Part 1): Agent Loop Internals
Part 2: Deep Dive into Open Agent SDK (Part 2): Behind the 34 Built-in Tools
Part 3: Deep Dive into Open Agent SDK (Part 3): MCP Integration in Practice
Part 4: Deep Dive into Open Agent SDK (Part 4): Multi-Agent Collaboration
Part 5: Deep Dive into Open Agent SDK (Part 5): Session Persistence and Security
Part 6: Deep Dive into Open Agent SDK (Part 6): Multi-LLM Providers and Runtime Controls

GitHub: terryso/open-agent-sdk-swift

DEV Community