An Agent shouldn't be locked to a single LLM provider. Different tasks suit different models — simple questions use cheap models, complex reasoning uses expensive ones, and some scenarios even require local models. Runtime needs change too: users might want deeper thinking mid-session, discover the budget is running low and need to downgrade, or switch to a local model to save money.
Open Agent SDK's approach: define a unified LLMClient protocol, with Anthropic and OpenAI-compatible providers each having an implementation. Internally, the Agent uses Anthropic format throughout. Switching providers requires changing only one configuration parameter, and models can be switched dynamically at runtime with adjustable thinking depth and budget control.
This article analyzes the SDK's multi-provider adaptation mechanism and runtime control capabilities.
1. LLMClient Protocol — Unified Interface
First, the protocol definition:
public protocol LLMClient: Sendable {
nonisolated func sendMessage(
model: String,
messages: [[String: Any]],
maxTokens: Int,
system: String?,
tools: [[String: Any]]?,
toolChoice: [String: Any]?,
thinking: [String: Any]?,
temperature: Double?
) async throws -> [String: Any]
nonisolated func streamMessage(
model: String,
messages: [[String: Any]],
maxTokens: Int,
system: String?,
tools: [[String: Any]]?,
toolChoice: [String: Any]?,
thinking: [String: Any]?,
temperature: Double?
) async throws -> AsyncThrowingStream<SSEEvent, Error>
}
Two core methods: one blocking, one streaming. The parameter list covers all capabilities of mainstream LLM APIs: model selection, message history, token limit, system prompt, tool definitions, tool choice strategy, thinking configuration, and temperature.
Key design decision: return values are always in Anthropic format dictionaries. Whether the underlying API is Anthropic native or OpenAI-compatible, the Agent internally receives the same structure — content arrays with {"type": "text", "text": "..."} or {"type": "tool_use", "name": "...", "input": {...}}, and stop_reason as end_turn/tool_use/max_tokens. This means Agent Loop processing logic doesn't need to care about which API is underneath.
Streaming returns use AsyncThrowingStream<SSEEvent, Error>, where SSEEvent is an enum:
public enum SSEEvent: @unchecked Sendable {
case messageStart(message: [String: Any])
case contentBlockStart(index: Int, contentBlock: [String: Any])
case contentBlockDelta(index: Int, delta: [String: Any])
case contentBlockStop(index: Int)
case messageDelta(delta: [String: Any], usage: [String: Any])
case messageStop
case ping
case error(data: [String: Any])
}
7 event types covering all streaming response events from the Anthropic Messages API. OpenAI-compatible layer streaming output is converted to the same SSEEvent sequence.
2. AnthropicClient — Native Claude API
AnthropicClient is the Anthropic native implementation of LLMClient, using actor for concurrency safety:
public actor AnthropicClient: LLMClient {
private let apiKey: String
private let baseURL: URL // Default https://api.anthropic.com
private let urlSession: URLSession
public init(apiKey: String, baseURL: String? = nil, urlSession: URLSession? = nil) {
self.apiKey = apiKey
self.baseURL = URL(string: baseURL ?? "https://api.anthropic.com")!
self.urlSession = urlSession ?? URLSession.shared
}
}
Requests are POST to /v1/messages with x-api-key and anthropic-version headers:
private nonisolated func buildRequest(body: [String: Any]) throws -> URLRequest {
var request = URLRequest(url: URL(string: baseURL.absoluteString + "/v1/messages")!)
request.httpMethod = "POST"
request.timeoutInterval = 300
request.setValue(apiKey, forHTTPHeaderField: "x-api-key")
request.setValue("2023-06-01", forHTTPHeaderField: "anthropic-version")
request.setValue("application/json", forHTTPHeaderField: "content-type")
request.httpBody = try JSONSerialization.data(withJSONObject: body, options: [])
return request
}
Since it uses the Anthropic native API, sendMessage request and response bodies don't need format conversion — request parameters are assembled directly as dictionaries, responses are parsed directly. Streaming mode directly parses Anthropic SSE text.
A security detail: all error messages replace the API Key with *** to prevent key leakage into logs:
let safeMessage = errorMessage.replacingOccurrences(of: apiKey, with: "***")
AnthropicClient directly supports Extended Thinking. When the Agent configures ThinkingConfig, the thinking parameter is passed through:
if let thinking {
body["thinking"] = thinking
}
3. OpenAI-Compatible Layer — Adapting GLM/Ollama/OpenRouter etc.
OpenAIClient is the heavy lifter. It accepts Anthropic-format parameters, converts them to OpenAI Chat Completion API format, sends the request, then converts the OpenAI response back to Anthropic format. The Agent is completely unaware of the underlying OpenAI-compatible API.
public actor OpenAIClient: LLMClient {
private let apiKey: String
private let baseURL: URL // Default https://api.openai.com/v1
public init(apiKey: String, baseURL: String? = nil, urlSession: URLSession? = nil) {
self.apiKey = apiKey
self.baseURL = URL(string: baseURL ?? "https://api.openai.com/v1")!
self.urlSession = urlSession ?? URLSession.shared
}
}
Requests go to /chat/completions with Bearer token authentication — standard practice for OpenAI-compatible APIs. Any provider supporting the /v1/chat/completions endpoint works with this client.
Message Format Conversion
Several key differences between Anthropic and OpenAI message formats must be handled during conversion:
1. System Message Position
Anthropic passes the system prompt as a top-level parameter; OpenAI includes it as the first role: "system" message:
if let system {
result.append(["role": "system", "content": system])
}
2. Tool Result Representation
Anthropic packages multiple tool_results in one role: "user" message's content array; OpenAI requires each tool result as a separate role: "tool" message:
let toolResults = blocks.filter { $0["type"] as? String == "tool_result" }
if !toolResults.isEmpty {
return toolResults.map { block in
[
"role": "tool",
"tool_call_id": block["tool_use_id"] as? String ?? "",
"content": block["content"] ?? "",
]
}
}
3. Tool Use Representation
Anthropic uses type: "tool_use" blocks in the content array; OpenAI uses a tool_calls array at the message top level:
result["tool_calls"] = toolUseBlocks.enumerated().map { index, block in
let inputDict = block["input"] as? [String: Any] ?? [:]
let arguments = (try? JSONSerialization.data(withJSONObject: inputDict, options: []))
.flatMap { String(data: $0, encoding: .utf8) } ?? "{}"
return [
"id": block["id"] as? String ?? "call_\(index)",
"type": "function",
"function": [
"name": block["name"] as? String ?? "",
"arguments": arguments, // OpenAI requires JSON string, not dictionary
],
]
}
Note that OpenAI's arguments must be a JSON string, not a dictionary object — serialization is done here.
Response Format Conversion
OpenAI's response structure (choices[0].message) needs conversion to Anthropic format:
// stop_reason mapping
private static func mapStopReason(_ finishReason: String) -> String {
switch finishReason {
case "stop": return "end_turn"
case "tool_calls": return "tool_use"
case "length": return "max_tokens"
default: return finishReason
}
}
// usage mapping
usage = [
"input_tokens": openAIUsage["prompt_tokens"] as? Int ?? 0,
"output_tokens": openAIUsage["completion_tokens"] as? Int ?? 0,
]
Streaming Conversion
Streaming conversion is more complex. OpenAI's streaming format (data: {"choices":[{"delta":{...}}]}) must be converted chunk by chunk to Anthropic's SSEEvent sequence:
- First chunk →
messageStart - Text delta →
contentBlockDelta(type: "text_delta") - Tool call start →
contentBlockStart(type: "tool_use"), parameter delta →contentBlockDelta(type: "input_json_delta") - End →
contentBlockStop+messageDelta+messageStop
The conversion function tracks how many content blocks are open, whether text blocks are closed, and which tool call blocks are still open to generate correct index values. A safety check ensures messageStop is always emitted, even if the original stream doesn't end normally.
Usage Examples
Connecting to different OpenAI-compatible providers only requires changing baseURL and model:
// DeepSeek
let agent = createAgent(options: AgentOptions(
apiKey: "sk-...",
model: "deepseek-chat",
baseURL: "https://api.deepseek.com/v1",
provider: .openai
))
// Ollama local
let localAgent = createAgent(options: AgentOptions(
apiKey: "ollama", // Ollama doesn't need a key, any value works
model: "qwen3:8b",
baseURL: "http://localhost:11434/v1",
provider: .openai
))
// GLM
let glmAgent = createAgent(options: AgentOptions(
apiKey: "xxx.glm-xxx",
model: "glm-4-plus",
baseURL: "https://open.bigmodel.cn/api/paas/v4",
provider: .openai
))
4. Runtime Model Switching
The SDK supports dynamic model switching at runtime without recreating the Agent:
let agent = createAgent(options: AgentOptions(
apiKey: apiKey,
model: "claude-sonnet-4-6",
fallbackModel: "claude-haiku-4-5" // Used if primary model fails
))
// Use sonnet for a simple question first
let result1 = await agent.prompt("What is 2 + 3?")
print(result1.costBreakdown)
// [CostBreakdownEntry(model: "claude-sonnet-4-6", inputTokens: 45, outputTokens: 3, costUsd: 0.000180)]
// Switch to opus for reasoning-intensive question
try agent.switchModel("claude-opus-4-6")
let result2 = await agent.prompt("Explain the difference between structs and classes in Swift.")
print(result2.costBreakdown)
// [CostBreakdownEntry(model: "claude-opus-4-6", inputTokens: 52, outputTokens: 156, costUsd: 0.011970)]
switchModel() implementation:
public func switchModel(_ model: String) throws {
let trimmed = model.trimmingCharacters(in: .whitespacesAndNewlines)
guard !trimmed.isEmpty else {
throw SDKError.invalidConfiguration("Model name cannot be empty")
}
let oldModel = self.model
self.model = trimmed
self.options.model = trimmed
Logger.shared.info("Agent", "model_switch", data: ["from": oldModel, "to": trimmed])
}
No allowlist validation — whatever model name is passed gets used. Unsupported models will error at the API level. This design choice exists because OpenAI-compatible provider model names can't be exhaustively listed.
fallbackModel is configured in AgentOptions. When the primary model fails completely (retries exhausted), the SDK automatically retries with the fallback:
if let fallbackModel = self.options.fallbackModel, fallbackModel != self.model {
let fallbackResponse = try await retryClient.sendMessage(
model: fallbackModel,
messages: retryMessages, ...
)
// Temporarily switch to fallback for cost tracking
let originalModel = self.model
self.model = fallbackModel
// ... process response
}
Per-Model Cost Breakdown
CostBreakdownEntry records costs grouped by model name:
public struct CostBreakdownEntry: Sendable, Equatable {
public let model: String
public let inputTokens: Int
public let outputTokens: Int
public let costUsd: Double
}
If models are switched mid-query (or fallback triggered), QueryResult.costBreakdown contains multiple entries with per-model costs. Costs are calculated from built-in price tables:
public nonisolated(unsafe) var MODEL_PRICING: [String: ModelPricing] = [
"claude-opus-4-6": ModelPricing(input: 15.0 / 1_000_000, output: 75.0 / 1_000_000),
"claude-sonnet-4-6": ModelPricing(input: 3.0 / 1_000_000, output: 15.0 / 1_000_000),
"claude-haiku-4-5": ModelPricing(input: 0.8 / 1_000_000, output: 4.0 / 1_000_000),
// ...
]
Custom models can register pricing via registerModel(_:pricing:):
registerModel("glm-4-plus", pricing: ModelPricing(
input: 0.1 / 1_000_000, output: 0.1 / 1_000_000
))
5. Thinking and Effort Configuration
ThinkingConfig
The SDK uses the ThinkingConfig enum to control LLM deep thinking:
public enum ThinkingConfig: Sendable, Equatable {
case adaptive // Model decides whether to think
case enabled(budgetTokens: Int) // Specify thinking token budget
case disabled // Disable deep thinking
}
Three modes for different uses:
- adaptive: Let the model judge — no thinking for simple questions, automatic thinking for complex ones. Most convenient for daily use.
- enabled(budgetTokens:): Explicitly control thinking budget. For deep analysis, allocate 10,000 thinking tokens.
- disabled: Turn off thinking entirely for maximum speed.
EffortLevel
EffortLevel is a higher-level abstraction mapping to specific thinking token budgets:
public enum EffortLevel: String, Sendable, CaseIterable {
case low // 1024 tokens
case medium // 5120 tokens
case high // 10240 tokens
case max // 32768 tokens
public var budgetTokens: Int {
switch self {
case .low: return 1024
case .medium: return 5120
case .high: return 10240
case .max: return 32768
}
}
}
Set in AgentOptions:
let agent = createAgent(options: AgentOptions(
apiKey: apiKey,
model: "claude-sonnet-4-6",
effort: .high // 10240 thinking tokens
))
Runtime Dynamic Adjustment
setMaxThinkingTokens() adjusts the thinking budget between queries:
// Simple question, fewer thinking tokens
try agent.setMaxThinkingTokens(2048)
let r1 = await agent.prompt("Summarize this file.")
// Complex reasoning, increase budget
try agent.setMaxThinkingTokens(16000)
let r2 = await agent.prompt("Design a concurrent data structure for...")
// Disable thinking
try agent.setMaxThinkingTokens(nil)
Positive integer enables thinking with the specified budget; nil disables it. Zero or negative throws SDKError.invalidConfiguration.
ModelInfo describes each model's capabilities:
public struct ModelInfo: Sendable, Equatable {
public let value: String
public let displayName: String
public let description: String
public let supportsEffort: Bool
public let supportedEffortLevels: [EffortLevel]?
public let supportsAdaptiveThinking: Bool?
public let supportsFastMode: Bool?
}
This lets UI layers dynamically show available options based on model capabilities.
6. Skills System
Skills are a special extension mechanism in the SDK — essentially "prompt templates with tool restrictions." A Skill defines a set of prompt instructions, an allowed tool subset, and an optional model override.
Skill Structure
public struct Skill: Sendable {
public let name: String
public let description: String
public let aliases: [String] // Aliases, e.g. ["ci"] for commit
public let userInvocable: Bool // Whether users can invoke via /command
public let toolRestrictions: [ToolRestriction]? // Restrict available tools, nil = all
public let modelOverride: String? // Override model during execution
public let isAvailable: @Sendable () -> Bool // Runtime availability check
public let promptTemplate: String // Prompt template content
public let whenToUse: String? // Tell LLM when to use this skill
public let argumentHint: String? // Argument hint, e.g. "[message]"
public let baseDir: String? // Absolute path to skill directory
public let supportingFiles: [String] // Supporting files (references, scripts, etc.)
}
5 Built-in Skills
The SDK predefines 5 common Skills accessible via the BuiltInSkills namespace:
| Skill | Aliases | Allowed Tools | Function |
|---|---|---|---|
commit |
ci |
bash, read, glob, grep | Analyze git diff, generate commit message |
review |
review-pr, cr
|
bash, read, glob, grep | Review code changes from 5 dimensions |
simplify |
— | bash, read, grep, glob | Review code for reuse, quality, efficiency |
debug |
investigate, diagnose
|
read, grep, glob, bash | Analyze errors, locate root cause |
test |
run-tests |
bash, read, write, glob, grep | Generate and execute test cases |
Each Skill restricts its tool scope. commit only allows bash, read, glob, grep — no file writing needed. debug is also read-only (read, grep, glob, bash), diagnosing without modifying. test is the only built-in Skill allowing write, since it creates test files.
test Skill also has a runtime availability check:
isAvailable: {
let cwd = FileManager.default.currentDirectoryPath
let testIndicators = [
"Package.swift", "pytest.ini", "jest.config",
"vitest.config", "Cargo.toml", "go.mod",
]
for indicator in testIndicators {
if FileManager.default.fileExists(atPath: cwd + "/" + indicator) {
return true
}
}
return false
}
The test Skill is only visible to users when a test framework configuration file is detected.
SkillRegistry
SkillRegistry is a thread-safe skill manager using DispatchQueue for concurrent access protection:
public final class SkillRegistry: @unchecked Sendable {
private var skills: [String: Skill] = [:]
private var orderedNames: [String] = []
private var aliases: [String: String] = [:]
private let queue = DispatchQueue(label: "com.openagentsdk.skillregistry")
public func register(_ skill: Skill) { ... }
public func find(_ name: String) -> Skill? { ... } // Find by name or alias
public var allSkills: [Skill] { ... }
public var userInvocableSkills: [Skill] { ... }
}
Register, find, replace, and delete are all queue.sync-protected operations. Aliases automatically build mappings on registration — after registering BuiltInSkills.commit, registry.find("ci") also finds it.
SkillLoader: Filesystem Discovery
Skills don't all need code registration. SkillLoader can automatically discover skills from the filesystem — any directory containing a SKILL.md file is recognized as a skill package.
Scanning directories by priority from low to high:
~/.config/agents/skills (lowest priority)
~/.agents/skills
~/.claude/skills
$PWD/.agents/skills
$PWD/.claude/skills (highest priority)
Same-named skills discovered later override earlier ones (last-wins).
SKILL.md uses YAML frontmatter for metadata:
---
name: polyv-live-cli
description: Manage live streaming services
aliases: live, plv
allowed-tools: Bash, Read, Write, Glob
when-to-use: user asks about live streaming management
argument-hint: [action] [options]
---
# polyv-live-cli Skill
You are a live streaming management assistant...
The allowed-tools in frontmatter is parsed into ToolRestriction arrays, restricting which tools the skill can use during execution.
SkillLoader uses a "progressive loading" strategy: only loading the SKILL.md Markdown body as the prompt template. Supporting files (references, scripts, templates) only have their paths recorded without loading content. The Agent reads them on-demand via Read/Bash tools when needed.
let registry = SkillRegistry()
registry.register(BuiltInSkills.commit)
registry.register(BuiltInSkills.review)
// Discover custom skills from filesystem
let count = registry.registerDiscoveredSkills()
// Or specify directories
registry.registerDiscoveredSkills(from: ["/opt/custom-skills"])
// Or only register whitelisted skills
registry.registerDiscoveredSkills(skillNames: ["polyv-live-cli"])
ToolRestriction
ToolRestriction enum defines restrictable tools:
public enum ToolRestriction: String, Sendable, CaseIterable {
case bash, read, write, edit, glob, grep
case webFetch, webSearch, askUser, toolSearch
case agent, sendMessage
case taskCreate, taskList, taskUpdate, taskGet, taskStop, taskOutput
case teamCreate, teamDelete
case notebookEdit, skill
}
When a Skill sets toolRestrictions: [.bash, .read, .glob], the Agent can only use these three tools during execution. Other tool calls are intercepted.
Using Skills in an Agent
To make Skills available to an Agent, add SkillTool to the tools list:
var tools = getAllBaseTools(tier: .core)
tools.append(createSkillTool(registry: registry))
let agent = createAgent(options: AgentOptions(
apiKey: apiKey,
model: "claude-sonnet-4-6",
permissionMode: .bypassPermissions,
tools: tools
))
// Agent auto-discovers and invokes based on skill list in system prompt
let result = await agent.prompt("Use the commit skill to analyze current changes")
SkillRegistry.formatSkillsForPrompt() generates a skill list snippet injected into the system prompt, including each skill's name, description, and trigger conditions. The LLM sees this list and knows when to invoke which skill.
7. Other Runtime Controls
Budget Control
maxBudgetUsd sets the cost ceiling per query:
let agent = createAgent(options: AgentOptions(
apiKey: apiKey,
model: "claude-sonnet-4-6",
maxBudgetUsd: 0.05 // Maximum 5 cents
))
Cumulative cost is checked after each turn:
if let budget = options.maxBudgetUsd, totalCostUsd > budget {
status = .errorMaxBudgetUsd
break
}
When the budget is exceeded, the loop exits immediately. Any text and token statistics already generated are preserved in QueryResult — you get a partial result, not a blank one.
Query Interruption
Two ways to interrupt an in-progress query:
// Method 1: Call interrupt()
agent.interrupt()
// Method 2: Cancel Task
let task = Task {
await agent.prompt("Long running query...")
}
// Later
task.cancel()
interrupt() internally sets the _interrupted flag and cancels the stream task. The Agent Loop checks this flag at multiple checkpoints (loop entry, between read-only/mutation tools, inside SSE event loop, before/after tool execution), exiting immediately on detection.
Dynamic Permission Switching
Runtime permission mode and tool authorization callbacks can be switched:
// Switch permission mode
agent.setPermissionMode(.askForPermission)
// Set custom authorization callback (higher priority than permissionMode)
agent.setCanUseTool { toolName, input in
if toolName == "Bash" {
return .deny("Bash is disabled")
}
return .allow
}
// Revert to permissionMode control
agent.setCanUseTool(nil)
setCanUseTool callback takes priority over permissionMode. Calling setPermissionMode() clears any previously set callback.
Environment Variable Configuration
The SDK supports configuration via environment variables. Priority: code settings > environment variables > defaults.
| Environment Variable | Corresponding Field | Default |
|---|---|---|
CODEANY_API_KEY |
apiKey |
nil |
CODEANY_MODEL |
model |
claude-sonnet-4-6 |
CODEANY_BASE_URL |
baseURL |
nil (use provider default) |
Merged using SDKConfiguration.resolved():
// Code-set values take priority; unset values read from environment
let config = SDKConfiguration.resolved(overrides: SDKConfiguration(
apiKey: "sk-...", // Overrides CODEANY_API_KEY
model: "claude-sonnet-4-6" // Overrides CODEANY_MODEL
))
// Environment variables only
let envConfig = SDKConfiguration.fromEnvironment()
Retry Mechanism
All LLM requests are wrapped with withRetry:
public struct RetryConfig: Sendable {
public let maxRetries: Int // Max retries, default 3
public let baseDelayMs: Int // Base delay, default 2000ms
public let maxDelayMs: Int // Max delay, default 30000ms
public let retryableStatusCodes: Set<Int> // Default [429, 500, 502, 503, 529]
}
Exponential backoff + 25% random jitter to avoid thundering herd. Only SDKError.apiError with status codes in the retryable set triggers retries; other errors are thrown directly.
let delay = config.baseDelayMs * (1 << attempt)
let jitterMs = Int(Double(delay) * 0.25 * (Double.random(in: -1...1)))
let totalMs = max(0, min(delay + jitterMs, config.maxDelayMs))
Series Recap
Six articles complete, covering the full architecture of Open Agent SDK (Swift):
- Part 0: Project overview — what the SDK does, overall architecture, how to use it
- Part 1: Agent Loop internals — the complete cycle from prompt to multi-turn conversation
- Part 2: 34 built-in tools — ToolProtocol design, three-tier architecture, custom extensions
- Part 3: MCP integration — connecting external tool servers, discovery, and communication
- Part 4: Multi-agent collaboration — Team/Task models, inter-agent communication
- Part 5: Session persistence and security — session storage, permission control, Hook system
- Part 6 (this article): Multi-LLM providers and runtime controls — LLMClient protocol, OpenAI adapter, model switching, Thinking/Effort, Skills system
Starting from the Agent Loop core, the tool system is the loop's "execution" stage, MCP is external tool extension, multi-agent is the collaboration pattern, sessions are state persistence, security and Hooks are governance mechanisms, and this article's multi-provider and runtime controls ensure flexibility — letting the same Agent choose the most appropriate model and control strategy for each scenario.
Deep Dive into Open Agent SDK (Swift) Series:
- Part 0: Open Agent SDK (Swift): Build AI Agent Applications with Native Swift Concurrency
- Part 1: Deep Dive into Open Agent SDK (Part 1): Agent Loop Internals
- Part 2: Deep Dive into Open Agent SDK (Part 2): Behind the 34 Built-in Tools
- Part 3: Deep Dive into Open Agent SDK (Part 3): MCP Integration in Practice
- Part 4: Deep Dive into Open Agent SDK (Part 4): Multi-Agent Collaboration
- Part 5: Deep Dive into Open Agent SDK (Part 5): Session Persistence and Security
- Part 6: Deep Dive into Open Agent SDK (Part 6): Multi-LLM Providers and Runtime Controls
GitHub: terryso/open-agent-sdk-swift
Top comments (0)