From a single on-device model to a hybrid, multimodal, agentic AI platform — and it's going open source
When Apple shipped the Foundation Models framework at WWDC 2025, the pitch was tidy: a Swift API onto the ~3B-parameter model that powers Apple Intelligence, running on-device, no API key, no network round-trip, no per-token cost. Great for privacy-sensitive, offline-friendly work — and deliberately narrow.
WWDC 2026 blows that scope wide open. The framework is no longer "the on-device model with a nice Swift wrapper." It's now a hybrid platform that spans on-device inference, Apple's own server models on Private Cloud Compute, and third-party frontier models like Claude and Gemini — all behind one session API. It gained vision. It gained agentic primitives. It got a Python SDK, a CLI, an evaluations framework, and it's going open source this summer, including on Linux.
This is a tour of what changed, with the Swift that matters.
The on-device model got rebuilt
The headline isn't a new feature — it's that the base model is better. Apple rebuilt the on-device model with stronger reasoning and tool calling, and refined the guardrails to cut false positives (a real pain point in the first release, where benign prompts sometimes tripped safety filters).
Two practical APIs landed (starting in iOS 26.4) that you'll reach for constantly: inspecting context size and counting tokens before you spend them.
let model = SystemLanguageModel()
print(model.contextSize)
// 8192
let count = try await model.tokenCount(
for: "What are the Japanese characters for origami?"
)
print(count)
The on-device context window is 8,192 tokens. Knowing your token budget up front means you can decide — programmatically — whether a request fits on-device or needs to escalate to a server model.
Vision: the on-device model can now see
The on-device model is multimodal now. You attach an image to a prompt and ask about it — object identification, text extraction, screenshot understanding, receipt parsing — entirely on-device, no cloud hop.
let response = try await session.respond {
"What animal is this?"
Attachment(UIImage(...))
}
Attachments accept UIImage, NSImage, CGImage, Core Image, CVPixelBuffer, and file URLs, at any size. Larger images simply cost more tokens, so it's worth downsampling when you don't need full resolution.
Private Cloud Compute: Apple's server models, same API
For tasks the small on-device model can't handle, you can now route to Apple's larger server models running on Private Cloud Compute (PCC) — through the same LanguageModelSession.
let model = PrivateCloudComputeLanguageModel()
let session = LanguageModelSession(model: model)
The selling points: a 32K context window, configurable reasoning levels, and crucially no account setup, no auth, no API keys — it inherits the same privacy guarantees as the rest of Apple Intelligence. As of this release it's available on watchOS 27 too.
The business news that made this real for most teams: at the Platforms State of the Union, Apple announced free PCC access for developers with fewer than two million first-time App Store downloads. That removes the infrastructure cost barrier that would otherwise make cloud inference a non-starter for indie and mid-size apps.
You can inspect exactly what a request consumed, including cached and reasoning tokens:
let response = try await session.respond(
to: "Recommend a craft that doesn't require scissors.",
contextOptions: ContextOptions(reasoningLevel: .light)
)
print(response.usage.input.totalTokenCount)
print(response.usage.input.cachedTokenCount)
print(response.usage.output.totalTokenCount)
print(response.usage.output.reasoningTokenCount)
The model abstraction layer — and partner models
This is the architectural centerpiece. A new LanguageModel protocol lets any model — local or server — back a LanguageModelSession. Apple's own models already conform, and the framework ships open-source CoreAILanguageModel and MLXLanguageModel conformers for running local models on the Neural Engine and GPU.
The downstream consequence: swapping models is a Swift Package Manager change, not a code change. Anthropic and Google publish Swift packages for their frontier models that conform to the same protocol. Your session logic, your prompts, your tool definitions — all unchanged.
// On-device
let session = LanguageModelSession(model: SystemLanguageModel())
// Apple server model
let session = LanguageModelSession(model: PrivateCloudComputeLanguageModel())
// Third-party frontier model — same session API
let session = LanguageModelSession(model: ClaudeModel(/* ... */))
The partner packages handle auth and billing securely (OAuth, credentials in Keychain) and expose per-token usage including cache and reasoning tokens. The model becomes a configuration choice — prototype on the free on-device model, escalate to Claude or Gemini for the hard queries, ship without rewriting your inference layer.
One constraint worth flagging: you still can't bring your own fine-tuned weights to the on-device runtime. If you need a domain-specific model, that's Core AI's territory, not Foundation Models.
System tools: Vision and Spotlight, built in
The framework now ships tools the model can call directly. Two are Vision-backed — BarcodeReaderTool and OCRTool — letting the model reason over visual information without you wiring up Vision yourself.
The more interesting one is a Spotlight-powered search tool that enables fully local Retrieval-Augmented Generation. Queries route through Spotlight first, then the results feed into the model's context. This replaces the custom vector-database infrastructure teams were standing up for on-device search — for a lot of use cases, Spotlight already had the index.
Dynamic Profiles: the agentic primitive
This is the big new building block for agentic apps. A DynamicProfile is a declarative description of a session's instructions and tools — and a single session can swap between profiles as the app's mode changes, while preserving conversation history.
The WWDC sample is a crafts app that flips between a "craft analysis" mode and a "brainstorm" mode. You describe each profile as a struct:
struct CraftProfile: LanguageModelSession.DynamicProfile {
var body: some DynamicProfile {
Profile {
Instructions {
"""
You are an expert crafting assistant. \
Record craft project image analyses \
using the recordImageAnalysis tool.
"""
}
RecordImageAnalysisTool()
}
}
}
let session = LanguageModelSession(profile: CraftProfile())
And here's where it gets powerful — you can vary the model and reasoning level per profile branch. Quick analysis stays on the fast on-device model; the open-ended brainstorm escalates to Private Cloud Compute with deep reasoning — all within one profile, history intact:
struct CraftProfile: LanguageModelSession.DynamicProfile {
let states: CraftProjectStates
var body: some DynamicProfile {
switch states.mode {
case .craftAnalysis:
Profile {
Instructions { /* ... */ }
RecordImageAnalysisTool()
SwitchModeTool(states: states)
}
case .brainstorm:
Profile {
Instructions { /* ... */ }
BrainstormRecordTool()
}
.model(states.privateCloudCompute)
.reasoningLevel(.deep)
}
}
}
A profile resolves to exactly one active branch at a time. The model itself can drive the transition by calling a tool — here a SwitchModeTool that mutates app state, which the framework observes to rebuild the session:
struct SwitchModeTool: Tool {
let description = "Switch to a different mode."
let states: AppStates
@Generable
struct Arguments {
let mode: Mode
}
func call(arguments: Arguments) async throws -> some PromptRepresentable {
appStates.mode = arguments.mode
return "Successfully switched to \(arguments.mode)."
}
}
This is the cleanest expression yet of multi-tool, multi-mode agent behavior on Apple platforms — declarative, type-safe, and with model selection as a first-class part of the design.
Evaluations: measuring quality, not vibes
Anyone who's shipped an LLM feature knows the trap: you tweak a prompt, it feels better, you ship, and you have no idea if you actually improved anything or just got lucky on three test prompts. Apple's new Evaluations framework is a Swift framework for quantifying accuracy as you iterate, so you can understand the statistical impact of a change before shipping. This is the boring-but-essential piece that makes the rest production-viable.
Tooling: an fm CLI and a Python SDK
Two additions that broaden where these models live.
In macOS 27, an fm command-line tool brings on-device and PCC models to the terminal — fm chat for interactive use, or pipe text through it in shell scripts to summarize, extract, or generate.
And there's a Python SDK exposing the same on-device model as the Swift framework, aimed at data scientists and researchers:
import apple_fm_sdk as fm
model = fm.SystemLanguageModel()
is_available, reason = model.is_available()
if is_available:
session = fm.LanguageModelSession(model=model)
response = await session.respond(prompt="Hello!")
print(response)
Open source — and it runs on Linux
The quiet bombshell: the Foundation Models framework is going open source this summer, and because it's Swift, it runs wherever Swift runs — including Linux servers. Alongside it, Apple is shipping a Foundation Models framework utilities package, updated between OS releases, with emerging building blocks: transcript management, a skill API, and a chat-completions interface for interop.
For a framework that started life as strictly on-device, Apple-platform-only, this is a remarkable reframing. The session API you write against on iPhone is, increasingly, a portable abstraction.
What this means for how you build
Put it together and the recommended pattern for an app shipping in the iOS 27 cycle is a deliberate hybrid:
- Use the on-device model for fast, free, offline, privacy-sensitive work — now with vision and better reasoning.
- Escalate to Private Cloud Compute when you need a bigger context window or deeper reasoning, free under two million downloads, with the same privacy posture.
- Route to Claude or Gemini for the genuinely hard queries — same session code, swapped via SPM.
- Compose it all with Dynamic Profiles, and validate it with the Evaluations framework before you ship.
The model is no longer a fixed dependency baked into your architecture — it's a configuration choice you can change without touching your logic. That's the real story of WWDC 2026: not any single model, but the abstraction layer that finally makes model choice a runtime decision.
A couple of caveats to keep on your radar: there's still no bring-your-own-fine-tune path for the on-device runtime (that's Core AI), and Siri AI features won't be available in the EU or China at iOS 27 launch — EU developers can't even test them during development — which matters if your user base spans those regions.
Where to start
Download the Xcode 27 beta, run fm chat in Terminal to feel out the on-device model, then pull Apple's sample app and get hands-on with Dynamic Profiles and the Evaluations framework. The deep-dive sessions on PCC, evaluations, the Xcode instrument, and dynamic profiles are where the real detail lives.
Top comments (0)