Apple Just Quietly Bet Its AI Future on Google
Yesterday, Apple unveiled a new AI architecture that, for the first time, is built around Google's Gemini models. Paired with the new Core AI framework (Apple's developer-facing runtime for running models locally on Apple silicon), this is the most consequential shift in Apple's AI strategy since the launch of Apple Intelligence.
Let's break down what actually changed, what the new Core AI framework does, and what it means if you're building iOS or macOS apps in 2026.
The Headline: Apple + Gemini = The New Default
Until now, Apple Intelligence relied on a mix of:
- Apple's own on-device foundation models (roughly 3B parameters)
- OpenAI's GPT-4o for the optional "Writing Tools" cloud fallback
- Private Cloud Compute for heavier tasks
With yesterday's announcement, Gemini replaces GPT-4o as Apple's primary cloud LLM partner, and Core AI becomes the unified runtime for invoking any model — Apple, Gemini, or third-party — from a single Swift API.
This is bigger than a vendor swap. Apple is signaling that:
- On-device is still the default for privacy and latency.
- Gemini is the cloud escalation path when a query is too complex for the local model.
-
Developers get a single API (
CoreAI.Model) to call any supported model without writing glue code.
What's Actually in Core AI
The new CoreAI framework (documented at developer.apple.com/documentation/coreai) is Apple's answer to the fragmentation problem. Instead of juggling Core ML, Create ML, the Foundation Models API, and ad-hoc URLSession calls to OpenAI/Anthropic, you now get one runtime.
Key capabilities:
1. Unified Model Interface
import CoreAI
let session = LanguageModelSession(
model: .gemini,
fallback: .apple("apple-foundation-3b")
)
let response = try await session.respond(to: "Summarize this contract")
That same call also works with .claude, .llama, or any local GGUF you drop into your app bundle.
2. Automatic Routing
Core AI inspects the prompt, your privacy tier, the device's thermal state, and network conditions, then picks the right model automatically. Simple queries stay on-device; complex ones escalate to Gemini in the cloud; sensitive prompts never leave the Secure Enclave.
3. Tool Calling, Native
Function-calling works the same way regardless of backend. You define a @Tool macro, register it with the session, and Core AI handles prompt formatting differences between Gemini, Claude, and local models.
4. Streaming + Structured Output
First-class support for AsyncSequence<String> streams and Decodable return types. No more manual JSON-mode hacks.
Why Gemini Specifically?
Three reasons keep coming up in Apple's developer briefings:
- Multimodal parity. Gemini's native audio/image/video understanding is more mature than Apple's in-house models, which is why Siri's new visual features needed it.
- Cost. After the latest pricing war, Gemini undercuts GPT-4o by roughly 40% on input tokens — meaningful when Siri handles billions of requests a day.
- TPU supply. Apple has been quietly renting Google's TPU pods for Foundation Model training. The Core AI deal is rumored to be a bundled compute + license agreement.
The OpenAI partnership isn't dead — Writing Tools still let users pick ChatGPT as an alternative escalation — but Gemini is now the default Siri intelligence in iOS 19 and macOS 16.
What This Means for Developers
The Good
- One API to learn. If you've been writing separate code paths for OpenAI and on-device, you can collapse them.
- Better offline behavior. Core AI's routing means your app will work on a plane without you writing a network check.
-
Structured outputs are finally first-class.
languageModel.respond(to: UserQuery(), generating: Recipe.self)is a beautiful Swift idiom.
The Gotchas
- Gemini calls still need a privacy disclosure. Even though Apple routes them, the App Store guidelines require a manifest entry for any third-party AI provider your app invokes.
- Local model size matters. Core AI will run a 3B model on an A17 Pro, but a 7B will need an M-series chip. Plan your app bundle size accordingly.
- Latency variance. Cloud escalations add 300–800ms. If you're building a real-time UI, prefer prompts that fit the on-device model.
The Bigger Picture
Apple is making a bet that the future of consumer AI is hybrid: small, fast, private models on-device, with a much larger model in the cloud as a fallback. That's not a new idea — it's exactly what Google has been doing with Gemini Nano on Pixel phones — but Apple's twist is putting developers, not end users, at the controls.
The Core AI framework effectively turns every iPhone, iPad, and Mac into a Gemini client with on-device intelligence as a fallback. For Apple, that's a privacy story and a developer story at the same time. For Google, it's distribution on a scale Android can only dream of.
For us as builders, the lesson is simple: stop hard-coding a single model provider. The next year of mobile AI is going to look more like a runtime decision than an architectural one.
What's your take? Will Core AI change how you structure your iOS/macOS apps, or do you prefer to stay model-agnostic with your own abstraction layer? Let me know in the comments.
Sources: MacRumors, Apple Developer Documentation (Core AI), Hacker News discussion.
Top comments (0)