The Apple-Gemini Deal Anthropic Wasn't In: Google Cloud Next 2026

#ai #llm #apple #agents

Book: AI Agents Pocket Guide
Also by me: Prompt Engineering Pocket Guide
My project: Hermes IDE | GitHub — an IDE for developers who ship with Claude Code and other AI coding tools
Me: xgabriel.com | GitHub

April 22, 2026. Las Vegas. Day one of Google Cloud Next. Thomas Kurian walked the keynote audience through the usual roadmap and then said the part that ended up on every Apple-watcher's feed by lunchtime: Gemini will underpin the next generation of Apple Foundation Models, and a more personal Siri built on that stack will ship later this year. The joint Apple-Google statement is the primary source. The 9to5Mac recap and the MacRumors writeup both have the timeline straight.

If you were paying attention in January, this isn't a new deal. Apple confirmed the Gemini partnership three months ago, with Tim Cook on the record about the privacy posture. What changed at Cloud Next is the public framing: Google now puts Apple on its slide as a flagship cloud customer, and Kurian is willing to say "later this year" on stage in front of an audience that will hold him to it.

Worth naming the team that wasn't on the slide. Anthropic. Through 2024 and 2025 Anthropic was widely reported as a contender to land the Apple deal. Claude on Siri was, in some quarters, a foregone conclusion. The conclusion was wrong.

What Apple actually bought, and what it didn't

Read the press carefully. Apple did not move its inference to Google Cloud. The architecture has three layers and only one of them is new.

The on-device model on your iPhone is still Apple's. Per AppleInsider's reporting, Apple is distilling Gemini into smaller variants suited for on-device inference. Distillation: train a smaller student model on the outputs of a larger teacher. The student lives on the Neural Engine. Gemini is the teacher; the model that runs on your phone is still Apple's.

The mid-tier Private Cloud Compute layer also stays Apple. PCC is Apple's verifiable, attested server tier that runs on Apple silicon in Apple data centers, with the privacy guarantees Apple has been advertising since WWDC 2024. The Apple Intelligence wikipedia entry tracks the shape of it. Nothing about the Gemini deal moves PCC to Google.

The new layer is the heavy server-side reasoning that didn't fit in either of the above. The "more personal Siri" demoed in 2024 and quietly delayed in 2025 needs context-aware reasoning over your mail, calendar, messages, and screen content. The model size that does that well is not the model size that fits on an iPhone. That tier is where Gemini comes in, served from Google Cloud infrastructure, accessed through Apple's privacy-preserving plumbing.

Tim Cook went on record saying privacy rules don't change. The joint statement is consistent with that: Apple keeps the user-data boundary, Google provides the model and the cloud underneath. Whether you fully buy the privacy story is a separate post. The architectural shape is what matters here.

What it means for app developers

If you build an iOS app and have been writing against Apple's Foundation Models framework, here is what is and isn't changing.

The Foundation Models API surface stays. Apple introduced it at WWDC 2024 as the route through which third-party apps can call the on-device LLM with the same privacy guarantees Apple uses for itself. That API does not become a Gemini API. It stays an Apple API. What changes is the model behind it gets better, because Apple is training it against a Gemini-derived teacher.

You will still hit the on-device model first. The on-device model has a hard constraint set: small context window, narrow token vocabulary on some intents, refusal behavior tuned for what fits. When the on-device model declines or returns low confidence, the system today either falls back to PCC or, in the Apple Intelligence flow, can defer to ChatGPT with user opt-in. Later this year, the deferral target for some queries becomes the Gemini-backed cloud tier. App developers don't pick the route. Apple picks.

What this means in practice. Your code does not need to know whether Gemini handled a request. Your code does need to handle the case where the on-device model declines, refuses, or returns a lower-quality output, and you need to decide what to do next.

A small routing pattern in Foundation Models

The Foundation Models framework lets you specify whether a session should fall back to a remote model. Today that fallback can be Apple's PCC tier or, with explicit user consent, an external provider through Apple Intelligence. Either way, your app code looks roughly like this.

Pseudocode. Names below are illustrative; verify the current FoundationModels symbols against your target SDK before pasting.

import FoundationModels

enum AnswerSource {
    case onDevice
    case cloud
    case declined
}

struct Answer {
    let text: String
    let source: AnswerSource
}

actor AnswerService {
    private let onDevice = LanguageModelSession(
        model: .systemDefault
    )
    private let cloud = LanguageModelSession(
        model: .systemCloudFallback
    )

The service itself is small. On-device first, cloud second, declined as a real state.

    func answer(_ prompt: String) async throws -> Answer {
        do {
            let r = try await onDevice.respond(
                to: prompt,
                options: .init(temperature: 0.2)
            )
            if r.confidence >= 0.6 {
                return Answer(
                    text: r.content,
                    source: .onDevice
                )
            }
        } catch LanguageModelError.declined {
            // on-device refused; try cloud
        } catch LanguageModelError.contextWindowExceeded {
            // prompt too big; cloud handles it
        }

        do {
            let r = try await cloud.respond(
                to: prompt,
                options: .init(temperature: 0.2)
            )
            return Answer(text: r.content, source: .cloud)
        } catch LanguageModelError.declined {
            return Answer(text: "", source: .declined)
        }
    }
}

What this gives you:

A predictable contract for your view layer: you always get an Answer, and you know whether it came from the device or the cloud.
A confidence floor that keeps low-quality on-device responses from leaking into your UI when a higher-quality cloud answer is available.
An explicit declined state your UI can render. "I can't answer that here" beats an empty string, which matters more once cloud routing involves a remote provider users may have opted out of.

The piece you don't write is the remote provider. The system picks it. Today it's PCC and in some cases ChatGPT. Later this year, for some queries, it's Gemini through Google Cloud. That choice is not yours to make in app code, and that is the point of the Foundation Models abstraction.

Three implications for the stack

1. Server-side reasoning is no longer the on-device-first story. Apple spent 2024 and most of 2025 pitching Apple Intelligence as on-device first, with PCC as the privacy-preserving overflow. The Gemini deal admits a third tier exists: bigger models, served from Google's infrastructure, for queries that need them. If you are building an app whose value depends on heavy reasoning, you can stop pretending the on-device model will catch up by next October. It will not. The cloud path will carry that load.

2. Anthropic is now competing one layer down. On the read of this announcement, Anthropic's consumer-grade exposure on iPhone narrows. Their growth lane is the developer surface: Claude Code, the API, MCP, agentic frameworks. If you ship agents in your own product, the Anthropic surface is still the most code-fluent in the field. That has not changed. What has changed is the assumption that "Claude on Siri" was inevitable. It was not.

3. The Foundation Models abstraction is doing real work. App code written against LanguageModelSession in 2024 keeps running in 2026 with no change, even though the model behind the cloud session is a Google model now. That is the kind of platform abstraction that pays for itself. If you are building your own LLM-powered product without a similar abstraction (every call hardcoded to a vendor's SDK, every prompt template tied to a model family), the lesson here is to add one. Apple did not name a vendor in the framework. You should not either.

The on-device narrative survives, with a footnote

Apple's privacy story in April 2026 is the same as it was in April 2024. On-device when possible. PCC when the device can't. External providers come in only with the user's awareness, and with no user data persisted at the provider. Today that means ChatGPT. Later this year, the Gemini-backed cloud tier joins the mix. The AppleInsider explainer is worth reading if you want the careful version.

What changed is what the third tier looks like under the hood. The teacher model for distillation, and the heavy-context server tier for the more-personal Siri, both come from a Google relationship now. Anthropic is not in the picture at that layer.

For app developers, the practical guidance is small and concrete. Keep writing against Foundation Models. Build the routing pattern above. Render a meaningful UI for the cases where the on-device model declines or low-confidences. Stop assuming the cloud fallback is your problem to design. Apple owns it, and Apple just announced who Apple is calling for it.

The interesting question is the one Kurian didn't answer on stage. What happens to the on-device model's refusal rate as the cloud tier gets stronger. The cheap path is to send more queries to the cloud. The privacy path is to keep on-device first. Apple has spent a decade picking the second answer. The Gemini deal is what they bought to keep picking it.

If this was useful

The AI Agents Pocket Guide covers the orchestration patterns the Foundation Models routing above turns into when your app has tools, retrieval, and multi-step plans, including the boundary work between local and remote models. The Prompt Engineering Pocket Guide covers the prompt-portability question this whole post is shaped by: how to write prompts that survive a model swap, because in 2026 the model behind your API call may not be the one you started with.