Sanjay Chauhan

Posted on May 19 • Originally published at blog.stackademic.com

I Spent £300 on AI API Calls for a Kids' App — Then I Built a Router That Cut It to £40

#ios #swift #ai #opensource

Three AI providers, one Swift app, and the routing layer I wish someone had built before me.

Last November, I started building a kid story app for my son, so that he can move away from youtube shorts or other things. As these online videos were really impacting on his mind and I could see the change in his behaviour and it was getting difficult to move him away from these platforms. The idea was dead simple he picks a character, a setting, maybe a mood — and the app generates a short illustrated story. Personalised fairy tales. Parents love it, kids love it, everybody’s happy.

The prototype worked brilliantly. I wired up OpenAI’s API — GPT-4o for the story text, DALL-E 3 for the illustrations — fed it some careful prompts, and out came these charming little stories about brave hedgehogs and underwater castles. Beautiful watercolour-style images, too. My son was delighted. My API bill was not.

£94 in the first week. For a prototype. With one user.

The rough maths: each story involved 3–4 DALL-E 3 illustrations plus the text generation. The images were the killer — $0.04 to $0.08 each at standard quality, and my son wanted the HD ones (of course he did). With text generation on top, each complete story was costing around $0.35–0.50. And he was requesting around 4–5 stories everyday and sometime even more. During development, I was also burning through test runs at maybe 20–30 a day, tweaking prompts, adjusting character descriptions, iterating on illustration styles. It adds up sickeningly fast when your primary user has unlimited bedtime leverage and your secondary user (of course me) can’t stop fiddling with the prompt engineering.

I panicked, obviously, I did not want such a load on my pocket. Switched to a cheaper model — gpt-4o-mini instead of gpt-4o. Stories got noticeably worse. Tried Gemini Flash cheaper still, but the tone wasn’t right for children’s content — it kept producing these weirdly formal sentences that sounded like a textbook trying to be whimsical. Went back to GPT-4o for the complex stories and tried to use the cheaper models for simple stuff like generating character names. Now I had three different SDK imports, three different authentication flows, and an if-else tree in my networking layer that was starting to be a messy code.

That’s when I stopped coding the app and started building what it actually needed a router.

Here’s what I kept running into, and I suspect you have too if you’ve added AI to an iOS app. Every provider has its own Swift SDK. Anthropic’s works differently from OpenAI’s, which works differently from Google’s. Fair enough — they’re different companies with different APIs.

But the downstream effect is brutal. You end up with something that reads roughly like this (simplified, but not by much):

func generateStory(prompt: String, complexity: StoryComplexity) async throws -> String {
    switch complexity {
    case .simple:
        // character names, descriptions — doesn't need a big model
        let gemini = GenerativeModel(name: "gemini-2.0-flash", apiKey: geminiKey)
        let resp = try await gemini.generateContent(prompt)
        return resp.text ?? ""
    case .medium:
        let openai = OpenAI(apiToken: openAIKey)
        let query = ChatQuery(messages: [.init(role: .user, content: .string(prompt))!],
                              model: .gpt4_o_mini)
        return try await openai.chats(query: query).choices.first?.message.content?.string ?? ""
    case .complex:
        // full narrative — needs the good stuff
        let anthropic = Client(apiKey: anthropicKey)
        let msg = try await anthropic.createMessage(.init(model: .claude_3_5_sonnet,
                                                          messages: [.init(role: .user, content: prompt)]))
        return msg.content.first?.text ?? ""
    }
    // TODO: what happens when one of these goes down? what about timeouts?
    //       what about Anthropic's rate limits at 3am? oh yeah.
    //       oh and DALL-E calls for illustrations are in a completely different function.
    //       this is getting out of hand.
}

Three SDKs. Three response types to unwrap. Three sets of API keys in the Keychain (or worse, hardcoded — don’t look at me like that, it was a prototype). Three different error types to handle. Three different rate limit policies to stay under — Anthropic’s per-minute limits are different from OpenAI’s, and I learned this the hard way at 11pm when the story app started throwing 429s mid-bedtime. And the routing logic — “is this prompt complex enough to justify a £0.03 API call, or can I get away with the £0.001 one?” — is scattered through the business logic where it absolutely does not belong.

I looked around for a framework that solved this in Swift. Something like LangChain but built natively for iOS, that felt like Swift rather than a Python port, and didn’t drag in a 200MB dependency. I found some options good ones, actually but none of them did the specific thing I needed.

What I Found

I spent a weekend evaluating every Swift AI library I could find. Read through source code, checked commit histories, ran the example projects where they existed. Filed two actual issues in other repos while I was at it.

Here’s what I learned:

MacPaw’s OpenAI library is the most mature option at nearly 3,000 stars. If you’re committed to OpenAI exclusively, stop reading and go use it — it covers the full API surface (chat, images, audio, realtime, even the new responses endpoint) and has 80+ contributors behind it. My problem was that I wasn’t committed to OpenAI exclusively.

The Swift AI SDK ports Vercel’s AI SDK to Swift and supports 37 cloud providers. If you’re building server-side Swift in Vapor or if you need niche providers like Moonshot, Together, or xAI, this is where to look. It also already has MCP tool support, which I’m envious of. But it’s cloud-only — no on-device inference, no MLX, no Apple Foundation Models. As I also needed on device model for voice over generation using MLX swift, which really saved the money without using cloud ai for voice over, thats a different topic and stream which I might cover some other day.

AIProxySwift takes a different approach entirely. It solves the API key security problem with a split-key encryption proxy — your actual API key never ships in the binary. If key protection is your primary concern and you don’t want to build a backend, it’s a uniquely smart solution. But it’s not trying to unify anything. Each provider still has its own service interface.

And

mi12labs’ SwiftAI has a genuinely clean pattern for on-device-to-cloud fallback SystemLLM.ifAvailable ?? OpenaiLLM. Elegant. But it's limited to Apple Foundation Models plus OpenAI, and the routing is manual — nil-coalescing, not intelligent scoring.

All solid. But they all shared the same blind spot.

Routing wasn’t a first-class concern in any of them. Not one could look at a prompt and decide, on its own, whether it needed a £0.03 cloud call or could run for free on-device. Spending? Untracked. And the idea of automatically saying “this prompt contains a child’s name, keep it on-device” — nobody had touched that.

They’re solving different problems and solving them well. I still recommend any of them depending on your situation (see the section at the end on when NOT to use Arbiter). But I needed something that sat a layer above individual providers and made intelligent decisions about where to send each request.

So I built one.

Arbiter: The Idea

Arbiter started as a routing layer inside my story app and grew into an open-source framework. The core concept is simple enough one API for every AI provider, with a router that decides where each request goes.

The architecture has three tiers:

+--------------------------------------------------------+
|                   Intelligent Router                   |
|       Analyse Request -> Match -> Score -> Route       |
+------+-------------+--------------+-------------+------+
       |             |              |             |
   Cloud APIs    Local Server    On-Device ML    Apple FM
  (Anthropic,    (Ollama)        (MLX on Apple   (iOS 26+)
   OpenAI,                        Silicon)
   Gemini)

You write one call. Arbiter picks the provider whether cloud or local provider. Here’s what the multi-provider setup actually looks like — and this is the real API, not pseudo-code.

import Arbiter

let ai = try Arbiter {
    try $0.cloud(.anthropic(from: .keychain))
    try $0.cloud(.openAI(from: .keychain))
    try $0.cloud(.gemini(from: .keychain))
    $0.local(OllamaProvider())
    $0.local(MLXProvider(.auto))  // picks the best model for your device's RAM
    // 8GB? Gets Qwen 2.5 7B. 16GB? Gets Qwen 2.5 14B. 32GB+? Llama 3.3 70B.
    $0.system(AppleFoundationProvider())
    $0.routing(.smart)
    $0.spendingLimit(5.00, action: .fallbackToCheaper)
    $0.privacy(.strict)  // enables PII detection
}
// Arbiter analyses the prompt and routes automatically
let story = try await ai.generate(storyPrompt)

No switch statement. No manual provider selection. API keys live in the Keychain — if you try to hardcode them as strings, the compiler will give you deprecation warning. That last detail made me smile. I've shipped hardcoded keys to TestFlight exactly once and I'd rather not revisit the experience.

The Routing Engine

The honest-to-god reason I kept building this past the “it works for my app” stage is the router. I couldn’t find anything like it in the Swift world and it felt like a gap worth filling.

When a request comes in, Arbiter runs it through a RequestAnalyser before sending anything to any provider. The analyser classifies the prompt across four dimensions, complexity (trivial through expert), task type (classification, code generation, reasoning, translation, creative writing, etc.), expected output length, and estimated cost per available provider.

Let me make this concrete. Here’s roughly what happens when I ask for “a name for a brave rabbit character”.

RequestAnalyser:
  Complexity: trivial (short prompt, single-output task)
  Task type: creative/classification
  Expected output: ~10 tokens

Scoring (with .smart strategy):
  Apple FM:    capability 0.8 + quality 0.5 + latency 1.0 + privacy 1.0 + cost 1.0 = 4.3
  MLX (Qwen):  capability 0.8 + quality 0.6 + latency 0.9 + privacy 1.0 + cost 1.0 = 4.3
  Ollama:      capability 0.8 + quality 0.7 + latency 0.7 + privacy 0.8 + cost 1.0 = 4.0
  Claude:      capability 1.0 + quality 1.0 + latency 0.3 + privacy 0.2 + cost 0.1 = 2.6

Tie-break: both free, so latency wins → Apple FM (1.0) beats MLX (0.9)
Route: Apple Foundation Models

And when I ask for “a 500-word story about a hedgehog who learns to share, with dialogue and a gentle moral”

RequestAnalyser:
  Complexity: complex (long-form creative, dialogue, emotional arc)
  Task type: creative writing
  Expected output: ~700 tokens

Scoring (with .smart strategy):
  Claude:      capability 1.0 + quality 1.0 + latency 0.3 + privacy 0.2 + cost 0.3 = 2.8
  GPT-4o:      capability 1.0 + quality 0.9 + latency 0.4 + privacy 0.2 + cost 0.3 = 2.8
  Apple FM:    capability 0.3 + quality 0.3 + latency 1.0 + privacy 1.0 + cost 1.0 = 3.6

Apple FM scored higher? Not quite. For complex tasks, the
router applies a complexity multiplier that upweights capability
and quality. After adjustment:

  Claude:      2.8 × complexity_boost = 4.1  ← winner
  Apple FM:    3.6 × complexity_penalty = 2.4

Route: Anthropic Claude

The exact weights change depending on your routing strategy. .costOptimized hammers the cost factor. .privacyFirst hammers privacy. .qualityFirst hammers capability and quality. .smart tries to balance everything.

What makes this work over time is the ProviderPerformanceTracker. It's an actor (needed for thread-safe concurrent access from multiple requests) that records success rates, latency, and token throughput per provider per task type. After about 10 requests, it has enough data to start adjusting scores automatically based on real-world performance, if Gemini keeps timing out on creative writing tasks but crushes classification, the router learns that and stops making the same mistake. Data persists across launches via UserDefaults currently.

The £300 → £40 Breakdown (Since I Put It in the Title)

I should probably substantiate this. Here’s the rough breakdown from my story app, comparing the first four months:

Before Arbiter (November–December — GPT-4o text + DALL-E 3 HD for everything). Each complete story text generation (~$0.02) + 3 illustrations at DALL-E 3 HD ($0.08 each = $0.24) ≈ $0.26 per story. My kid’s usage ~5 stories/day × 30 days = 150 stories/month ≈ $39/month. My development testing, easily another 15–25 test runs/day in the early weeks ≈ $30–50/month. Worst month (November, heavy dev + usage): ~£74 ($94) Cumulative Nov–Dec: ~£140.

The first week, I was testing constantly — dozens of prompt variations, illustration style tweaks, trying different models. That’s where the £94 came from. Pure kid-usage alone was more like £30–35/month, but I couldn’t separate my API key from his at that point. Lesson learned.

After Arbiter (February–March — intelligent routing): Same ~150 stories/month from my son, but now routed:

Simple tasks (character names, mood selection, colour palettes): Apple FM → free
Medium tasks (short character descriptions, scene-setting): Gemini Flash text → ~£0.10/month
Complex text **(full stories with dialogue, emotional arc): Claude or GPT-4o → £4/month
**Illustrations: Arbiter doesn’t route image generation (that’s still DALL-E directly), but I separately switched to standard quality for background scenes and only kept HD for hero illustrations → £8/month (down from £20) Monthly total: £12–14/month

The title’s “£300 → £40” is cumulative over the first 4 months — two months of spending much money, then two months with routing. The per-month improvement was roughly £55 → £13, which is a 76% reduction. The biggest single win was Arbiter routing simple text tasks to Apple FM for free. The second biggest was me realising, separately from Arbiter that half the illustration requests didn’t need HD quality.

The takeaway that applies whether or not you use Arbiter, audit your API calls by task type. Most apps send every request to the same model at the same quality tier. Classifying by complexity and routing accordingly even manually with an if statement will cut your bill substantially.

What I Tried First

My first attempt at routing was embarrassingly naive. I wrote a protocol called AIProvider, made each SDK conform to it, and then wrote a Router class with a giant priority list:

// Attempt 1: the priority list (December 2024)
// Spoiler: this lasted about a week

protocol AIProvider {
    func generate(_ prompt: String) async throws -> String
}

class Router {
    let providers: [(priority: Int, provider: AIProvider)]

    func route(_ prompt: String, complexity: Int) async throws -> String {
        // just pick the cheapest provider that meets the complexity threshold
        let sorted = providers.sorted { $0.priority < $1.priority }
        for entry in sorted where entry.priority >= complexity {
            return try await entry.provider.generate(prompt)
        }
        throw RouterError.noProvider
    }
}

Simple requests go to provider 1, medium to provider 2, complex to provider 3. It worked for about a week.

Then I noticed Gemini was returning worse results for certain types of children’s content. Not bad, just… off. The tone wasn’t warm enough sentences like “The rabbit proceeded to the forest clearing” instead of “The rabbit hopped through the tall grass until he found his favourite spot.” But my router didn’t care about result quality. It only knew about complexity as a number. So it kept sending children’s story prompts to the cheapest provider, and I kept manually overriding it.

The second attempt used a strategy pattern you’d define routing strategies and compose them. Better, but the composition logic got hairy fast. What happens when the privacy strategy says “on-device only” but the complexity strategy says “this needs cloud”? I had conflicts everywhere and no clean way to resolve them and things were getting messy around. The strategies were just fighting each other.

The version that stuck — the one in Arbiter now treats routing as a weighted scoring system. Each available provider gets scored across multiple dimensions (capability, cost, privacy compliance, latency history, current availability), and the highest-scoring provider wins. Ties are broken by cost first, then latency. If the winner fails, the router falls back to the next-highest scorer automatically.

Write on Medium
Not glamorous, but it’s been running in my story app since January. The routing decisions haven’t needed manual overrides so far — though the project is early and I’m certain edge cases are lurking. The RoutingDebugView has been invaluable for catching scoring surprises before they become problems.

SwiftUI Components (The Bit I Built for Debugging and Now Can’t Live Without)

Four SwiftUI views ship with Arbiter. The one that matters most is RoutingDebugView:

import Arbiter

struct DebugScreen: View {
    let ai: Arbiter

    var body: some View {
        VStack {
            // the chat interface
            ArbiterChatView(ai: ai)

            // live routing decisions — shows why each request
            // went where it went
            RoutingDebugView(router: ai.smartRouter)
        }
    }
}

The debug view shows a live feed: timestamp, which provider was selected, the detected complexity and task type, the score breakdown, and what alternatives were considered. It’s the kind of tool I built because I was tired of adding print() statements and it turned out to be genuinely useful for tuning routing weights.

The other three: ArbiterChatView is a drop-in chat interface with streaming animation, provider badges on each response, and dark mode. ProviderPicker shows configured providers with real-time availability status. UsageDashboard renders spending per provider with bar charts. I added this after the £94 shock because I never wanted to be surprised by a bill again. And one modifier I'd call essential: .swiftAILifecycle(ai)attach it to your root view and Arbiter automatically unloads on-device models when the system reports memory pressure. Without this, MLX models can hold 2-4GB of RAM that your app could reclaim.

Beyond Simple Text Generation

Three features I use daily that I haven’t mentioned yet.

Streaming works exactly how you’d expect — same routing intelligence, just token-by-token.

let stream = ai.stream("Write a bedtime story about a hedgehog named Amber.")
for try await chunk in stream {
    storyText += chunk.delta
}

Every provider has its own streaming format under the hood (Anthropic’s SSE, OpenAI’s chunked responses, MLX’s token callbacks). Arbiter normalises them all into chunk.delta. Your UI code doesn't care which provider was selected.

Structured output is the one that saves me the most boilerplate. Instead of parsing JSON from a raw string response:

struct StoryOutline: Codable {
    let title: String
    let scenes: [String]
    let moral: String
}

// Arbiter handles the JSON schema, parsing, and validation
let outline: StoryOutline = try await ai.generate(
    "Create a 3-scene story about a kid who learns patience",
    as: StoryOutline.self,
    example: StoryOutline(title: "", scenes: [], moral: "")
)

// outline.scenes is a real [String], not a blob of JSON to parse
This works across all providers. No manual JSON extraction, no JSONDecoder boilerplate, no "the model returned malformed JSON" errors at 2 am. The example parameter gives the model a schema hint for complex types.

Conversations maintain state across turns with an @Observable session — your SwiftUI views update automatically:

@State private var session = ConversationSession(
    systemPrompt: "You are a storyteller for children aged 3-6."
)

// later, in your view:
try await session.send("Tell me about a brave rabbit", using: ai)
try await session.send("Now make her find a treasure", using: ai)
// session.messages updates your UI automatically via @Observable

Each message in the conversation gets routed independently. The first turn might go to Apple FM (simple greeting), the next to Claude (complex narrative). The session handles the context window across provider switches.

Tool calling works across any provider that supports it — cloud providers and Apple FM all expose tools. The router factors tool capability into its scoring, so if your request needs tools, it won’t route to a provider that can’t handle them. I use this in the story app for structured character generation: the model calls a createCharacter tool that returns typed data rather than hoping the prose output contains the right fields.

Privacy, Security, and Caching

This was non-negotiable for a kids’ app. If a prompt contains my kid’s name, I don’t want it leaving the device. Full stop.

Arbiter handles this at two levels. At config time, .privacy(.strict) enables the PrivacyGuard, which scans prompts for PII patterns — email addresses, phone numbers, SSN-like sequences, and credit card numbers using regex patterns. Not perfect (regex never is for this), but it catches the obvious stuff.

At request time, you can tag prompts explicitly:-

// .health tag forces on-device routing — the prompt never hits a cloud API
let options = RequestOptions(tags: [.health])
let response = try await ai.generate("Summarise my blood pressure readings", options: options)

Built-in tags: .private, .health, .financial, .personal. Or define your own with RequestTag("legal"). Any tagged request is constrained to on-device providers. If no on-device provider is available, the request fails rather than silently sending sensitive data to the cloud. I'd rather show an error than leak PII.

The security layer goes deeper than routing. Arbiter ships with a RequestSanitiserMiddleware that catches prompt injection attempts ("ignore previous instructions…"), enforces rate limits, blocks oversized prompts, and rejects empty inputs. The LoggingMiddleware automatically redacts API keys and bearer tokens from log output — so sk-ant-api03-abc... becomes sk-ant-***REDACTED*** even at verbose log levels. For a kids' app, I sleep better knowing both of these are on by default.

$0.middleware(LoggingMiddleware(logLevel: .standard))
$0.middleware(RequestSanitiserMiddleware(requestsPerMinute: 30))

There’s also a ResponseCache that I added after realising my kid asks for "the rabbit story" almost every other night. Same prompt, same response, no reason to hit the API twice:

let cache = ResponseCache(maxEntries: 500, ttl: .seconds(300))
// or persist across sessions:
let diskCache = ResponseCache(maxEntries: 1000, ttl: .seconds(600), persistence: .disk)

We know how costly these AI providers can become and caching to save on repeated prompts is no unknown. That cache alone probably saves me another £2–3/month on repeated prompts.

When Providers Fail (Rate Limits, Outages and Exhausted Quotas)

If you’ve shipped an app with AI, you’ve written the “hope and retry” pattern: catch the 429, sleep for a bit, try again, eventually give up and... show an error? Try a different provider with completely separate code? It's ugly, it's brittle, and the next request goes right back to the rate-limited provider because your app learned nothing.

After breaking this twice in testing, I ended up settling on three fallback layers.

Retry with backoff. If Anthropic returns a 429, Arbiter retries with exponential backoff (500ms, 1s, 2s). Configurable: $0.retry(maxAttempts: 3, baseDelay: .milliseconds(500), maxDelay: .seconds(30)).

Automatic fallback. If all retries fail, the request goes to the next-highest-scoring provider. If that fails, it tries the next. Your app code sees none of this — it just gets a response, slightly slower than usual.

Adaptive learning. The ProviderPerformanceTracker records every failure. If Anthropic starts rate-limiting frequently, its success rate drops, the tracker applies a score penalty, and future requests naturally route away until it recovers. The router doesn't just retry — it learns which provider is having a bad day.

And the budget version, when you’ve blown through your daily API spend, SpendingGuard drops cloud scores to zero and requests fall through to free on-device providers:

$0.spendingLimit(5.00, action: .fallbackToCheaper)

No error for the user — just slightly different responses. My story app does this with a £2/day cap. I’ve had nights where both Anthropic and OpenAI were misbehaving and the app quietly served stories from Apple FM the entire evening. I only found out from the RoutingDebugView logs the next morning.

When NOT to Use Arbiter

This is important. Arbiter is the wrong choice if:

You only need one provider. If your entire app runs on GPT-4o and you’re happy with that, use MacPaw’s OpenAI library. It’s more mature, better tested, and has 80x the community. Adding a routing layer for one provider is overhead with no benefit.

You’re building server-side Swift. Arbiter is designed for client-side iOS/macOS/visionOS. If you need a Vapor backend that calls 37 different cloud providers, the Swift AI SDK is the better fit.

API key security is your main concern. If your primary worry is keeping API keys out of your app binary, AIProxySwift has a more sophisticated solution (split-key proxy with DeviceCheck) than anything Arbiter offers. You could use both together, but that’s probably overkill for most apps.

You want the simplest possible on-device → cloud fallback. If all you need is “use Apple FM when available, fall back to OpenAI when it’s not,” mi12labs’ SwiftAI does this in one line: SystemLLM.ifAvailable ?? OpenaiLLM. Arbiter's scoring engine is overkill for this use case.

Use Arbiter when you need two or more of: multiple providers, intelligent routing based on complexity, privacy-based routing, cost tracking with budgets, or on-device + cloud in the same app. That’s the sweet spot.

Some Limitations and Gaps

The framework is at v0.1. That’s early. I’ve been using it in my own app for five months, but one app is not a stress test. Things that aren’t shipped yet: MCP client support, CLI, conversation persistence across app launches.

Test coverage is decent but not where I want it. The routing logic has solid unit tests. The provider integrations are mostly tested against mocks, not live APIs (because running Claude on every CI pass gets expensive fast). I’d love help here, to be frank. If you’ve written integration test harnesses for AI APIs in Swift, I’m all ears.

The complexity analyser uses heuristics I tuned for children’s creative content. They may not generalise perfectly to code generation or medical Q&A. The adaptive tracker should compensate over time, but your first few dozen requests might route suboptimally until it learns your traffic patterns.

Reason to Open-Source It

This was never supposed to be a product. I built it because I needed it and I open-sourced it because the alternative was letting it rot inside a private repo where it helps exactly one person.

When Apple announced the Foundation Models framework at WWDC 2025 I watched the “Meet the Foundation Models framework” session three times trying to understand the guided generation API, it confirmed what I’d already suspected on-device AI is a first-class citizen now, not an afterthought. The number of providers a typical app might want is going up, not down. And right now, the answer to “how do I use all of these?” is either “glue code” or “vendor lock-in.”

Arbiter is a third option. MIT-licensed, and I’m actively looking for contributors — provider integrations, routing improvements, docs, or just filing issues when things break. There’s a Contributing Guide in the repo with good-first-issue labels for anyone who wants a low-friction entry point.

Getting Started

Add Arbiter via Swift Package Manager (requires Swift 6.0+, targets iOS 17+ / macOS 14+ / visionOS 1+, Xcode 16+):

// Package.swift
dependencies: [
    .package(url: "https://github.com/sabby3861/Arbiter.git", from: "0.1.0")
]

The quickest way to see routing in action without spending a penny on API keys:

import Arbiter
import SwiftUI

struct QuickStartView: View {
    @State private var ai: Arbiter?

    var body: some View {
        Group {
            if let ai {
                ArbiterChatView(ai: ai)
            } else {
                ProgressView("Loading model...")
            }
        }
        .task {
            ai = try? Arbiter {
                $0.system(AppleFoundationProvider())  // on-device only, free
                $0.routing(.smart)
            }
        }
    }
}

This runs entirely on-device using Apple Foundation Models (iOS 26+ with Apple Intelligence). No API keys, no network, no cost. From there, add cloud providers one by one and watch the RoutingDebugView show how the router's decisions change.

The repo has more complete examples in the Examples/ directory. The README covers multi-provider setup, structured output, conversations, middleware, and the full routing strategy API.

My kid still uses the story app most nights. He has no idea there’s a routing engine deciding whether his hedgehog adventure should be generated on-device or in a data centre somewhere in Virginia. He just knows the stories come faster now and I stopped muttering about API bills at breakfast.

That’s the whole point, really. He shouldn’t have to know.

Sanjay is an iOS developer based in London. He built Arbiter after discovering that adding AI to a children’s app required more SDKs than the app had screens. He’s looking for contributors — especially anyone with experience in on-device ML, MCP protocol, or just a strong opinion about routing algorithms. Find him at @sabby3861 on GitHub.

Top comments (1)

Sanjay Chauhan • May 19

Happy to answer any questions about the routing architecture or on-device ML performance. Repo is at Arbiter — contributors welcome!