So this is a little story about how I added on-device AI to my macOS app, Look, using Apple's Foundation Models framework. I wanna share what was good, what was painful, some real code, and why I think building on this now is actually a good chance even if the model is still small.
First, what is Look?
Look is a keyboard launcher for macOS. You press a hotkey, you type, and it find your apps, files and folders super fast. The search core is written in Rust so it is quick.
But here is the thing, people don't really think in keywords. They think in questions. Like "that pdf i downloaded last week", or "what is 18% of 240", or "who painted Guernica". Normal keyword search can't really help with that.
So I added a small AI layer to Look with Apple Intelligence. My one rule was simple: AI is best effort and it must never block or slow down the search. If the model is not available, not ready, or it just fails, then Look works exactly like before. Nothing breaks.
Why on-device and not a cloud LLM?
For a launcher, three things matter a lot. It should be private (your file names never leave the Mac), it should be free (i hit this tool like 100 times a day, i can't pay per token), and it should start fast (no network round trip). Apple's Foundation Models gives you a ~3B model running locally with a clean Swift API. For a launcher that is just the perfect shape.
Ref: https://developer.apple.com/documentation/foundationmodels
How i actually integrated it into Look
One important decision first. In Look, ALL the Foundation Models code lives behind one provider type. The rest of the app never import FoundationModels directly, it only talks to my own little abstraction. So the on-device model is just one swappable "tier", and later i can add a cloud model without touching everything.
1. Always check availability first
Apple Intelligence is not always there. Maybe the user have old hardware, maybe they didn't turn it on, maybe the model is still downloading. You can read all of this from SystemLanguageModel.default.availability, and you must handle every case.
#if canImport(FoundationModels)
import FoundationModels
#endif
var availability: AIProviderAvailability {
#if canImport(FoundationModels)
guard #available(macOS 26, *) else {
return .unavailable(.requiresNewerOS)
}
switch SystemLanguageModel.default.availability {
case .available:
return .available
case .unavailable(.deviceNotEligible):
return .unavailable(.requiresNewerOS)
case .unavailable(.appleIntelligenceNotEnabled):
return .unavailable(.appleIntelligenceNotEnabled) // "Turn on Apple Intelligence in System Settings."
case .unavailable(.modelNotReady):
return .unavailable(.modelNotReady) // "The on-device model is still downloading."
case .unavailable(let other):
return .unavailable(.other("\(other)"))
@unknown default:
return .unavailable(.other("Unknown availability state"))
}
#else
return .unavailable(.requiresNewerOS)
#endif
}
The trick here is two layers. #if canImport(FoundationModels) let the app compile on a lower deployment target, and #available(macOS 26, *) guard the runtime. In Look every AI entry point check availability.isAvailable and just return nil if not. The AI toggle in Settings show a green check or a reason, but the search itself never break.
Ref: https://developer.apple.com/documentation/foundationmodels/systemlanguagemodel
2. Structured output with @Generable (this is the best part)
Okay this is where Foundation Models really shine. Instead of parsing the model text by hand, you just describe the shape of the output with @Generable and @Guide, and the framework give you back exactly that type. In Look i use it to turn a vague query like "the folder where i keep invoices" into a typed plan that my Rust engine can run.
@available(macOS 26, *)
@Generable
private struct EngineQueryPlan {
@Guide(description: "The kind of thing the user wants to find.")
let kind: PlanKind
@Guide(description: "Just the keywords to search for, with filler words removed.")
let searchText: String
@Generable
enum PlanKind: String {
case app, file, folder, recent, any
}
}
// guided generation, no string parsing. the result IS an EngineQueryPlan
let session = LanguageModelSession(instructions: Self.instructions)
let response = try await session.respond(
to: query,
generating: EngineQueryPlan.self
)
return response.content.asIntent() // -> Rust prefix grammar, e.g. app -> a"..., file -> f"...
No JSON prompt engineering, no parsing, no begging the model to "please answer in valid JSON". You just get a typed and validated struct. This remove a whole category of bugs honestly.
In Look this only run as a rescue. When the fast local search return zero result, then i rewrite the natural language query into engine grammar and search again. If local search already found something, the model never run. So you never pay latency for the queries that were already fine.
3. Streaming short answers
For the inline answer card (like "what is the capital of Norway"), i stream the response so the text appear word by word, and i cap the length so the answer stay small and launcher sized.
func answer(query: String) -> AsyncThrowingStream<String, Error>? {
AsyncThrowingStream { continuation in
let task = Task {
do {
let session = LanguageModelSession(instructions: Self.answerInstructions)
let options = GenerationOptions(maximumResponseTokens: 220)
for try await snapshot in session.streamResponse(to: query, options: options) {
if Task.isCancelled { break }
continuation.yield(snapshot.content) // snapshot.content is cumulative
}
continuation.finish()
} catch {
continuation.finish(throwing: error)
}
}
continuation.onTermination = { _ in task.cancel() }
}
}
Two small details worth stealing. Each streamed snapshot.content is the cumulative answer (not a delta), and the on-device model is the last fallback in Look. Calculator and web sources are tried first, the model only run when nothing else hit.
Ref: https://developer.apple.com/documentation/foundationmodels/languagemodelsession
4. Prewarm to hide the latency
The first call to the model have a noticeable startup cost. A launcher is opened and closed all the time, so i prewarm() a resident session the moment the window open or the query start to look like a sentence.
@MainActor
final class AppleIntelligenceWarmer {
static let shared = AppleIntelligenceWarmer()
private var session: LanguageModelSession?
func prewarm() {
let warm = session ?? LanguageModelSession()
session = warm
warm.prewarm()
}
}
Ref: https://developer.apple.com/documentation/foundationmodels/languagemodelsession/prewarm()
The advantages (what is genuinely great)
- Private by default. Everything run on the Mac. For a launcher that see every file name you type, this is not a nice to have, this is the whole reason i felt okay to ship AI at all.
- Zero marginal cost. No API key, no billing, no rate limit. I can let the model run on every qualifying query and not watch a meter.
-
Really nice Swift API. The
@Generableguided generation is the star. Getting a typed and validated struct straight from the model is so good. - It is already on the device. I don't bundle a multi gigabyte model in my app, the OS ship and update the weights.
- Clean and composable. Sessions, instructions, streaming, options, prewarm. The pieces just snap together into a real feature.
The disadvantages (what still hurt, honestly)
- Availability is a moving target. It need macOS 26 and eligible hardware, the user have to enable it, and the model can still be downloading. A big chunk of your users just won't have it. So you must design the no-AI path as the default, not the exception.
- It is a small model. ~3B is great for classification, rewriting and short factual answers, but it is clearly weaker than big cloud models at reasoning, long writing or niche knowledge. Scope your feature to what a small model do well.
- First use latency. Without prewarm the first token is slow. Even with it, on-device inference is not free. For a sub 100ms tool this is a real thing to budget around.
- Guardrails can surprise you. Generation can get refused or interrupted by safety guardrails. You have to catch those error and degrade nicely. In Look it just fall back to the raw query.
-
macOS 26 only and the API is young. You are building on a brand new surface that will keep changing. The
#if canImportplus#availablegymnastics is the price of being early.
Why building on it now is a chance, not a risk

Okay here is the main point I wanted to make. The local model only get better, and my code don't have to change. Apple ship and upgrade the on-device weights through OS updates. The same LanguageModelSession call i write today will, one year from now, be backed by a smarter and faster model. For free. With zero work from me. The @Generable plan i defined keep returning the same type, it just get filled in more accurate.
That really change the math. Adopting Foundation Models early is not a bet that today's small model is good enough for everything, because it clearly is not. It is a bet that the floor rise under you. The integration work (the provider abstraction, the availability handling, the prewarm, the fallback design) is the hard part, and it is a one time cost. Once Look speak fluent Foundation Models, every OS update become a silent capability upgrade.
So I am shipping on-device AI in Look today. Scoped tight to what a 3B model do well, always optional, never blocking. And i treat it like infrastructure that i grow into, not a feature i have to perfect right now. If you are building a Mac app, that is the chance. Get the plumbing right while the API is still young, and ride the model improvements for free.
Thanks for reading. If you wanna try it, go grab Look and turn on the AI toggle in Settings.
Reference links
- Foundation Models framework: https://developer.apple.com/documentation/foundationmodels
- SystemLanguageModel: https://developer.apple.com/documentation/foundationmodels/systemlanguagemodel
- LanguageModelSession: https://developer.apple.com/documentation/foundationmodels/languagemodelsession
- Guided generation (@Generable / @Guide): https://developer.apple.com/documentation/foundationmodels/generating-swift-data-structures-with-guided-generation
- HIG: Generative AI: https://developer.apple.com/design/human-interface-guidelines/generative-ai
- WWDC25, Meet the Foundation Models framework: https://developer.apple.com/videos/play/wwdc2025/286/


Top comments (0)