A while ago I wrote a full llama.cpp iOS implementation using Obj-c bridge because I wanted one thing:
image in -> structured JSON out -> no cloud required.
It worked. It was fast enough. It was also a lot of plumbing:
- XCFramework builds
- ObjC++ bridge
- tokenizer/eval/sampling internals
- model + projector file choreography
- JSON guardrails everywhere
Now, about 6 months later, Apple dropped Foundation Models image analysis in Xcode 27.0 beta, and i can finally call a serious on-device model without keeping that whole engine room by myself.
With Foundation Models, the core API is basically:
import FoundationModels
@Generable
struct ReceiptExtraction: Codable {
var vendor_name: String
var transaction_date: String
var total_amount: Double
var currency: String
var category: String
var line_items: [String]
}
let session = LanguageModelSession(model: .default)
let response = try await session.respond(
generating: ReceiptExtraction.self,
options: GenerationOptions(
sampling: .random(top: 20, seed: 1111),
temperature: 0.1,
maximumResponseTokens: 384
)
) {
"""
Extract receipt information for bookkeeping.
Return schema-compliant structured output only.
Format fields for QuickBooks ingestion.
"""
Attachment(cgImage, orientation: .right)
}
let result = response.content
Receipt image in → QuickBooks-ready JSON out.
No bridge.
No gguf.
No mmproj.
No custom decode loop.
Before
- llama.cpp vendor management
- ObjC++ wrappers and thread safety
- bespoke schema/prompt failover handling
- app startup warmups with model files in bundle
Now
- native LanguageModelSession
- native Attachment(...) for images
- native structured generation with @Generable
- native prewarm and model availability checks
- native Instruments.app profiling available
And that is exactly where it should have been from day one fiddling with multi-modal inference.
Top comments (0)