How I Built a Privacy-First AI App With Zero Backend — iOS Architecture Deep Dive

#ios #architecture #swift #privacy

Most AI apps today are thin wrappers around cloud APIs. Snaap is the opposite — 100% on-device processing, zero backend, zero API calls.

Here's the complete architecture and why I made each decision.

The Stack

┌─────────────────────────────────────────┐
│                SwiftUI                   │
│  Splash · Scan · Home · Settings · Share │
├─────────────────────────────────────────┤
│                UIKit                     │
│     Inbox Card Stack + Swipe Gestures    │
├─────────────────────────────────────────┤
│            Processing Layer              │
│  PhotoKit → Vision → Classifier → Engine │
├─────────────────────────────────────────┤
│              GRDB / SQLite               │
│        Local persistence, migrations      │
└─────────────────────────────────────────┘

Why Hybrid SwiftUI + UIKit?

SwiftUI for static screens — Splash, Home, Settings. Declarative, fast to build, easy to maintain.

UIKit for the Inbox — The card stack with swipe gestures needs UIPanGestureRecognizer with spring physics. UIKit gives precise control over animation timing, transform matrices, and gesture state machines. SwiftUI gestures can't match this fluidity (yet).

The bridge: UIHostingController wraps SwiftUI views for UIKit navigation, UIViewControllerRepresentable embeds the UIKit inbox in SwiftUI tab navigation.

Why GRDB and Not Core Data?

SQL. I want to write queries, not predicates.
Migrations. GRDB DatabaseMigrator is clean and explicit.
Performance. GRDB's ValueObservation for reactive queries.
Safety. Type-safe queries via FetchableRecord.

The schema is dead simple — one table screenshot with 15 columns. No relationships. No complex fetches. Core Data would be overkill.

The Processing Pipeline

1. Ingestion (PhotoKit)

// iOS natively tags screenshots — no ML needed
options.predicate = NSPredicate(
    format: "mediaType == %d AND (mediaSubtypes & %d) != 0",
    PHAssetMediaType.image.rawValue,
    PHAssetMediaSubtype.photoScreenshot.rawValue
)

New screenshots are observed via PHPhotoLibraryChangeObserver and auto-added to the inbox.

2. OCR (Vision Framework)

VNRecognizeTextRequest with .accurate level extracts all text. Runs on a background concurrent queue. Results cached to SQLite — never re-OCR the same asset.

3. Classification + Context (Rule Engine)

Keyword + regex classifier → template-based sentence engine. See my other post for why I chose rules over LLMs.

4. Dedup + Expiry (Utility)

Perceptual hashing for duplicate detection. Regex date extraction for expiry checking.

5. Store (GRDB)

Everything persisted locally. inboxState enum tracks pending/kept/deleted/archived/snoozed.

Privacy by Design

No analytics SDK (yet — TelemetryDeck is privacy-first when I add it)
No crash reporter that uploads data
No network calls — not even for config
No user accounts — state lives in local SQLite
Photo permission — user explicitly grants, can revoke anytime

What I'd Do Differently

The hybrid UIKit/SwiftUI bridge was fiddly. For v2, I'd consider SwiftUI-only if Apple improves gesture APIs.
pHash should run on a lower-priority queue — it blocked the OCR queue briefly.
Multi-language OCR support would be nice (Vision supports many languages, I just need to expose the option).

Snaap is free on the App Store: https://apps.apple.com/app/snaap-voucher-reminder-ai/id6770817204