Most AI apps today are thin wrappers around cloud APIs. Snaap is the opposite — 100% on-device processing, zero backend, zero API calls.
Here's the complete architecture and why I made each decision.
The Stack
┌─────────────────────────────────────────┐
│ SwiftUI │
│ Splash · Scan · Home · Settings · Share │
├─────────────────────────────────────────┤
│ UIKit │
│ Inbox Card Stack + Swipe Gestures │
├─────────────────────────────────────────┤
│ Processing Layer │
│ PhotoKit → Vision → Classifier → Engine │
├─────────────────────────────────────────┤
│ GRDB / SQLite │
│ Local persistence, migrations │
└─────────────────────────────────────────┘
Why Hybrid SwiftUI + UIKit?
SwiftUI for static screens — Splash, Home, Settings. Declarative, fast to build, easy to maintain.
UIKit for the Inbox — The card stack with swipe gestures needs UIPanGestureRecognizer with spring physics. UIKit gives precise control over animation timing, transform matrices, and gesture state machines. SwiftUI gestures can't match this fluidity (yet).
The bridge: UIHostingController wraps SwiftUI views for UIKit navigation, UIViewControllerRepresentable embeds the UIKit inbox in SwiftUI tab navigation.
Why GRDB and Not Core Data?
- SQL. I want to write queries, not predicates.
-
Migrations. GRDB
DatabaseMigratoris clean and explicit. -
Performance. GRDB's
ValueObservationfor reactive queries. -
Safety. Type-safe queries via
FetchableRecord.
The schema is dead simple — one table screenshot with 15 columns. No relationships. No complex fetches. Core Data would be overkill.
The Processing Pipeline
1. Ingestion (PhotoKit)
// iOS natively tags screenshots — no ML needed
options.predicate = NSPredicate(
format: "mediaType == %d AND (mediaSubtypes & %d) != 0",
PHAssetMediaType.image.rawValue,
PHAssetMediaSubtype.photoScreenshot.rawValue
)
New screenshots are observed via PHPhotoLibraryChangeObserver and auto-added to the inbox.
2. OCR (Vision Framework)
VNRecognizeTextRequest with .accurate level extracts all text. Runs on a background concurrent queue. Results cached to SQLite — never re-OCR the same asset.
3. Classification + Context (Rule Engine)
Keyword + regex classifier → template-based sentence engine. See my other post for why I chose rules over LLMs.
4. Dedup + Expiry (Utility)
Perceptual hashing for duplicate detection. Regex date extraction for expiry checking.
5. Store (GRDB)
Everything persisted locally. inboxState enum tracks pending/kept/deleted/archived/snoozed.
Privacy by Design
- No analytics SDK (yet — TelemetryDeck is privacy-first when I add it)
- No crash reporter that uploads data
- No network calls — not even for config
- No user accounts — state lives in local SQLite
- Photo permission — user explicitly grants, can revoke anytime
What I'd Do Differently
- The hybrid UIKit/SwiftUI bridge was fiddly. For v2, I'd consider SwiftUI-only if Apple improves gesture APIs.
- pHash should run on a lower-priority queue — it blocked the OCR queue briefly.
- Multi-language OCR support would be nice (Vision supports many languages, I just need to expose the option).
Snaap is free on the App Store: https://apps.apple.com/app/snaap-voucher-reminder-ai/id6770817204
Top comments (0)