How I added a production AI feature to Naked — a psychology-based dating app — using Flutter, Firebase Cloud Functions, and the Claude API.
The product problem
Naked matches people on psychological compatibility: users complete assessments (attachment style, communication style, Big Five…) and get matched on the results. But self-reported questionnaires have a known weakness — people answer how they want to appear, not how they are. A rule-based scoring engine can compute "your attachment style is anxious"; it cannot notice that question 4 and question 11 contradict each other.
That pattern-level analysis is exactly what LLMs are good at. The feature: after completing any assessment, the user can request an AI Insight that analyzes the raw answer pattern — strengths, growth areas, a dating tip, and a consistency analysis that flags contradictions, fence-sitting (all-neutral answers), and social-desirability bias. Once the user's match completes the same test, the insight upgrades with a couple-dynamics comparison.
Architecture
Flutter app (quiz UI, BLoC + Clean Architecture)
│ httpsCallable — Firebase Auth token attached automatically
▼
Firebase Cloud Function (Node.js, the only holder of the API key)
│ auth gate → payload validation → cache check → Claude call
▼
Claude API (claude-opus-4-8, schema-enforced structured output)
│
▼
Firestore (insight persisted per user/test + token-usage audit trail)
Five decisions I'd defend in any review
The API key never ships in the binary. Anything inside an APK/IPA can be extracted in minutes. The app only talks to a Firebase callable function that requires an authenticated user; the Anthropic key lives in Firebase Secret Manager and exists only inside the function's runtime.
Structured outputs instead of "please reply in JSON". The Claude call uses a JSON schema the API enforces — including enums for the consistency verdict and confidence. No regex extraction, no retry-on-malformed-JSON, and the Flutter side parses straight into typed entities.
Cost discipline from day one. On a paid path, never trust the client: the function caps answer counts and string lengths so a tampered client can't inflate token spend. Responses are cached in Firestore keyed by a SHA-256 hash of the inputs — an identical request is served free. Every generation stores its token usage, so spend is auditable per user from the console.
Cache invalidation by design, not by flag. The cache hash includes the partner's result. Same answers + no partner and same answers + partner data are different hashes — so when the match completes the test, a refresh regenerates the insight automatically. The hash is the staleness check.
A privacy boundary the model must respect. The consistency analysis is private: Firestore rules make insights readable only by their owner, and the system prompt explicitly forbids leaking candour observations into the shared partner-dynamics text. Only the partner's computed result is ever sent to the model — never their raw answers.
Process
TDD: the BLoC tests were written before the implementation — 8 tests covering success, failure, double-tap guarding, stored-insight loading, and partner-aware refresh, with the repository mocked.
Inputs persisted, not just outputs: every quiz submit snapshots the raw answers, so insights can be regenerated later (e.g. when the partner finishes the test) without retaking the quiz.
Production debugging: the first deploy failed transiently mid-creation, leaving the callable without its public-invoker IAM binding (403 at the HTTP layer before any code runs). Diagnosed with a curl smoke test, fixed with one gcloud IAM command — a reminder that "deployed" isn't "verified".
Stack
Flutter · BLoC + Clean Architecture · Firebase (Auth, Firestore, Cloud Functions, Secret Manager) · Claude API (claude-opus-4-8, structured outputs) · Node.js · mocktail/bloc_test
Vlad Vladescu — Senior Mobile Engineer (Flutter & Android), ex-British Telecom (EE app, 12M+ users). Open to senior Flutter / mobile-AI roles, remote EU. linkedin.com/in/vlad-vladescu-180733121
Top comments (0)