Two years ago I was going through CBT therapy for anxiety and depression.
The homework — keeping a thought diary between sessions — never worked for me.
Typing during a panic attack is impossible. I'd end up filling it from memory
two hours before the session. CBT loses most of its effect that way.
Voice was the obvious fix. On-device AI was the missing piece.
What I built
Mentalium is a voice CBT diary with offline AI transcription. Press record,
speak the five steps of a CBT thought record — situation, automatic thought,
emotion, reaction, alternative thought — and the AI fills every field. No typing.
No cloud. Your voice never leaves the device.
The tech stack
Flutter + Dart for the UI and app logic. One codebase for iOS and Android,
which matters for a solo founder.
Whisper.cpp for transcription. The ggml-small-q5_1 model (~180 MB)
downloads once on first launch, then runs fully offline forever.
- On iOS: compiled against Metal GPU via a Swift bridge. CoreML wasn't flexible enough for the quantized model weights I needed.
- On Android: JNI bridge to the native C++ library. Getting this to compile cleanly for arm64-v8a and x86_64 took a while.
SQLite (via sqflite) for local storage. Every diary entry stays on-device —
no backend sync, no analytics on the content.
AES-256-GCM encryption applies only when the user emails an Excel report
to their therapist. That's the only outbound transmission.
The hard parts
Model size vs. accuracy tradeoff. The tiny Whisper model is fast but misses
nuance — important for transcribing emotional content accurately. The small-q5_1
quantized model hits the sweet spot: ~180 MB on disk, good accuracy across
all 7 supported languages, runs in real-time on a 2020 iPhone.
First launch UX. Downloading 180 MB on first open is a bad experience
if you don't set expectations. I added a progress bar with explicit copy:
"Downloading the AI model — this happens once, then it works offline forever."
Language detection. The app supports 7 languages. Whisper handles multilingual
transcription well, but I had to build language-specific keyword matching
for the cognitive distortion analysis (catastrophizing, mind-reading, etc.)
across EN/DE/FR/ES/IT/PT-BR/RU.
What I learned
Building for mental health adds constraints most apps don't have. Privacy
isn't a feature — it's the baseline. Users will not trust a voice diary
that uploads to a server, no matter how good the privacy policy is.
On-device AI removes that trust problem entirely.
The CBT methodology also drove some unusual UX decisions. The five-step
structure is fixed — you can't let users skip steps the way a generic journal would.
That rigidity is the point.
Where it is now
iOS is live on the App Store with a 7-day free trial.
Android is coming.
Happy to answer questions about the Whisper.cpp integration,
the Flutter architecture, or the CBT-specific UX decisions.
Top comments (2)
Voice as the obvious fix + on-device AI as the missing piece resonates strongly. We're on the opposite side of the same pipeline at AudioProducer.ai (text-to-audio for long-form books, not voice-to-text for diary entries), and the rigid five-field schema being THE point translates cleanly. On our side the structured artifact per chapter has the same shape: per-line speaker map, per-paragraph soundscape annotation, per-character voice card, per-line emotion tag, all fixed slots the writer can re-tag in place. The rigidity is what makes re-renders deterministic and per-slot edits attributable, the same way your fixed five-step makes diary entries comparable across sessions. Curious whether you considered surfacing the cognitive-distortion analysis as another fixed field the user can edit in place after Whisper transcribes, vs running it as a separate post-record analysis pass. Same trade-off we hit between structured-artifact-as-source-of-truth and inline-post-hoc-tagging.
Some comments may only be visible to logged-in visitors. Sign in to view all comments.