Pavel Trubetskov

Posted on May 27

I built a voice CBT diary with offline AI — here's how it works

#flutter #ai #mentalhealth #buildinpublic

Two years ago I was going through CBT therapy for anxiety and depression.
The homework — keeping a thought diary between sessions — never worked for me.
Typing during a panic attack is impossible. I'd end up filling it from memory
two hours before the session. CBT loses most of its effect that way.

Voice was the obvious fix. On-device AI was the missing piece.

What I built

Mentalium is a voice CBT diary with offline AI transcription. Press record,
speak the five steps of a CBT thought record — situation, automatic thought,
emotion, reaction, alternative thought — and the AI fills every field. No typing.
No cloud. Your voice never leaves the device.

The tech stack

Flutter + Dart for the UI and app logic. One codebase for iOS and Android,
which matters for a solo founder.

Whisper.cpp for transcription. The ggml-small-q5_1 model (~180 MB)
downloads once on first launch, then runs fully offline forever.

On iOS: compiled against Metal GPU via a Swift bridge. CoreML wasn't flexible enough for the quantized model weights I needed.
On Android: JNI bridge to the native C++ library. Getting this to compile cleanly for arm64-v8a and x86_64 took a while.

SQLite (via sqflite) for local storage. Every diary entry stays on-device —
no backend sync, no analytics on the content.

AES-256-GCM encryption applies only when the user emails an Excel report
to their therapist. That's the only outbound transmission.

The hard parts

Model size vs. accuracy tradeoff. The tiny Whisper model is fast but misses
nuance — important for transcribing emotional content accurately. The small-q5_1
quantized model hits the sweet spot: ~180 MB on disk, good accuracy across
all 7 supported languages, runs in real-time on a 2020 iPhone.

First launch UX. Downloading 180 MB on first open is a bad experience
if you don't set expectations. I added a progress bar with explicit copy:
"Downloading the AI model — this happens once, then it works offline forever."

Language detection. The app supports 7 languages. Whisper handles multilingual
transcription well, but I had to build language-specific keyword matching
for the cognitive distortion analysis (catastrophizing, mind-reading, etc.)
across EN/DE/FR/ES/IT/PT-BR/RU.

What I learned

Building for mental health adds constraints most apps don't have. Privacy
isn't a feature — it's the baseline. Users will not trust a voice diary
that uploads to a server, no matter how good the privacy policy is.
On-device AI removes that trust problem entirely.

The CBT methodology also drove some unusual UX decisions. The five-step
structure is fixed — you can't let users skip steps the way a generic journal would.
That rigidity is the point.

Where it is now

iOS is live on the App Store with a 7-day free trial.
Android is coming.

→ mentalium.me
→ App Store

Happy to answer questions about the Whisper.cpp integration,
the Flutter architecture, or the CBT-specific UX decisions.

Top comments (4)

AudioProducer.ai • May 27

Voice as the obvious fix + on-device AI as the missing piece resonates strongly. We're on the opposite side of the same pipeline at AudioProducer.ai (text-to-audio for long-form books, not voice-to-text for diary entries), and the rigid five-field schema being THE point translates cleanly. On our side the structured artifact per chapter has the same shape: per-line speaker map, per-paragraph soundscape annotation, per-character voice card, per-line emotion tag, all fixed slots the writer can re-tag in place. The rigidity is what makes re-renders deterministic and per-slot edits attributable, the same way your fixed five-step makes diary entries comparable across sessions. Curious whether you considered surfacing the cognitive-distortion analysis as another fixed field the user can edit in place after Whisper transcribes, vs running it as a separate post-record analysis pass. Same trade-off we hit between structured-artifact-as-source-of-truth and inline-post-hoc-tagging.

Ashan de Silva • May 28

The 'typing during a panic attack is impossible' line is the whole product. That's a real insight most builders would have skipped past.

One thing from building a decision journal myself: the hard part isn't capture, it's what happens after. A journal that just stores entries becomes a graveyard you never reopen. The version that earned its keep pulled structure out of each entry - what happened, the thought, the reframe - so you could look back and actually see patterns over weeks, which is the thing that changes behaviour. Offline AI nails the privacy story; the structure-extraction step is where the day-to-day value lives.

Are you keeping it fully offline for the look-back view too, or does that part need a bigger model? That was the tension I kept hitting.

Pavel Trubetskov • May 29

hi, Ashan! Mentalium is used to work with a therapist, records are sent to the therapist and you go through them on a session. right now i use only whisper for speach recording, but i have some ideas on how to improve the app and user experience in the future.

Some comments may only be visible to logged-in visitors. Sign in to view all comments.