I built a voice CBT diary with offline AI — here's how it works

Pavel Trubetskov — Wed, 27 May 2026 13:56:22 +0000

Two years ago I was going through CBT therapy for anxiety and depression.
The homework — keeping a thought diary between sessions — never worked for me.
Typing during a panic attack is impossible. I'd end up filling it from memory
two hours before the session. CBT loses most of its effect that way.

Voice was the obvious fix. On-device AI was the missing piece.

What I built

Mentalium is a voice CBT diary with offline AI transcription. Press record,
speak the five steps of a CBT thought record — situation, automatic thought,
emotion, reaction, alternative thought — and the AI fills every field. No typing.
No cloud. Your voice never leaves the device.

The tech stack

Flutter + Dart for the UI and app logic. One codebase for iOS and Android,
which matters for a solo founder.

Whisper.cpp for transcription. The ggml-small-q5_1 model (~180 MB)
downloads once on first launch, then runs fully offline forever.

On iOS: compiled against Metal GPU via a Swift bridge. CoreML wasn't flexible enough for the quantized model weights I needed.
On Android: JNI bridge to the native C++ library. Getting this to compile cleanly for arm64-v8a and x86_64 took a while.

SQLite (via sqflite) for local storage. Every diary entry stays on-device —
no backend sync, no analytics on the content.

AES-256-GCM encryption applies only when the user emails an Excel report
to their therapist. That's the only outbound transmission.

The hard parts

Model size vs. accuracy tradeoff. The tiny Whisper model is fast but misses
nuance — important for transcribing emotional content accurately. The small-q5_1
quantized model hits the sweet spot: ~180 MB on disk, good accuracy across
all 7 supported languages, runs in real-time on a 2020 iPhone.

First launch UX. Downloading 180 MB on first open is a bad experience
if you don't set expectations. I added a progress bar with explicit copy:
"Downloading the AI model — this happens once, then it works offline forever."

Language detection. The app supports 7 languages. Whisper handles multilingual
transcription well, but I had to build language-specific keyword matching
for the cognitive distortion analysis (catastrophizing, mind-reading, etc.)
across EN/DE/FR/ES/IT/PT-BR/RU.

What I learned

Building for mental health adds constraints most apps don't have. Privacy
isn't a feature — it's the baseline. Users will not trust a voice diary
that uploads to a server, no matter how good the privacy policy is.
On-device AI removes that trust problem entirely.

The CBT methodology also drove some unusual UX decisions. The five-step
structure is fixed — you can't let users skip steps the way a generic journal would.
That rigidity is the point.

Where it is now

iOS is live on the App Store with a 7-day free trial.
Android is coming.

→ mentalium.me
→ App Store