DEV Community

Cover image for How’s My Day? — A Voice-First Mood Tracker Using AssemblyAI’s Speech Understanding & Real-Time TTS
Abhishek Taneja
Abhishek Taneja

Posted on

How’s My Day? — A Voice-First Mood Tracker Using AssemblyAI’s Speech Understanding & Real-Time TTS

AssemblyAI Voice Agents Challenge: Real-Time

This is a submission for the AssemblyAI Voice Agents Challenge

What I Built

Domain Expert Voice Agent

How’s My Day? is a one-shot voice check-in app that helps users feel heard — emotionally, not just functionally.

In just one tap:

  1. The user speaks how they’re feeling
  2. The app transcribes their voice in real-time using AssemblyAI Universal Streaming
  3. It detects the emotional tone of their voice using AssemblyAI’s Speech Understanding
  4. It matches that emotion to a hand-curated emotional tip using Algolia MCP Server
  5. Finally, it reads the tip aloud using AssemblyAI’s Text-to-Speech — or falls back to ElevenLabs if needed

✨ The experience feels like being heard by someone who cares — not a chatbot.

🧠 How I Used AssemblyAI

We used AssemblyAI’s Universal-Streaming API to:

  • Capture and transcribe voice input with <300ms latency
  • Show live transcript to the user while they speak
  • Handle punctuation and natural pauses beautifully

What I Learned

  • AssemblyAI’s emotion detection is shockingly accurate — tone alone can reveal so much more than words

  • Transcription feels like magic when done right — and AssemblyAI nailed it

  • Using voice as input and output feels more natural than a chatbot for mental wellness apps

  • People want calm, 1-shot interactions — not 20-message bots

Challenges

  • Browser-based mic streaming + latency management was tricky

  • Emotion ↔ tip mapping needed thoughtful writing

  • Not all users want to hear their feelings read back — we added a toggle

  • AssemblyAI TTS is clean, but fallback was needed for broader support

Demo

GitHub Repository

How's My Day? - AI-Powered Voice Mood Tracker

A sophisticated, voice-powered mood tracking web application that listens to your feelings and provides supportive, human-like responses using cutting-edge AI technology.

✨ Features

  • 🎤 Professional voice recording - File-based audio capture with high quality
  • 🎯 AI-powered transcription - AssemblyAI integration for accurate speech-to-text
  • 🧠 Enhanced mood detection - Local algorithm with scoring and emotion mapping
  • 🤖 GPT-4o-mini responses - Human-like, empathetic AI-generated support messages
  • 🔊 High-quality TTS - OpenAI text-to-speech with natural voice synthesis
  • ⌨️ Real-time typing animation - Text appears character-by-character during speech
  • 🎨 Modern UI - Clean, responsive design with smooth animations
  • 🚀 Full-stack architecture - Node.js backend with Express server

🚀 Quick Setup Guide

Prerequisites

  • Node.js (v14 or higher) - Download here
  • Git (optional) - For cloning the repository
  • Modern web browser - Chrome, Firefox, Safari, or Edge

1. Installation

# Clone or download the project
git clone <
Enter fullscreen mode Exit fullscreen mode

Top comments (0)