DEV Community

dineshrajdhanapathyDD for AWS Community Builders

Posted on • Originally published at Medium on

Serenova AI Mental Health Companion

A deep dive into building a real-time, speech-to-text AI companion for mental health support using AWS Bedrock and Amazon Nova models.

A Story of Technology, Empathy, and AWS

The Beginning

Imagine a moment late at night.

Someone is feeling overwhelmed, anxious, or alone.

They want to talk to someone - but no one is available.

Therapy appointments take weeks.

Friends may be busy.

And sometimes, people just need someone to listen without judgment.

This is where Serenova was born.

Serenova is an AI-powered mental health companion designed to provide emotional support, guided conversations, and calming interactions anytime, anywhere.

And the entire experience is powered by AWS and Amazon Nova AI models.

1

The Vision

Our vision was simple:

Build an AI companion that feels safe, responsive, and human-like .

Serenova helps users:

  • Share their thoughts
  • Reduce stress
  • Practice breathing exercises
  • Receive supportive AI responses
  • Detect emotional distress early

But to achieve this, we needed powerful AI and scalable infrastructure.

So we built Serenova on AWS Cloud.

Introduction

What if you could talk to an AI that actually listens not just to your words, but to the emotion in your voice?

That’s the idea behind Serenova. We built a voice-first mental health companion powered by Amazon Nova models on AWS Bedrock. It combines real-time speech-to-speech conversation (Nova Sonic), intelligent text chat (Nova Pro), and a calming UI designed to make users feel safe.

The Problem

Mental health support has a massive accessibility gap. Therapy is expensive, waitlists are long, and many people don’t feel comfortable reaching out. Text-based chatbots exist, but they miss something fundamental that the human voice carries: emotion that text can’t capture. The trembling, the pauses, the pace of speech all tell a story.

We wanted to build something that meets people where they are: a companion that’s always available, feels warm and safe, and can understand not just what you say but how you say it.

This post walks through the technical decisions, architecture, and lessons learned from building and deploying Serenova.

How Serenova Works (Powered by AWS)

AI Brain - Amazon Bedrock

At the heart of Serenova is Amazon Bedrock.

We use Amazon Nova models to power intelligent conversations.

Why Amazon Nova?

We chose Amazon Nova models for three reasons:

Nova Sonic - Speech-to-Speech

Traditional voice AI requires three separate services: speech-to-text, language model, text-to-speech. Each adds latency and loses context. Nova Sonic (amazon.nova-sonic-v1:0) handles the entire pipeline in one model - user speaks, AI responds with voice, under 500ms. It also exposes audio features (pitch, volume, pace) that we use for emotion detection.

Nova Pro - Text Chat Fallback

Not every user has a microphone, and not every environment supports voice. Nova Pro (amazon.nova-pro-v1:0) provides intelligent text chat via the Bedrock Converse API. We use it as the primary model in our production deployment on App Runner.

Nova Lite - Fast Responses

For quick, lightweight interactions where speed matters more than depth, Nova Lite (amazon.nova-lite-v1:0) handles fast text responses.

ModelModel IDLatencyUse CaseNova Sonicamazon.nova-sonic-v1:0<500msVoice conversationNova Proamazon.nova-pro-v1:0~300msText chat, reasoningNova Liteamazon.nova-lite-v1:0~150msQuick responses

Architecture

2

Backend Intelligence AWS App Runner

The Serenova backend runs on AWS App Runner.

3

Why App Runner?

Because it allows us to:

  • Deploy FastAPI services easily
  • Automatically scale with traffic
  • Handle secure API communication

The backend connects the frontend to Amazon Bedrock to generate AI responses.

React frontend on AWS Amplify talks to a FastAPI backend on App Runner. The backend uses Nova Pro for text chat. Browser SpeechRecognition handles voice-to-text on the client side.

Frontend Experience AWS Amplify

The Serenova interface is built with React and hosted on AWS Amplify.

4

Amplify provides:

  • Fast global hosting
  • CI/CD deployment
  • Secure HTTPS access

Users can simply open the website and immediately start interacting with Serenova.

Container Deployment Amazon ECR + Docker

To solve deployment challenges, we packaged the backend using Docker.

5

The container image is stored in: Amazon Elastic Container Registry (ECR)

App Runner then pulls the image directly from ECR and runs the application reliably.

Local Development: Full backend with Nova Sonic voice streaming over WebSocket, emotion detection, crisis detection, and MCP tool integration.

Three-Tier Graceful Degradation

We designed the system to never break:

  1. Tier 1: Full Voice  — Nova Sonic speech-to-speech, <500ms latency
  2. Tier 2: Text Chat  — Nova Pro via Converse API, browser SpeechRecognition
  3. Tier 3: Demo Mode  — friendly message, crisis resources still available

If Nova Sonic isn’t available, it falls to text. If the backend is down, it enters demo mode. The user always sees a working app.

The Voice Orb

Voice Interaction Web Audio + AI

6

Serenova introduces a Voice Orb , a glowing animated circle that reacts to the user’s voice.

When a user speaks:

1️⃣ Browser captures voice

2️⃣ AI analyzes speech patterns

3️⃣ Serenova generates a supportive response

This creates a natural, human-like conversation experience.

How It Works

// Web Audio API for mic level monitoring
const audioContext = new AudioContext();
const analyser = audioContext.createAnalyser();
analyser.fftSize = 256;

const source = audioContext.createMediaStreamSource(stream);
source.connect(analyser);

// Read frequency data every frame
const dataArray = new Uint8Array(analyser.frequencyBinCount);
analyser.getByteFrequencyData(dataArray);
const avg = dataArray.reduce((a, b) =\> a + b, 0) / dataArray.length;
const level = Math.min(100, Math.round((avg / 128) * 100));
Enter fullscreen mode Exit fullscreen mode

The orb scales based on input level, creating a visual feedback loop that makes speaking feel responsive. It has four states:

| State | Animation | Meaning |
| ---------- | ------------------------------- | ----------------- |
| Idle | Soft pulse, mic icon | Ready to listen |
| Listening | Animated bars, glow intensifies | Recording voice |
| Thinking | Bouncing dots | AI processing |
| Responding | Wave animation | AI response ready |
Enter fullscreen mode Exit fullscreen mode

The dark theme with glassmorphism and ambient background orbs creates a calm, safe atmosphere important for a mental health app.

Support Modes and Prompt Engineering

Each support mode has a carefully crafted system prompt that shapes the AI’s personality:

mode_prompts = {
    "crisis": "You are a crisis support counselor. Be calm, direct, and prioritize safety.",
    "cbt": "You are a CBT therapist. Help identify thought patterns and suggest reframing.",
    "regulation": "You are an emotion regulation coach. Guide breathing and grounding techniques.",
    "companion": "You are a warm, empathetic mental health companion. Listen actively, validate feelings. Keep responses short (2-3 sentences).",
}
Enter fullscreen mode Exit fullscreen mode

The key insight: mental health prompts need to be warm without being patronizing, supportive without being prescriptive. We iterated on these prompts extensively. Short responses (2–3 sentences) work better than long ones — they feel more like a real conversation.

Breathing Exercise

The Regulation mode automatically activates a guided breathing exercise:

  • Inhale: 4 seconds
  • Hold: 4 seconds
  • Exhale: 6 seconds

This 4–4–6 pattern is based on clinical breathing techniques. The animated circle with countdown gives users a visual anchor.

Crisis Detection

Safety is non-negotiable in a mental health app. Serenova monitors every message for crisis indicators:

crisis_keywords = ["hurt", "harm", "suicide", "kill", "die", "end it"]
detected = any(keyword in text.lower() for keyword in crisis_keywords)
Enter fullscreen mode Exit fullscreen mode

When detected:

  • A red banner appears at the top of the screen
  • A modal opens with crisis resources (988 Lifeline, Crisis Text Line, SAMHSA, Veterans Crisis Line, Trevor Project)
  • The 🆘 button is always accessible in the header

This is a keyword-based first layer. The full local backend adds Bedrock AI analysis and emotion-based detection as additional layers.

The Deployment Battle: App Runner and Docker

This was our biggest technical challenge. We needed to deploy a Python FastAPI backend to AWS App Runner. Sounds simple. It wasn’t.

What Failed: Source-Based Deployment

App Runner’s Python runtime builds your code in one environment and runs it in another. Pip-installed binaries like uvicorn aren't in PATH at runtime.

We tried everything:

  • uvicorn main:app → "executable not found in $PATH"
  • python3 -m uvicorn main:app → same error
  • /usr/local/bin/uvicorn → still failed

What Worked: Docker via ECR

The solution was building our own Docker image:

FROM python:3.11-slim
WORKDIR /app
COPY backend/requirements.txt .
RUN pip install --no-cache-dir --upgrade pip && \
    pip install --no-cache-dir -r requirements.txt
COPY backend/main.py .
EXPOSE 8080
CMD ["uvicorn", "main:app", "--host", "0.0.0.0", "--port", "8080"]
Enter fullscreen mode Exit fullscreen mode

Inside the Docker container, uvicorn is properly installed and in PATH. We push the image to Amazon ECR, and App Runner pulls it. No PATH issues, no build/run environment mismatch.

The Standalone Backend

We also created a separate backend/main.py a minimal FastAPI app with zero complex imports. No Nova Sonic SDK dependency, no relative imports, just boto3 for Bedrock. This made the Docker image small and the deployment reliable.

Nova Pro Integration: The Converse API

The text chat uses Bedrock’s Converse API, which provides a clean conversation interface:

response = bedrock_client.converse(
    modelId="amazon.nova-pro-v1:0",
    messages=conversation_history, # sliding window of 20 messages
    inferenceConfig={
        "maxTokens": 300,
        "temperature": 0.7,
        "topP": 0.9,
    },
)
assistant_text = response["output"]["message"]["content"][0]["text"]
Enter fullscreen mode Exit fullscreen mode

We keep a 20-message sliding window per session. This gives the AI enough context to maintain a coherent conversation without unbounded memory growth.

Temperature 0.7 gives responses that are warm and varied without being unpredictable — important for mental health where consistency builds trust.

Lessons Learned

1. Docker solves deployment headaches

If your deployment platform has environment quirks, skip the source-based approach and use Docker. You control the entire runtime.

2. Browser APIs are powerful

SpeechRecognition and Web Audio API gave us voice features with zero backend dependencies. The browser is an underrated platform.

3. Graceful degradation is a feature

Users should never see a broken app. Design for failure from the start — voice fails? use text. Backend down? demo mode. Always show something useful.

4. Keep production simple

The standalone backend/main.py with 5 dependencies was the key to a successful deployment. The full backend with Nova Sonic, MCP tools, and emotion detection is for local dev.

5. Mental health AI needs careful tone

The difference between “I understand you’re feeling sad” and “That sounds really tough. I’m here.” is huge. Short, warm, human-sounding responses work best.

What’s Next

  • Full Nova Sonic voice streaming in production (ECS or App Runner with larger image)
  • Persistent conversation history with DynamoDB
  • Real-time emotion detection from audio features
  • Mobile app (React Native)
  • Multi-language support
  • AI-guided journaling with mood tracking
  • Optional therapist dashboard for professional oversight

Tech Stack

| Layer | Technology |
|------|---------|
| AI | Amazon Nova Sonic, Nova Pro, Nova Lite (Bedrock)|
| Backend | Python 3.11, FastAPI, Uvicorn, boto3|
| Frontend |React 18, CSS3, Web Audio API, SpeechRecognition API |
| Hosting | AWS App Runner (Docker/ECR), AWS Amplify |
| Infrastructure | AWS CDK, DynamoDB, S3, EventBridge, SNS, CloudWatch |
| Container | Docker, Amazon ECR |
Enter fullscreen mode Exit fullscreen mode

Final Thought

We built Serenova because we believe AI can make mental health support more accessible, not as a replacement for human therapists, but as a bridge. A 3 AM companion. A judgment-free space to process your thoughts out loud.

Serenova proves that AI can be compassionate.

By combining:

  • Amazon Nova AI
  • AWS Cloud infrastructure
  • Human-centered design

We created a system that helps people feel heard, supported, and less alone.

Serenova isn’t just technology. It’s a companion for the mind.

Resources :

Amazon Nova Documentation

AWS App Runner Documentation

Amplify Documentation — AWS Amplify Gen 2 Documentation

Serenova is an AI companion and is not a replacement for professional mental health care. If you or someone you know is in crisis, call 988 or text HOME to 741741.

Thank you for taking the time to read my article! If you found it helpful, feel free to like, share, and drop your thoughts in the comments; I’d love to hear from you.

If you want to connect or dive deeper into cloud, AI and DevOps, feel free to follow me on my socials:

💼 LinkedIn

X (formerly Twitter)

👨‍💻 DEV Community

🛡️Medium

🐙 GitHub

🌍AWS Builder Profile

Top comments (0)