A deep dive into building a real-time, speech-to-text AI companion for mental health support using AWS Bedrock and Amazon Nova models.
A Story of Technology, Empathy, and AWS
The Beginning
Imagine a moment late at night.
Someone is feeling overwhelmed, anxious, or alone.
They want to talk to someone - but no one is available.
Therapy appointments take weeks.
Friends may be busy.
And sometimes, people just need someone to listen without judgment.
This is where Serenova was born.
Serenova is an AI-powered mental health companion designed to provide emotional support, guided conversations, and calming interactions anytime, anywhere.
And the entire experience is powered by AWS and Amazon Nova AI models.
The Vision
Our vision was simple:
Build an AI companion that feels safe, responsive, and human-like .
Serenova helps users:
- Share their thoughts
- Reduce stress
- Practice breathing exercises
- Receive supportive AI responses
- Detect emotional distress early
But to achieve this, we needed powerful AI and scalable infrastructure.
So we built Serenova on AWS Cloud.
Introduction
What if you could talk to an AI that actually listens not just to your words, but to the emotion in your voice?
That’s the idea behind Serenova. We built a voice-first mental health companion powered by Amazon Nova models on AWS Bedrock. It combines real-time speech-to-speech conversation (Nova Sonic), intelligent text chat (Nova Pro), and a calming UI designed to make users feel safe.
The Problem
Mental health support has a massive accessibility gap. Therapy is expensive, waitlists are long, and many people don’t feel comfortable reaching out. Text-based chatbots exist, but they miss something fundamental that the human voice carries: emotion that text can’t capture. The trembling, the pauses, the pace of speech all tell a story.
We wanted to build something that meets people where they are: a companion that’s always available, feels warm and safe, and can understand not just what you say but how you say it.
This post walks through the technical decisions, architecture, and lessons learned from building and deploying Serenova.
How Serenova Works (Powered by AWS)
AI Brain - Amazon Bedrock
At the heart of Serenova is Amazon Bedrock.
We use Amazon Nova models to power intelligent conversations.
Why Amazon Nova?
We chose Amazon Nova models for three reasons:
Nova Sonic - Speech-to-Speech
Traditional voice AI requires three separate services: speech-to-text, language model, text-to-speech. Each adds latency and loses context. Nova Sonic (amazon.nova-sonic-v1:0) handles the entire pipeline in one model - user speaks, AI responds with voice, under 500ms. It also exposes audio features (pitch, volume, pace) that we use for emotion detection.
Nova Pro - Text Chat Fallback
Not every user has a microphone, and not every environment supports voice. Nova Pro (amazon.nova-pro-v1:0) provides intelligent text chat via the Bedrock Converse API. We use it as the primary model in our production deployment on App Runner.
Nova Lite - Fast Responses
For quick, lightweight interactions where speed matters more than depth, Nova Lite (amazon.nova-lite-v1:0) handles fast text responses.
ModelModel IDLatencyUse CaseNova Sonicamazon.nova-sonic-v1:0<500msVoice conversationNova Proamazon.nova-pro-v1:0~300msText chat, reasoningNova Liteamazon.nova-lite-v1:0~150msQuick responses
Architecture
Backend Intelligence AWS App Runner
The Serenova backend runs on AWS App Runner.
Why App Runner?
Because it allows us to:
- Deploy FastAPI services easily
- Automatically scale with traffic
- Handle secure API communication
The backend connects the frontend to Amazon Bedrock to generate AI responses.
React frontend on AWS Amplify talks to a FastAPI backend on App Runner. The backend uses Nova Pro for text chat. Browser SpeechRecognition handles voice-to-text on the client side.
Frontend Experience AWS Amplify
The Serenova interface is built with React and hosted on AWS Amplify.
Amplify provides:
- Fast global hosting
- CI/CD deployment
- Secure HTTPS access
Users can simply open the website and immediately start interacting with Serenova.
Container Deployment Amazon ECR + Docker
To solve deployment challenges, we packaged the backend using Docker.
The container image is stored in: Amazon Elastic Container Registry (ECR)
App Runner then pulls the image directly from ECR and runs the application reliably.
Local Development: Full backend with Nova Sonic voice streaming over WebSocket, emotion detection, crisis detection, and MCP tool integration.
Three-Tier Graceful Degradation
We designed the system to never break:
- Tier 1: Full Voice — Nova Sonic speech-to-speech, <500ms latency
- Tier 2: Text Chat — Nova Pro via Converse API, browser SpeechRecognition
- Tier 3: Demo Mode — friendly message, crisis resources still available
If Nova Sonic isn’t available, it falls to text. If the backend is down, it enters demo mode. The user always sees a working app.
The Voice Orb
Voice Interaction Web Audio + AI
Serenova introduces a Voice Orb , a glowing animated circle that reacts to the user’s voice.
When a user speaks:
1️⃣ Browser captures voice
2️⃣ AI analyzes speech patterns
3️⃣ Serenova generates a supportive response
This creates a natural, human-like conversation experience.
How It Works
// Web Audio API for mic level monitoring
const audioContext = new AudioContext();
const analyser = audioContext.createAnalyser();
analyser.fftSize = 256;
const source = audioContext.createMediaStreamSource(stream);
source.connect(analyser);
// Read frequency data every frame
const dataArray = new Uint8Array(analyser.frequencyBinCount);
analyser.getByteFrequencyData(dataArray);
const avg = dataArray.reduce((a, b) =\> a + b, 0) / dataArray.length;
const level = Math.min(100, Math.round((avg / 128) * 100));
The orb scales based on input level, creating a visual feedback loop that makes speaking feel responsive. It has four states:
| State | Animation | Meaning |
| ---------- | ------------------------------- | ----------------- |
| Idle | Soft pulse, mic icon | Ready to listen |
| Listening | Animated bars, glow intensifies | Recording voice |
| Thinking | Bouncing dots | AI processing |
| Responding | Wave animation | AI response ready |
The dark theme with glassmorphism and ambient background orbs creates a calm, safe atmosphere important for a mental health app.
Support Modes and Prompt Engineering
Each support mode has a carefully crafted system prompt that shapes the AI’s personality:
mode_prompts = {
"crisis": "You are a crisis support counselor. Be calm, direct, and prioritize safety.",
"cbt": "You are a CBT therapist. Help identify thought patterns and suggest reframing.",
"regulation": "You are an emotion regulation coach. Guide breathing and grounding techniques.",
"companion": "You are a warm, empathetic mental health companion. Listen actively, validate feelings. Keep responses short (2-3 sentences).",
}
The key insight: mental health prompts need to be warm without being patronizing, supportive without being prescriptive. We iterated on these prompts extensively. Short responses (2–3 sentences) work better than long ones — they feel more like a real conversation.
Breathing Exercise
The Regulation mode automatically activates a guided breathing exercise:
- Inhale: 4 seconds
- Hold: 4 seconds
- Exhale: 6 seconds
This 4–4–6 pattern is based on clinical breathing techniques. The animated circle with countdown gives users a visual anchor.
Crisis Detection
Safety is non-negotiable in a mental health app. Serenova monitors every message for crisis indicators:
crisis_keywords = ["hurt", "harm", "suicide", "kill", "die", "end it"]
detected = any(keyword in text.lower() for keyword in crisis_keywords)
When detected:
- A red banner appears at the top of the screen
- A modal opens with crisis resources (988 Lifeline, Crisis Text Line, SAMHSA, Veterans Crisis Line, Trevor Project)
- The 🆘 button is always accessible in the header
This is a keyword-based first layer. The full local backend adds Bedrock AI analysis and emotion-based detection as additional layers.
The Deployment Battle: App Runner and Docker
This was our biggest technical challenge. We needed to deploy a Python FastAPI backend to AWS App Runner. Sounds simple. It wasn’t.
What Failed: Source-Based Deployment
App Runner’s Python runtime builds your code in one environment and runs it in another. Pip-installed binaries like uvicorn aren't in PATH at runtime.
We tried everything:
- uvicorn main:app → "executable not found in $PATH"
- python3 -m uvicorn main:app → same error
- /usr/local/bin/uvicorn → still failed
What Worked: Docker via ECR
The solution was building our own Docker image:
FROM python:3.11-slim
WORKDIR /app
COPY backend/requirements.txt .
RUN pip install --no-cache-dir --upgrade pip && \
pip install --no-cache-dir -r requirements.txt
COPY backend/main.py .
EXPOSE 8080
CMD ["uvicorn", "main:app", "--host", "0.0.0.0", "--port", "8080"]
Inside the Docker container, uvicorn is properly installed and in PATH. We push the image to Amazon ECR, and App Runner pulls it. No PATH issues, no build/run environment mismatch.
The Standalone Backend
We also created a separate backend/main.py a minimal FastAPI app with zero complex imports. No Nova Sonic SDK dependency, no relative imports, just boto3 for Bedrock. This made the Docker image small and the deployment reliable.
Nova Pro Integration: The Converse API
The text chat uses Bedrock’s Converse API, which provides a clean conversation interface:
response = bedrock_client.converse(
modelId="amazon.nova-pro-v1:0",
messages=conversation_history, # sliding window of 20 messages
inferenceConfig={
"maxTokens": 300,
"temperature": 0.7,
"topP": 0.9,
},
)
assistant_text = response["output"]["message"]["content"][0]["text"]
We keep a 20-message sliding window per session. This gives the AI enough context to maintain a coherent conversation without unbounded memory growth.
Temperature 0.7 gives responses that are warm and varied without being unpredictable — important for mental health where consistency builds trust.
Lessons Learned
1. Docker solves deployment headaches
If your deployment platform has environment quirks, skip the source-based approach and use Docker. You control the entire runtime.
2. Browser APIs are powerful
SpeechRecognition and Web Audio API gave us voice features with zero backend dependencies. The browser is an underrated platform.
3. Graceful degradation is a feature
Users should never see a broken app. Design for failure from the start — voice fails? use text. Backend down? demo mode. Always show something useful.
4. Keep production simple
The standalone backend/main.py with 5 dependencies was the key to a successful deployment. The full backend with Nova Sonic, MCP tools, and emotion detection is for local dev.
5. Mental health AI needs careful tone
The difference between “I understand you’re feeling sad” and “That sounds really tough. I’m here.” is huge. Short, warm, human-sounding responses work best.
What’s Next
- Full Nova Sonic voice streaming in production (ECS or App Runner with larger image)
- Persistent conversation history with DynamoDB
- Real-time emotion detection from audio features
- Mobile app (React Native)
- Multi-language support
- AI-guided journaling with mood tracking
- Optional therapist dashboard for professional oversight
Tech Stack
| Layer | Technology |
|------|---------|
| AI | Amazon Nova Sonic, Nova Pro, Nova Lite (Bedrock)|
| Backend | Python 3.11, FastAPI, Uvicorn, boto3|
| Frontend |React 18, CSS3, Web Audio API, SpeechRecognition API |
| Hosting | AWS App Runner (Docker/ECR), AWS Amplify |
| Infrastructure | AWS CDK, DynamoDB, S3, EventBridge, SNS, CloudWatch |
| Container | Docker, Amazon ECR |
Final Thought
We built Serenova because we believe AI can make mental health support more accessible, not as a replacement for human therapists, but as a bridge. A 3 AM companion. A judgment-free space to process your thoughts out loud.
Serenova proves that AI can be compassionate.
By combining:
- Amazon Nova AI
- AWS Cloud infrastructure
- Human-centered design
We created a system that helps people feel heard, supported, and less alone.
Serenova isn’t just technology. It’s a companion for the mind.
Resources :
Amplify Documentation — AWS Amplify Gen 2 Documentation
Serenova is an AI companion and is not a replacement for professional mental health care. If you or someone you know is in crisis, call 988 or text HOME to 741741.
Thank you for taking the time to read my article! If you found it helpful, feel free to like, share, and drop your thoughts in the comments; I’d love to hear from you.
If you want to connect or dive deeper into cloud, AI and DevOps, feel free to follow me on my socials:
👨💻 DEV Community
🛡️Medium
🐙 GitHub

Top comments (0)