This is a submission for the DEV Weekend Challenge: Community
The Community
I built this for developers, freelancers, and technically strong introverts who struggle with high-pressure communication.
Many of us can:
- Write clearly
- Think logically
- Build complex systems
But freeze during:
- Cold outreach calls
- Interviews
- Client pitches
- Real-time conversations
The problem isn’t intelligence.
It’s speaking under pressure.
This project is built specifically for that community — builders who think well but want to speak better.
What I Built
I built a voice-first AI conversation simulator that lets users practice real conversations in a controlled but reactive environment.
The app allows you to:
- Select a conversation mode (e.g., Cold Outreach, Normal Talk)
- Speak naturally through your microphone
- Receive real-time AI responses
- Trigger dynamic UI feedback based on performance
- Get conversation state updates (confidence, clarity, escalation)
- Hear AI responses via text-to-speech
In Cold Outreach mode, the AI can:
- Interrupt you
- Escalate tone
- End the call if clarity drops too low
The goal is not just conversation.
It’s simulated pressure training.
Instead of practicing what to say, users practice how to say it.
Demo
Live prototype:
https://syrup-mode-15103828.figma.site/
Video walkthrough:
Code
Backend and WebSocket server:
Voice-to-Voice AI Backend
Real-time voice call simulation with AI personalities via Socket.IO.
Setup
npm install
cp .env.example .env
# Add your ANTHROPIC_API_KEY to .env
npm run dev
Architecture
Client clicks personality → socket connects with auth → call is live
Client (browser/app)
│
├── io({ auth: { personalityId: "cold-professional" } })
│ └── socket.on("call_started") ← { personalityId, personalityName }
│
├── socket.emit("voice_message") → AI generates reply (fast, concise)
│ └── socket.on("voice_response") ← { text, personalityId }
│
└── socket.emit("analysis_tick") → Every 5s with current text (or empty)
└── socket.on("analysis_result") ← { status, message }
Both channels are independent — analysis never blocks voice responses.
Socket Events
Client → Server
| Event | Payload | Description |
|---|---|---|
voice_message |
{ text } |
Send transcribed speech. Returns AI reply. |
analysis_tick |
{ text } |
5s tick. text can be empty string. |
Personality is set at connection time via
auth, not as a separate event.
Server → Client
…Frontend interaction logic:
The voice interaction pipeline is fully functional and runs in real time.
How I Built It
Architecture Overview
Voice Input
→ Browser MediaStream
→ Speech-to-Text
→ WebSocket Server
→ Gemma-3 (LLM inference)
→ Conversation State Classification
→ Frontend UI Updates
→ ElevenLabs Text-to-Speech
→ Audio Playback
Key Technologies
- WebSockets (real-time bidirectional streaming)
- Gemma-3 (LLM processing)
- Speech-to-Text service
- ElevenLabs (TTS synthesis)
- Node.js backend
- Figma Make (interaction prototype & UI layer)
Why WebSockets?
Traditional REST would introduce unnecessary latency for conversational flow.
Using WebSockets allowed:
- Continuous streaming communication
- Real-time state updates
- Low-latency conversational response
- Dynamic UI reactions synced with backend state
This made the “call” experience feel immediate and responsive without using a traditional telephony API.
The call layer itself is modeled using WebSockets instead of a phone API, giving full control over interaction timing and conversation states.
State Engine
The backend maintains a simple conversation state machine:
- Listening
- Processing
- Responding
- Evaluating
- Escalating
- Ending Call
These states directly drive UI changes (color shifts, orb reactions, feedback indicators).
Reflection
This weekend project explores a simple idea:
Communication confidence is built through repetition under pressure.
By combining real-time AI processing with reactive interface design, this prototype demonstrates how interaction design can simulate real-world conversational tension.
Future improvements would include:
- Tone analysis based on speech rate and pitch
- Filler word detection scoring
- Detailed post-session analytics
- Session history tracking
Thanks for reading — and if you're part of the community that thinks clearly but struggles to speak clearly, I’d love your feedback.
Top comments (3)
This is such a thoughtful and genuinely useful project. You nailed the problem statement — so many developers are technically brilliant but freeze the moment a real human is on the other end of a call. The fact that the AI can actually interrupt, escalate tone, and even hang up in Cold Outreach mode is brilliant. That kind of unpredictable pressure is exactly what makes real conversations scary, and you've replicated it in a safe environment. Well done! 🎙️
Glad to hear it! Which personality you loved?
Drop a feedback