DEV Community

Ateeb Hussain
Ateeb Hussain

Posted on

Master Communication With AI: A Real-Time Voice Simulator for Developers Under Pressure

DEV Weekend Challenge: Community

This is a submission for the DEV Weekend Challenge: Community

The Community

I built this for developers, freelancers, and technically strong introverts who struggle with high-pressure communication.

Many of us can:

  • Write clearly
  • Think logically
  • Build complex systems

But freeze during:

  • Cold outreach calls
  • Interviews
  • Client pitches
  • Real-time conversations

The problem isn’t intelligence.
It’s speaking under pressure.

This project is built specifically for that community — builders who think well but want to speak better.


What I Built

I built a voice-first AI conversation simulator that lets users practice real conversations in a controlled but reactive environment.

The app allows you to:

  • Select a conversation mode (e.g., Cold Outreach, Normal Talk)
  • Speak naturally through your microphone
  • Receive real-time AI responses
  • Trigger dynamic UI feedback based on performance
  • Get conversation state updates (confidence, clarity, escalation)
  • Hear AI responses via text-to-speech

In Cold Outreach mode, the AI can:

  • Interrupt you
  • Escalate tone
  • End the call if clarity drops too low

The goal is not just conversation.
It’s simulated pressure training.

Instead of practicing what to say, users practice how to say it.


Demo

Live prototype:
https://syrup-mode-15103828.figma.site/

Video walkthrough:


Code

Backend and WebSocket server:

Voice-to-Voice AI Backend

Real-time voice call simulation with AI personalities via Socket.IO.

Setup

npm install
cp .env.example .env
# Add your ANTHROPIC_API_KEY to .env
npm run dev
Enter fullscreen mode Exit fullscreen mode

Architecture

Client clicks personality → socket connects with auth → call is live

Client (browser/app)
  │
  ├── io({ auth: { personalityId: "cold-professional" } })
  │   └── socket.on("call_started") ← { personalityId, personalityName }
  │
  ├── socket.emit("voice_message")   → AI generates reply (fast, concise)
  │   └── socket.on("voice_response") ← { text, personalityId }
  │
  └── socket.emit("analysis_tick")   → Every 5s with current text (or empty)
      └── socket.on("analysis_result") ← { status, message }

Both channels are independent — analysis never blocks voice responses.


Socket Events

Client → Server

Event Payload Description
voice_message { text } Send transcribed speech. Returns AI reply.
analysis_tick { text } 5s tick. text can be empty string.

Personality is set at connection time via auth, not as a separate event.

Server → Client

Frontend interaction logic:

The voice interaction pipeline is fully functional and runs in real time.


How I Built It

Architecture Overview

Voice Input

→ Browser MediaStream

→ Speech-to-Text

→ WebSocket Server

→ Gemma-3 (LLM inference)

→ Conversation State Classification

→ Frontend UI Updates

→ ElevenLabs Text-to-Speech

→ Audio Playback

Key Technologies

  • WebSockets (real-time bidirectional streaming)
  • Gemma-3 (LLM processing)
  • Speech-to-Text service
  • ElevenLabs (TTS synthesis)
  • Node.js backend
  • Figma Make (interaction prototype & UI layer)

Why WebSockets?

Traditional REST would introduce unnecessary latency for conversational flow.

Using WebSockets allowed:

  • Continuous streaming communication
  • Real-time state updates
  • Low-latency conversational response
  • Dynamic UI reactions synced with backend state

This made the “call” experience feel immediate and responsive without using a traditional telephony API.

The call layer itself is modeled using WebSockets instead of a phone API, giving full control over interaction timing and conversation states.

State Engine

The backend maintains a simple conversation state machine:

  • Listening
  • Processing
  • Responding
  • Evaluating
  • Escalating
  • Ending Call

These states directly drive UI changes (color shifts, orb reactions, feedback indicators).


Reflection

This weekend project explores a simple idea:

Communication confidence is built through repetition under pressure.

By combining real-time AI processing with reactive interface design, this prototype demonstrates how interaction design can simulate real-world conversational tension.

Future improvements would include:

  • Tone analysis based on speech rate and pitch
  • Filler word detection scoring
  • Detailed post-session analytics
  • Session history tracking

Thanks for reading — and if you're part of the community that thinks clearly but struggles to speak clearly, I’d love your feedback.

Top comments (3)

Collapse
 
maya_bayers profile image
Maya Bayers

This is such a thoughtful and genuinely useful project. You nailed the problem statement — so many developers are technically brilliant but freeze the moment a real human is on the other end of a call. The fact that the AI can actually interrupt, escalate tone, and even hang up in Cold Outreach mode is brilliant. That kind of unpredictable pressure is exactly what makes real conversations scary, and you've replicated it in a safe environment. Well done! 🎙️

Collapse
 
mateebhussain profile image
Ateeb Hussain

Glad to hear it! Which personality you loved?

Collapse
 
mateebhussain profile image
Ateeb Hussain

Drop a feedback