Skip to content

DEV Community

Ateeb Hussain

Posted on Mar 1

Master Communication With AI: A Real-Time Voice Simulator for Developers Under Pressure

#devchallenge #weekendchallenge #showdev

DEV Weekend Challenge: Community

This is a submission for the DEV Weekend Challenge: Community

The Community

I built this for developers, freelancers, and technically strong introverts who struggle with high-pressure communication.

Many of us can:

Write clearly
Think logically
Build complex systems

But freeze during:

Cold outreach calls
Interviews
Client pitches
Real-time conversations

The problem isn’t intelligence.
It’s speaking under pressure.

This project is built specifically for that community — builders who think well but want to speak better.

What I Built

I built a voice-first AI conversation simulator that lets users practice real conversations in a controlled but reactive environment.

The app allows you to:

Select a conversation mode (e.g., Cold Outreach, Normal Talk)
Speak naturally through your microphone
Receive real-time AI responses
Trigger dynamic UI feedback based on performance
Get conversation state updates (confidence, clarity, escalation)
Hear AI responses via text-to-speech

In Cold Outreach mode, the AI can:

Interrupt you
Escalate tone
End the call if clarity drops too low

The goal is not just conversation.
It’s simulated pressure training.

Instead of practicing what to say, users practice how to say it.

Demo

Live prototype:
https://syrup-mode-15103828.figma.site/

Video walkthrough:

Code

Backend and WebSocket server:

AteebHussain2 / Master-Communication-AI-Backend

Voice-to-Voice AI Backend

Real-time voice call simulation with AI personalities via Socket.IO.

Setup

npm install
cp .env.example .env
# Add your ANTHROPIC_API_KEY to .env
npm run dev

Architecture

Client clicks personality → socket connects with auth → call is live

Client (browser/app)
  │
  ├── io({ auth: { personalityId: "cold-professional" } })
  │   └── socket.on("call_started") ← { personalityId, personalityName }
  │
  ├── socket.emit("voice_message")   → AI generates reply (fast, concise)
  │   └── socket.on("voice_response") ← { text, personalityId }
  │
  └── socket.emit("analysis_tick")   → Every 5s with current text (or empty)
      └── socket.on("analysis_result") ← { status, message }

Both channels are independent — analysis never blocks voice responses.

Socket Events

Client → Server

Event	Payload	Description
`voice_message`	`{ text }`	Send transcribed speech. Returns AI reply.
`analysis_tick`	`{ text }`	5s tick. `text` can be empty string.

Personality is set at connection time via auth, not as a separate event.

Server → Client

…

Frontend interaction logic:

The voice interaction pipeline is fully functional and runs in real time.

How I Built It

Architecture Overview

Voice Input

→ Browser MediaStream

→ Speech-to-Text

→ WebSocket Server

→ Gemma-3 (LLM inference)

→ Conversation State Classification

→ Frontend UI Updates

→ ElevenLabs Text-to-Speech

→ Audio Playback

Key Technologies

WebSockets (real-time bidirectional streaming)
Gemma-3 (LLM processing)
Speech-to-Text service
ElevenLabs (TTS synthesis)
Node.js backend
Figma Make (interaction prototype & UI layer)

Why WebSockets?

Traditional REST would introduce unnecessary latency for conversational flow.

Using WebSockets allowed:

Continuous streaming communication
Real-time state updates
Low-latency conversational response
Dynamic UI reactions synced with backend state

This made the “call” experience feel immediate and responsive without using a traditional telephony API.

The call layer itself is modeled using WebSockets instead of a phone API, giving full control over interaction timing and conversation states.

State Engine

The backend maintains a simple conversation state machine:

Listening
Processing
Responding
Evaluating
Escalating
Ending Call

These states directly drive UI changes (color shifts, orb reactions, feedback indicators).

Reflection

This weekend project explores a simple idea:

Communication confidence is built through repetition under pressure.

By combining real-time AI processing with reactive interface design, this prototype demonstrates how interaction design can simulate real-world conversational tension.

Future improvements would include:

Tone analysis based on speech rate and pitch
Filler word detection scoring
Detailed post-session analytics
Session history tracking

Thanks for reading — and if you're part of the community that thinks clearly but struggles to speak clearly, I’d love your feedback.

Top comments (3)

Subscribe

Maya Bayers • Mar 2

This is such a thoughtful and genuinely useful project. You nailed the problem statement — so many developers are technically brilliant but freeze the moment a real human is on the other end of a call. The fact that the AI can actually interrupt, escalate tone, and even hang up in Cold Outreach mode is brilliant. That kind of unpredictable pressure is exactly what makes real conversations scary, and you've replicated it in a safe environment. Well done! 🎙️

Ateeb Hussain • Mar 2

Glad to hear it! Which personality you loved?

Ateeb Hussain • Mar 1

Drop a feedback