DEV Community

Cover image for Engineering Call of Customer: Building an AI-Powered Voice CX Battle Arena
Harish Kotra (he/him)
Harish Kotra (he/him)

Posted on

Engineering Call of Customer: Building an AI-Powered Voice CX Battle Arena

A deep-dive technical blueprint explaining real-time voice processing, procedural Web Audio creation, D3.js dark-mode visualization, and Gemini-driven structured scoring.


The Core Engineering Challenge

In customer experience (CX) training, static multiple-choice questionnaires fail to replicate the raw psychological tension of resolving a live customer dispute. We built Call of Customer to solve this exact problem: a full-stack, voice-dictated, peer-challenged training environment where users are put under a countdown timer to de-escalate aggressive, multi-tier-rage customer complaints.

This article reviews the architectural components, code implementations, and key engineering compromises chosen to bring this multiplayer gaming experience to life.


Architectural Overview

                               ┌────────────────────────────────┐
                               │     Web Speech Recognition     │
                               │   (Continuous Audio Stream)    │
                               └───────────────┬────────────────┘
                                               │
                                               ▼ Output: String Text
┌────────────────────────┐     ┌────────────────────────────────┐     ┌────────────────────────┐
│     Web Audio Synth    │ ◄───┤        React State Engine      ├────►│       D3 SVG Line      │
│ (Procedural Sound FX)  │     │   (Haptic & QA Calibration)    │     │  (Performance Chart)   │
└────────────────────────┘     └───────────────┬────────────────┘     └────────────────────────┘
                                               │
                                               ▼ Pipeline: Post Action
                               ┌────────────────────────────────┐
                               │     Express API (/api/rooms)   │
                               └───────────────┬────────────────┘
                                               │
                                               ▼ Payload: String Transcript
                               ┌────────────────────────────────┐
                               │   Server-Side Google GenAI     │
                               │   (JSON Structured Output)     │
                               └────────────────────────────────┘
Enter fullscreen mode Exit fullscreen mode

The system operates across three core domains:

  1. The Client Interface: Continuous client-side Speech Recognition (Web Speech API) and real-time canvas-rendered waveform indicators.
  2. Interactive Hardware Feedback: Procedural sound synthesis (Web Audio API) coupled with local haptic parameters (navigator.vibrate) representing game status.
  3. The Intelligent Grading Core: An ESM-compliant Node.js Express server that proxy-scores transcripts against contextual complaint records using client-hidden Gemini API credentials.

Deep Technical Drills & Code Snippets

Let's explore the four core systems enabling this performance training engine.

1. High-Fidelity Procedural Sound Synthesis (Web Audio API)

Using raw sound files slows client cold-starts and breaks theme customization. Call of Customer generates high-impact synth sounds dynamically using browser audio contexts:

// /src/utils/audio.ts
let audioCtx: AudioContext | null = null;

export function getAudioContext(): AudioContext {
  if (!audioCtx) {
    audioCtx = new (window.AudioContext || (window as any).webkitAudioContext)();
  }
  return audioCtx;
}

export function playSound(type: 'click' | 'success' | 'alert' | 'tick' | 'scoreReveal') {
  try {
    const isMuted = localStorage.getItem('call_of_customer_mute') === 'true';
    if (isMuted) return;

    const ctx = getAudioContext();
    const now = ctx.currentTime;

    if (type === 'click') {
      const osc = ctx.createOscillator();
      const gain = ctx.createGain();
      osc.type = 'triangle';
      osc.frequency.setValueAtTime(150, now);
      osc.frequency.exponentialRampToValueAtTime(80, now + 0.08);
      gain.gain.setValueAtTime(0.2, now);
      gain.gain.exponentialRampToValueAtTime(0.01, now + 0.08);
      osc.connect(gain);
      gain.connect(ctx.destination);
      osc.start(now);
      osc.stop(now + 0.08);
    } else if (type === 'success') {
      const osc1 = ctx.createOscillator();
      const osc2 = ctx.createOscillator();
      const gain = ctx.createGain();

      osc1.frequency.setValueAtTime(330, now);
      osc1.frequency.setValueAtTime(440, now + 0.08);
      osc1.frequency.setValueAtTime(660, now + 0.16);

      gain.gain.setValueAtTime(0.15, now);
      gain.gain.exponentialRampToValueAtTime(0.01, now + 0.4);

      osc1.connect(gain);
      gain.connect(ctx.destination);
      osc1.start(now);
      osc1.stop(now + 0.4);
    }
  } catch (err) {
    console.warn('Audio Context interaction blocked: ', err);
  }
}
Enter fullscreen mode Exit fullscreen mode

This ensures we have a Zero-Dependency, latency-free retro game-show response for clicks, timer countdowns, and performance score unveils.


2. Client-Side Speech Recognition & Quality Assured Inputs

Speech recognition can feel fragile if accidental or low-quality clicks are processed. The frontend leverages continuous speech synthesis triggers alongside deep string length constraints:

// /src/components/RecordingView.tsx
const handleSubmit = (force = false) => {
  const cleanTranscript = transcript.trim();

  if (!force) {
    if (!cleanTranscript) {
      setValidationError("QA Alert: Response is empty! Write or record a customer resolution first.");
      return;
    }

    const words = cleanTranscript.split(/\s+/).filter(Boolean);
    if (words.length < 3) {
      setValidationError("QA Alert: Resolution too brief! Please address customer points with at least 3 words.");
      return;
    }

    if (cleanTranscript.length < 12) {
      setValidationError("QA Alert: Operator response lacks detail! Must be at least 12 characters long.");
      return;
    }
  }

  setValidationError(null);
  onSubmit(cleanTranscript || "Default resolution waiver proposed.", selectedPowerup);
};
Enter fullscreen mode Exit fullscreen mode

3. Server-Side Difficulty Multiplier and LLM Structured Grading

Inside /server.ts, we implement evaluation parsing using the server-side Gemini API. The evaluated metrics (Empathy, Professionalism, Resolution, Clarity, Retention) are modified by difficulty multipliers. If the encounter is set to "Nightmare" mode, we multiply their final outputs by 1.5 to reward player bravery!

// /server.ts
const difficultyStr = (complaint.difficulty || 'medium').toLowerCase();
let diffMultiplier = 1.1;

if (difficultyStr === 'easy') diffMultiplier = 1.0;
else if (difficultyStr === 'medium') diffMultiplier = 1.1;
else if (difficultyStr === 'hard') diffMultiplier = 1.3;
else if (difficultyStr === 'nightmare') diffMultiplier = 1.5;

const originalScore = scoreResult.score;
const finalScore = Math.min(100, Math.floor(originalScore * diffMultiplier));

// Inject breakdown for rich UI visualization
scoreResult.originalScore = originalScore;
scoreResult.difficultyMultiplier = diffMultiplier;
scoreResult.difficultyApplied = difficultyStr;
scoreResult.score = finalScore;
Enter fullscreen mode Exit fullscreen mode

On the frontend, this multiplier is displayed in an eye-catching Difficulty Bonus pill:

// /src/components/ResultsView.tsx
{evaluation.difficultyMultiplier !== undefined && (
  <div className="mb-4 bg-amber-500/10 border border-amber-500/20 rounded-xl p-3 flex items-center justify-between">
    <div className="flex gap-2">
      <Sparkles className="w-4 h-4 text-amber-400" />
      <div className="text-left font-mono">
        <span className="text-[10px] uppercase">Difficulty Bonus Active</span>
        <span className="text-[8px] text-slate-400">Multiplier: x{evaluation.difficultyMultiplier}</span>
      </div>
    </div>
    <span className="bg-amber-500 text-slate-950 px-2 rounded font-black text-xs">
      +{Math.round((evaluation.originalScore || 100) * (evaluation.difficultyMultiplier - 1.0))} PTS
    </span>
  </div>
)}
Enter fullscreen mode Exit fullscreen mode

4. Vectorized Historical Calibration Charting (D3.js)

To visualize historical training progress across matches, we avoid messy static charting elements and use native, interactive D3.js SVG render loops:

// /src/components/AnalyticsChart.tsx
const xScale = d3.scaleLinear()
  .domain([0, data.length - 1])
  .range([paddingLeft, width - paddingRight]);

const yScale = d3.scaleLinear()
  .domain([0, 100])
  .range([height - paddingBottom, paddingTop]);

const empathyLineGenerator = d3.line<HistoryEntry>()
  .x((_, idx) => xScale(idx))
  .y(d => yScale(d.empathy ?? 50))
  .curve(d3.curveMonotoneX);

// We draw custom area overlays to render glowing neon gradient drops beneath paths
const empathyAreaGenerator = d3.area<HistoryEntry>()
  .x((_, idx) => xScale(idx))
  .y0(height - paddingBottom)
  .y1(d => yScale(d.empathy ?? 50))
  .curve(d3.curveMonotoneX);
Enter fullscreen mode Exit fullscreen mode

We have unified low-latency browser APIs, custom styling parameters, and LLM reasoning arrays into a robust full-stack solution optimized for modern browser execution frames.

Try it here: https://call-of-customer-783263164775.us-west1.run.app

Code & more: https://www.dailybuild.xyz/project/150-call-of-customer

Top comments (0)