DEV Community

Cover image for Veew - Real-time video calling with live captioning, minutes recording, and speaker diarization.
sahra 💫
sahra 💫 Subscriber

Posted on

Veew - Real-time video calling with live captioning, minutes recording, and speaker diarization.

AssemblyAI Voice Agents Challenge: Real-Time

This is a submission for the AssemblyAI Voice Agents Challenge

What I Built

Veew is a real-time video communication platform that connects users through video calls and enhances the experience with live captioning, automatic minutes generation and speaker diarization at sub-300ms latency. This project prioritizes a fast and responsive voice experience, ensuring captions for every spoken word is delivered to all participants in real time, offering an inclusive solution for individuals with auditory impairments, enabling them to fully participate in video calls.

Demo

Live Site

GitHub Repository

Veew - Simplifying Communication

Veew is a video communication platform, which utilizes the Assemblyai's universal streaming api to auto generate live video captions with speaker diarizations.

Features

  • Create room: This allows users to start a video channel
  • Join room: Users can join an already created room to connect with other participants.
  • Live Captioning: Users can enable live captions during a video call.



Technical Implementation & AssemblyAI Integration

AssemblyAI's Universal Streaming played a pivotal role in turning the vision for this application into reality. By providing real-time, speaker-diarized transcription capabilities, it enabled the seamless generation of live video captions with high accuracy. This technology also made it possible to automatically produce well-structured meeting minutes, enhancing both accessibility and post-call productivity.

Below is a snippet of how I integrated AssemblyAI into the application to generate the live captions, as well as the meeting minutes

const startTranscription = useCallback(async () => {
  try {
    // Reset any previous error and set connection status
    setError(null);
    setConnectionStatus('connecting');

    // Fetch authentication token
    const token = await getToken();
    if (!token) return;

    // Create WebSocket connection with AssemblyAI using token and transcription parameters
    const wsUrl = `wss://streaming.assemblyai.com/v3/ws?sample_rate=16000&speaker_diarization=true&formatted_finals=true&token=${token}`;
    socket.current = new WebSocket(wsUrl);

    // When WebSocket connection is successfully opened
    socket.current.onopen = async () => {
      console.log('🔰🔰🔰AssemblyAI WebSocket connected');
      setIsConnected(true);
      setConnectionStatus('connected');
      setIsListening(true);

      // Access user's microphone
      mediaStream.current = await navigator.mediaDevices.getUserMedia({ audio: true });

      // Create audio context with sample rate matching AssemblyAI
      audioContext.current = new AudioContext({ sampleRate: 16000 });

      // Create a media stream source and script processor node
      const source = audioContext.current.createMediaStreamSource(mediaStream.current);
      scriptProcessor.current = audioContext.current.createScriptProcessor(4096, 1, 1);

      // Connect the audio nodes
      source.connect(scriptProcessor.current);
      scriptProcessor.current.connect(audioContext.current.destination);

      // Process and send audio data on each audio processing event
      scriptProcessor.current.onaudioprocess = (event) => {
        if (!socket.current || socket.current.readyState !== WebSocket.OPEN) return;

        const input = event.inputBuffer.getChannelData(0);
        const buffer = new ArrayBuffer(input.length * 2);
        const view = new DataView(buffer);

        // Convert audio float samples to 16-bit PCM
        for (let i = 0; i < input.length; i++) {
          const s = Math.max(-1, Math.min(1, input[i]));
          view.setInt16(i * 2, s < 0 ? s * 0x8000 : s * 0x7fff, true); // little-endian
        }

        // Send the audio buffer to the WebSocket
        socket.current.send(buffer);
      };
    };

    // Handle incoming messages from AssemblyAI WebSocket
    socket.current.onmessage = (event) => {
      console.log("⬅️⬅️⬅️ AssemblyAI says:", event.data);

      try {
        const message = JSON.parse(event.data);
        console.log("🟢🟢🟢Parsed message:", message);

        // Handle live partial transcript (for real-time display only)
        if (message.type === 'PartialTranscript') {
          const { text, speaker, created } = message;
          const timestamp = new Date(created || Date.now()).toLocaleTimeString();

          setPartialTranscript({
            text: text || '',
            speaker: speaker || 'Unknown',
            timestamp,
            type: 'partial'
          });
          return;
        }

        // Handle final transcript (Turn or FinalTranscript)
        if (message.type === 'Turn' || message.message_type === 'FinalTranscript') {
          const transcriptText = message.transcript || message.text || '';
          const speakerLabel = `${currentSpeakerRef.current}`;
          const timestamp = new Date(message.created || Date.now()).toLocaleTimeString();
          const transcriptId = message.id || Date.now().toString();

          const finalTranscript: Transcript = {
            text: transcriptText,
            speaker: speakerLabel,
            timestamp,
            id: transcriptId,
            type: 'final'
          };

          // Save the final transcript
          setTranscripts(() => ({
            [transcriptId]: finalTranscript
          }));

          // Clear the partial transcript display
          setPartialTranscript(null);

          // Update speaker statistics
          setSpeakers(prev => ({
            ...prev,
            [speakerLabel]: {
              name: speakerLabel,
              lastSeen: timestamp,
              totalMessages: (prev[speakerLabel]?.totalMessages || 0) + 1
            }
          }));

          // Add final transcript to minutes buffer if session is active
          if (minutesInSessionRef.current) {
            setMinutesBuffer(prev => [...prev, finalTranscript]);
          }
        }

      } catch (e) {
        console.error('Error parsing message:', e);
      }
    };

    // Handle WebSocket errors
    socket.current.onerror = (e) => {
      console.error('WebSocket error:', e);
      setError('WebSocket error');
      stopTranscription(); // Gracefully stop transcription on error
    };

    // Handle WebSocket close
    socket.current.onclose = () => {
      console.log('WebSocket closed');
      setIsConnected(false);
      setConnectionStatus('disconnected');
    };

  } catch (err) {
    console.error('startTranscription error:', err);
    setError('Failed to start transcription');
  }
}, []);
Enter fullscreen mode Exit fullscreen mode

The complete code for this project can be found in the linked repository.

This was an amazing challenge to participate in, and I'd like to thank the AssemblyAI, as well as the DEV Team, for putting it together.

Top comments (0)