KYC Admin Panel – Real-Time Voice KYC with AssemblyAI & Llama 3.3-70B

#devchallenge #assemblyaichallenge #ai #api

AssemblyAI Voice Agents Challenge: Real-Time

This is a submission for the AssemblyAI Voice Agents Challenge

What I Built

KYC Admin Panel – Real-Time Voice KYC with AssemblyAI & Llama 3.3-70B
What I Built

We built a KYC Admin Panel that enables ultra-fast, accurate KYC form filling using real-time voice input.

Our app addresses the Real-Time Performance category by leveraging AssemblyAI’s Universal-Streaming for sub-300ms transcription latency, and Cloudflare’s Llama 3.3-70B Instruct LLM for instant, field-specific extraction and normalization.

Demo

YOUTUBE LINK - https://youtu.be/GUWSwAg18DY?si=V_SOyv-kxNI9gayr

HOST LINK - https://assembly-ai-voice-agents-challenge.vercel.app/

SCREENSHOT

GitHub Repository

https://github.com/NishikantaRay/AssemblyAI_Voice_Agents_Challenge

Technical Implementation & AssemblyAI Integration

Real-Time Voice Input
We use AssemblyAI’s Universal-Streaming WebSocket API for real-time transcription.

Key code (from src/app/page.js):    
const wsUrl = ⁠ wss://streaming.assemblyai.com/v3/ws?sample_rate=16000&formatted_finals=true&token=${token} ⁠;
    socket.current = new WebSocket(wsUrl);

    const turns = {};

    socket.current.onopen = async () => {
      console.log("WebSocket connection established");
      setIsRecording(true);

      mediaStream.current = await navigator.mediaDevices.getUserMedia({
        audio: true,
      });
      audioContext.current = new AudioContext({ sampleRate: 16000 });

      const source = audioContext.current.createMediaStreamSource(
        mediaStream.current
      );
      scriptProcessor.current = audioContext.current.createScriptProcessor(
        4096,
        1,
        1
      );

      source.connect(scriptProcessor.current);
      scriptProcessor.current.connect(audioContext.current.destination);

      scriptProcessor.current.onaudioprocess = (event) => {
        if (!socket.current || socket.current.readyState !== WebSocket.OPEN)
          return;

        const input = event.inputBuffer.getChannelData(0);
        const buffer = new Int16Array(input.length);
        for (let i = 0; i < input.length; i++) {
          buffer[i] = Math.max(-1, Math.min(1, input[i])) * 0x7fff;
        }
        socket.current.send(buffer.buffer);
      };
    };

    socket.current.onmessage = (event) => {
      const message = JSON.parse(event.data);

      if (message.type === "Turn") {
        const { turn_order, transcript } = message;
        turns[turn_order] = transcript;

        const ordered = Object.keys(turns)
          .sort((a, b) => Number(a) - Number(b))
          .map((k) => turns[k])
          .join(" ");

        setTranscripts({ ...turns });

        if (ordered.trim()) {
          const fieldConfig = kycFields[category][fieldKey];
          setLastTranscript(ordered.trim());

          // Clear existing timer
          if (lastActivityTimer.current) {
            clearTimeout(lastActivityTimer.current);
          }

          // Set timer to process after 3 seconds of no new transcript updates
          lastActivityTimer.current = setTimeout(() => {
            processWhenSilent(ordered.trim(), fieldConfig, fieldKey, category);
          }, 3000);
        }
      }
    };
LLM-Powered Field Extraction
We use Cloudflare’s Llama 3.3-70B Instruct via a Worker for robust, prompt-based extraction:

const extractDataWithAI = async (transcript, fieldType, fieldKey) => {
    const prompt = createPromptForField(transcript, fieldType, fieldKey);
    const encodedPrompt = encodeURIComponent(prompt);
    const apiUrl = ⁠ https://old-poetry-937f.sumeetweb.workers.dev/?prompt=${encodedPrompt} ⁠;

    console.log("Making API call for", fieldKey, "with prompt:", prompt);

    try {
      const response = await fetch(apiUrl);
      const data = await response.json();

      console.log("API response data:", data);

      if (data.success && data.response?.response) {
        const extractedValue = data.response.response.trim();
        console.log("Extracted value from API:", extractedValue);
        return extractedValue;
      } else {
        console.error("API response was not successful:", data);
        throw new Error("API response was not successful");
      }
    } catch (error) {
      console.error("API call failed:", error);
      return transcript;
    }
  };

Performance Benchmarking
End-to-end latency: consistently under 300ms from speech to field fill
Analytics: Usage, accuracy, and response times are tracked and visualized in AnalyticsPage.j

Team Submissions:
nishikantaray
sumeetweb
ayushmohanty24