This is a submission for the AssemblyAI Voice Agents Challenge
What I Built
KYC Admin Panel – Real-Time Voice KYC with AssemblyAI & Llama 3.3-70B
What I Built
We built a KYC Admin Panel that enables ultra-fast, accurate KYC form filling using real-time voice input.
Our app addresses the Real-Time Performance category by leveraging AssemblyAI’s Universal-Streaming for sub-300ms transcription latency, and Cloudflare’s Llama 3.3-70B Instruct LLM for instant, field-specific extraction and normalization.
Demo
YOUTUBE LINK - https://youtu.be/GUWSwAg18DY?si=V_SOyv-kxNI9gayr
HOST LINK - https://assembly-ai-voice-agents-challenge.vercel.app/
SCREENSHOT
GitHub Repository
https://github.com/NishikantaRay/AssemblyAI_Voice_Agents_Challenge
Technical Implementation & AssemblyAI Integration
Real-Time Voice Input
We use AssemblyAI’s Universal-Streaming WebSocket API for real-time transcription.
Key code (from src/app/page.js):    
const wsUrl =  wss://streaming.assemblyai.com/v3/ws?sample_rate=16000&formatted_finals=true&token=${token} ;
    socket.current = new WebSocket(wsUrl);
    const turns = {};
    socket.current.onopen = async () => {
      console.log("WebSocket connection established");
      setIsRecording(true);
      mediaStream.current = await navigator.mediaDevices.getUserMedia({
        audio: true,
      });
      audioContext.current = new AudioContext({ sampleRate: 16000 });
      const source = audioContext.current.createMediaStreamSource(
        mediaStream.current
      );
      scriptProcessor.current = audioContext.current.createScriptProcessor(
        4096,
        1,
        1
      );
      source.connect(scriptProcessor.current);
      scriptProcessor.current.connect(audioContext.current.destination);
      scriptProcessor.current.onaudioprocess = (event) => {
        if (!socket.current || socket.current.readyState !== WebSocket.OPEN)
          return;
        const input = event.inputBuffer.getChannelData(0);
        const buffer = new Int16Array(input.length);
        for (let i = 0; i < input.length; i++) {
          buffer[i] = Math.max(-1, Math.min(1, input[i])) * 0x7fff;
        }
        socket.current.send(buffer.buffer);
      };
    };
    socket.current.onmessage = (event) => {
      const message = JSON.parse(event.data);
      if (message.type === "Turn") {
        const { turn_order, transcript } = message;
        turns[turn_order] = transcript;
        const ordered = Object.keys(turns)
          .sort((a, b) => Number(a) - Number(b))
          .map((k) => turns[k])
          .join(" ");
        setTranscripts({ ...turns });
        if (ordered.trim()) {
          const fieldConfig = kycFields[category][fieldKey];
          setLastTranscript(ordered.trim());
          // Clear existing timer
          if (lastActivityTimer.current) {
            clearTimeout(lastActivityTimer.current);
          }
          // Set timer to process after 3 seconds of no new transcript updates
          lastActivityTimer.current = setTimeout(() => {
            processWhenSilent(ordered.trim(), fieldConfig, fieldKey, category);
          }, 3000);
        }
      }
    };
LLM-Powered Field Extraction
We use Cloudflare’s Llama 3.3-70B Instruct via a Worker for robust, prompt-based extraction:
const extractDataWithAI = async (transcript, fieldType, fieldKey) => {
    const prompt = createPromptForField(transcript, fieldType, fieldKey);
    const encodedPrompt = encodeURIComponent(prompt);
    const apiUrl =  https://old-poetry-937f.sumeetweb.workers.dev/?prompt=${encodedPrompt} ;
    console.log("Making API call for", fieldKey, "with prompt:", prompt);
    try {
      const response = await fetch(apiUrl);
      const data = await response.json();
      console.log("API response data:", data);
      if (data.success && data.response?.response) {
        const extractedValue = data.response.response.trim();
        console.log("Extracted value from API:", extractedValue);
        return extractedValue;
      } else {
        console.error("API response was not successful:", data);
        throw new Error("API response was not successful");
      }
    } catch (error) {
      console.error("API call failed:", error);
      return transcript;
    }
  };
Performance Benchmarking
End-to-end latency: consistently under 300ms from speech to field fill
Analytics: Usage, accuracy, and response times are tracked and visualized in AnalyticsPage.j
Team Submissions:
nishikantaray
sumeetweb
ayushmohanty24
 
 
              





 
    
Top comments (0)