This is a submission for the AssemblyAI Voice Agents Challenge
What I Built
KYC Admin Panel – Real-Time Voice KYC with AssemblyAI & Llama 3.3-70B
What I Built
We built a KYC Admin Panel that enables ultra-fast, accurate KYC form filling using real-time voice input.
Our app addresses the Real-Time Performance category by leveraging AssemblyAI’s Universal-Streaming for sub-300ms transcription latency, and Cloudflare’s Llama 3.3-70B Instruct LLM for instant, field-specific extraction and normalization.
Demo
YOUTUBE LINK - https://youtu.be/GUWSwAg18DY?si=V_SOyv-kxNI9gayr
HOST LINK - https://assembly-ai-voice-agents-challenge.vercel.app/
SCREENSHOT
GitHub Repository
https://github.com/NishikantaRay/AssemblyAI_Voice_Agents_Challenge
Technical Implementation & AssemblyAI Integration
Real-Time Voice Input
We use AssemblyAI’s Universal-Streaming WebSocket API for real-time transcription.
Key code (from src/app/page.js):
const wsUrl = wss://streaming.assemblyai.com/v3/ws?sample_rate=16000&formatted_finals=true&token=${token} ;
socket.current = new WebSocket(wsUrl);
const turns = {};
socket.current.onopen = async () => {
console.log("WebSocket connection established");
setIsRecording(true);
mediaStream.current = await navigator.mediaDevices.getUserMedia({
audio: true,
});
audioContext.current = new AudioContext({ sampleRate: 16000 });
const source = audioContext.current.createMediaStreamSource(
mediaStream.current
);
scriptProcessor.current = audioContext.current.createScriptProcessor(
4096,
1,
1
);
source.connect(scriptProcessor.current);
scriptProcessor.current.connect(audioContext.current.destination);
scriptProcessor.current.onaudioprocess = (event) => {
if (!socket.current || socket.current.readyState !== WebSocket.OPEN)
return;
const input = event.inputBuffer.getChannelData(0);
const buffer = new Int16Array(input.length);
for (let i = 0; i < input.length; i++) {
buffer[i] = Math.max(-1, Math.min(1, input[i])) * 0x7fff;
}
socket.current.send(buffer.buffer);
};
};
socket.current.onmessage = (event) => {
const message = JSON.parse(event.data);
if (message.type === "Turn") {
const { turn_order, transcript } = message;
turns[turn_order] = transcript;
const ordered = Object.keys(turns)
.sort((a, b) => Number(a) - Number(b))
.map((k) => turns[k])
.join(" ");
setTranscripts({ ...turns });
if (ordered.trim()) {
const fieldConfig = kycFields[category][fieldKey];
setLastTranscript(ordered.trim());
// Clear existing timer
if (lastActivityTimer.current) {
clearTimeout(lastActivityTimer.current);
}
// Set timer to process after 3 seconds of no new transcript updates
lastActivityTimer.current = setTimeout(() => {
processWhenSilent(ordered.trim(), fieldConfig, fieldKey, category);
}, 3000);
}
}
};
LLM-Powered Field Extraction
We use Cloudflare’s Llama 3.3-70B Instruct via a Worker for robust, prompt-based extraction:
const extractDataWithAI = async (transcript, fieldType, fieldKey) => {
const prompt = createPromptForField(transcript, fieldType, fieldKey);
const encodedPrompt = encodeURIComponent(prompt);
const apiUrl = https://old-poetry-937f.sumeetweb.workers.dev/?prompt=${encodedPrompt} ;
console.log("Making API call for", fieldKey, "with prompt:", prompt);
try {
const response = await fetch(apiUrl);
const data = await response.json();
console.log("API response data:", data);
if (data.success && data.response?.response) {
const extractedValue = data.response.response.trim();
console.log("Extracted value from API:", extractedValue);
return extractedValue;
} else {
console.error("API response was not successful:", data);
throw new Error("API response was not successful");
}
} catch (error) {
console.error("API call failed:", error);
return transcript;
}
};
Performance Benchmarking
End-to-end latency: consistently under 300ms from speech to field fill
Analytics: Usage, accuracy, and response times are tracked and visualized in AnalyticsPage.j
Team Submissions:
nishikantaray
sumeetweb
ayushmohanty24
Top comments (0)