This is a submission for the AssemblyAI Voice Agents Challenge.
What I Built
I built GetAutoCue, a professional, browser-based teleprompter designed to eliminate the most common frustration for presenters: unnatural pacing. It uses AssemblyAI's Universal-Streaming model to listen to your voice in real-time, automatically scrolling the script to match your natural speaking speed.
This project is being submitted to the Real-Time Performance prompt by creating a highly responsive, low-latency voice experience that makes presenting feel seamless and intuitive. Whether you're recording a video, giving a speech, or practicing a presentation, GetAutoCue ensures the script is always exactly where you need it to be.
Demo
- Live App: https://getautocue.vercel.app
- Video Walkthrough:
Key Features in Action
1. Flawless Voice-Activated Scrolling
The app actively listens, highlighting the current word and dimming spoken words, while scrolling the viewport to keep you perfectly on track.
Paste your script and make edits without ever leaving the teleprompter view.
3. Professional Display Controls
Full control over font size, colors, and horizontal/vertical mirroring for professional beam-splitter glass setups.
GitHub Repository
Technical Implementation & AssemblyAI Integration
The core of GetAutoCue is the useVoiceMode
hook, which seamlessly integrates AssemblyAI's real-time streaming capabilities into the React/Next.js front-end.
1. Establishing the Real-Time Connection
When the user clicks on "Start Listening" in the voice mode of the app, it first fetches a temporary authentication token from a NextJS api route which calls AssemblyAI's API endpoint. This token is then used to establish a secure WebSocket connection to AssemblyAI.
// Now proceed with the actual connection setup
const response = await fetch('/api/assemblyai/token');
const data = await response.json();
if (!data.token) throw new Error('Failed to get AssemblyAI token');
const wsUrl = `wss://streaming.assemblyai.com/v3/ws?sample_rate=16000&token=${data.token}`;
socketRef.current = new WebSocket(wsUrl);
socketRef.current.onopen = () => {
setIsConnected(true);
};
2. Streaming Audio from the Browser
I used the navigator.mediaDevices.getUserMedia
API to capture microphone input with a 16000
sample rate to match AssemblyAI's requirements. A MediaRecorder
instance then chunks this audio stream into manageable blobs, which are sent through the WebSocket. This ensures a continuous, low-latency flow of data.
// Set up audio processing for PCM data
const stream = await navigator.mediaDevices.getUserMedia({
audio: {
sampleRate: 16000,
channelCount: 1,
echoCancellation: true,
noiseSuppression: true
}
});
// Create audio context for processing
const audioContext = new AudioContext({ sampleRate: 16000 });
const source = audioContext.createMediaStreamSource(stream);
const processor = audioContext.createScriptProcessor(4096, 1, 1);
processor.onaudioprocess = (event) => {
if (socketRef.current?.readyState === WebSocket.OPEN) {
const inputData = event.inputBuffer.getChannelData(0);
// Convert float32 audio data to int16 PCM
const pcmData = new Int16Array(inputData.length);
for (let i = 0; i < inputData.length; i++) {
// Clamp the value to [-1, 1] and convert to 16-bit integer
const sample = Math.max(-1, Math.min(1, inputData[i]));
pcmData[i] = sample * 0x7FFF;
}
socketRef.current.send(pcmData.buffer);
}
};
3. Processing Transcripts and Syncing the UI
The magic happens when the WebSocket sends back transcript data. The onmessage
handler listens for both partial and final transcripts returned from Assembly AI's universal streaming model.
To achieve the dynamic highlighting and scrolling, I implemented a fuzzy matching algorithm using fuse.js
. As transcripts arrive, the app:
- Identifies the last spoken word in the transcript.
- Finds its corresponding position in the full script.
- Updates the UI state to highlight the next word as "current" and dim all previous words as "spoken."
- Calculates the progress through the script and smoothly scrolls the container to keep the current word in view.
This approach creates a robust and natural-feeling experience, as the teleprompter doesn't jump but flows with the speaker.
socketRef.current.onmessage = (event) => {
const message = JSON.parse(event.data);
// Handle different message types from AssemblyAI v3 API
if (message.message_type === 'PartialTranscript') {
handlePartialTranscript(message.text);
} else if (message.message_type === 'FinalTranscript') {
handleFinalTranscript(message.text);
} else if (message.transcript) {
// Handle Turn events with transcript data
if (message.end_of_turn) {
handleFinalTranscript(message.transcript);
} else {
handlePartialTranscript(message.transcript);
}
}
};
socketRef.current.onerror = (error) => {
stopListening();
};
socketRef.current.onclose = (event) => {
setIsConnected(false);
};
By leveraging AssemblyAI's Universal-Streaming model, I was able to build a teleprompter that feels less like a machine and more like a self-paced and natural one.
Top comments (2)
Okay, this deserves attention!
good job!
Glad you liked it Nikoloz!