Building a Real-Time Speech-to-Text Pipeline with Deepgram + Next.js

#javascript #programming #ai

Building a Real-Time Speech-to-Text Pipeline with Deepgram + Next.js

Real-time speech-to-text (STT) converts spoken audio into text as it is being spoken, with latency under 300 milliseconds. Deepgram Nova-2 model offers 98.7% accuracy for English at .0043 per minute - 3x cheaper than AWS Transcribe.

Prerequisites

Node.js 20+, Next.js 15, Deepgram API key (free tier: 45K minutes)

Step 1: Project Setup

ash npx create-next-app@latest stt-demo --typescript --tailwind --app cd stt-demo npm install @deepgram/sdk

Step 2: Backend WebSocket Route

` ypescript
import { createClient, LiveTranscriptionEvents } from "@deepgram/sdk";

const deepgram = createClient(process.env.DEEPGRAM_API_KEY);
const connection = deepgram.listen.live({
model: "nova-2",
language: "en",
smart_format: true,
interim_results: true,
vad_events: true,
endpointing: 300,
});

connection.on(LiveTranscriptionEvents.Transcript, (data) => {
const transcript = data.channel.alternatives[0]?.transcript;
if (transcript) console.log("Transcript:", transcript);
});
`

Step 3: Browser Audio Capture

ypescript const stream = await navigator.mediaDevices.getUserMedia({ audio: { sampleRate: 16000, channelCount: 1, echoCancellation: true } }); const mediaRecorder = new MediaRecorder(stream, { mimeType: "audio/webm;codecs=opus" }); mediaRecorder.ondataavailable = (event) => { if (event.data.size > 0) sendToWebSocket(event.data); }; mediaRecorder.start(100);

Step 4: Production Optimizations

Connection recovery with exponential backoff
Audio buffering during reconnection
Multi-language support (language: "auto" for 36 languages)

At AissenceAI, we use this pipeline to power real-time interview transcription in 42 languages.

Our live coaching feature uses Voice Activity Detection to detect when the interviewer stops speaking.

See this in action at aissence.ai.

DEV Community

Building a Real-Time Speech-to-Text Pipeline with Deepgram + Next.js

Building a Real-Time Speech-to-Text Pipeline with Deepgram + Next.js

Prerequisites

Step 1: Project Setup

Step 2: Backend WebSocket Route

Step 3: Browser Audio Capture

Step 4: Production Optimizations

Top comments (0)