🚀 Getting Started with Deepgram Nova-3 for Real-Time Speech-to-Text

#whisper #speechtotext #voicetech #ai

Introduction

Deepgram’s Nova-3 is the latest evolution in speech-to-text AI, offering real-time multilingual transcription, improved accuracy, and instant vocabulary updates. If you’re working on AI-driven transcription, you’ll want to explore this model.

At Transgate.ai, we took our first look at Nova-3, and the results were promising! (Check out our insights here: Transgate’s article).

In this guide, let’s get started with Nova-3 by setting up a simple transcription pipeline using Deepgram’s API.

Step 1: Set Up Your Deepgram API Key

First, sign up at Deepgram and grab your API key.

Then, install the Deepgram SDK:

npm install @deepgram/sdk

Step 2: Basic Real-Time Transcription

Using Node.js, we’ll create a simple WebSocket connection to transcribe live audio.

import { Deepgram } from '@deepgram/sdk';
import WebSocket from 'ws';
import fs from 'fs';

// Replace with your Deepgram API key
const deepgramApiKey = 'YOUR_DEEPGRAM_API_KEY';
const audioFile = 'sample.wav'; // Path to your audio file

const deepgram = new Deepgram(deepgramApiKey);

const ws = new WebSocket('wss://api.deepgram.com/v1/listen', {
  headers: { Authorization: `Token ${deepgramApiKey}` },
});

ws.on('open', () => {
  console.log('Connected to Deepgram WebSocket');
  const stream = fs.createReadStream(audioFile);
  stream.on('data', (chunk) => ws.send(chunk));
  stream.on('end', () => ws.close());
});

ws.on('message', (message) => {
  const transcript = JSON.parse(message);
  console.log('Transcript:', transcript.channel.alternatives[0].transcript);
});

ws.on('close', () => console.log('Connection closed'));

What This Script Does:

✅ Connects to Deepgram’s real-time transcription API

✅ Streams an audio file for processing

✅ Logs transcriptions in real time

Step 3: Customizing the Transcription

Nova-3 supports custom vocabulary and language models. To enhance accuracy, pass custom parameters like this:

const ws = new WebSocket('wss://api.deepgram.com/v1/listen?model=nova-3&language=en&keywords=AI,transcription');

This boosts accuracy for domain-specific terms like AI, medical jargon, or industry-specific words.

Final Thoughts

Deepgram’s Nova-3 is fast, multilingual, and highly customizable. It’s a powerful tool for anyone building real-time voice applications.

DEV Community