DEV Community

How Minds Work
How Minds Work

Posted on

Why I switched from Dragon NaturallySpeaking to Whisper API (and built my own app)

Why I switched from Dragon NaturallySpeaking to Whisper API (and built my own app)

I used Dragon NaturallySpeaking for years. It was the gold standard — everyone said so. Then I spent a weekend with Whisper and realized the gap had closed in a way Nuance wasn't advertising.

This post is for people evaluating modern speech-to-text options for real work. I'll go technical where it matters.

What Dragon gets right

Let's be fair. Dragon's strengths are real:

  • On-device processing: No audio leaves your machine. For legal, medical, or confidential work, this matters enormously.
  • Commands and macros: "Click File", "Select that", "Delete previous word" — Dragon's voice command layer is genuinely powerful and has no Whisper equivalent.
  • Long-session accuracy: Dragon can adapt to your voice over time. It learns your vocabulary, your accent, your quirks. Whisper doesn't personalize.
  • Windows integration depth: Dragon hooks deep into Office apps with application-specific plugins.

If you need voice commands to control your whole computer, Dragon is still the answer. This comparison is purely about transcription quality for dictating text.

Where Whisper changed the math

Accuracy on technical vocabulary

Dragon struggles with words it hasn't been trained on. You can add custom vocabulary, but it's a friction point every time you hit a new term. Whisper's approach is fundamentally different — it was trained on 680,000 hours of multilingual audio from across the internet, which means it's seen an enormous variety of technical vocabulary, names, and jargon already.

Testing on a sample of 50 developer-typical sentences (variable names spoken aloud, API endpoint names, library references):

  • Dragon: ~88% word accuracy
  • Whisper Large v3 (via Groq): ~96% word accuracy

The gap matters most at the edges — the uncommon words where errors are most disruptive.

The setup cost

Dragon requires a training session. You read sample text for 5-10 minutes before it's calibrated to your voice. Whisper needs nothing. You hit record and it just works, for any speaker.

Price

Dragon Professional Individual: $500 one-time (or $15/month subscription). Updates have historically cost money.

Groq Whisper API: $0.04/hour of audio. At 30 min/day of dictation that's roughly $0.60/month in API costs.

The managed version I built (dictate.app) wraps this for $9/month.

What the Whisper API call actually looks like

import Groq from "groq-sdk";
import fs from "fs";

const groq = new Groq({ apiKey: process.env.GROQ_API_KEY });

async function transcribeAudio(audioFilePath) {
  const audioFile = fs.createReadStream(audioFilePath);

  const transcription = await groq.audio.transcriptions.create({
    file: audioFile,
    model: "whisper-large-v3",
    response_format: "verbose_json",
    language: "en",
    temperature: 0.0,
  });

  return transcription.text;
}
Enter fullscreen mode Exit fullscreen mode

For real-time feel, you chunk the audio:

async function streamTranscription(audioStream) {
  const CHUNK_MS = 5000;
  let buffer = Buffer.alloc(0);

  audioStream.on('data', async (data) => {
    buffer = Buffer.concat([buffer, data]);

    if (buffer.length >= SAMPLE_RATE * (CHUNK_MS / 1000) * 2) {
      const chunk = buffer;
      buffer = Buffer.alloc(0);

      const result = await transcribeChunk(chunk);
      process.stdout.write(result + " ");
    }
  });
}
Enter fullscreen mode Exit fullscreen mode

Latency comparison:

Provider Avg latency (5s clip)
Groq ~280ms
OpenAI ~1100ms
Local Whisper (GPU) ~400ms
Local Whisper (CPU) ~8000ms

Groq's LPU hardware is the reason for those numbers — not software tricks.

The tradeoffs

I miss Dragon's commands. Voice commands for formatting and navigation are genuinely powerful. Whisper transcribes only — no control layer.

I don't miss Dragon's software. Massive install, dated UI, fragile updates. Whisper is a REST endpoint.

Privacy is a real tradeoff. Audio leaves the machine via Groq's API. Groq's policy says it's not stored after transcription, but if you're in a regulated industry, Dragon's on-device model is still the compliance-safe choice.

What I built

After this evaluation I built dictate.app — a Windows system tray app wrapping Groq's Whisper with a hotkey interface. Press a key, talk, release, text appears wherever your cursor is. $9/month, Windows 10 and 11.

Bottom line

Voice commands + compliance + on-device: Dragon.

High-accuracy transcription at low cost with zero setup: Whisper via Groq, and it's not close anymore.

For pure dictation, Whisper won.

Top comments (0)