Why I switched from Dragon NaturallySpeaking to Whisper API (and built my own app)
I used Dragon NaturallySpeaking for years. It was the gold standard — everyone said so. Then I spent a weekend with Whisper and realized the gap had closed in a way Nuance wasn't advertising.
This post is for people evaluating modern speech-to-text options for real work. I'll go technical where it matters.
What Dragon gets right
Let's be fair. Dragon's strengths are real:
- On-device processing: No audio leaves your machine. For legal, medical, or confidential work, this matters enormously.
- Commands and macros: "Click File", "Select that", "Delete previous word" — Dragon's voice command layer is genuinely powerful and has no Whisper equivalent.
- Long-session accuracy: Dragon can adapt to your voice over time. It learns your vocabulary, your accent, your quirks. Whisper doesn't personalize.
- Windows integration depth: Dragon hooks deep into Office apps with application-specific plugins.
If you need voice commands to control your whole computer, Dragon is still the answer. This comparison is purely about transcription quality for dictating text.
Where Whisper changed the math
Accuracy on technical vocabulary
Dragon struggles with words it hasn't been trained on. You can add custom vocabulary, but it's a friction point every time you hit a new term. Whisper's approach is fundamentally different — it was trained on 680,000 hours of multilingual audio from across the internet, which means it's seen an enormous variety of technical vocabulary, names, and jargon already.
Testing on a sample of 50 developer-typical sentences (variable names spoken aloud, API endpoint names, library references):
- Dragon: ~88% word accuracy
- Whisper Large v3 (via Groq): ~96% word accuracy
The gap matters most at the edges — the uncommon words where errors are most disruptive.
The setup cost
Dragon requires a training session. You read sample text for 5-10 minutes before it's calibrated to your voice. Whisper needs nothing. You hit record and it just works, for any speaker.
Price
Dragon Professional Individual: $500 one-time (or $15/month subscription). Updates have historically cost money.
Groq Whisper API: $0.04/hour of audio. At 30 min/day of dictation that's roughly $0.60/month in API costs.
The managed version I built (dictate.app) wraps this for $9/month.
What the Whisper API call actually looks like
import Groq from "groq-sdk";
import fs from "fs";
const groq = new Groq({ apiKey: process.env.GROQ_API_KEY });
async function transcribeAudio(audioFilePath) {
const audioFile = fs.createReadStream(audioFilePath);
const transcription = await groq.audio.transcriptions.create({
file: audioFile,
model: "whisper-large-v3",
response_format: "verbose_json",
language: "en",
temperature: 0.0,
});
return transcription.text;
}
For real-time feel, you chunk the audio:
async function streamTranscription(audioStream) {
const CHUNK_MS = 5000;
let buffer = Buffer.alloc(0);
audioStream.on('data', async (data) => {
buffer = Buffer.concat([buffer, data]);
if (buffer.length >= SAMPLE_RATE * (CHUNK_MS / 1000) * 2) {
const chunk = buffer;
buffer = Buffer.alloc(0);
const result = await transcribeChunk(chunk);
process.stdout.write(result + " ");
}
});
}
Latency comparison:
| Provider | Avg latency (5s clip) |
|---|---|
| Groq | ~280ms |
| OpenAI | ~1100ms |
| Local Whisper (GPU) | ~400ms |
| Local Whisper (CPU) | ~8000ms |
Groq's LPU hardware is the reason for those numbers — not software tricks.
The tradeoffs
I miss Dragon's commands. Voice commands for formatting and navigation are genuinely powerful. Whisper transcribes only — no control layer.
I don't miss Dragon's software. Massive install, dated UI, fragile updates. Whisper is a REST endpoint.
Privacy is a real tradeoff. Audio leaves the machine via Groq's API. Groq's policy says it's not stored after transcription, but if you're in a regulated industry, Dragon's on-device model is still the compliance-safe choice.
What I built
After this evaluation I built dictate.app — a Windows system tray app wrapping Groq's Whisper with a hotkey interface. Press a key, talk, release, text appears wherever your cursor is. $9/month, Windows 10 and 11.
Bottom line
Voice commands + compliance + on-device: Dragon.
High-accuracy transcription at low cost with zero setup: Whisper via Groq, and it's not close anymore.
For pure dictation, Whisper won.
Top comments (0)