DEV Community

Cover image for Building a Browser-Based Voice-to-Text App with the Web Speech API
Digiwares
Digiwares

Posted on

Building a Browser-Based Voice-to-Text App with the Web Speech API

Building a Browser-Based Voice-to-Text App with the Web Speech API

I recently built a voice-to-text tool that works entirely in the browser — no backend required for the core functionality. Here's what I learned about the Web Speech API and its quirks.

Why Browser-Based?

Privacy is the main sell. Audio never leaves the user's device. No uploads, no storage, no GDPR headaches. For a simple transcription tool, this is a huge advantage.

The Web Speech API Basics

The API is surprisingly simple:

const recognition = new (window.SpeechRecognition || window.webkitSpeechRecognition)();

recognition.continuous = true;
recognition.interimResults = true;
recognition.lang = 'en-US';

recognition.onresult = (event) => {
  const transcript = Array.from(event.results)
    .map(result => result[0].transcript)
    .join('');
  console.log(transcript);
};

recognition.start();
Enter fullscreen mode Exit fullscreen mode

That's it. You now have live speech-to-text.

The Gotchas Nobody Warns You About

1. Browser support is inconsistent

Chrome uses Google's servers (ironically, not fully local). Safari uses on-device processing. Firefox support is limited. Always check:

if (!('SpeechRecognition' in window || 'webkitSpeechRecognition' in window)) {
  // Show fallback UI
}
Enter fullscreen mode Exit fullscreen mode

2. It stops listening randomly

The API has a habit of stopping after silence. You need to restart it:

recognition.onend = () => {
  if (shouldKeepListening) {
    recognition.start();
  }
};
Enter fullscreen mode Exit fullscreen mode

3. Punctuation doesn't exist

The API returns raw words with no periods, commas, or capitalization. You'll need to handle this yourself:

function addAutoPunctuation(text) {
  // Add period after pause patterns
  // Capitalize after periods
  // Handle common patterns like "question mark" → "?"
}
Enter fullscreen mode Exit fullscreen mode

4. Language switching is manual

You need to build your own language selector and set recognition.lang accordingly. The API supports 100+ languages but won't auto-detect.

When to NOT Use Web Speech API

For anything beyond basic dictation, you'll hit walls:

  • Audio file transcription — API only does live mic input
  • Speaker identification — Not supported
  • Timestamps — Not provided
  • Accuracy requirements — Enterprise use cases need Whisper, AssemblyAI, or Deepgram

I ended up building a hybrid: free tier uses Web Speech API for live dictation, Pro tier uses Whisper for file uploads and higher accuracy.

Native Language SEO Bonus

One unexpected win: I built language-specific pages with native script UI. The Hindi page is actually in Hindi (हिंदी में वॉइस टू टेक्स्ट), not just "Hindi Voice to Text" in English.

Result: Started ranking for native-language searches with way less competition than English keywords.

Try It

I built this into voicetotextonline.com — free to use, no signup for basic transcription.

If you're building something similar, happy to answer questions in the comments.

Top comments (0)