Digiwares

Posted on Dec 12, 2025

Building a Browser-Based Voice-to-Text App with the Web Speech API

#webapi #javascript #productivity #speechtotext

Building a Browser-Based Voice-to-Text App with the Web Speech API

I recently built a voice-to-text tool that works entirely in the browser — no backend required for the core functionality. Here's what I learned about the Web Speech API and its quirks.

Why Browser-Based?

Privacy is the main sell. Audio never leaves the user's device. No uploads, no storage, no GDPR headaches. For a simple transcription tool, this is a huge advantage.

The Web Speech API Basics

The API is surprisingly simple:

const recognition = new (window.SpeechRecognition || window.webkitSpeechRecognition)();

recognition.continuous = true;
recognition.interimResults = true;
recognition.lang = 'en-US';

recognition.onresult = (event) => {
  const transcript = Array.from(event.results)
    .map(result => result[0].transcript)
    .join('');
  console.log(transcript);
};

recognition.start();

That's it. You now have live speech-to-text.

The Gotchas Nobody Warns You About

1. Browser support is inconsistent

Chrome uses Google's servers (ironically, not fully local). Safari uses on-device processing. Firefox support is limited. Always check:

if (!('SpeechRecognition' in window || 'webkitSpeechRecognition' in window)) {
  // Show fallback UI
}

2. It stops listening randomly

The API has a habit of stopping after silence. You need to restart it:

recognition.onend = () => {
  if (shouldKeepListening) {
    recognition.start();
  }
};

3. Punctuation doesn't exist

The API returns raw words with no periods, commas, or capitalization. You'll need to handle this yourself:

function addAutoPunctuation(text) {
  // Add period after pause patterns
  // Capitalize after periods
  // Handle common patterns like "question mark" → "?"
}

4. Language switching is manual

You need to build your own language selector and set recognition.lang accordingly. The API supports 100+ languages but won't auto-detect.

When to NOT Use Web Speech API

For anything beyond basic dictation, you'll hit walls:

Audio file transcription — API only does live mic input
Speaker identification — Not supported
Timestamps — Not provided
Accuracy requirements — Enterprise use cases need Whisper, AssemblyAI, or Deepgram

I ended up building a hybrid: free tier uses Web Speech API for live dictation, Pro tier uses Whisper for file uploads and higher accuracy.

Native Language SEO Bonus

One unexpected win: I built language-specific pages with native script UI. The Hindi page is actually in Hindi (हिंदी में वॉइस टू टेक्स्ट), not just "Hindi Voice to Text" in English.

Result: Started ranking for native-language searches with way less competition than English keywords.

Try It

I built this into voicetotextonline.com — free to use, no signup for basic transcription.

If you're building something similar, happy to answer questions in the comments.

DEV Community

Building a Browser-Based Voice-to-Text App with the Web Speech API

Building a Browser-Based Voice-to-Text App with the Web Speech API

Why Browser-Based?

The Web Speech API Basics

The Gotchas Nobody Warns You About

1. Browser support is inconsistent

2. It stops listening randomly

3. Punctuation doesn't exist

4. Language switching is manual

When to NOT Use Web Speech API

Native Language SEO Bonus

Try It

Top comments (0)