DEV Community

Speech to text in the browser with the Web Speech API

Phil Nash on February 10, 2020

The Web Speech API has two functions, speech synthesis, otherwise known as text to speech, and speech recognition, or speech to text. We previously...

Read full post

netsi1964 🙏🏻 • Feb 11 '20 • Edited

On iOS (and Android?) when you can input text using keyboard you already have the option to speak to text, simple by pressing the microphone button - just as a sidenote :-)

Phil Nash • Feb 11 '20

Oh but I bet we can build much more powerful and interesting interfaces if we have the control over the speech recognition rather than hoping for users to press that button!

I will be doing more work with this in the future, so keep an eye out for more posts on how we can work with this.

Pablo • Feb 11 '20

Very nice one, Phil!

Phil Nash • Feb 11 '20

Thanks Pablo! 😊

Mark Davies • Feb 11 '20

I really hope this becomes a thing detached from the browser, this would be a cool thing to have as an api you can plug an play anywher e you want it

Phil Nash • Feb 11 '20

Behind this browser API is the Google Cloud Speech API and there are plenty of other speech to text APIs available, like Amazon Transcribe, IBM Watson Speech to Text and Azure Speech-to-Text that you can use in any application to achieve this. For example, here's a blog post on how to use the Google Cloud Speech API to live translate a Twilio phone call.

The benefits of the browser API are that it is free, you have to pay for any of those APIs to use them directly, and you don't have to handle the streaming connection to the API, the browser does it for you.