DEV Community

Cover image for Speech to text in the browser with the Web Speech API

Speech to text in the browser with the Web Speech API

Phil Nash on February 10, 2020

The Web Speech API has two functions, speech synthesis, otherwise known as text to speech, and speech recognition, or speech to text. We previously...
Collapse
 
netsi1964 profile image
netsi1964 🙏🏻 • Edited

On iOS (and Android?) when you can input text using keyboard you already have the option to speak to text, simple by pressing the microphone button - just as a sidenote :-)

the microphone button on ios

Collapse
 
philnash profile image
Phil Nash

Oh but I bet we can build much more powerful and interesting interfaces if we have the control over the speech recognition rather than hoping for users to press that button!

I will be doing more work with this in the future, so keep an eye out for more posts on how we can work with this.

Collapse
 
joro550 profile image
Mark Davies

I really hope this becomes a thing detached from the browser, this would be a cool thing to have as an api you can plug an play anywher e you want it

Collapse
 
philnash profile image
Phil Nash

Behind this browser API is the Google Cloud Speech API and there are plenty of other speech to text APIs available, like Amazon Transcribe, IBM Watson Speech to Text and Azure Speech-to-Text that you can use in any application to achieve this. For example, here's a blog post on how to use the Google Cloud Speech API to live translate a Twilio phone call.

The benefits of the browser API are that it is free, you have to pay for any of those APIs to use them directly, and you don't have to handle the streaming connection to the API, the browser does it for you.

Collapse
 
pablowbk profile image
Pablo

Very nice one, Phil!

Collapse
 
philnash profile image
Phil Nash

Thanks Pablo! 😊