Originally published on webdeasy.de!
With the JavaScript Speech Recoginition API you can implement quite simply ingenious functions that can quickly make native apps look old. This article will show you how to do this!
The JavaScript Speech Recoginition API allows us to access the visitor’s microphone and intercept and evaluate the speech inputs. And with it some cool things can be implemented: This can even go as far as your own AI! Or you build your own Amazon Echo (Alexa)? You have all possibilities. 🙂
Requirements
In order to use the Speech Recognition API, the browser must support JavaScript, which fortunately is now standard. Whereby there are actually people who block “the evil JavaScript”…and install extra add-ons on top of that. 🤯
In addition, the visitor must agree to the use of the microphone once. For this purpose, a pop-up will appear, which may look different depending on the operating system and browser. You can also allow the general use of the microphone on all websites in the browser settings.
How to use the Speech Recognition API
At the beginning, we define the interface that can be used by us. We have to do this, because not all browsers support this function. You can find the current status for browser support at Can I use.
window.SpeechRecognition = window.SpeechRecognition || window.webkitSpeechRecognition;
Now we create an instance of the SpeechRecognition
class. We set the parameter interimResults
to true
, so that we can retrieve text input during input and not only after the API has recognized the end of the speech input. This way we can already perform evaluation even though the sentence has not even been finished.
We also specify the language using the lang
parameter.
All events and parameters can also be read directly in the Web Speech API documentation.
// setup SpeechRecognation
const recognition = new SpeechRecognition();
recognition.interimResults = true;
recognition.lang = 'en-US';
Now everything is prepared and we can start to wait for voice inputs and evaluate them. The result
event is triggered when the API has recognized a complete input, e.g. when the user has finished his sentence and is taking a break.
In the transcript
variable we find the result. In line 6 the Boolean isFinal checks again whether the input was finished.
Optionally I added a query from line 10 on to check if an input starts with a certain word. The following demo is based on the same principle.
// waiting for speech results
recognition.addEventListener('result', event => {
const transcript = event.results[0][0].transcript;
// check if the voice input has ended
if(event.results[0].isFinal) {
console.log(transcript);
// check if the input starts with 'hello'
if(transcript.indexOf('hello') == 0) {
console.log('You said hello to somebody.');
}
}
});
Finally, we start the speech input with the .start()
function and call it when an input is finished. This way we achieve that the Speech Recognition API listens “permanently”.
recognition.addEventListener('end', recognition.start);
recognition.start();
You can change this so that listening is started e.g. when you click on a button – depending on what you want to do.
Example: Voice controlled ToDo List
I also tried a little bit with the Speech Recognition API and created a speech driven todo list with it. Using the same principle you can also build your own voice control. Try it yourself – you don’t need as much code as you might think at first!
Conclusion
I myself am a big fan of pure web applications and generally don’t need many native apps. The Speech Recognition API can make a big contribution to this. The implementation is – as you have seen – very simple. Which cool function do you want to implement with it? Please write it in the comments. 🙂
Top comments (6)
I read your article and I was inspired by all the possibilities. I've updated a project to use voice recognition to aid the websites main navigation.
Hello, I am building an mvc application to translate speech to text, my project continues trying to do different actions (build a video playlist) for each identified word.
Doing some tests with interimResults=true the console shows "well" "Wellco" "wellcome", which executes 3 actions to the server. Is there any way to identify when the speech engine already identified the complete word?
call function when identifying the whole word
Here is:
recognition.onresult = function (event) {
var final = "";
var interim = "";
for (var i = 0; i < event.results.length; ++i) {
if (event.results[i].isFinal) {
final += event.results[i][0].transcript;
textoFinal.innerHTML = final;
} else {
interim += event.results[i][0].transcript;
texto.innerHTML = interim;
}
}
}
I have made a project using speech recognition API, a year ago. But now I will put this idea into my ToDo List web app too.
Thank you. This is fun. I will try it out lol.
Would this work on Safari for iOS?
Unfortunately not yet. Look at Can I Use.