One of my friends is a dermatologist. He has a very busy schedule, seeing up to 60 patients a day. In order to save time, he approached me with a request:
Can you help me make a form, where you fill out the fields using speech recognition? Is that possible?
Yes, indeed it is, but the SpeechRecognition
API is currently only working in Chrome and Edge (according to MDN, it should also work in Safari 14.1 — but I haven't tested that).
To get started is pretty straight-forward:
window.SpeechRecognition = window.SpeechRecognition || window.webkitSpeechRecognition;
if (('SpeechRecognition' in window || 'webkitSpeechRecognition' in window)) { /* It's supporpted! */ }
I've chosen to create a speech
-object, that will hold all the stuff I need:
let speech = {
enabled: true,
listening: false,
recognition: new window.SpeechRecognition(),
text: ''
}
/* To allow to continously listen: */
speech.recognition.continuous = true;
/* To return interim results to a transcript area: */
speech.recognition.interimResults = true;
/* To set the language: */
speech.recognition.lang = 'en-US';
The main eventListener
takes the first result of an array of results
— and, if the activeElement
is either an <input>
or a <textarea>
, sets the value
of that field to the transcript
:
speech.recognition.addEventListener('result', (event) => {
const audio = event.results[event.results.length - 1];
speech.text = audio[0].transcript;
const tag = document.activeElement.nodeName;
if (tag === 'INPUT' || tag === 'TEXTAREA') {
if (audio.isFinal) {
document.activeElement.value += speech.text;
}
}
result.innerText = speech.text;
});
The toggle button is simply toggling a class, it's innerText
, as well as triggering:
speech.recognition.start();
/* and */
speech.recognition.stop();
Now, we're ready to click the ”Toggle listening”-button, focus on a form-field, and start talking. Go to this Codepen demo — remember to allow your microphone to be used.
Pause a bit after a sentence, to allow the engine to process the audio and return a transcript.
There's a lot of room for improvement — maybe you could return a tag-cloud
of transcripts, and then click-to-insert the text? What do you think?
To be honest, the SpeechRecognition API
does feel a little bit shaky, but I'm sure it will improve in the future. I've tested with various languages, and can confirm it works pretty well with danish, english and lithuanian languages!
Thanks for reading!
Note: Due to browser security-restrictions, the Codepen demo doesn't work when embedded.
Top comments (2)
Really nice article!
Thx!