How To Fill Out A Form With Your Voice

#codepen #javascript #webdev #showdev

One of my friends is a dermatologist. He has a very busy schedule, seeing up to 60 patients a day. In order to save time, he approached me with a request:

Can you help me make a form, where you fill out the fields using speech recognition? Is that possible?

Yes, indeed it is, but the SpeechRecognition API is currently only working in Chrome and Edge (according to MDN, it should also work in Safari 14.1 — but I haven't tested that).

To get started is pretty straight-forward:



window.SpeechRecognition = window.SpeechRecognition || window.webkitSpeechRecognition;
if (('SpeechRecognition' in window || 'webkitSpeechRecognition' in window)) { /* It's supporpted! */ }

I've chosen to create a speech-object, that will hold all the stuff I need:



let speech = {
  enabled: true,
  listening: false,
  recognition: new window.SpeechRecognition(),
  text: ''
}

/* To allow to continously listen: */
speech.recognition.continuous = true;
/* To return interim results to a transcript area: */
speech.recognition.interimResults = true;
/* To set the language: */
speech.recognition.lang = 'en-US';

The main eventListener takes the first result of an array of results — and, if the activeElement is either an <input> or a <textarea>, sets the value of that field to the transcript:



speech.recognition.addEventListener('result', (event) => {
  const audio = event.results[event.results.length - 1];
  speech.text = audio[0].transcript;
  const tag = document.activeElement.nodeName;
  if (tag === 'INPUT' || tag === 'TEXTAREA') {
    if (audio.isFinal) {
      document.activeElement.value += speech.text;
    }
  }
  result.innerText = speech.text;
});

The toggle button is simply toggling a class, it's innerText, as well as triggering:



speech.recognition.start();
/* and */
speech.recognition.stop();

Now, we're ready to click the ”Toggle listening”-button, focus on a form-field, and start talking. Go to this Codepen demo — remember to allow your microphone to be used.

Pause a bit after a sentence, to allow the engine to process the audio and return a transcript.

There's a lot of room for improvement — maybe you could return a tag-cloud of transcripts, and then click-to-insert the text? What do you think?

To be honest, the SpeechRecognition API does feel a little bit shaky, but I'm sure it will improve in the future. I've tested with various languages, and can confirm it works pretty well with danish, english and lithuanian languages!

Thanks for reading!