DEV Community πŸ‘©β€πŸ’»πŸ‘¨β€πŸ’»

DEV Community πŸ‘©β€πŸ’»πŸ‘¨β€πŸ’» is a community of 964,423 amazing developers

We're a place where coders share, stay up-to-date and grow their careers.

Create account Log in
Cover image for Building a Voice Assistant using Web Speech API
Roopali Singh
Roopali Singh

Posted on • Updated on

Building a Voice Assistant using Web Speech API

Hi thereπŸ‘‹,

In this guide we will be learning how to integrate voice user interface in our web application.

We are working with React. To incorporate Voice User Interface (VUI) we will use Web Speech API.

For simplicity we will not be focusing on design.

Our aim is to build a voice assistant which will recognize what we say and answer accordingly.

Loudspeaker image

For this we are using Web Speech API.

This API allows fine control and flexibility over the speech recognition capabilities in Chrome version 25 and later.

The Web Speech API provides us with two functionality β€”

  • Speech Recognition which converts speech to text.
  • Speech Synthesis which converts text to speech.

1. We will start by installing two npm packages:

// for speech recognition
npm i react-speech-recognition
// for speech synthesis
npm i react-speech-kit
Enter fullscreen mode Exit fullscreen mode

Now before moving on to the next step, let's take a look at some important functions of Speech Recognition.

Detecting browser support for Web Speech API

if (!SpeechRecognition.browserSupportsSpeechRecognition()) {
    //Render some fallback function content
}
Enter fullscreen mode Exit fullscreen mode

Turning the microphone on

SpeechRecognition.startListening();
Enter fullscreen mode Exit fullscreen mode

Turning the microphone off

// It will first finish processing any speech in progress and
// then stop.
SpeechRecognition.stopListening();
// It will cancel the processing of any speech in progress.
SpeechRecognition.abortListening();
Enter fullscreen mode Exit fullscreen mode

Consuming the microphone transcript

// To make the microphone transcript available in our component.
const { transcript } = useSpeechRecognition();
Enter fullscreen mode Exit fullscreen mode

Resetting the microphone transcript

const { resetTranscript } = useSpeechRecognition();
Enter fullscreen mode Exit fullscreen mode

Now we're ready to add Speech Recognition (text to speech) in our web app πŸš€

2. In the App.js file, we will check the support for react-speech-recognition and add two components StartButton and Output.

The App.js file should look like this for now:

import React from "react";
import StartButton from "./StartButton";
import Output from "./Output";
import SpeechRecognition from "react-speech-recognition";

function App() {

// Checking the support
if (!SpeechRecognition.browserSupportsSpeechRecognition()) {
  return (
    <div>
      Browser does not support Web Speech API (Speech Recognition).
      Please download latest Chrome.
    </div>
  );
}

  return (
    <div className="App">
      <StartButton />
      <Output />
    </div>
  );
}

export default App;
Enter fullscreen mode Exit fullscreen mode

3. Next we will move to the StartButton.js file.

Here we will add a toggle button to start and stop listening.

import React, { useState } from "react";

function StartButton() {
  const [listen, setListen] = useState(false);

  const clickHandler = () => {
    if (listen === false) {
      SpeechRecognition.startListening({ continuous: true });
      setListen(true);
      // The default value for continuous is false, meaning that
      // when the user stops talking, speech recognition will end. 
    } else {
      SpeechRecognition.abortListening();
      setListen(false);
    }
  };

  return (
    <div>
      <button onClick={clickHandler}>
        <span>{listen ? "Stop Listening" : "Start Listening"} 
        </span>
      </button>
    </div>
  );
}

export default StartButton;
Enter fullscreen mode Exit fullscreen mode

4. Now in the Output.js file, we will use useSpeechRecognition react hook.

useSpeechRecognition gives a component access to a transcript of speech picked up from the user's microphone.

import React, { useState } from "react";
import { useSpeechRecognition } from "react-speech-recognition";

function Output() {
  const [outputMessage, setOutputMessage] = useState("");

  const commands = [
    // here we will write various different commands and
    // callback functions for their responses.
  ];

  const { transcript, resetTranscript } = 
                              useSpeechRecognition({ commands });

  return (
    <div>
      <p>{transcript}</p>
      <p>{outputMessage}</p>
    </div>
  );
}

export default Output;
Enter fullscreen mode Exit fullscreen mode

5. Before defining the commands, we will add Speech Synthesis in our web app to convert the outputMessage to speech.

In the App.js file, we will now check the support for the speech synthesis.

import { useSpeechSynthesis } from "react-speech-kit";

funtion App() {
  const { supported } = useSpeechSynthesis();

  if (supported == false) {
    return <div>
      Browser does not support Web Speech API (Speech Synthesis).
      Please download latest Chrome.
    </div>
}
.
.
.
export default App;
Enter fullscreen mode Exit fullscreen mode

6. Now in the Output.js file, we will use useSpeechSynthesis() react hook.

But before moving on, we first take a look at some important functions of Speech Synthesis:

  • speak(): Call to make the browser read some text.
  • cancel(): Call to make SpeechSynthesis stop reading.

We want to call the speak() function each time the outputMessage is changed.

So we would add the following lines of code in Output.js file:

import React, { useEffect, useState } from "react";
import { useSpeechSynthesis } from "react-speech-kit";

function Output() {
  const [outputMessage, setOutputMessage] = useState("");
  const { speak, cancel } = useSpeechSynthesis();

  // The speak() will get called each time outputMessage is changed 
  useEffect(() => {
      speak({
        text: outputMessage,
      });
  }, [outputMessage]);
.
.
.
export default Output;
}
Enter fullscreen mode Exit fullscreen mode

πŸ˜ƒWhoa!
Everything is now setup πŸ”₯
The only thing left is to define our commands πŸ‘©πŸŽ€

Naruto: Only commands left

7. Now we're back at our Output.js file to complete our commands.

const commands = [
  {
    // In this, the words that match the splat(*) will be passed
    // into the callback,

    command: "I am *",

    callback: (name) => {
      resetTranscript();
      setOutputMessage(`Hi ${name}. Nice name`);
    },
  },

  // DATE AND TIME
  {
    command: "What time is it",

    callback: () => {
      resetTranscript();
      setOutputMessage(new Date().toLocaleTimeString());
    },
    matchInterim: true,
    // The default value for matchInterim is false, meaning that
    // the only results returned by the recognizer are final and
    // will not change.
  },
  {
    // This example would match both:
    // 'What is the date' and 'What is the date today'

    command: 'What is the date (today)',

    callback: () => {
      resetTranscript();
      setOutputMessage(new Date().toLocaleDateString());
    },
  },

  // GOOGLING (search)
  {
    command: "Search * on google",

    callback: (gitem) => {
      resetTranscript();

      // function to google the query(gitem)
      function toGoogle() {
        window.open(`http://google.com/search?q=${gitem}`, "_blank");
      }
      toGoogle();

      setOutputMessage(`Okay. Googling ${gitem}`);
    },
  },

  // CALCULATIONS
  {
    command: "Add * and *",

    callback: (numa, numb) => {
      resetTranscript();
      const num1 = parseInt(numa, 10);
      const num2 = parseInt(numb, 10);
      setOutputMessage(`The answer is: ${num1 + num2}`);
    },
  },

  // CLEAR or STOP.
  {
    command: "clear",

    callback: () => {
      resetTranscript();
      cancel();
    },
    isFuzzyMatch: true,
    fuzzyMatchingThreshold: 0.2,

    // isFuzzyMatch is false by default.
    // It determines whether the comparison between speech and
    // command is based on similarity rather than an exact match.

    // fuzzyMatchingThreshold (default is 0.8) takes values between
    // 0 (will match anything) and 1 (needs an exact match).
    //  If the similarity of speech to command is higher than this
    // value, the callback will be invoked.
  },
]
Enter fullscreen mode Exit fullscreen mode

πŸ˜ƒWe have successfully built a voice assistant using the Web Speech API that do as we say πŸ”₯πŸ”₯

Note: As of May 2021, browsers support for Web Speech API:

  • Chrome (desktop)
  • Chrome (Android)
  • Safari 14.1
  • Microsoft Edge
  • Android webview
  • Samsung Internet

For all other browsers, you can integrate a polyfill.

Here's a demo that I have made with some styling:

I call it Aether

Completed

Top comments (4)

Collapse
 
drimdave profile image
Drimdave • Edited on

Hi hi,
I love this writeup. I'm currently working on something like this and will appreciate if I could have access to the source code so I could learn more and better.

Here is my email address just in case: oyeladedavid1@gmail.com

Collapse
 
roopalisingh profile image
Roopali Singh Author

Thank you so much. Your comment makes me wanna continue writing blogs.
Here's the link to the repo: voice-assistant

The main files you wanna check are: StartButton.js which starts or stops the speechRecognition
And Output.js which has all the functions in it.

Collapse
 
drimdave profile image
Drimdave

Thank you so much. I deeply appreciate this. Just in case I have further queries, how can I reach out to you?

Thread Thread
 
roopalisingh profile image
Roopali Singh Author

You could contact me on any of these or just comment here. I surely try to help if i could.

Here are my accounts:
LinkedIn: roopalisingh-rs/
Mail: roopali.singh.222@gmail.com

What image format should you use in your next project? πŸ€”