Victor Novais

Posted on Apr 22, 2022 • Edited on Apr 27, 2022

Creating a speech synthesizer using Web Speech API and React

#javascript #react #ai #webdev

Hello fellow readers!
In this post we are going to build a very simple speech synthesizer application using React and a browser built-in speech API.

Introduction

Some of you may not know that browsers even had a built-in speech API (at least I didn't), but here we are going to explore more of it. This API had it's first draft in 2012 and described the interfaces for speech recognition and synthesis. The updated draft can be found here if you are curious enough to read.

Below you can see the current browser support for this API. As you can see, it's pretty wide adopted (even in Safari!).

Here we are going to address only the synthesis API. In this sense, our application will have a text input for the user to type what should be spoken and a selector to pick the wanted voice.

Important note: during this article I will be using the Chrome browser. You may have different results if you are using another one.

The final result will be like this:

That being said, let's start!

The voice selector component

This component is a simple select element to allow the user to choose between the voices offered by the browser.
Let's begin with the API object itself. If you are using a browser that enables it, you may find this object globally in the window:



console.log(window.speechSynthesis)

Component structure

This component will basically hold a state to the voice list and a select element to choose between them.
The state is typed with SpeechSynthesisVoice, which is an object that has some properties regarding the voices offered by the browser, as: name, language and a default flag, that will be true to the default voice of your browser.

Let's start with the initial structure and we will be incrementing it later:



const synth = window.speechSynthesis;

const VoiceSelector = ({ selected = 0, setSelected }: VoiceSelectorProps) => {
  const [voices, setVoices] = useState<SpeechSynthesisVoice[]>([]);

  return (
    <select
      value={selected}
      onChange={(e) => setSelected(parseInt(e.target.value))}
    >
      {voices.map((voice, index) => (
        <option key={index} value={index}>
          {voice.name} ({voice.lang}) {voice.default && ' [Default]'}
        </option>
      ))}
    </select>
  );
};

export default VoiceSelector;

Getting the voice list

In this API, there is a specialized function to get the voices offered by the browser. You can check it out directly on your dev tools:



window.speechSynthesis.getVoices()

Let's change our component a little bit to init the state.
Here we will have a populateVoice function that calls the API function and sets the state. Then, we are going to call it in a useEffect.



const VoiceSelector = ({ selected = 0, setSelected }: VoiceSelectorProps) => {
  const [voices, setVoices] = useState<SpeechSynthesisVoice[]>([]);

  const populateVoiceList = useCallback(() => {
    const newVoices = synth.getVoices();
    setVoices(newVoices);
  }, []);

  useEffect(() => {
    populateVoiceList();
    if (synth.onvoiceschanged !== undefined) {
      synth.onvoiceschanged = populateVoiceList;
    }
  }, [populateVoiceList]);

  return (
    <select
      value={selected}
      onChange={(e) => setSelected(parseInt(e.target.value))}
    >
      {voices.map((voice, index) => (
        <option key={index} value={index}>
          {voice.name} ({voice.lang}) {voice.default && ' [Default]'}
        </option>
      ))}
    </select>
  );
};

You may ask yourself why don't we just init the state directly with the voices like this:



const [voices, setVoices] = useState<SpeechSynthesisVoice[]>(synth.getVoices());

There is a little gotcha on the approach above (if you try, you ought to see an empty select). According to Web Speech API Errata (E11 2013-10-17), the voices are loaded asynchronously. Therefore, there is an event called onvoiceschanged (which we are using) that is fired when the voices are ready (this behavior might be different from one browser to another).

You can learn more about this behavior here.

That's it for the voice selector component. Let's move next to the application itself.

The application component

Our application component will control the voice selector state and do the magic of converting a text into a speech.
Let's first start with the simple structure. It's a simple form with a text input, the voice selector and a submit button:



const synth = window.speechSynthesis;

const App = () => {
  const [textValue, setTextValue] = useState<string>('');
  const [selectedVoice, setSelectedVoice] = useState<number>(0);

  if (!synth)
    return <span>Aw... your browser does not support Speech Synthesis</span>;

  return (
    <form>
      <input
        type="text"
        value={textValue}
        onChange={(e) => setTextValue(e.target.value)}
      />
      <VoiceSelector selected={selectedVoice} setSelected={setSelectedVoice} />
      <button type="submit">Speak</button>
    </form>
  );
};

As you can see, the application holds two states:

textValue: controls the input value
selectedVoice: controls the selected voice

Also, I've put a security check to ensure the browser has the speech API.

Let's now attach the submit handler to the form. When the user submits it, the API must read the input's content and speak it with the selected voice. Check it out:



const speak = (e: FormEvent<HTMLFormElement>) => {
    e.preventDefault();

    const synth = window.speechSynthesis;
    const utterance = new SpeechSynthesisUtterance(textValue);

    // As the voices were already loaded in the voice selector
    // we don't need to use the onvoiceschanged event
    utterance.voice = synth.getVoices()[selectedVoice];

    synth.speak(utterance);
 };

Let's break it down:

First, we create a SpeechSynthesisUtterance object with the typed text as the constructor's argument.
Then, we plug-in the selected voice into the voice property of the newly created utterance object. Note that I just call the getVoices function with the selected voice index.
Lastly, but not less important, we call the speak function of the synthesis API. And voilà! Our synthesizer is ready.

Now we have our complete application component:



const synth = window.speechSynthesis;

const App = () => {
  const [textValue, setTextValue] = useState<string>('');
  const [selectedVoice, setSelectedVoice] = useState<number>(0);

  if (!synth)
    return <span>Aw... your browser does not support Speech Synthesis</span>;

  const speak = (e: FormEvent<HTMLFormElement>) => {
    e.preventDefault();

    const synth = window.speechSynthesis;
    const utterance = new SpeechSynthesisUtterance(textValue);
    utterance.voice = synth.getVoices()[selectedVoice];

    synth.speak(utterance);
  };

  return (
    <form onSubmit={speak}>
      <input
        type="text"
        value={textValue}
        onChange={(e) => setTextValue(e.target.value)}
      />
      <VoiceSelector selected={selectedVoice} setSelected={setSelectedVoice} />
      <button type="submit">Speak</button>
    </form>
  );
};

You can run this example here to test this and hear your browser speaking.

Other features

The synthesis API has some cool features that weren't exposed here, such as:

stop: you can stop the speak at any time!
pitch and rate: you can customize the pitch and rate of the speaking

You can learn more about these features and much more on mozilla's documentation.

Conclusion

This wraps up our adventure on the speech synthesis API world. I hope you all enjoyed it and if you have any doubts or opinion, please use the comment section below!

Top comments (1)

albir karim • May 31 '24 • Edited

Nice article, Btw i also make like that. in my library that i made. i combined the use of web speech api and the audio file to achieve good sound quality and better text to speech with also highlight the sentence and the word that are being spoken. Here is my research