DEV Community

Cover image for Text to Speech Powered by Google Api
Opuama Lucky
Opuama Lucky

Posted on • Updated on

Text to Speech Powered by Google Api

Introduction

The Web Speech API is used to integrate voice data into web applications. It enables developers to generate text-to-speech output, continuous dictation, and control using scripting. Web pages can use the JavaScript API to control activation in order to handle results and alternatives.
In this tutorial, we will create a simple webpage and convert our text to speech using a Google API. You can check the browser compatibility for the Web Speech API Here

Text To Speech

Text-to-speech (TTS) is a type of assistive technology that reads digital text aloud. It's sometimes called “read aloud” technology. TTS can take words on a computer or other digital device and convert them into audio.

Prerequisites

  • Basic understanding of Html,Bootstrap and javascript
  • A code editor. I’ll be using Visual Studio Code
  • Web browser. I recommend using Google Chrome or Mozilla Firefox.

Important Factor

  • Speech Synthensis: is the artificial production of human speech. It is used to translate written information into aural information where it is more convenient, especially for mobile applications such as voice-enabled e-mail and Unified messaging.

  • Speech Synthensis Utterance: it represents a speech request. It contains the content the speech service should read and information about how to read it (e.g. language, pitch and volume.)

Building our Font-end
First, make a folder or directory Text-speech and create two new files called index.html and index.js. The body section includes a form, textarea, input, and a button.

<html lang="en">
<head>
    <meta charset="UTF-8">
    <meta http-equiv="X-UA-Compatible" content="IE=edge">
    <meta name="viewport" content="width=device-width, initial-scale=1.0">
    <link href="https://cdn.jsdelivr.net/npm/bootstrap@5.1.3/dist/css/bootstrap.min.css" rel="stylesheet" 
    integrity="sha384-1BmE4kWBq78iYhFldvKuhfTAU6auU8tT94WrHftjDbrCEXSU1oBoqyl2QvZ6jIW3" crossorigin="anonymous">
    <title>type and speak</title>
</head>

<body  class=" text-center d-flex justify-content-center ">
    <div class="container text-center">
        <img src="image/dul.png" class=" mb-5 col-4 opacity-90 ">  
        <div class="row">   
            <div class="col-md-6 mx-auto">
                <form> 
                    <div class="form-group">
                        <textarea name="" id="text-input"  class="form-control form-control-lg" placeholder="Type Anything Here... "></textarea>
                    </div>
                    <div class="form-group">
                        <label for="rate">Rate</label>
                        <div id="rate-value" class="badge badge-primary float-right">1</div><br>
                        <input type="range" id="rate" class="custom-range" min="0.5" max="2" value="1" step="0.1">
                    </div>
                    <div class="form-group">
                        <label for="pitch" >Pitch</label>
                        <div id="pitch-value" class="badge badge-primary float-right">1</div><br>
                        <input type="range" id="pitch" class="custom-range" min="0.5" max="2" value="1" step="0.1">
                    </div>
                    <div class="form-group">
                        <select  id="voice-select" class="form-control form-control-lg mb-2"></select>
                    </div>
                    <button class="btn btn-primary btn-lg btn-block" id="speak">speak it</button>
                    <button class="btn btn-warning btn-lg btn-block" id="pause">pause</button>
                </form>

        </div>      
    </div>
    <script src="https://cdn.jsdelivr.net/npm/@popperjs/core@2.10.2/dist/umd/popper.min.js"
     integrity="sha384-7+zCNj/IqJ95wo16oMtfsKbZ9ccEh31eOz1HGyDuCQ6wgnyJNSYdrPa03rtR1zdB" crossorigin="anonymous"></script>
<script src="https://cdn.jsdelivr.net/npm/bootstrap@5.1.3/dist/js/bootstrap.min.js" 
integrity="sha384-QJHtvGhmr9XOIpI6YVutG+2QOK9T+ZnN4kzFN1RtK3zEFEIsxhlmWl5/YESvpZ13" crossorigin="anonymous"></script>
<script src="https://cdn.jsdelivr.net/npm/bootstrap@5.1.3/dist/js/bootstrap.bundle.min.js"
 integrity="sha384-ka7Sk0Gln4gmtz2MlQnikT1wXgYsOg+OMhuP+IlRH9sENBO0LRn5q+8nbTov4+1p" crossorigin="anonymous"></script>
<script src="js/file.js"></script>

</body>
</html>
Enter fullscreen mode Exit fullscreen mode

Output

Image description

Next is our javascript.
We will go through the process step by step so that everyone understands. Enter the index.js file.
The first step is to initialize all variables required by all DOM elements involved in the front-end.
Then, set a variable synth = window.speechSynthesis.

//Dom Element
let textForm = document.querySelector("form");
let textInput = document.querySelector("#text-input");
let vioceSelect = document.querySelector("#voice-select");
let rate = document.querySelector("#rate");
let rateValue = document.querySelector("#rate-value");
let pitch = document.querySelector("#pitch");
let pitchValue = document.querySelector("#pitch-value");

let synth = window.speechSynthesis;

Enter fullscreen mode Exit fullscreen mode

Following that, we begin to synthesize voice. getVoice() returns a list of all available voices represented by Google Api SpeechSynthesisVoice objects. Then, if there are no voices, set a conditional statement to get the voicechanged listener.
Then, for each option, create data- attributes containing the name and language of the associated voice so that we can easily grab them later on, and append the options as children of the select.

let voices = synth.getVoices();
if (voices.length !== 0)  
} else {
  synth.addEventListener("voiceschanged", function () {
    voices = synth.getVoices();
    //lets loop through voice and create an option for each one
    voices.forEach((voice) => {
      //lets create option element
      let option = document.createElement("option");
      //lets fill option with voice and language
      option.textContent = voice.name + "(" + voice.lang + ")";
      //lets set needed option atributes
      option.setAttribute("data-lang", voice.lang);
      option.setAttribute("data-name", voice.name);
      vioceSelect.appendChild(option);
    });
  });
}

Enter fullscreen mode Exit fullscreen mode

List of voices generated

Image description)
Next, let's create a speak function called speak and add a conditional statement that says if textinput is not equal to zero, initailize speak text to new SpeechSynthesisUtterance and pass it to textInput.value.
Then, to end and detect errors, create speakText.onend and speakText.error functions respectively.
next, we add selectedVoice to the attribute data-name and loop through the voice, then set the pitch and rate.

let speak = () => {
  if (textInput.value !== "") {
    // get speak text
    speakText = new SpeechSynthesisUtterance(textInput.value);
    //speak end
    speakText.onend = (e) => {

    };
    //speak error
    speakText.error = (e) => {

    };
    //selected voice
    const selectedVoice =
      vioceSelect.selectedOptions[0].getAttribute("data-name");
    //lets loop through voices
    voices.forEach((voice) => {
      if (voice.name === selectedVoice) {
        speakText.voice = voice;
      }
    });
    // set pitch and rate
    speakText.rate = rate.value;
    speakText.pitch = pitch.value;
    synth.speak(speakText);
  }
};

Enter fullscreen mode Exit fullscreen mode

Let's add a submit listener to the submit button in order to activate the submit function when it is submitted.

textForm.addEventListener("submit", (e) => {
  e.preventDefault();
  speak();
  textInput.blur();
});
Enter fullscreen mode Exit fullscreen mode

Next add a 'change' listener to the rate and pitch range sliders and their properties as the slider's value changes. We've already specified the minimum, maximum, and default values for the slider in the HTML tag.

rate.addEventListener("change", (e) => {
  rateValue.textContent = rate.value;
});
pitch.addEventListener("change", (e) => {
  pitchValue.textContent = pitch.value;
});
Enter fullscreen mode Exit fullscreen mode

Lastly let's add a change listener to voiceSelect to activate the speak function when changed.

vioceSelect.addEventListener("change", (e) => {
  speak();
});
Enter fullscreen mode Exit fullscreen mode

Conclusion

We have now completed the tutorial on creating and converting text to speech using Google API. If you follow this tutorial from beginning to end, you should be able to get it right.

The tutorial's repo is available here. You can fork it and modify it to suit your needs.

Top comments (2)

Collapse
 
oricohen profile image
OriCohen05

Pretty nice and simple ,thank you.

Collapse
 
fortune42 profile image
Opuama Lucky

you welcome and thanks too