Introduction
The Web Speech API is used to integrate voice data into web applications. It enables developers to generate text-to-speech output, continuous dictation, and control using scripting. Web pages can use the JavaScript API to control activation in order to handle results and alternatives.
In this tutorial, we will create a simple webpage and convert our text to speech using a Google API. You can check the browser compatibility for the Web Speech API Here
Text To Speech
Text-to-speech (TTS) is a type of assistive technology that reads digital text aloud. It's sometimes called “read aloud” technology. TTS can take words on a computer or other digital device and convert them into audio.
Prerequisites
- Basic understanding of Html,Bootstrap and javascript
- A code editor. I’ll be using Visual Studio Code
- Web browser. I recommend using Google Chrome or Mozilla Firefox.
Important Factor
Speech Synthensis: is the artificial production of human speech. It is used to translate written information into aural information where it is more convenient, especially for mobile applications such as voice-enabled e-mail and Unified messaging.
Speech Synthensis Utterance: it represents a speech request. It contains the content the speech service should read and information about how to read it (e.g. language, pitch and volume.)
Building our Font-end
First, make a folder or directory Text-speech and create two new files called index.html and index.js. The body section includes a form, textarea, input, and a button.
<html lang="en">
<head>
<meta charset="UTF-8">
<meta http-equiv="X-UA-Compatible" content="IE=edge">
<meta name="viewport" content="width=device-width, initial-scale=1.0">
<link href="https://cdn.jsdelivr.net/npm/bootstrap@5.1.3/dist/css/bootstrap.min.css" rel="stylesheet"
integrity="sha384-1BmE4kWBq78iYhFldvKuhfTAU6auU8tT94WrHftjDbrCEXSU1oBoqyl2QvZ6jIW3" crossorigin="anonymous">
<title>type and speak</title>
</head>
<body class=" text-center d-flex justify-content-center ">
<div class="container text-center">
<img src="image/dul.png" class=" mb-5 col-4 opacity-90 ">
<div class="row">
<div class="col-md-6 mx-auto">
<form>
<div class="form-group">
<textarea name="" id="text-input" class="form-control form-control-lg" placeholder="Type Anything Here... "></textarea>
</div>
<div class="form-group">
<label for="rate">Rate</label>
<div id="rate-value" class="badge badge-primary float-right">1</div><br>
<input type="range" id="rate" class="custom-range" min="0.5" max="2" value="1" step="0.1">
</div>
<div class="form-group">
<label for="pitch" >Pitch</label>
<div id="pitch-value" class="badge badge-primary float-right">1</div><br>
<input type="range" id="pitch" class="custom-range" min="0.5" max="2" value="1" step="0.1">
</div>
<div class="form-group">
<select id="voice-select" class="form-control form-control-lg mb-2"></select>
</div>
<button class="btn btn-primary btn-lg btn-block" id="speak">speak it</button>
<button class="btn btn-warning btn-lg btn-block" id="pause">pause</button>
</form>
</div>
</div>
<script src="https://cdn.jsdelivr.net/npm/@popperjs/core@2.10.2/dist/umd/popper.min.js"
integrity="sha384-7+zCNj/IqJ95wo16oMtfsKbZ9ccEh31eOz1HGyDuCQ6wgnyJNSYdrPa03rtR1zdB" crossorigin="anonymous"></script>
<script src="https://cdn.jsdelivr.net/npm/bootstrap@5.1.3/dist/js/bootstrap.min.js"
integrity="sha384-QJHtvGhmr9XOIpI6YVutG+2QOK9T+ZnN4kzFN1RtK3zEFEIsxhlmWl5/YESvpZ13" crossorigin="anonymous"></script>
<script src="https://cdn.jsdelivr.net/npm/bootstrap@5.1.3/dist/js/bootstrap.bundle.min.js"
integrity="sha384-ka7Sk0Gln4gmtz2MlQnikT1wXgYsOg+OMhuP+IlRH9sENBO0LRn5q+8nbTov4+1p" crossorigin="anonymous"></script>
<script src="js/file.js"></script>
</body>
</html>
Output
Next is our javascript.
We will go through the process step by step so that everyone understands. Enter the index.js file.
The first step is to initialize all variables required by all DOM elements involved in the front-end.
Then, set a variable synth = window.speechSynthesis.
//Dom Element
let textForm = document.querySelector("form");
let textInput = document.querySelector("#text-input");
let vioceSelect = document.querySelector("#voice-select");
let rate = document.querySelector("#rate");
let rateValue = document.querySelector("#rate-value");
let pitch = document.querySelector("#pitch");
let pitchValue = document.querySelector("#pitch-value");
let synth = window.speechSynthesis;
Following that, we begin to synthesize voice. getVoice() returns a list of all available voices represented by Google Api SpeechSynthesisVoice objects. Then, if there are no voices, set a conditional statement to get the voicechanged listener.
Then, for each option, create data- attributes containing the name and language of the associated voice so that we can easily grab them later on, and append the options as children of the select.
let voices = synth.getVoices();
if (voices.length !== 0)
} else {
synth.addEventListener("voiceschanged", function () {
voices = synth.getVoices();
//lets loop through voice and create an option for each one
voices.forEach((voice) => {
//lets create option element
let option = document.createElement("option");
//lets fill option with voice and language
option.textContent = voice.name + "(" + voice.lang + ")";
//lets set needed option atributes
option.setAttribute("data-lang", voice.lang);
option.setAttribute("data-name", voice.name);
vioceSelect.appendChild(option);
});
});
}
List of voices generated
)
Next, let's create a speak function called speak and add a conditional statement that says if textinput is not equal to zero, initailize speak text to new SpeechSynthesisUtterance and pass it to textInput.value.
Then, to end and detect errors, create speakText.onend and speakText.error functions respectively.
next, we add selectedVoice to the attribute data-name and loop through the voice, then set the pitch and rate.
let speak = () => {
if (textInput.value !== "") {
// get speak text
speakText = new SpeechSynthesisUtterance(textInput.value);
//speak end
speakText.onend = (e) => {
};
//speak error
speakText.error = (e) => {
};
//selected voice
const selectedVoice =
vioceSelect.selectedOptions[0].getAttribute("data-name");
//lets loop through voices
voices.forEach((voice) => {
if (voice.name === selectedVoice) {
speakText.voice = voice;
}
});
// set pitch and rate
speakText.rate = rate.value;
speakText.pitch = pitch.value;
synth.speak(speakText);
}
};
Let's add a submit listener to the submit button in order to activate the submit function when it is submitted.
textForm.addEventListener("submit", (e) => {
e.preventDefault();
speak();
textInput.blur();
});
Next add a 'change' listener to the rate and pitch range sliders and their properties as the slider's value changes. We've already specified the minimum, maximum, and default values for the slider in the HTML tag.
rate.addEventListener("change", (e) => {
rateValue.textContent = rate.value;
});
pitch.addEventListener("change", (e) => {
pitchValue.textContent = pitch.value;
});
Lastly let's add a change listener to voiceSelect to activate the speak function when changed.
vioceSelect.addEventListener("change", (e) => {
speak();
});
Conclusion
We have now completed the tutorial on creating and converting text to speech using Google API. If you follow this tutorial from beginning to end, you should be able to get it right.
The tutorial's repo is available here. You can fork it and modify it to suit your needs.
Top comments (2)
Pretty nice and simple ,thank you.
you welcome and thanks too