In this article, we will create a speech-to-text application with just a few lines of code using HTML, JavaScript, and CSS.
There are many tools that can change speech into text, but here, we will make our own speech-to-text app using HTML, JavaScript, and CSS. We’ll use a browser's SpeechRecognition API, which is already built into most web browsers like Chrome, Firefox, Safari, and Edge.
What is the SpeechRecognition API?
The SpeechRecognition API allows web applications to convert spoken words into text using a browser’s built-in speech recognition service. It supports real-time speech input, language selection, and various events to handle speech detection and results. For more detailed information, visit the MDN Web Docs on SpeechRecognition.
Let’s get started!
Design the structure of your app with an HTML layout.
First, create an HTML file, which will be the layout (structure) of our app. I’ll keep it simple so you can understand it easily. You can add extra design to make it look nicer if you like.
<html>
<head>
<title>Voice to Text</title>
</head>
<body>
<button id="startButton">Start Voice Input</button>
<button id="clear">Clear</button>
<select id="language">
<option value="en-US">English</option>
<option value="hi-IN">Hindi</option>
</select>
<div id="output" contentEditable="true">Hello</div>
</body>
</html>
We’ve made a simple layout that includes:
- A button to start voice input,
- A button to clear the text,
- A dropdown to choose a language,
- A box (div) to show the text output.
We’ve used the contentEditable attribute to make sure we can edit the text inside the box.
Also, each element has an id so we can easily refer to them later in our JavaScript code.
Pretty simple, right? But it doesn’t do anything yet. Let’s make it work using JavaScript!
Bring your app to life with JavaScript.
Add the following JavaScript code to your HTML file inside the <script>
tag, before the closing </body>
tag.
// Reference to the elements
const startButton = document.getElementById("startButton");
const outputDiv = document.getElementById("output");
const clearButton = document.getElementById("clear");
// Constants for the language and the default language
const LANG = "en-US";
// Event listeners for the clear button
clearButton.addEventListener("click", () => {
outputDiv.textContent = "";
});
// Create a new SpeechRecognition object
const recognition = new (window.SpeechRecognition ||
window.webkitSpeechRecognition ||
window.mozSpeechRecognition ||
window.msSpeechRecognition)();
// Set the language of the recognition
recognition.lang = LANG;
// Event listeners for the recognition
recognition.onresult = (event) => {
const transcript = event.results[0][0].transcript;
outputDiv.textContent += ` ${transcript}`;
};
// Event listeners for the start and end of the recognition
recognition.onstart = () => startButton.textContent = "Listening...";;
recognition.onend = () => startButton.textContent = "Start Voice Input";;
startButton.addEventListener("click", () => recognition.start());
function onLanguageChange() {
recognition.lang = document.getElementById("language").value;
}
In this code, we’ve added functionality to the buttons and the speech recognition object. Here’s how it works:
- When you click the start button, the recognition object begins listening to your speech.
- Once it hears speech, it converts it to text and adds it to the output box (div).
- The clear button removes all text from the output box.
We’ve also added a dropdown to choose the language. When you change the language, the recognition object updates to match your selection. Don’t forget to add the onLanguageChange function to the onchange event of the dropdown.
Enhancing Visual Appeal with CSS
Now, let’s make our app look nicer. Add the following CSS code to your HTML file inside the <style>
tag, before the closing </head>
tag.
<style>
* {
font-size: 16px;
}
body {
padding: 8px 16px;
font-family: sans-serif;
min-width: 90vw;
min-height: 90vh;
}
button, select, option {
padding: 8px;
margin: 8px;
width: 200px;
}
#output {
padding: 8px;
margin: 8px;
border: 1px solid #ccc;
border-radius: 8px;
min-height: 100%;
height: 85vh;
}
</style>
That’s it! You’ve created a speech-to-text app with just a few lines of code using HTML, JavaScript, and CSS. You can now run your app in a browser and start turning speech into text.
Here is the complete code for you to check out: Github Repository and Live Demo.
Feel free to change the layout and styles to match your preferences. You can also add more features, like saving the text to a file or sending it to a server. There are endless possibilities.
I hope this article was helpful! If you have any questions or feedback, leave a comment below. Happy coding!
Must Read If you haven't
How to create JSX template engine from scratch
Rahul Sharma ・ Nov 23 '22
React.js state management using signals
Rahul Sharma ・ Sep 21 '22
Simplify JavaScript's Async Concepts with One GIF
Rahul Sharma ・ Nov 2 '23
More content at Dev.to.
Catch me on
Youtube Github LinkedIn Medium Stackblitz Hashnode HackerNoon
Top comments (0)