Happy New Year, everyone! In this short tutorial, we will build a simple yet useful real-time speech-to-text web app using the Web Speech API. Feature-wise, it will be straightforward: click a button to start recording, and your speech will be converted to text, displayed in real-time on the screen. We'll also play with voice commands; saying "stop recording" will halt the recording. Sounds fun? Okay, let's get into it. 😊
Web Speech API Overview
The Web Speech API is a browser technology that enables developers to integrate speech recognition and synthesis capabilities into web applications. It opens up possibilities for creating hands-free and voice-controlled features, enhancing accessibility and user experience.
Some use cases for the Web Speech API include voice commands, voice-driven interfaces, transcription services, and more.
Let's Get Started
Now, let's dive into building our real-time speech-to-text web app. I'm going to use vite.js
to initiate the project, but feel free to use any build tool of your choice or none at all for this mini demo project.
- Create a new
vite
project:
npm create vite@latest
- Choose "Vanilla" on the next screen and "JavaScript" on the following one. Use arrow keys on your keyboard to navigate up and down.
HTML Structure
<!DOCTYPE html>
<html lang="en">
<head>
<meta charset="UTF-8" />
<link rel="icon" type="image/svg+xml" href="/vite.svg" />
<meta name="viewport" content="width=device-width, initial-scale=1.0" />
<script type="module" src="/main.js"></script>
<title>Real-time Speech to Text App</title>
</head>
<body>
<div class="container">
<h1>Real-time Stt App</h1>
<div class="btn-wrapper">
<button id="startBtn" class="btn-start">
<svg viewBox="0 0 100 100" class="hidden">
<!-- Outer circle -->
<circle
cx="50"
cy="50"
r="40"
stroke="#ccc"
stroke-width="5"
fill="none"
/>
<!-- Inner circle indicating recording -->
<circle
cx="50"
cy="50"
r="30"
stroke="#ccc"
stroke-width="5"
fill="none"
>
<animate
attributeName="r"
values="30; 25; 30"
dur="1.5s"
repeatCount="indefinite"
/>
</circle>
<!-- Record icon in the center -->
<circle cx="50" cy="50" r="5" fill="#ccc" />
</svg>
<span> Start Recording </span>
</button>
<button id="stopBtn" class="btn-stop" disabled>Stop Recording</button>
</div>
<div id="result" class="result"></div>
</div>
</body>
</html>
CSS Styling
:root {
font-family: Inter, system-ui, Avenir, Helvetica, Arial, sans-serif;
line-height: 1.5;
font-weight: 400;
font-synthesis: none;
text-rendering: optimizeLegibility;
-webkit-font-smoothing: antialiased;
-moz-osx-font-smoothing: grayscale;
}
* {
margin: 0;
padding: 0;
box-sizing: border-box;
}
body {
background: radial-gradient(
circle at 100%,
rgba(3, 6, 21, 0.9) 15%,
rgba(189, 205, 226, 0.5) 5%,
rgba(7, 9, 22, 0.9) 15%
),
url('./public/chevron.png') center/cover;
height: 100vh;
padding: 40px 0;
}
.container {
max-width: 1100px;
margin: 0 auto;
display: flex;
flex-direction: column;
align-items: center;
padding: 0 15px;
}
h1 {
color: #fff;
font-size: 1.5rem;
text-transform: uppercase;
}
.btn-wrapper {
margin-top: 20px;
display: flex;
flex-wrap: wrap;
justify-content: center;
align-items: center;
gap: 10px;
}
button {
display: flex;
align-items: center;
column-gap: 5px;
border: none;
cursor: pointer;
padding: 12px 24px;
border-radius: 3px;
font-weight: 600;
box-shadow: 0 0 10px rgba(0, 0, 0, 0.3);
transition: opacity 400ms ease-in-out;
}
button:disabled {
opacity: 0.47;
cursor: default;
}
button:hover:not(:disabled) {
opacity: 0.9;
}
button > svg {
height: 1rem;
}
.btn-start {
background-color: #ff2c4f;
color: #fff;
}
.btn-stop {
background-color: rgb(7, 2, 44);
color: #fff;
}
.result {
background-color: #fff;
width: 100%;
min-height: 200px;
padding: 10px;
border-radius: 3px;
margin-top: 20px;
box-shadow: 0 0 10px rgba(0, 0, 0, 0.3);
text-transform: capitalize;
}
.result:empty {
display: none;
}
.hidden {
display: none !important;
}
@media screen and (min-width: 768px) {
h1 {
font-size: 3.125rem;
text-transform: capitalize;
}
.container {
padding: 0 30px;
}
.result {
padding: 15px;
}
}
JavaScript Implementation
const resultElement = document.getElementById('result');
const startBtn = document.getElementById('startBtn');
const animatedSvg = startBtn.querySelector('svg');
const stopBtn = document.getElementById('stopBtn');
startBtn.addEventListener('click', startRecording);
stopBtn.addEventListener('click', stopRecording);
let recognition = window.SpeechRecognition || window.webkitSpeechRecognition;
if (recognition) {
recognition = new recognition();
recognition.continuous = true;
recognition.interimResults = true;
recognition.lang = 'en-US';
recognition.onstart = () => {
startBtn.disabled = true;
stopBtn.disabled = false;
animatedSvg.classList.remove('hidden');
console.log('Recording started');
};
recognition.onresult = function (event) {
let result = '';
for (let i = event.resultIndex; i < event.results.length; i++) {
if (event.results[i].isFinal) {
result += event.results[i][0].transcript + ' ';
} else {
result += event.results[i][0].transcript;
}
}
resultElement.innerText = result;
if (result.toLowerCase().includes('stop recording')) {
resultElement.innerText = result.replace(/stop recording/gi, '');
stopRecording();
}
};
recognition.onerror = function (event) {
startBtn.disabled = false;
stopBtn.disabled = true;
console.error('Speech recognition error:', event.error);
};
recognition.onend = function () {
startBtn.disabled = false;
stopBtn.disabled = true;
animatedSvg.classList.add('hidden');
console.log('Speech recognition ended');
};
} else {
console.error('Speech recognition not supported');
}
function startRecording() {
resultElement.innerText = '';
recognition.start();
}
function stopRecording() {
if (recognition) {
recognition.stop();
}
}
Conclusion
This simple web app utilizes the Web Speech API to convert spoken words into text in real-time. Users can start and stop recording with the provided buttons. Customize the design and functionalities further based on your project requirements.
Final demo: https://stt.nixx.dev
Feel free to explore the complete code on the GitHub repository.
Now, you have a basic understanding of how to create a real-time speech-to-text web app using the Web Speech API. Experiment with additional features and enhancements to make it even more versatile and user-friendly. 😊 🙏
Top comments (0)