As technology advances, new problems come, so, new solutions are developed. Text and Voice solutions were created to help people with some type of disability, also to control devices using voice, to make translators, speech synthesizers, virtual assistants and many other use cases.
What are the types of voice technologies?
- Speech recognition
- Text-to-speech (speech synthesis)
These technologies use Natural Language Processing (NLP), a sub-field of artificial intelligence that consists of processing and manipulation of human language at different levels.
Speech recognition
Speech recognition is a sub-field of computer science that develops technologies that enable the recognition and translation of human language into text.
Speech recognition examples
- Siri: the virtual assistant from IOS;
- "Ok Google": when you search for something at Google just by using your voice;
- Automatically generated closed-caption subtitles for people with hearing impairments;
- Hands-free computing: control devices, apps and websites without using your hands.
Text-to-speech
Text-to-speech is the reverse of speech recognition, also known as speech synthesis, it is the artificial production of human speech. This means that, from a given text input, the computer should output an audio speaking the content of the input.
Text-to-speech examples
- Multimodal interfaces: Stephen Hawking used text-to-speech for many years of his life to communicate.
- Apps accessibility: Describe screen elements through text-to-speech for visually impaired people.
- Translators
Implementing Text-to-Speech in React-native
So, let's begin! For this implementation I will be using Expo, so first of all check if it is installed in your machine by using the following command:
expo --version
If it returns an error, you can install following this guide.
After that, let's create a React-native project, I will be using a Typescript template:
expo init mobile --template expo-template-blank-typescript
Setting up React-navigation
First let's install @react-navigation
library and its dependencies:
yarn add @react-navigation/native
react-native-screens
and react-native-safe-area-context
are required for @react-navigation
to work:
expo install react-native-screens react-native-safe-area-context
Now we just need to install the bottom-tabs navigator:
yarn add @react-navigation/bottom-tabs
To maintain the project organized, let's create an src
folder, with screens
and components
subfolders. The structure should be like this:
src
|-----screens
|-----components
App.tsx
Our navigator will be inside the components folder, so create the file TabNav.tsx
.
import { createBottomTabNavigator } from '@react-navigation/bottom-tabs';
import { Text } from 'react-native';
const Tab = createBottomTabNavigator();
export default function TabNav() {
return (
<Tab.Navigator
screenOptions={
{
headerShown: false,
tabBarIconStyle: {
display: 'none',
}
}
}
>
<Tab.Screen name="TTS" component={() => <Text>TTS SCREEN</Text>} />
<Tab.Screen name="ASR" component={() => <Text>ASR SCREEN</Text>} />
</Tab.Navigator>
);
}
The app will have only two screens, one for Text-to-Speech and other for Speech-Recognition. For now let's render only a Text component with the screen name.
Text-to-Speech screen
With the react-navigation configuration done, we can start to implement text-to-speech. As we are using expo, we can download Expo's official library for TTS (Text-to-Speech).
yarn add expo-speech
This is a very simple library, to use it let's create a file named text-to-speech.screen.tsx
where we will put the code for TTS.
import React, { useState } from "react";
import { Button, TextInput, View, StyleSheet } from "react-native";
import * as Speech from 'expo-speech';
export default function TextToSpeechScreen(): JSX.Element {
const [iptValue, setIptValue] = useState<string>('');
function speak (): void {
const thingToSay = iptValue;
Speech.speak(thingToSay, {
language: 'pt-BR',
});
};
return (
<View style={styles.container}>
<TextInput style={styles.input} placeholder="Type something..." value={iptValue} onChangeText={(text) => setIptValue(text)} />
<Button title="Listen" onPress={speak} />
</View>
);
}
const styles = StyleSheet.create({
container: {
flex: 1,
justifyContent: 'center',
backgroundColor: '#F5FCFF',
padding: 8,
},
input: {
height: 40,
borderColor: 'gray',
borderWidth: 1,
margin: 10,
padding: 10,
}
});
It's a very simple screen, just a TextInput and a Button centralized with some basic styles. You type something in the input and then when you hit the button, an action is triggered, so the function speak
is called.
Speech Recognition screen
To use speech recognition with React-native, there is a library called @react-native-voice/voice
, so let's add it to our project.
yarn add @react-native-voice/voice
This library was initially designed for react-native-cli projects, so to use it with Expo, we need to do some adaptations. So, open your app.json
file and add the following content inside the "expo"
property:
"plugins": [
[
"@react-native-voice/voice",
{
"microphonePermission": "Allow $(PRODUCT_NAME) to access your microphone",
"speechRecogntionPermission": "Allow $(PRODUCT_NAME) to securely recognize user speech"
}
]
]
Now, let's create a file named speech-recognition.screen.tsx
, this file will have the code to the screen implementing Speech Recognition. Add the following code to the file.
import React, { useState, useEffect } from "react";
import { Button, StyleSheet, Text, View } from "react-native";
import Voice, {
SpeechResultsEvent,
SpeechErrorEvent,
} from "@react-native-voice/voice";
export default function SpeechRecognitionScreen() {
const [results, setResults] = useState<string[]>([]);
const [isListening, setIsListening] = useState(false);
useEffect(() => {
function onSpeechResults(e: SpeechResultsEvent) {
setResults(e.value ?? []);
}
function onSpeechError(e: SpeechErrorEvent) {
console.error(e);
}
Voice.onSpeechError = onSpeechError;
Voice.onSpeechResults = onSpeechResults;
return function cleanup() {
Voice.destroy().then(Voice.removeAllListeners);
};
}, []);
async function toggleListening() {
try {
if (isListening) {
await Voice.stop();
setIsListening(false);
} else {
setResults([]);
await Voice.start("en-US");
setIsListening(true);
}
} catch (e) {
console.error(e);
}
}
return (
<View style={styles.container}>
<Text>Press the button and start speaking.</Text>
<Button
title={isListening ? "Stop Recognizing" : "Start Recognizing"}
onPress={toggleListening}
/>
<Text>Results:</Text>
{results.map((result, index) => {
return <Text key={`result-${index}`}>{result}</Text>;
})}
</View>
);
}
const styles = StyleSheet.create({
container: {
flex: 1,
justifyContent: "center",
alignItems: "center",
backgroundColor: "#F5FCFF",
},
});
The code above is very simple, we import what we will use from the @react-native-voice/voice
package. After that, a function is created to start listening when the button is hit by the user.
Before we test, let's just update our tab navigation file to render the screens we have just created.
Firstly, add the following imports on the top of the file:
import SpeechRecognitionScreen from '../screens/speech-recognition.screen';
import TextToSpeechScreen from '../screens/text-to-speech.screen';
And lastly, replace the content of the lines 10 and 11, for the following:
<Tab.Screen name="TTS" component={TextToSpeechScreen} />
<Tab.Screen name="ASR" component={SpeechRecognitionScreen} />
Conclusion
Our app is done, if you did everything correctly, your app should be something like that:
Also, feel free to check the code at github, and also contribute with new features if you want: repository link.
So, that's it! Text-to-Speech and Speech-Recognition are very useful technologies, that can be combined and integrated to many types of systems, including mobile apps. I know that the app isn't the most beautiful, but this is because this article is more focused on presenting TTS and ASR.
Top comments (3)
What do you mean by accuracy?
Hi there, thanks for such an amazing post! Is it possible to use speech recognition in background mode?
very interesting, I hope the more refined the better