DEV Community

Cover image for Create a Synthesizes Natural Sounding Speech From Text Tool
Thirasha Praween
Thirasha Praween

Posted on • Edited on

Create a Synthesizes Natural Sounding Speech From Text Tool

You've probably used whatever text to speech tool at least once. So in this post, We'll create your own text-to-speech tool with an audio exporting feature using Python.

Basically, We'll use IBM Watson Text to Speech Machine learning model. IBM Watson helping enterprises put AI to work and helps organizations predict future outcomes, automate complex processes, and optimize employeesโ€™ time.

Register with IBM Cloud

To Getting started with the Text to Speech model, You have to register with IBM Cloud. Go to IBM Cloud and create a new free account.

After that, you have to create lite plan instances of the model. To create that, go to the Text to Speech model page and then create a free instance by clicking Create button.

Afterward, you'll see the getting started page. Go to the Manage page to get model credentials which are API key and URL. Now registration process is completed.

Model Credential Page

Usage

First, have to install the ibm_watson on your computer.

pip install ibm_watson
Enter fullscreen mode Exit fullscreen mode

If you are using Jupyter Notebook, add an exclamation mark before the command to act as if it is executed in the terminal.

!pip install ibm_watson
Enter fullscreen mode Exit fullscreen mode

Authenticate

Import TextToSpeech model, Watson authenticator and authenticate with API key and the URL.

from ibm_watson import TextToSpeechV1
from ibm_cloud_sdk_core.authenticators import IAMAuthenticator
Enter fullscreen mode Exit fullscreen mode

Specify the API Key and URL

url = '<your-api-url>'
apiKey = '<your-api-key>'
Enter fullscreen mode Exit fullscreen mode
authenticator = IAMAuthenticator(apiKey)
tts = TextToSpeechV1(authenticator=authenticator)
tts.set_service_url(url)
Enter fullscreen mode Exit fullscreen mode

Setup Text to Speech

In this step, we'll look at how to speak a text from string and text files.

From String

with open('./speech.mp3', 'wb') as audio_file:
    res = tts.synthesize('Hello World! I\'m Thirasha', accept='audio/mp3', voice='en-US_AllisonV3Voice').get_result()
    audio_file.write(res.content)
Enter fullscreen mode Exit fullscreen mode

In a while, it will generate that string to an audio file and export it as speech.mp3 at the root directory.

From Text File

with open('SpeechText.txt', 'r') as f:
    text = f.readlines()
Enter fullscreen mode Exit fullscreen mode

Remove line breaks

text = [line.replace('\n', '') for line in text]
text = ''.join(str(line) for line in text)
Enter fullscreen mode Exit fullscreen mode

Export audio file

with open('./speech.mp3', 'wb') as audio_file:
    res = tts.synthesize(text, accept='audio/mp3', voice='en-US_AllisonV3Voice').get_result()
    audio_file.write(res.content)
Enter fullscreen mode Exit fullscreen mode

Change Language and Voice (Optional)

If you want to change the voice or language, refer to this IBM Languages and Voices documentation.

For example, If I have chosen the German female voice de-DE_BirgitV3Voice, that code should be change like this.

with open('./germanspeech.mp3', 'wb') as audio_file:
    res = tts.synthesize('Hallo Welt! Ich bin Thirasha', accept='audio/mp3', voice='de-DE_BirgitV3Voice').get_result()
    audio_file.write(res.content)
Enter fullscreen mode Exit fullscreen mode

Eventually, You have created your own Speech-To-Text generating tool!๐ŸŽ‰

Top comments (0)