Thirasha Praween

Posted on Aug 11, 2021 • Edited on Aug 15, 2021

Create a Synthesizes Natural Sounding Speech From Text Tool

#python #jupyter #ai #ibmwatson

You've probably used whatever text to speech tool at least once. So in this post, We'll create your own text-to-speech tool with an audio exporting feature using Python.

Basically, We'll use IBM Watson Text to Speech Machine learning model. IBM Watson helping enterprises put AI to work and helps organizations predict future outcomes, automate complex processes, and optimize employees’ time.

Register with IBM Cloud

To Getting started with the Text to Speech model, You have to register with IBM Cloud. Go to IBM Cloud and create a new free account.

After that, you have to create lite plan instances of the model. To create that, go to the Text to Speech model page and then create a free instance by clicking Create button.

Afterward, you'll see the getting started page. Go to the Manage page to get model credentials which are API key and URL. Now registration process is completed.

Usage

First, have to install the ibm_watson on your computer.

pip install ibm_watson

If you are using Jupyter Notebook, add an exclamation mark before the command to act as if it is executed in the terminal.

!pip install ibm_watson

Authenticate

Import TextToSpeech model, Watson authenticator and authenticate with API key and the URL.

from ibm_watson import TextToSpeechV1
from ibm_cloud_sdk_core.authenticators import IAMAuthenticator

Specify the API Key and URL

url = '<your-api-url>'
apiKey = '<your-api-key>'

authenticator = IAMAuthenticator(apiKey)
tts = TextToSpeechV1(authenticator=authenticator)
tts.set_service_url(url)

Setup Text to Speech

In this step, we'll look at how to speak a text from string and text files.

From String

with open('./speech.mp3', 'wb') as audio_file:
    res = tts.synthesize('Hello World! I\'m Thirasha', accept='audio/mp3', voice='en-US_AllisonV3Voice').get_result()
    audio_file.write(res.content)

In a while, it will generate that string to an audio file and export it as speech.mp3 at the root directory.

From Text File

with open('SpeechText.txt', 'r') as f:
    text = f.readlines()

Remove line breaks

text = [line.replace('\n', '') for line in text]
text = ''.join(str(line) for line in text)

Export audio file

with open('./speech.mp3', 'wb') as audio_file:
    res = tts.synthesize(text, accept='audio/mp3', voice='en-US_AllisonV3Voice').get_result()
    audio_file.write(res.content)

Change Language and Voice (Optional)

If you want to change the voice or language, refer to this IBM Languages and Voices documentation.

For example, If I have chosen the German female voice de-DE_BirgitV3Voice, that code should be change like this.

with open('./germanspeech.mp3', 'wb') as audio_file:
    res = tts.synthesize('Hallo Welt! Ich bin Thirasha', accept='audio/mp3', voice='de-DE_BirgitV3Voice').get_result()
    audio_file.write(res.content)

Eventually, You have created your own Speech-To-Text generating tool!🎉

The Next Generation Developer Platform

Coherence is the first Platform-as-a-Service you can control. Unlike "black-box" platforms that are opinionated about the infra you can deploy, Coherence is powered by CNC, the open-source IaC framework, which offers limitless customization.

Learn more

DEV Community

Create a Synthesizes Natural Sounding Speech From Text Tool

Register with IBM Cloud

Usage

Authenticate

Setup Text to Speech

From String

From Text File

Change Language and Voice (Optional)

The Next Generation Developer Platform

Top comments (0)

Timescale – the developer's data platform for modern apps, built on PostgreSQL

Read next

DroidSpeak: A Breakthrough in AI-to-AI Communication Speed Using Neural Caching

Simplifying Project Management with AI-Powered Task Generation

New ML Compiler Uses Pattern Matching to Speed Up AI Code, Verified with Formal Proofs

A beginner's guide to the Remove-Bg model by Lucataco on Replicate

Okay