DEV Community

Introducing Amazon Polly: How to Use Text-to-Speech service

In today's digital age, businesses are increasingly looking for ways to improve their customer experience and operational efficiency. One way to do this is through the use of natural language processing (NLP) services, such as Amazon Polly. Amazon Polly is a text-to-speech service that allows you to convert text into lifelike speech. In this blog post, we'll provide an overview of these these services and show you how to get started using it.

Overview of Amazon Polly

Amazon Polly is a powerful service that allows you to create lifelike speech from text. It supports a wide variety of languages and voices, and you can customize the pronunciation, pitch, and speed of the speech to suit your needs. This makes it an ideal choice for businesses that want to create personalized voice messages or voice-overs for videos.
Some of the key features and benefits of Amazon Polly include:

  • High-quality speech synthesis: Amazon Polly uses advanced deep learning algorithms to create lifelike speech that is almost indistinguishable from human speech.
  • Wide language support: Amazon Polly supports more than 60 languages and dialects, including English, Spanish, French, German, and Japanese.
  • Customizable speech: You can customize the pronunciation, pitch, and speed of the speech to suit your needs. This allows you to create a unique voice that represents your brand.
  • Easy integration: Amazon Polly is easy to integrate with your applications and workflows, thanks to its API and SDKs.

Getting started

From your AWS Console go to Security Credentials and create a user attaching the AmazonPollyFullAccess permission policy.

Once the user is created go to Security Credentials for this user and create access key. Copy the access and secret keys.

Now, use these keys in the Python Notebook:

import boto3
from contextlib import closing
import os
import sys
import subprocess

boto_session = boto3.Session(
    aws_access_key_id='YOUR_ACCESS_KEY',
    aws_secret_access_key='YOUR_SECRET_KEY',
    region_name='us-west-2')

# Create a new Polly client
polly_client = boto_session.client('polly')

# Set the text that you want to synthesize
text = "Hello, welcome to Amazon Polly!"

# Synthesize speech with Amazon Polly
response = polly_client.synthesize_speech(
    Text=text,
    OutputFormat="mp3",
    VoiceId="Joanna"
)

# Access the audio stream from the response
if "AudioStream" in response:
    with closing(response["AudioStream"]) as stream:
        output = os.path.join(os.getcwd(), "speech.mp3")

        try:
            # Open a file for writing the output as a binary stream
            with open(output, "wb") as file:
                file.write(stream.read())
        except IOError as error:
            # Could not write to file, exit gracefully
            print(error)
            sys.exit(-1)

else:
    # The response didn't contain audio data, exit gracefully
    print("Could not stream audio")
    sys.exit(-1)

# Play the audio using the platform's default player
if sys.platform == "win32":
    os.startfile(output)
else:
    # The following works on macOS and Linux. (Darwin = mac, xdg-open = linux).
    opener = "open" if sys.platform == "darwin" else "xdg-open"
    subprocess.call([opener, output])
Enter fullscreen mode Exit fullscreen mode

Conclusion

Amazon Polly is a powerful tool that can help you generate high-quality speech for your applications. With just a few lines of code, you can synthesize speech from text. These tool can be used to create voiceovers for videos, generate audio versions of articles or blog posts.

Thanks for reading

Thank you very much for reading. I hope you found this article interesting and may be useful in the future. If you have any questions or ideas you need to discuss, it will be a pleasure to collaborate and exchange knowledge.

Top comments (0)