DEV Community

Cover image for Amazon Polly - She's Holly Molly(AI on AWS series)
Jeya Shri
Jeya Shri

Posted on

Amazon Polly - She's Holly Molly(AI on AWS series)

Now that we have this AI on AWS series, we do not deal with text knowledge anymore, but provide applications with a voice. Voice interface has also been freed of virtual assistants. They can be employed broadly in learning platforms, accessibility tools, customer support platforms, navigation applications and content platforms.

This is made possible through Amazon Polly which is the AWS service that uses artificial intelligence to convert written text to natural-sounding speech.

What Amazon Polly is and what Issue it Resolves?

Amazon Polly is a text-to-speech tool, which converts text to natural sound. The conventional method of developing speech system involved voice recording of human voices, audio file management and dealing with various accents and language. This was costly, time consuming and not easily scalable.

Amazon Polly does not need to overcome these obstacles as it offers ready-to-use neural voices capable of speech-generating at any time. All developers do is to input text in Polly and get an audio stream back.

This enables dynamic generation of speech without audio files (audio files need not be stored and handled).

The Amazon Polly Magic Behind the Scenes

Sentences are put in Amazon Polly it goes through deep learning models that are trained on a variety of samples of human speech. These models comprehend sentence structure, pronunciation, stress and intonation.

Polly is a supporter of standard voices as well as a supporter of neural voices. More complex models are used in neural voices to generate a more natural and expressive speech, and they are more related to human narrations.

The product is produced in audio file or stream formats, e.g. MP3, WAV or OGG, which can be directly played in applications.

Assisted Language, Vocality and Speech Styles

Amazon Polly supports multiple languages and a large variety of voices, including the male and female ones in various regions. Other speech styles that are supported by some voices include the conversational, newscaster, or empathetic tones.

This elasticity enables the developers to select the voices that are most appropriate to their application. A simple example of this is that an educational application could employ a soft and clear voice, whereas a news application could implement a more commanding voice.

Application of Amazon Polly in the Real World

Amazon Polly is popularly applied in e-learning systems to turn learning content into audio lessons. The accessibility tools rely on Polly to read text to the visually impaired individuals. The customer support systems use Automated call flows to create spoken responses.

Polly is also used by content creators to create an audio version of blogs, articles, and notifications so that information is more accessible and attractive.

Exploring Amazon Polly by use of the AWS console

The AWS Console is a simple tool offering the opportunity to play with Amazon Polly.

Once you have navigated into the Polly service you can enter or paste text into the console, choose a language and voice and immediately hear the speech that has been generated. This practical method is used to make novices realize the impact of various voices and styles on speech production.

The generated audio can also be downloaded to the console, and tested further.

Python using Amazon Polly(Example)

An example in Python (transforming a text into an MP3 audio file with the help of Amazon Polly) is given below.

import boto3

polly = boto3.client('polly')

response = polly.synthesize_speech(
    Text="Welcome to the AI on AWS series. This audio was generated using Amazon Polly.",
    OutputFormat='mp3',
    VoiceId='Joanna'
)

with open('speech.mp3', 'wb') as file:
    file.write(response['AudioStream'].read())

print("Audio file generated successfully")
Enter fullscreen mode Exit fullscreen mode

This code transmits the text to Amazon Polly, where it gets an audio stream and is saved in the form of an MP3 file. Polly is simple in structure and can be simply incorporated into web and mobile applications as well as into the backend.

The Speech Synthesis Markup Language (SSML)

Amazon Polly also enables SSML whereby a developer has the control over pronunciation, pauses, pitch and speaking pace. Speech sound may be made more natural and expressive using SSML.

As an illustration, one may insert pauses between sentences, stress certain words, and even make certain words be pronounced differently.

It is a good level of control in narration, stories and teaching materials.

Streaming:

Amazon Polly will support streaming audio and file-based generation. Streaming can be used in applications that need to be real-time like chatbots or voice assistants. File based creation works better with pre recorded materials such as audiobooks or announcements.

This difference will enable developers to select the appropriate integration strategy depending on the needs of the applications.

Pricing and Cost factors

Amazon Polly pricing is determined by the amount of characters that are read out. Neural voice commands a higher price in comparison to regular voice, as it is a superior voice.

The free version is enough to explore the service to beginners and for small projects. When scaling applications, use of character is an important factor that needs to be monitored to prevent any surprises.

When is Amazon Polly the correct choice to use?

Amazon Polly is the best when the applications require dynamism, scalability, and natural sound. Specifically, it is handy in accessibility, education, and voice-based user experience.

In case the application needs some personal voice or a speech that is very personalized, some extra services or professional voice applications might be necessary. Polly is a good and powerful solution to most general use cases.

Conclusion

Amazon Polly shows that AI could be used to make applications more user-friendly by adding voice features. AWS allows developers to build voice-enabled systems with little to no effort by generalizing the complex speech synthesizer into a simple API.

To those who are just starting to learn, Amazon Polly can be a helpful entry point into the world of AI-based speech recognition and a significant milestone on the way toward making the applications more interactive and inclusive.

Leave your suggestions below! What do you think of polly?

Top comments (0)