DEV Community

Amazone Polly Overview

What is Polly

Polly is the opposite of Transcribe, because it generates an audio.
You turn text into speech using deep learning.
This allows you to create applications that will talk.

What Amazon Polly can do

It can use Lexicon and SSML.

Lexicon is to customize the pronunciation of words with Pronunciation lexicons.
For example, if there is a stylized word such as Ayad but the "A" is a "3", the Amazon Polly image might say "3yad" which is not how it should be pronounced; it should be pronounced "Ayad", and you can create a lexicon for this.

Or for example, for acronyms, for example, any time it sees AWS, instead of saying "A-W-S" it should say the full "Amazon Web Services."
So then you upload the lexicons and you use them in the Synthesize Speech operation.

The second feature you need to know about is the SSML feature, which is called Speech Synthesis Markup Language, and this enables more customization to how speech is made.
So you can, for example, emphasize on specific words or phrases, or you use phonetic pronunciation, or you want to include breathing sounds or whispering, or you want to use the Newscaster speaking style.

So all of it can be used using this Markup Language, and so instead of generating the speech from plain text you can include a whisper and it will start whispering, and so on, okay?

So, remember, for pronunciation of stylized words or acronyms, use Pronunciation lexicons, and for more customization on how words are being pronounced, for example, whispering or phonetic pronunciation, and so on, then use the SSML Markup Language.

GitHub
LinkedIn
Facebook
Medium

Top comments (0)