Adjust speaking rate(SSML) via cURL POST

#shortposts #codenewbie #tts #speakingrate

Stackoverflow is an ocean for learning and exploring. How? Try answering a question and you will understand :)

A few weeks ago, I saw this question on StackOverflow -

How to adjust the speaking rate in Watson Text-to-Speech using cURL POST?

I coded an Android chatbot that uses Speech-to-Text and Text-to-speech as add-ons. But, I never bothered about the speaking rate and Speech Synthesis Markup Language (SSML).

So, what is this speaking rate and SSML?

According to Virtualspeech,

Speaking rate is often expressed in words per minute (wpm). To calculate this value, you’ll need to record yourself talking for a few minutes and then add up the number of words in your speech. Divide the total number of words by the number of minutes your speech took.

Speaking rate (wpm) = total words / number of minutes

and according to IBM Cloud docs

The Speech Synthesis Markup Language (SSML) is an XML-based markup language that provides annotations of text for speech-synthesis applications. It is a recommendation of the W3C Voice-Browser Working Group that has been adopted as the standard markup language for speech synthesis by the VoiceXML 2.0 specification. SSML provides developers of speech applications with a standard way to control aspects of the synthesis process by enabling them to specify pronunciation, volume, pitch, speed, and other attributes via markup.

OK, I understood the terms.

What's the answer to the Stackoverflow question

Here's a working example with the POST call,

curl -X POST -u "apikey:{API_KEY}" \
--header "Accept: audio/wav" \
--header "Content-Type: application/json" \
--data '{"text": "<p><s><prosody rate=\"+50%\">This is the first sentence of the paragraph.</prosody></s><s>Here is another sentence.</s><s>Finally, this is the last sentence.</s></p>"}' \
--output result.wav \
"{URL}/v1/synthesize" -v

on a Windows command prompt(cmd), create a JSON file input.json with the below command

echo {"text": "<p><s><prosody rate='+50%'>This is the first sentence of the paragraph.</prosody></s><s>Here is another sentence.</s><s>Finally, this is the last sentence.</s></p>"} > input.json

and then cURL to see result.wav file

curl -X POST -u "apikey:{API_KEY}" ^
--header "Accept: audio/wav" ^
--header "Content-Type: application/json" ^
--data @input.json ^
--output result.wav ^
"{URL}/v1/synthesize" -v

For the sentence in the actual question, replace the JSON above with yours

{"text":"<prosody rate='fast'>Adult capybaras are one meter long.</prosody>"}

Here are some useful links I followed to create the above code sample that will help you in understanding the SSML attributes. Also, check the limitations of <prosody> in the links below

Deploy with ease. Manage efficiently. Scale faster.

Leave the infrastructure headaches to us, while you focus on pushing boundaries, realizing your vision, and making a lasting impression on your users.

Get Started

DEV Community

Adjust speaking rate(SSML) via cURL POST

So, what is this speaking rate and SSML?

What's the answer to the Stackoverflow question

Deploy with ease. Manage efficiently. Scale faster.

Top comments (0)

A Workflow Copilot. Tailored to You.

Okay