DEV Community

loading...

Adjust speaking rate(SSML) via cURL POST

vidyasagarmsc profile image Vidyasagar Machupalli Updated on ・2 min read

Stackoverflow is an ocean for learning and exploring. How? Try answering a question and you will understand :)

A few weeks ago, I saw this question on StackOverflow -

How to adjust the speaking rate in Watson Text-to-Speech using cURL POST?

I coded an Android chatbot that uses Speech-to-Text and Text-to-speech as add-ons. But, I never bothered about the speaking rate and Speech Synthesis Markup Language (SSML).

Courtesy: unDraw

So, what is this speaking rate and SSML?

According to Virtualspeech,

Speaking rate is often expressed in words per minute (wpm). To calculate this value, you’ll need to record yourself talking for a few minutes and then add up the number of words in your speech. Divide the total number of words by the number of minutes your speech took.

Speaking rate (wpm) = total words / number of minutes

and according to IBM Cloud docs

The Speech Synthesis Markup Language (SSML) is an XML-based markup language that provides annotations of text for speech-synthesis applications. It is a recommendation of the W3C Voice-Browser Working Group that has been adopted as the standard markup language for speech synthesis by the VoiceXML 2.0 specification. SSML provides developers of speech applications with a standard way to control aspects of the synthesis process by enabling them to specify pronunciation, volume, pitch, speed, and other attributes via markup.

OK, I understood the terms.

What's the answer to the Stackoverflow question

Here's a working example with the POST call,

curl -X POST -u "apikey:{API_KEY}" \
--header "Accept: audio/wav" \
--header "Content-Type: application/json" \
--data '{"text": "<p><s><prosody rate=\"+50%\">This is the first sentence of the paragraph.</prosody></s><s>Here is another sentence.</s><s>Finally, this is the last sentence.</s></p>"}' \
--output result.wav \
"{URL}/v1/synthesize" -v
Enter fullscreen mode Exit fullscreen mode

on a Windows command prompt(cmd), create a JSON file input.json with the below command

echo {"text": "<p><s><prosody rate='+50%'>This is the first sentence of the paragraph.</prosody></s><s>Here is another sentence.</s><s>Finally, this is the last sentence.</s></p>"} > input.json
Enter fullscreen mode Exit fullscreen mode

and then cURL to see result.wav file

curl -X POST -u "apikey:{API_KEY}" ^
--header "Accept: audio/wav" ^
--header "Content-Type: application/json" ^
--data @input.json ^
--output result.wav ^
"{URL}/v1/synthesize" -v
Enter fullscreen mode Exit fullscreen mode

For the sentence in the actual question, replace the JSON above with yours

{"text":"<prosody rate='fast'>Adult capybaras are one meter long.</prosody>"}
Enter fullscreen mode Exit fullscreen mode

Here are some useful links I followed to create the above code sample that will help you in understanding the SSML attributes. Also, check the limitations of <prosody> in the links below

Discussion (0)

pic
Editor guide