程序员一鸣

Posted on Jun 30

HarmonyOS Development: Text Synthesis Speech

#harmonyos

Foreword

this article is based on Api12

Android Development students all know that in Android, the realization of a text synthesis speech playback can be realized quickly by using the object TextToSpeech provided by the system. If the system does not come with it, it can also be realized by three parties, such as the speech synthesis of Xunfei, etc. In short, it can be realized in a variety of ways. Then, in Hongmeng, how to realize the synthesis speech playback according to the specified text, in fact, it is also very simple, because there is also textToSpeech in Hongmeng.

Implementation steps

step 1: Create an engine to get text-to-speech class

use the system's own Api textToSpeech to call the createEngine method to create an engine and receive parameters to set create the relevant parameters of the engine instance, such as the configured language, mode, tone, and style.

private createTextToSpeech() {
    let extraParam: Record<string, Object> = { "style": 'interaction-broadcast', "locate": 'CN', "name": 'EngineName' };
    let initParamsInfo: textToSpeech.CreateEngineParams = {
      language: 'zh-CN',
      person: 0,
      online: 1,
      extraParams: extraParam
    }


    textToSpeech.createEngine(initParamsInfo,
      (err: BusinessError, textToSpeechEngine: textToSpeech.TextToSpeechEngine) => {
        if (!err) {
          console.info('Succeeded in creating engine.');

          this.ttsEngine = textToSpeechEngine;
        } else {
          console.error(`Failed to create engine. Code: ${err.code}, message: ${err.message}.`);
        }
      })
  }

Regarding the parameters, we need to pay attention to the explanation, first of all language, that is, the supported language, currently only supports Chinese, that is, "zh-CN"; The online field is the supported mode, 0 is the online mode, which is not currently supported; 1 is offline, currently only offline mode is supported; person, timbre, 0 is the tone of the female voice of the young lady, currently only the tone of the young lady is supported; Extracarams is the style setting, such as setting the broadcasting style and whether background playback is supported.

It can be found that although there are many parameters, only one is supported at present, which is basically fixed. extraParams has an isBackStage, that is, whether background playback is supported. When true, background broadcast is supported.

The second step, voice playback

text synthesis speech, we only need to call the speak method, two parameters, the first is the text that needs to be synthesized speech, the requirement is chinese text with no more than 10000 characters, this is the key point, and the second is the relevant parameters for synthesizing broadcast audio, which are used to configure speech speed, volume, tone, synthesis type, etc., including one parameter requestId needs to know that it can only be used once in the same instance. Repeated settings will not work. If it is called multiple times, it is recommended to replace it each time, such as timestamp, random number, etc.

//Call the speak broadcast method
private speak(message:string) {
let speakListener: textToSpeech.SpeakListener = {
//Start broadcasting callback
onStart(requestId: string, response: textToSpeech.StartResponse) {
console.info(`onStart, requestId: ${requestId} response: ${JSON.stringify(response)}`);
},
//Complete the broadcast callback
onComplete(requestId: string, response: textToSpeech.CompleteResponse) {
console.info(`onComplete, requestId: ${requestId} response: ${JSON.stringify(response)}`);
},
//Stop the broadcast and complete the callback. When the stop method is called and completed, this callback will be triggered
onStop(requestId: string, response: textToSpeech.StopResponse) {
console.info(`onStop, requestId: ${requestId} response: ${JSON.stringify(response)}`);
},
//Return audio stream
onData(requestId: string, audio: ArrayBuffer, response: textToSpeech.SynthesisResponse) {
console.info(`onData, requestId: ${requestId} sequence: ${JSON.stringify(response)} audio: ${JSON.stringify(audio)}`);
},
//Error callback, triggered when an error occurs during the broadcast process
onError(requestId: string, errorCode: number, errorMessage: string) {
console.error(`onError, requestId: ${requestId} errorCode: ${errorCode} errorMessage: ${errorMessage}`);
}
};
//Set callback
this.ttsEngine?  .setListener(speakListener);
//Set broadcasting related parameters
let extraParam: Record<string, Object> = {
"queueMode": 0,
"speed": 1,
"volume": 2,
"pitch": 1,
"languageContext": 'zh-CN',
"audioType": "pcm",
"soundChannel": 3,
"playType": 1
}
let speakParams: textToSpeech.SpeakParams = {
RequestId: "123456-a",//requestId can only be used once within the same instance, please do not set it again
extraParams: extraParam
}
//Call the speak broadcast method
this.ttsEngine?  .speak(message, speakParams)
}

Stop playback

call stop directly.

 ttsEngine.stop()

Shut down the engine and release engine resources


ttsEngine.shutdown()

voice Recognition Callback

let speakListener: textToSpeech.SpeakListener = {
//Start broadcasting callback
onStart(requestId: string, response: textToSpeech.StartResponse) {
console.info(`onStart, requestId: ${requestId} response: ${JSON.stringify(response)}`);
},
//Complete the broadcast callback
onComplete(requestId: string, response: textToSpeech.CompleteResponse) {
console.info(`onComplete, requestId: ${requestId} response: ${JSON.stringify(response)}`);
},
//Stop the broadcast and complete the callback. When the stop method is called and completed, this callback will be triggered
onStop(requestId: string, response: textToSpeech.StopResponse) {
console.info(`onStop, requestId: ${requestId} response: ${JSON.stringify(response)}`);
},
//Return audio stream
onData(requestId: string, audio: ArrayBuffer, response: textToSpeech.SynthesisResponse) {
console.info(`onData, requestId: ${requestId} sequence: ${JSON.stringify(response)} audio: ${JSON.stringify(audio)}`);
},
//Error callback, triggered when an error occurs during the broadcast process
onError(requestId: string, errorCode: number, errorMessage: string) {
console.error(`onError, requestId: ${requestId} errorCode: ${errorCode} errorMessage: ${errorMessage}`);
}
};
//Set callback
this.ttsEngine? .setListener(speakListener);

broadcast strategy

in different scenarios, such as pauses, words read together, numbers read separately, etc., different scenarios will have different playback strategies.

Word broadcasting mode

text format:hN, First h is fixed, N can choose 012 three numbers, and 0 is the intelligent judgment word playback mode. The default value is 0,1 is broadcast letter by letter, and 2 is broadcast word by word.

For example:

"hello[h1] world"

hello uses the word pronunciation, world and subsequent words will be pronounced letter by letter.

Digital Broadcasting Strategy

the format is the same as above,nN,N can select 012 three numbers, 0 is the intelligent judgment number processing strategy, the default value is 0;1 is broadcast as a number one by one, 2 is broadcast as a numerical value, more than 18 digits are not supported, automatically broadcast according to the number one by one.

For example:

"[n2]123[n1]456[n0]"

among them, the 123 will be broadcast according to the value, the 456 will be broadcast according to the number, and then the numbers in the text will be automatically judged.

Insert Mute Pause

the format is [pN], where N is an unsigned integer in ms.

For example:

"你好[p1000]程序员一鸣"

when the above statement is broadcast, a 1000ms silent pause will be inserted after "Hello.

Specify the pronunciation of Chinese characters

the tone of Chinese characters is followed by a number 1~5 to represent the five tones of Yin flat, Yang flat, upper tone, de-tone and soft tone respectively. The format is:[= MN], where M represents pinyin, N represents tone, and the value range is: 1 represents Yin flat, 2 represents Yang flat, 3 represents upper tone, 4 represents De-tone and 5 represents soft tone.

For example:

"[=zhao2]"

the word "zha" will be read as "zha ó".

Related Summary

the ability to synthesize speech from text can only be tested on real computers, and simulators are not supported.

DEV Community

HarmonyOS Development: Text Synthesis Speech

Foreword

Implementation steps

step 1: Create an engine to get text-to-speech class

The second step, voice playback

Stop playback

Shut down the engine and release engine resources

voice Recognition Callback

broadcast strategy

Word broadcasting mode

Digital Broadcasting Strategy

Insert Mute Pause

Specify the pronunciation of Chinese characters

Related Summary

Top comments (0)