HarmonyOS

Posted on Nov 5 • Edited on Nov 14

Practical Guide: How to Implement Text-to-Speech (TTS) in HarmonyOS

#harmonyos #mobileappdevelopment #codesnippet #typescript

Read the original article：Practical Guide: How to Implement Text-to-Speech (TTS) in HarmonyOS

Preface

Hello everyone, I am Ruocheng. This series aims to help developers quickly implement commonly used features in HarmonyOS apps, providing ready-to-use code examples.

In this article, we will focus on how to implement text-to-speech (TTS) capability in HarmonyOS using the textToSpeech API, enabling your app to convert written text into spoken audio.

Effect Preview

As shown below, by clicking the Read Text button in the demo, the text on the page will be read aloud:

Read Text Function Encapsulation

To improve reusability and maintainability, create a new file named textToSpeech.ets in the utils folder to encapsulate text-to-speech-related functions:

Core Utility Class

Below is the complete TextToSpeechManager utility class. It follows the singleton pattern and can be directly copied and used:

import { textToSpeech } from '@kit.CoreSpeechKit';
import { BusinessError } from '@kit.BasicServicesKit';


/**
 * Text-to-speech
 * Use: Core Speech Kit
 * Supports converting up to 10,000 characters of mixed Chinese/English text (Simplified Chinese, Traditional Chinese, numbers, English) into speech with selectable voice styles.
 * Use cases
 * Enables devices such as phones and tablets to perform text-to-speech conversion even offline, supporting playback features (like screen reading) for visually impaired users or situations where reading is inconvenient.
 */
export class TextToSpeechManager{
    private static instance: TextToSpeechManager;

    private constructor() {}

    public static getInstance(): TextToSpeechManager {
        if (!TextToSpeechManager.instance) {
            TextToSpeechManager.instance = new TextToSpeechManager();
        }
        return TextToSpeechManager.instance;
    }

    // Create a TextToSpeechEngine instance
    private ttsEngine: textToSpeech.TextToSpeechEngine|null = null;

    // Set playback parameters
    private extraParam: Record<string, Object>|null = null;

    // Instantiate the SpeakParams object
    private speakParams: textToSpeech.SpeakParams|null = null;

    // SpeakListener object; set the callback information of speak
    private  speakListener: textToSpeech.SpeakListener|null = null;


    /**
     * Create a TextToSpeechEngine instance using the createEngine API.
     * The createEngine API supports two call forms; this article demonstrates one of them; for other methods, refer to the API documentation.
     * For additional methods, see: https://developer.huawei.com/consumer/cn/doc/harmonyos-references/hms-ai-texttospeech
     */
    createEngine(){
        // Set engine creation parameters
        let extraParam: Record<string, Object> = {"style": 'interaction-broadcast', "locate": 'CN', "name": 'EngineName'};
        let initParamsInfo: textToSpeech.CreateEngineParams = {
            language: 'zh-CN',
            person: 0,
            online: 1,
            extraParams: extraParam
        };

        // Call the createEngine method
        textToSpeech.createEngine(initParamsInfo, (err: BusinessError, textToSpeechEngine: textToSpeech.TextToSpeechEngine) => {
            if (!err) {
                console.info('Succeeded in creating engine');
                // Receive the created engine instance
                this.ttsEngine = textToSpeechEngine;
            } else {
                console.error(`Failed to create engine. Code: ${err.code}, message: ${err.message}.`);
            }
        });
    }

    /**
     * After obtaining the TextToSpeechEngine instance, instantiate the SpeakParams and SpeakListener objects. Then, pass in the text to be synthesized and spoken (originalText), and call the speak API to perform text-to-speech playback.
     */
    initParam(){
        // Set the speak callback information
        this.speakListener = {
            // Callback for when playback starts
            onStart(requestId: string, response: textToSpeech.StartResponse) {
                console.info(`onStart, requestId: ${requestId} response: ${JSON.stringify(response)}`);
            },
            // Callback for when synthesis and playback are completed
            onComplete(requestId: string, response: textToSpeech.CompleteResponse) {
                console.info(`onComplete, requestId: ${requestId} response: ${JSON.stringify(response)}`);
            },
            // Callback for when playback stops
            onStop(requestId: string, response: textToSpeech.StopResponse) {
                console.info(`onStop, requestId: ${requestId} response: ${JSON.stringify(response)}`);
            },
            // Callback for returning audio data
            onData(requestId: string, audio: ArrayBuffer, response: textToSpeech.SynthesisResponse) {
                console.info(`onData, requestId: ${requestId} sequence: ${JSON.stringify(response)} audio: ${JSON.stringify(audio)}`);
            },
            // Error callback
            onError(requestId: string, errorCode: number, errorMessage: string) {
                console.error(`onError, requestId: ${requestId} errorCode: ${errorCode} errorMessage: ${errorMessage}`);
            }
        };

        // Set the callback
        this.ttsEngine?.setListener(this.speakListener);

    }

    /**
     * Call the playback method
     * Developers can actively set the playback policy by modifying speakParams.
     */
    speak(text:string){

    // Set playback parameters
        this.extraParam= {"queueMode": 0, "speed": 1, "volume": 0.1, "pitch": 1, "languageContext": 'zh-CN',
            "audioType": "pcm", "soundChannel": 3, "playType": 1 };

        this.speakParams = {
            requestId: new Date().getTime().toString(), //'123456', // requestId can only be used once within the same instance, do not reuse.
            extraParams: this.extraParam
        };

        // Call the playback method
        // Developers can modify speakParams to actively define the playback policy.
        this.ttsEngine?.speak(text, this.speakParams);
    }

    /**
     * Stop calling the playback method
     * When it is necessary to terminate speech synthesis and playback, call the stop API.
     */
    stop(){
        // When it is necessary to check whether the text-to-speech service is busy,
        // then stop the playback
        if(this.ttsEngine?.isBusy()){
            this.ttsEngine?.stop();
        }
    }
}

Practical Implementation Steps

Step 1: Import the utility class.

In the page where you need the text-to-speech feature, import the encapsulated class:

import {TextToSpeechManager} from  "../utils/textToSpeech"

Step 2: Initialize the TTS engine.

Initialize in the page lifecycle method. When the page opens, create the TextToSpeechEngine instance and initialize related parameters:

   private textToSpeechManger = TextToSpeechManager.getInstance();
  aboutToAppear(): void {
        this.textToSpeechManger.createEngine();
        this.textToSpeechManger.initParam();
    }

Step 3: Implement text-to-speech feature.

Next, write the business function for text-to-speech. When the user clicks the button, simply pass the text to be played back into the function to implement text-to-speech.

    // Text-to-Speech
    textToSpeech(txt:string){
        this.textToSpeechManger.speak(txt);
    }

Feature Highlights

Multi-language support：Supports mixed Chinese-English text playback
Offline capability：Works without a network connection
Text limit： Supports up to 10,000 characters per request
Voice options：Multiple voice styles available
Adjustable parameters：Customize speed, volume, and pitch
Status callbacks：Provides complete playback status monitoring

Usage Recommendations

Resource management：Call stop() when the page is destroyed to release resources
Error handling：You can perform corresponding error handling based on the error information returned in the callback function
User experience：For long texts, split into segments to avoid long waiting times
Permission configuration：Ensure the app has the required audio permissions

Summary

Through this article, we learned how to implement text-to-speech feature in HarmonyOS apps:

Utility class encapsulation：The TextToSpeechManager class is encapsulated using the singleton pattern for global access
Engine initialization：Create the TTS engine instance with the createEngine() method
Parameter configuration：Set playback parameters and listeners with the initParam() method
Text-to-speech implementation：Implement text-to-speech using the speak() method
Resource control：Manage playback state using the stop() method

This text-to-speech solution is highly encapsulated and easy to integrate, allowing quick adoption in your HarmonyOS projects. Whether for news reading, learning assistance, or accessibility features, it can deliver an excellent user experience. That concludes this guide — now go and try it out.

DEV Community