kouwei qing

Posted on Dec 25

HarmonyOS Next - OPUS Audio Encoding in Audio and Video in Practice

#harmonyos

Background

In the requirement of sending short voice messages in chat scenarios, it is necessary to perform encoding and compression on the sent audio content. Initially, the MP3 encoder was used for compression. Later, since the voice messages were to be used for the training of the ASR model, the OPUS encoder was required to process voice signals. Previously, Android did not support MP3 and OPUS encoding. Currently, HarmonyOS provides support for both MP3 and OPUS encoding. The types of encoders supported by HarmonyOS are as follows:

Container Specification	Audio Encoding Type
mp4	AAC, Flac
m4a	AAC
flac	Flac
aac	AAC
mp3	MP3
raw	G711mu
amr	AMR
ogg	opus

After performing opus encoding through the system API, it was found that the encoded audio file could not be played. By checking the file, it was discovered that the encoder did not automatically perform Muxer. By checking the system API, it was found that the system Muxer does not support the ogg container. The currently supported encapsulation capabilities are as follows:

Encapsulation Format	Video Codec Type	Audio Codec Type	Cover Type
mp4	AVC (H.264), HEVC (H.265)	AAC, MPEG (MP3)	jpeg, png, bmp
m4a	-	AAC	jpeg, png, bmp
mp3	-	MPEG (MP3)	-

Since the HarmonyOS system does not support the ogg container, it is necessary to implement the container by oneself.

Introduction to opus ogg Encapsulation

Ogg organizes and links logical streams in units of pages. Each page has a page header and page data. The page header has the following definitions:

capture_pattern Page Identifier: ASCII characters, 0x4f 'O' 0x67 'g' 0x67 'g' 0x53 'S', with a size of 4 bytes. It marks the beginning of a page.
stream_structure_version Version ID: Generally, the current version is defaulted to 0, with a size of 1 byte.
header_type_flag Type Identifier: Identifies the type of the current page, with a size of 1 byte.
- 0x01: The media encoding data on this page and the previous page belong to the same packet of the same logical stream. If this bit is not set, it means that this page starts with a new packet.
- 0x02: Indicates that this page is the first page of the logical stream, the bos identifier. If this bit is not set, it means it is not the first page.
- 0x04: Indicates that this page is the last page of the logical stream, the eos identifier. If this bit is not set, it means this page is not the last page.
granule_position: Parameter information related to media encoding, with a size of 8 bytes. For audio streams, it stores the number of sampling codes in the PCM output of the logical stream up to this page, and the timestamp can be calculated from it. For video streams, it stores the number of encoded video frames up to this page. If this value is -1, it means that up to this page, the packet of the logical stream has not ended. (Little-endian)
serial_number: The ID of the stream in the current page, with a size of 4 bytes. It is the sequence number that distinguishes the logical stream to which this page belongs from other logical streams. We can use this value to divide the streams. (Little-endian)
page_seguence_number: The sequence number of this page in the logical stream, with a size of 4 bytes.
CRC_cbecksum: Cyclic redundancy check code for verifying the validity of each page, with a size of 4 bytes.
number_page_segments: Given the number of segments that appear in the segment_table domain of this page, with a size of 1 byte.
segment_table: Literally, it is a table representing the length of each segment, with a value range of 0 - 255. The packet value can be obtained from the segment (1 segment is 1 byte). The size of each packet ends with the last segment that is not equal to 255. From the segment_table in the page header, the length of each packet can be obtained. For example, if a group of segments are in the order of FF 45 FF FF FF 40 FF 05 FF FF FF 66 (a total of 4 packets, containing 12 segments, and the length of each packet is: FF 45【324】; FF FF FF 40【829】; FF 05【260】; FF FF FF 66【847】), then the length of the first packet is 255 + 69 = 324, the size of the second packet is 829, and so on.

Basically, the page header is composed of the above parameters. From this, we can obtain the length of the page header and the length of the entire page:

header_size  = 27 + number_page_segments ; (byte)
page_size = header_size + the size of each segment in the segment_table;

The format of the page header:

Implementing ogg Encapsulation

Xiph provides the open-source implementation libopusenc for opus ogg encapsulation. However, libopusenc depends on the libopus library. So, to use libopusenc to implement ogg container encapsulation, a simple way is to directly implement opus encoding and container encapsulation based on libopusenc.

The processing flow of libopusenc is as follows:

Creating the Encoder

First, create comments:

OggOpusComments *comments = ope_comments_create();  
ope_comments_add(comments, "ARTIST", "qingkouwei");  
ope_comments_add(comments, "TITLE", "qingkouwei-im");

You can customize some descriptions for the audio file. Next, create the encoder:

OggOpusEnc *pEnc = ope_encoder_create_file(outputFilePath_, comments, inSamplerate, inChannel,  
                                              quality, &error);  
    if (pEnc) {    
        int ret = ope_encoder_ctl(pEnc, OPUS_SET_BITRATE(outBitrate));  
    }

The parameters include the output file path of the encoding multiplexer, comment information, as well as the sampling rate, number of channels, audio quality, and other information.

Encoding PCM Data

static napi_value encodePCMToOpusOggNative(napi_env env, napi_callback_info info)  
{  
    size_t argc = 1;  
    napi_value args[1] = {nullptr};  
    napi_get_cb_info(env, info, &argc, args, nullptr, nullptr);  
        void* inputBuffer;   
size_t inputLength;   
napi_get_arraybuffer_info(env, args[0], &inputBuffer, &inputLength);   
ope_encoder_write(pEnc, (short *)inputBuffer, inputLength/2);  
    return nullptr;
}

Pass the binary data at the TS layer to the ope_encoder_write function. The ope_encoder_write function will write the encoded data to the path specified when the encoder was created.

Closing the Encoding Multiplexer

static napi_value closeOpusOggEncoderNative(napi_env env, napi_callback_info info)  
{  
   ope_encoder_drain(pEnc);  
    ope_encoder_destroy(pEnc);  
    if(comments!= NULL){  
        ope_comments_destroy(comments);  
        comments = NULL;  
    }  
    return nullptr;  
}

Release objects such as enc and comments. The overall process is relatively simple. Finally, the data in the ogg container file can be played normally using a common player:

Summary

This article introduced the method of implementing OPUS encoding and performing OGG container encapsulation in HarmonyOS, solving the special requirements of the business scenario for audio encoding.

DEV Community

HarmonyOS Next - OPUS Audio Encoding in Audio and Video in Practice

Background

Introduction to opus ogg Encapsulation

Implementing ogg Encapsulation

Creating the Encoder

Encoding PCM Data

Closing the Encoding Multiplexer

Summary

Top comments (0)

Read next

Exploring Essential Aspects of Sensitive Data Protection

Learning Your Money: The Art of Effective Tax Planning

State Isolation: Layout vs Workspace

How to Scrape Google Trends Data With Python?