DEV Community

Cover image for HarmonyOS Next - OPUS Audio Encoding in Audio and Video in Practice
kouwei qing
kouwei qing

Posted on

HarmonyOS Next - OPUS Audio Encoding in Audio and Video in Practice

Background

In the requirement of sending short voice messages in chat scenarios, it is necessary to perform encoding and compression on the sent audio content. Initially, the MP3 encoder was used for compression. Later, since the voice messages were to be used for the training of the ASR model, the OPUS encoder was required to process voice signals. Previously, Android did not support MP3 and OPUS encoding. Currently, HarmonyOS provides support for both MP3 and OPUS encoding. The types of encoders supported by HarmonyOS are as follows:

Container Specification Audio Encoding Type
mp4 AAC, Flac
m4a AAC
flac Flac
aac AAC
mp3 MP3
raw G711mu
amr AMR
ogg opus

After performing opus encoding through the system API, it was found that the encoded audio file could not be played. By checking the file, it was discovered that the encoder did not automatically perform Muxer. By checking the system API, it was found that the system Muxer does not support the ogg container. The currently supported encapsulation capabilities are as follows:

Encapsulation Format Video Codec Type Audio Codec Type Cover Type
mp4 AVC (H.264), HEVC (H.265) AAC, MPEG (MP3) jpeg, png, bmp
m4a - AAC jpeg, png, bmp
mp3 - MPEG (MP3) -

Since the HarmonyOS system does not support the ogg container, it is necessary to implement the container by oneself.

Introduction to opus ogg Encapsulation

Ogg organizes and links logical streams in units of pages. Each page has a page header and page data. The page header has the following definitions:

  1. capture_pattern Page Identifier: ASCII characters, 0x4f 'O' 0x67 'g' 0x67 'g' 0x53 'S', with a size of 4 bytes. It marks the beginning of a page.
  2. stream_structure_version Version ID: Generally, the current version is defaulted to 0, with a size of 1 byte.
  3. header_type_flag Type Identifier: Identifies the type of the current page, with a size of 1 byte.
    • 0x01: The media encoding data on this page and the previous page belong to the same packet of the same logical stream. If this bit is not set, it means that this page starts with a new packet.
    • 0x02: Indicates that this page is the first page of the logical stream, the bos identifier. If this bit is not set, it means it is not the first page.
    • 0x04: Indicates that this page is the last page of the logical stream, the eos identifier. If this bit is not set, it means this page is not the last page.
  4. granule_position: Parameter information related to media encoding, with a size of 8 bytes. For audio streams, it stores the number of sampling codes in the PCM output of the logical stream up to this page, and the timestamp can be calculated from it. For video streams, it stores the number of encoded video frames up to this page. If this value is -1, it means that up to this page, the packet of the logical stream has not ended. (Little-endian)
  5. serial_number: The ID of the stream in the current page, with a size of 4 bytes. It is the sequence number that distinguishes the logical stream to which this page belongs from other logical streams. We can use this value to divide the streams. (Little-endian)
  6. page_seguence_number: The sequence number of this page in the logical stream, with a size of 4 bytes.
  7. CRC_cbecksum: Cyclic redundancy check code for verifying the validity of each page, with a size of 4 bytes.
  8. number_page_segments: Given the number of segments that appear in the segment_table domain of this page, with a size of 1 byte.
  9. segment_table: Literally, it is a table representing the length of each segment, with a value range of 0 - 255. The packet value can be obtained from the segment (1 segment is 1 byte). The size of each packet ends with the last segment that is not equal to 255. From the segment_table in the page header, the length of each packet can be obtained. For example, if a group of segments are in the order of FF 45 FF FF FF 40 FF 05 FF FF FF 66 (a total of 4 packets, containing 12 segments, and the length of each packet is: FF 45【324】; FF FF FF 40【829】; FF 05【260】; FF FF FF 66【847】), then the length of the first packet is 255 + 69 = 324, the size of the second packet is 829, and so on.

Basically, the page header is composed of the above parameters. From this, we can obtain the length of the page header and the length of the entire page:

header_size  = 27 + number_page_segments ; (byte)
page_size = header_size + the size of each segment in the segment_table;
Enter fullscreen mode Exit fullscreen mode

The format of the page header:

Image description

Implementing ogg Encapsulation

Xiph provides the open-source implementation libopusenc for opus ogg encapsulation. However, libopusenc depends on the libopus library. So, to use libopusenc to implement ogg container encapsulation, a simple way is to directly implement opus encoding and container encapsulation based on libopusenc.

The processing flow of libopusenc is as follows:

Creating the Encoder

First, create comments:

OggOpusComments *comments = ope_comments_create();  
ope_comments_add(comments, "ARTIST", "qingkouwei");  
ope_comments_add(comments, "TITLE", "qingkouwei-im");
Enter fullscreen mode Exit fullscreen mode

You can customize some descriptions for the audio file. Next, create the encoder:

OggOpusEnc *pEnc = ope_encoder_create_file(outputFilePath_, comments, inSamplerate, inChannel,  
                                              quality, &error);  
    if (pEnc) {    
        int ret = ope_encoder_ctl(pEnc, OPUS_SET_BITRATE(outBitrate));  
    }
Enter fullscreen mode Exit fullscreen mode

The parameters include the output file path of the encoding multiplexer, comment information, as well as the sampling rate, number of channels, audio quality, and other information.

Encoding PCM Data

static napi_value encodePCMToOpusOggNative(napi_env env, napi_callback_info info)  
{  
    size_t argc = 1;  
    napi_value args[1] = {nullptr};  
    napi_get_cb_info(env, info, &argc, args, nullptr, nullptr);  
        void* inputBuffer;   
size_t inputLength;   
napi_get_arraybuffer_info(env, args[0], &inputBuffer, &inputLength);   
ope_encoder_write(pEnc, (short *)inputBuffer, inputLength/2);  
    return nullptr;
}
Enter fullscreen mode Exit fullscreen mode

Pass the binary data at the TS layer to the ope_encoder_write function. The ope_encoder_write function will write the encoded data to the path specified when the encoder was created.

Closing the Encoding Multiplexer

static napi_value closeOpusOggEncoderNative(napi_env env, napi_callback_info info)  
{  
   ope_encoder_drain(pEnc);  
    ope_encoder_destroy(pEnc);  
    if(comments!= NULL){  
        ope_comments_destroy(comments);  
        comments = NULL;  
    }  
    return nullptr;  
}
Enter fullscreen mode Exit fullscreen mode

Release objects such as enc and comments. The overall process is relatively simple. Finally, the data in the ogg container file can be played normally using a common player:

Image description

Summary

This article introduced the method of implementing OPUS encoding and performing OGG container encapsulation in HarmonyOS, solving the special requirements of the business scenario for audio encoding.

Top comments (0)