DEV Community

Cover image for HarmonyOS Next Audio-Video Practice: OPUS Audio Encoding
kouwei qing
kouwei qing

Posted on • Edited on

HarmonyOS Next Audio-Video Practice: OPUS Audio Encoding

HarmonyOS Next Audio-Video Practice: OPUS Audio Encoding

Background

In chat scenarios requiring short voice messages, audio content needs to be encoded and compressed. Initially, MP3 encoding was used, but later, as voice messages were applied to ASR model training, OPUS encoding became necessary for processing voice signals. Previously, Android did not support MP3 or OPUS encoding, but HarmonyOS now supports both. The encoder types supported by HarmonyOS are:

Container Format Audio Encoding Type
mp4 AAC, Flac
m4a AAC
flac Flac
aac AAC
mp3 MP3
raw G711mu
amr AMR
ogg opus

After encoding with OPUS via system APIs, the audio file could not be played. Investigation showed the encoder did not automatically perform muxing, and the system Muxer does not support the OGG container. Current packaging capabilities are as follows:

Packaging Format Video Codec Type Audio Codec Type Cover Type
mp4 AVC (H.264), HEVC (H.265) AAC, MPEG (MP3) jpeg, png, bmp
m4a - AAC jpeg, png, bmp
mp3 - MPEG (MP3) -

Since HarmonyOS does not support the OGG container, we need to implement it manually.

Introduction to OPUS OGG Packaging

OGG organizes logical streams in units of "pages," each consisting of a page header and page data. The page header includes:

  1. capture_pattern (Page Identifier): ASCII characters 0x4F 'O' 0x67 'g' 0x67 'g' 0x53 'S' (4 bytes), marking the start of a page.
  2. stream_structure_version (Version ID): Typically 0 (1 byte).
  3. header_type_flag (Type Indicator): Marks the page type (1 byte):
    • 0x01: The media data on this page belongs to the same packet as the previous page. If unset, this page starts a new packet.
    • 0x02: Indicates the first page of the logical stream (BOS flag). If unset, it is not the first page.
    • 0x04: Indicates the last page of the logical stream (EOS flag). If unset, it is not the last page.
  4. granule_position: Media encoding parameters (8 bytes). For audio streams, it stores the number of PCM samples up to this page, from which timestamps can be calculated. For video streams, it stores the number of encoded frames. A value of -1 means the packet is not yet complete (little-endian).
  5. serial_number: Stream ID of the page (4 bytes), distinguishing this logical stream from others (little-endian).
  6. page_sequence_number: Page sequence number in the logical stream (4 bytes).
  7. CRC_cbecksum: Cyclic Redundancy Check (4 bytes) for page validity.
  8. number_page_segments: Number of segments in the segment_table (1 byte).
  9. segment_table: A table defining segment lengths (0–255). Packets end at the last segment not equal to 255. For example, segments FF 45 FF FF FF 40 FF 05 FF FF FF 66 (12 segments, 4 packets) yield packet lengths: 255+69=324, 829, 260, 847.

The page header length and total page length are calculated as:

header_size  = 27 + number_page_segments  (bytes)
page_size = header_size + sum of segment sizes in segment_table
Enter fullscreen mode Exit fullscreen mode

Page header format:

Implementing OGG Packaging

Xiph provides the open-source libopusenc for OPUS OGG packaging, which depends on the libopus library. The simplest approach is to use libopusenc for both OPUS encoding and container packaging.

libopusenc Processing Flow

Creating the Encoder

First, create comments:

OggOpusComments *comments = ope_comments_create();  
ope_comments_add(comments, "ARTIST", "qingkouwei");  
ope_comments_add(comments, "TITLE", "qingkouwei-im");
Enter fullscreen mode Exit fullscreen mode

Then create the encoder:

OggOpusEnc *pEnc = ope_encoder_create_file(outputFilePath_, comments, inSamplerate, inChannel,  
                                          quality, &error);  
if (pEnc) {    
    int ret = ope_encoder_ctl(pEnc, OPUS_SET_BITRATE(outBitrate));  
}
Enter fullscreen mode Exit fullscreen mode

Parameters include the muxer output path, comment info, sampling rate, channel count, audio quality, etc.

Encoding PCM Data

static napi_value encodePCMToOpusOggNative(napi_env env, napi_callback_info info)  
{  
    size_t argc = 1;  
    napi_value args[1] = {nullptr};  
    napi_get_cb_info(env, info, &argc, args, nullptr, nullptr);  
    void* inputBuffer;   
    size_t inputLength;   
    napi_get_arraybuffer_info(env, args[0], &inputBuffer, &inputLength);   
    ope_encoder_write(pEnc, (short *)inputBuffer, inputLength/2);  
    return nullptr;
}
Enter fullscreen mode Exit fullscreen mode

Transfer binary data from the TS layer to ope_encoder_write, which writes encoded data to the path specified during encoder creation.

Closing the Encoder Muxer

static napi_value closeOpusOggEncoderNative(napi_env env, napi_callback_info info)  
{  
   ope_encoder_drain(pEnc);  
   ope_encoder_destroy(pEnc);  
   if (comments != NULL) {  
       ope_comments_destroy(comments);  
       comments = NULL;  
   }  
   return nullptr;  
}
Enter fullscreen mode Exit fullscreen mode

Release the encoder, comments, and other objects. The overall process is straightforward, and the final OGG container file can be played normally by general players:

Summary

This article introduces methods to implement OPUS encoding and OGG container packaging in HarmonyOS, addressing special audio encoding requirements in business scenarios.

Top comments (0)