HarmonyOS Next Audio-Video Practice: OPUS Audio Encoding
Background
In chat scenarios requiring short voice messages, audio content needs to be encoded and compressed. Initially, MP3 encoding was used, but later, as voice messages were applied to ASR model training, OPUS encoding became necessary for processing voice signals. Previously, Android did not support MP3 or OPUS encoding, but HarmonyOS now supports both. The encoder types supported by HarmonyOS are:
Container Format | Audio Encoding Type |
---|---|
mp4 | AAC, Flac |
m4a | AAC |
flac | Flac |
aac | AAC |
mp3 | MP3 |
raw | G711mu |
amr | AMR |
ogg | opus |
After encoding with OPUS via system APIs, the audio file could not be played. Investigation showed the encoder did not automatically perform muxing, and the system Muxer does not support the OGG container. Current packaging capabilities are as follows:
Packaging Format | Video Codec Type | Audio Codec Type | Cover Type |
---|---|---|---|
mp4 | AVC (H.264), HEVC (H.265) | AAC, MPEG (MP3) | jpeg, png, bmp |
m4a | - | AAC | jpeg, png, bmp |
mp3 | - | MPEG (MP3) | - |
Since HarmonyOS does not support the OGG container, we need to implement it manually.
Introduction to OPUS OGG Packaging
OGG organizes logical streams in units of "pages," each consisting of a page header and page data. The page header includes:
-
capture_pattern (Page Identifier): ASCII characters
0x4F 'O' 0x67 'g' 0x67 'g' 0x53 'S'
(4 bytes), marking the start of a page. - stream_structure_version (Version ID): Typically 0 (1 byte).
-
header_type_flag (Type Indicator): Marks the page type (1 byte):
-
0x01
: The media data on this page belongs to the same packet as the previous page. If unset, this page starts a new packet. -
0x02
: Indicates the first page of the logical stream (BOS flag). If unset, it is not the first page. -
0x04
: Indicates the last page of the logical stream (EOS flag). If unset, it is not the last page.
-
- granule_position: Media encoding parameters (8 bytes). For audio streams, it stores the number of PCM samples up to this page, from which timestamps can be calculated. For video streams, it stores the number of encoded frames. A value of -1 means the packet is not yet complete (little-endian).
- serial_number: Stream ID of the page (4 bytes), distinguishing this logical stream from others (little-endian).
- page_sequence_number: Page sequence number in the logical stream (4 bytes).
- CRC_cbecksum: Cyclic Redundancy Check (4 bytes) for page validity.
-
number_page_segments: Number of segments in the
segment_table
(1 byte). -
segment_table: A table defining segment lengths (0–255). Packets end at the last segment not equal to 255. For example, segments
FF 45 FF FF FF 40 FF 05 FF FF FF 66
(12 segments, 4 packets) yield packet lengths:255+69=324
,829
,260
,847
.
The page header length and total page length are calculated as:
header_size = 27 + number_page_segments (bytes)
page_size = header_size + sum of segment sizes in segment_table
Page header format:
Implementing OGG Packaging
Xiph provides the open-source libopusenc
for OPUS OGG packaging, which depends on the libopus
library. The simplest approach is to use libopusenc
for both OPUS encoding and container packaging.
libopusenc Processing Flow
Creating the Encoder
First, create comments:
OggOpusComments *comments = ope_comments_create();
ope_comments_add(comments, "ARTIST", "qingkouwei");
ope_comments_add(comments, "TITLE", "qingkouwei-im");
Then create the encoder:
OggOpusEnc *pEnc = ope_encoder_create_file(outputFilePath_, comments, inSamplerate, inChannel,
quality, &error);
if (pEnc) {
int ret = ope_encoder_ctl(pEnc, OPUS_SET_BITRATE(outBitrate));
}
Parameters include the muxer output path, comment info, sampling rate, channel count, audio quality, etc.
Encoding PCM Data
static napi_value encodePCMToOpusOggNative(napi_env env, napi_callback_info info)
{
size_t argc = 1;
napi_value args[1] = {nullptr};
napi_get_cb_info(env, info, &argc, args, nullptr, nullptr);
void* inputBuffer;
size_t inputLength;
napi_get_arraybuffer_info(env, args[0], &inputBuffer, &inputLength);
ope_encoder_write(pEnc, (short *)inputBuffer, inputLength/2);
return nullptr;
}
Transfer binary data from the TS layer to ope_encoder_write
, which writes encoded data to the path specified during encoder creation.
Closing the Encoder Muxer
static napi_value closeOpusOggEncoderNative(napi_env env, napi_callback_info info)
{
ope_encoder_drain(pEnc);
ope_encoder_destroy(pEnc);
if (comments != NULL) {
ope_comments_destroy(comments);
comments = NULL;
}
return nullptr;
}
Release the encoder, comments, and other objects. The overall process is straightforward, and the final OGG container file can be played normally by general players:
Summary
This article introduces methods to implement OPUS encoding and OGG container packaging in HarmonyOS, addressing special audio encoding requirements in business scenarios.
Top comments (0)