DEV Community

Cover image for HarmonyOS Audio-Video: Audio Capture Practice
kouwei qing
kouwei qing

Posted on • Edited on

HarmonyOS Audio-Video: Audio Capture Practice

HarmonyOS Audio-Video: Audio Capture Practice

Background

Many scenarios in application development require audio capture, such as voice messaging in chat functions, real-time speech-to-text conversion, voice calls, and video calls. On Android and iOS, the system provides two forms of audio capture:

  • Real-time audio stream capture
  • Audio file recording

The system also offers different API forms. For example, on Android:

  • AudioRecorder Java interface
  • MediaRecorder Java interface
  • OpenSLES C++ interface
  • AAudio C++ interface

During HarmonyOS adaptation, audio capture is also necessary. This article guides you through implementing audio capture step by step.

Introduction to Audio Recording Interfaces

HarmonyOS provides two audio capture interfaces for TS and C++:

  • AudioCapture
  • OHAudio

Below is an introduction to the APIs for both languages.

AudioCapture

Using AudioCapturer for audio recording involves creating an AudioCapturer instance, configuring audio capture parameters, starting and stopping capture, and releasing resources. The official state diagram clearly marks method calls and state transitions:

createAudioCapture

Creating a capturer mainly involves parameter configuration:

import { audio } from '@kit.AudioKit';

let audioStreamInfo: audio.AudioStreamInfo = {
  samplingRate: audio.AudioSamplingRate.SAMPLE_RATE_48000, // Sampling rate
  channels: audio.AudioChannel.CHANNEL_2, // Number of channels
  sampleFormat: audio.AudioSampleFormat.SAMPLE_FORMAT_S16LE, // Sample format
  encodingType: audio.AudioEncodingType.ENCODING_TYPE_RAW // Encoding format
};

let audioCapturerInfo: audio.AudioCapturerInfo = {
  source: audio.SourceType.SOURCE_TYPE_MIC,
  capturerFlags: 0
};

let audioCapturerOptions: audio.AudioCapturerOptions = {
  streamInfo: audioStreamInfo,
  capturerInfo: audioCapturerInfo
};

audio.createAudioCapturer(audioCapturerOptions, (err, data) => {
  if (err) {
  } else {
    let audioCapturer = data;
  }
});
Enter fullscreen mode Exit fullscreen mode

Parameters include two main sections:

  • AudioStreamInfo: Audio format configuration
    • samplingRate: Sampling rate
    • channels: Number of channels
    • sampleFormat: Sample format
    • encodingType: Audio encoding type (only PCM's ENCODING_TYPE_RAW is supported currently)
  • AudioCapturerInfo: Capture configuration
    • source: Audio source type, including:
    • SOURCE_TYPE_INVALID: Invalid audio source
    • SOURCE_TYPE_MIC: Microphone audio source
    • SOURCE_TYPE_VOICE_RECOGNITION: Speech recognition source
    • SOURCE_TYPE_PLAYBACK_CAPTURE: Playback audio stream (internal recording)
    • SOURCE_TYPE_VOICE_COMMUNICATION: Voice call scenario source
    • SOURCE_TYPE_VOICE_MESSAGE: Short voice message source
    • capturerFlags: Audio capturer flags (0 represents an audio capturer)

on('readData')

The on('readData') method subscribes to audio data read callbacks:

let readDataCallback = (buffer: ArrayBuffer) => {
  // Process the audio stream
}
audioCapturer.on('readData', readDataCallback);
Enter fullscreen mode Exit fullscreen mode

start

The start method begins recording:

import { BusinessError } from '@kit.BasicServicesKit';
audioCapturer.start((err: BusinessError) => {
  if (err) {
    // Handle error
  } else {
    // Recording started
  }
});
Enter fullscreen mode Exit fullscreen mode

stop

The stop method stops recording:

import { BusinessError } from '@kit.BasicServicesKit';
audioCapturer.stop((err: BusinessError) => {
  if (err) {
    // Handle error
  } else {
    // Recording stopped
  }
});
Enter fullscreen mode Exit fullscreen mode

release

The release method destroys the instance and releases resources:

import { BusinessError } from '@kit.BasicServicesKit';
audioCapturer.release((err: BusinessError) => {
  if (err) {
    // Handle error
  } else {
    // Resources released
  }
});
Enter fullscreen mode Exit fullscreen mode

OHAudio

OHAudio is a set of C APIs introduced in API version 10. These APIs are designed to be unified, supporting both normal and low-latency audio paths. They only support PCM format and are suitable for scenarios where native-layer audio input is required. Since many audio encoding libraries are implemented in C/C++, using OHAudio C++ interfaces on HarmonyOS reduces data transfer overhead between TS and C++ layers, improving efficiency.

OHAudio depends on the libohaudio.so dynamic library. Import the header files <native_audiostreambuilder.h> and <native_audiocapturer.h> to use audio recording-related APIs.

Create a Builder

OH_AudioStreamBuilder* builder;
OH_AudioStreamBuilder_Create(&builder, AUDIOSTREAM_TYPE_CAPTURER);
Enter fullscreen mode Exit fullscreen mode

Configure Audio Stream Parameters

Refer to the following example:

// Set audio sampling rate
OH_AudioStreamBuilder_SetSamplingRate(builder, 48000);
// Set audio channel count
OH_AudioStreamBuilder_SetChannelCount(builder, 2);
// Set audio sample format
OH_AudioStreamBuilder_SetSampleFormat(builder, AUDIOSTREAM_SAMPLE_S16LE);
// Set audio stream encoding type
OH_AudioStreamBuilder_SetEncodingType(builder, AUDIOSTREAM_ENCODING_TYPE_RAW);
// Set the working scenario for the input audio stream
OH_AudioStreamBuilder_SetCapturerInfo(builder, AUDIOSTREAM_SOURCE_TYPE_MIC);
Enter fullscreen mode Exit fullscreen mode

Parameters function similarly to those in AudioCapture.

Set Audio Callback Functions

// Custom data write function
int32_t MyOnReadData(
    OH_AudioCapturer* capturer,
    void* userData,
    void* buffer,
    int32_t length)
{
    // Extract 'length' bytes of recording data from 'buffer'
    return 0;
}
// Custom audio stream event function
int32_t MyOnStreamEvent(
    OH_AudioCapturer* capturer,
    void* userData,
    OH_AudioStream_Event event)
{
    // Update player state and UI based on the audio stream event
    return 0;
}
// Custom audio interrupt event function
int32_t MyOnInterruptEvent(
    OH_AudioCapturer* capturer,
    void* userData,
    OH_AudioInterrupt_ForceType type,
    OH_AudioInterrupt_Hint hint)
{
    // Update recorder state and UI based on the audio interrupt info
    return 0;
}
// Custom error callback function
int32_t MyOnError(
    OH_AudioCapturer* capturer,
    void* userData,
    OH_AudioStream_Result error)
{
    // Handle the audio error based on 'error'
    return 0;
}

OH_AudioCapturer_Callbacks callbacks;
// Configure callback functions
callbacks.OH_AudioCapturer_OnReadData = MyOnReadData;
callbacks.OH_AudioCapturer_OnStreamEvent = MyOnStreamEvent;
callbacks.OH_AudioCapturer_OnInterruptEvent = MyOnInterruptEvent;
callbacks.OH_AudioCapturer_OnError = MyOnError;

// Set the callback for the audio input stream
OH_AudioStreamBuilder_SetCapturerCallback(builder, callbacks, nullptr);
Enter fullscreen mode Exit fullscreen mode

Configure callback functions via OH_AudioStreamBuilder_SetCapturerCallback.

Construct the Recording Audio Stream

OH_AudioCapturer* audioCapturer;
OH_AudioStreamBuilder_GenerateCapturer(builder, &audioCapturer);
Enter fullscreen mode Exit fullscreen mode

Use the Audio Stream

  • OH_AudioStream_Result OH_AudioCapturer_Start(OH_AudioCapturer* capturer): Start recording
  • OH_AudioStream_Result OH_AudioCapturer_Pause(OH_AudioCapturer* capturer): Pause recording
  • OH_AudioStream_Result OH_AudioCapturer_Stop(OH_AudioCapturer* capturer): Stop recording
  • OH_AudioStream_Result OH_AudioCapturer_Flush(OH_AudioCapturer* capturer): Release cached data
  • OH_AudioStream_Result OH_AudioCapturer_Release(OH_AudioCapturer* capturer): Release the recording instance

Release the Builder

OH_AudioStreamBuilder_Destroy(builder);
Enter fullscreen mode Exit fullscreen mode

Audio Recording Best Practices

Let's implement a full-process audio capture example by recording to an MP3 file.

Permission Application

Audio capture requires dynamic permission application. First, declare the permission in module.json5:

"requestPermissions": [  
  {  
    "name": "ohos.permission.MICROPHONE",  
    "reason": "$string:reason",  
    "usedScene": {  
      "abilities": [  
        "FormAbility"  
      ],  
      "when": "inuse"  
    }  
  }  
],
Enter fullscreen mode Exit fullscreen mode

Dynamic permission application:

function reqPermissionsFromUser(permissions: Array<Permissions>, context: common.UIAbilityContext): void {  
  let atManager: abilityAccessCtrl.AtManager = abilityAccessCtrl.createAtManager();  
  atManager.requestPermissionsFromUser(context, permissions).then((data) => {  
    let grantStatus: Array<number> = data.authResults;  
    let length: number = grantStatus.length;  
    for (let i = 0; i < length; i++) {  
      if (grantStatus[i] !== 0) {  
        // User denied permission; prompt and guide to settings  
        return;  
      }  
    }  
    // Permission granted successfully  
  }).catch((err: BusinessError) => {  
    console.error(`Failed to request permissions. Code: ${err.code}, Message: ${err.message}`);  
  })  
}
Enter fullscreen mode Exit fullscreen mode

Call the permission application method in aboutToAppear and start recording after authorization:

const context: common.UIAbilityContext = getContext(this) as common.UIAbilityContext;  
reqPermissionsFromUser(permissions, context);  
Enter fullscreen mode Exit fullscreen mode

Configure C++ Project

After creating a C++ module, configure the ohaudio dynamic library dependency:

cmake_minimum_required(VERSION 3.5.0)  
project(audiorecorderdemo)  

set(NATIVERENDER_ROOT_PATH ${CMAKE_CURRENT_SOURCE_DIR})  

if(DEFINED PACKAGE_FIND_FILE)  
    include(${PACKAGE_FIND_FILE})  
endif()  

include_directories(${NATIVERENDER_ROOT_PATH}  
                    ${NATIVERENDER_ROOT_PATH}/include)  

add_library(capture SHARED napi_init.cpp)  
target_link_libraries(capture PUBLIC libace_napi.z.so)  
target_link_libraries(capture PUBLIC libohaudio.so)
Enter fullscreen mode Exit fullscreen mode

Configure NAPI methods:

static napi_value start(napi_env env, napi_callback_info info)  
{  
    // Implementation omitted  
    return nullptr;  
}  
static napi_value stop(napi_env env, napi_callback_info info)  
{  
    // Implementation omitted  
    return nullptr;  
}  
EXTERN_C_START  
static napi_value Init(napi_env env, napi_value exports)  
{  
    napi_property_descriptor desc[] = {  
        { "start", nullptr, start, nullptr, nullptr, nullptr, napi_default, nullptr },  
        { "stop", nullptr, stop, nullptr, nullptr, nullptr, napi_default, nullptr }  
    };  
    napi_define_properties(env, exports, sizeof(desc) / sizeof(desc[0]), desc);  
    return exports;  
}
Enter fullscreen mode Exit fullscreen mode

Implement Start Recording

// Custom data read function  
int32_t MyOnReadData(  
    OH_AudioCapturer* capturer,  
    void* userData,  
    void* buffer,  
    int32_t length)  
{  
    // TODO: Extract recording data from buffer  
    return 0;  
}  
// Custom audio stream event function  
int32_t MyOnStreamEvent(  
    OH_AudioCapturer* capturer,  
    void* userData,  
    OH_AudioStream_Event event)  
{  
    // TODO: Update state/UI based on event  
    return 0;  
}  
// Custom audio interrupt event function  
int32_t MyOnInterruptEvent(  
    OH_AudioCapturer* capturer,  
    void* userData,  
    OH_AudioInterrupt_ForceType type,  
    OH_AudioInterrupt_Hint hint)  
{  
    // TODO: Update state/UI based on interrupt info  
    return 0;  
}  
// Custom error callback function  
int32_t MyOnError(  
    OH_AudioCapturer* capturer,  
    void* userData,  
    OH_AudioStream_Result error)  
{  
    // TODO: Handle error based on 'error'  
    return 0;  
}  
static napi_value start(napi_env env, napi_callback_info info)  
{  
    OH_AudioStreamBuilder* builder;  
    OH_AudioStreamBuilder_Create(&builder, AUDIOSTREAM_TYPE_CAPTURER);  
    // Set audio parameters  
    OH_AudioStreamBuilder_SetSamplingRate(builder, 48000);  
    OH_AudioStreamBuilder_SetChannelCount(builder, 2);  
    OH_AudioStreamBuilder_SetSampleFormat(builder, AUDIOSTREAM_SAMPLE_S16LE);  
    OH_AudioStreamBuilder_SetEncodingType(builder, AUDIOSTREAM_ENCODING_TYPE_RAW);  
    OH_AudioStreamBuilder_SetCapturerInfo(builder, AUDIOSTREAM_SOURCE_TYPE_MIC);  

    OH_AudioCapturer_Callbacks callbacks;  
    // Configure callbacks  
    callbacks.OH_AudioCapturer_OnReadData = MyOnReadData;  
    callbacks.OH_AudioCapturer_OnStreamEvent = MyOnStreamEvent;  
    callbacks.OH_AudioCapturer_OnInterruptEvent = MyOnInterruptEvent;  
    callbacks.OH_AudioCapturer_OnError = MyOnError;  
    OH_AudioStreamBuilder_SetCapturerCallback(builder, callbacks, nullptr);  

    OH_AudioCapturer* audioCapturer;  
    OH_AudioStreamBuilder_GenerateCapturer(builder, &audioCapturer);  
    return nullptr;  
}
Enter fullscreen mode Exit fullscreen mode

Best Practice 1:

To avoid unexpected behavior, ensure each callback in OH_AudioCapturer_Callbacks is initialized with a custom callback or a null pointer. For example:

OH_AudioCapturer_Callbacks callbacks;

// Configure needed callbacks
callbacks.OH_AudioCapturer_OnReadData = MyOnReadData;
callbacks.OH_AudioCapturer_OnInterruptEvent = MyOnInterruptEvent;

// (Mandatory) Initialize unused callbacks with null
callbacks.OH_AudioCapturer_OnStreamEvent = nullptr;
callbacks.OH_AudioCapturer_OnError = nullptr;
Enter fullscreen mode Exit fullscreen mode

Best Practice 2:

For devices supporting low-latency mode, use low-latency mode to create the audio recording builder for scenarios with strict latency requirements (e.g., voice calls) to achieve higher-quality audio:

OH_AudioStream_LatencyMode latencyMode = AUDIOSTREAM_LATENCY_MODE_FAST;
OH_AudioStreamBuilder_SetLatencyMode(builder, latencyMode);
Enter fullscreen mode Exit fullscreen mode

Audio File Processing

Process audio data in the audio callback, which can be passed to ASR or written to a file directly. The next article will cover encoding to MP3 and writing to a file.

Stop and Release

OH_AudioCapturer_Stop(audioCapturer);
OH_AudioStreamBuilder_Destroy(builder);
Enter fullscreen mode Exit fullscreen mode

Summary

This article introduces two audio capture methods in HarmonyOS: AudioCapture at the TS layer and OHAudio at the C++ layer, and implements real-time audio capture using OHAudio interfaces.

Top comments (0)