snowlyg

Posted on Jun 10 • Originally published at lodan.me

Debugging WebRTC Audio Playback Latency on OpenHarmony 5.0: ArkWeb, AudioRenderer Underrun, and Native SDK Comparison

#webrtc #openharmony #audio #debugging

This case happened in a WebRTC call running inside nweb / ArkWeb on an OpenHarmony 5.0 device. The peer side was Chrome, and the call could be established normally. The OpenHarmony side was also receiving audio data, but playback was clearly wrong: speech became choppy, delayed audio kept accumulating, and the user experience turned into "the peer finishes speaking, then the device plays it several seconds later."

At first glance, this kind of symptom can look like a network or WebRTC ICE problem. In this case, the useful evidence pointed in a different direction. WebRTC stats showed a connected media path, low round-trip time, low jitter, no packet loss, and continuously increasing send/receive bytes. The more important failure was in the local playback path: ArkWeb supplied decoded audio to the system playback path too slowly, AudioRenderer underrun appeared continuously, and old PCM data kept queuing instead of being dropped.

The native ohos_webrtc project is still relevant in this investigation, but not because it directly patches ArkWeb's internal audio sink. Its value is as a comparison path: if the same call works through native WebRTC, the problem is more likely inside ArkWeb/nweb playback. If native WebRTC also fails, the investigation moves lower into AudioServer, audio host, HAL, driver, or scheduling.

Background: WebRTC in nweb / ArkWeb

The affected scenario used a web-side WebRTC call on OpenHarmony 5.0. The application path looked like this:

Chrome peer
    -> WebRTC network and media path
    -> nweb / ArkWeb on OpenHarmony
    -> AudioRendererSinkInner
    -> AudioRenderer / AudioServer
    -> audio host / HAL / device speaker

The peer side could establish a call normally. The OpenHarmony side could receive audio, but playback had several visible symptoms:

speech was choppy.
AudioRenderer underrun appeared repeatedly.
older audio continued to queue.
the perceived playback became several seconds late.

That last symptom matters. If audio packets were simply missing, the result would be gaps or silence. In this case, old audio continued playing late, which suggested that the local playback path was falling behind real time.

Why the Network Was Not the First Suspect

WebRTC stats did not show a broken media path. The observed values were in a healthy range:

connectionState: connected
iceConnectionState: connected
packetsLost: 0
roundTripTime: 0.002 ~ 0.021 seconds
jitter: 0.022 ~ 0.031 seconds
bytesReceived / bytesSent: increasing continuously

These numbers do not prove that the network is perfect, but they make it unlikely that network loss, ICE failure, or remote-side media starvation is the primary cause of a multi-second playback delay.

The key distinction is this:

network problem:
  packets are lost, delayed heavily, or ICE state degrades

local playback problem:
  media arrives, but decoded audio is not delivered to the playback device in real time

In this case, the second explanation matched the logs better.

Key Logs: RenderFrame Was Too Slow

The most important device-side evidence came from the audio rendering logs:

AudioRendererSinkInner: [RenderFrame] RenderFrame len[7056] cost[100~106]ms
RendererInServer: Underrun
PaRendererStreamImpl underrun: 5757, 5758, 5759...
Buffer is not empty

The important details are:

a single RenderFrame call cost around 100 ms.
the playback side reported underrun.
the upper buffer was still not empty.

That combination is very different from "no audio data arrived." It means audio data existed above the playback layer, but the data was not reaching the low-level playback device at a real-time pace.

For a real-time call, a 100 ms render operation is too expensive. If this repeats, AudioServer runs out of data at the playback deadline, underrun is reported, and the remaining old PCM continues to queue. The user then hears delayed speech instead of live speech.

Failure Chain: Data Exists, Playback Falls Behind

The practical failure chain can be summarized like this:

nweb / ArkWeb audio supply is slow
  -> AudioServer playback buffer runs dry
  -> AudioRenderer underrun appears
  -> old PCM is still queued above the sink
  -> playback delay grows to several seconds

This explains why the call can look connected and still sound unusable. WebRTC may continue to receive packets, decode audio, and update stats, while the local playback sink fails to keep up with real time.

It also explains why simply waiting does not recover the call. If old PCM is not dropped when the renderer falls behind, the playback queue can preserve outdated audio. The device then keeps playing history instead of catching up to the live stream.

For communication products, that behavior is usually worse than a short dropout. A short dropout is noticeable, but delayed conversation breaks turn-taking and makes the call unusable.

What the Native ohos_webrtc Project Provides

The relevant native project is:

https://gitcode.com/openharmony-sig/ohos_webrtc

This project is based on WebRTC m120 for OpenHarmony adaptation. From the project history and exposed SDK surface, its work is mainly around:

ArkTS SDK wrapping.
native WebRTC integration.
audio and video capture/rendering.
AudioDeviceModule adaptation and optimization.
video hardware encode/decode support.
screen capture.
system audio capture.
memory and stability fixes.

That means the project already provides a native WebRTC SDK path for OpenHarmony. It is useful for building a comparison test against the ArkWeb path:

web path:
  Chrome -> nweb / ArkWeb -> AudioRendererSinkInner

native path:
  ArkTS UI -> native ohos_webrtc -> AudioDeviceModule -> AudioRenderer

The native path does not directly repair ArkWeb's internal AudioRendererSinkInner, but it can help answer a critical question: is the playback problem specific to ArkWeb's WebRTC sink, or does it still happen when the media path is moved to native WebRTC?

Hardware Audio Codec Is Not an Existing Project Capability

The current project evidence does not show an implemented hardware audio encoder/decoder path.

The audio codec path still appears to use WebRTC's built-in software factories:

CreateBuiltinAudioEncoderFactory()
CreateBuiltinAudioDecoderFactory()

The typical supported audio codecs in that path are:

Opus
G722
iLBC
G711
L16

The OpenHarmony AVCodec adaptation in the project is mainly visible on the video side:

HardwareVideoEncoderFactory
HardwareVideoDecoderFactory
H264 / H265

The current material does not show a matching hardware audio factory layer such as:

HardwareAudioEncoderFactory
HardwareAudioDecoderFactory
OH_AudioEncoder
OH_AudioDecoder

This should be stated carefully. It does not mean the platform can never support hardware audio codecs. It only means that this project, as observed, should not be treated as already having a hardware audio encode/decode path comparable to its video hardware codec path.

Why Hardware Audio Codec Is Not the First Fix

More importantly, the observed failure does not first point to compressed-audio codec performance.

Hardware audio encoding/decoding mainly affects this boundary:

compressed audio <-> PCM

The logs point to a later boundary:

decoded PCM -> AudioRendererSinkInner -> AudioRenderer -> AudioServer -> HAL

The actual symptoms were:

RenderFrame taking around 100 ms.
AudioRenderer underrun.
upper buffer not empty.
old PCM continuing to queue.
playback delay growing to seconds.

Even if hardware audio codec support were added later, it would not necessarily fix a renderer that cannot deliver PCM to the playback device in real time. The first priority is the playback scheduling and buffer behavior after audio has already reached PCM form.

Native SDK as a Comparison Experiment

The most useful short-term experiment is to run the same call scenario through native ohos_webrtc.

If the native path is smooth:

native ohos_webrtc works normally
  -> the problem is more likely in nweb / ArkWeb audio sink behavior
  -> business logic can consider bypassing web-side WebRTC for calls

If the native path also has underrun and delayed playback:

native ohos_webrtc also stutters
  -> the problem is more likely below ArkWeb
  -> inspect AudioServer, audio_host, HAL, driver, thread priority, and CPU scheduling

This comparison is valuable because it avoids arguing from one log stream. It turns the question into a controlled boundary test:

same device
same network
same call scenario
different WebRTC playback path

That is a better isolation strategy than immediately tuning codec parameters or changing ICE configuration.

AEC Dump Capability and Its Boundary

The native project exposes AEC dump controls through PeerConnectionFactory:

startAecDump(fd: number, max_size_bytes: number): boolean;
stopAecDump(): void;

A typical native SDK usage pattern is:

const file = fs.openSync(`${context.filesDir}/call_debug.aecdump`,
  fs.OpenMode.READ_WRITE | fs.OpenMode.CREATE | fs.OpenMode.TRUNC);

pcf.startAecDump(file.fd, -1);

This is useful for debugging the native WebRTC audio-processing path. It can help inspect audio processing behavior in the native SDK call path.

The boundary is equally important:

it applies to the native ohos_webrtc SDK path.
it does not directly capture nweb / ArkWeb internal WebRTC AEC dump.
after the file descriptor is handed to native/WebRTC, ownership must be handled carefully; do not close or reuse it from the application side in a way that breaks the native dump writer.

So AEC dump is useful for the comparison path, but it is not direct visibility into ArkWeb's internal playback sink.

Recommended Direction

The short-term direction is:

Reproduce the same call through native ohos_webrtc.
Compare playback behavior with the nweb / ArkWeb path.
If native playback is smooth, evaluate moving the call path to the native SDK.
If native playback also fails, move the investigation to the system audio stack.

System-side investigation should focus on:

AudioRendererSinkInner RenderFrame calls taking around 100 ms.
low-latency AudioRenderer path.
AudioServer and audio host thread priority.
HAL period and buffer configuration.
CPU governor and big/little core scheduling.
dropping old PCM when playback is already behind real time.

That last point is important for real-time communication. When the renderer is late, preserving all old PCM may be technically faithful, but it makes conversation worse. A real-time call usually needs a catch-up strategy that favors current audio over stale audio.

Conclusion

This OpenHarmony WebRTC issue was not primarily a classic network problem. The call was connected, ICE stayed connected, packets were not being lost, RTT and jitter were low, and bytes continued to move.

The strongest evidence pointed to local audio playback real-time failure. AudioRendererSinkInner::RenderFrame cost around 100 ms, AudioServer reported underrun, and the upper buffer was still not empty. That means the system was not simply missing audio data. It had audio data, but it was not being delivered to the playback device fast enough, and old PCM kept accumulating.

The native ohos_webrtc project provides an important comparison path. It already covers ArkTS SDK wrapping, native WebRTC integration, AudioDeviceModule work, video hardware codec support, screen capture, system audio capture, and AEC dump controls. But it should not be described as having hardware audio encode/decode support based on the current evidence.

For this specific issue, hardware audio codec work is not the first fix. The first priority is to verify whether native WebRTC can bypass the ArkWeb audio sink and then determine whether the remaining bottleneck belongs to ArkWeb, AudioRenderer, AudioServer, audio host, HAL, or scheduling.

DEV Community