snowlyg

Posted on Jun 15 • Originally published at lodan.me

Debugging Android WebRTC Audio 3A with AEC_DUMP and Audacity

#webrtc #android #audio #debugging

Android WebRTC audio problems often start as a vague report: call quality is bad. The root cause may be echo, background noise, howling feedback, unstable loudness, near-end speech being over-suppressed, or far-end playback leaking back into the microphone.

If the only test method is asking two people to make calls repeatedly and remember whether this build sounds better than the previous one, the result drifts quickly. Human listening gets tired, room noise changes, and firmware or parameter changes are hard to compare without durable evidence.

The practical goal is to turn subjective listening into inspectable audio material. WebRTC AEC_DUMP can record Audio Processing Module inputs, reverse playback reference, processed output, and runtime settings. After unpacking it with unpack_aecdump, the generated WAV files can be inspected in Audacity or a similar audio tool.

Background: Listening Memory Is Not Enough

This case came from an Android 14 WebRTC call scenario. The device hardware platform included RK3562, and the application used WebRTC's Audio Processing Module for audio processing. In this article, 3A mainly refers to:

AEC: Acoustic Echo Cancellation.
AGC: Automatic Gain Control.
NS / ANS: Noise Suppression.

In a real product, audio quality is not determined by the WebRTC algorithm alone. Enclosure design, microphone openings, speaker loudness, Audio HAL gain, sample rate, channel layout, playback delay, and scheduling can all affect the final call.

Listening-only testing has several weaknesses:

it is difficult to reproduce the same speech, loudness, and distance;
people cannot reliably remember small differences between builds;
it is hard to distinguish bad raw microphone input from bad 3A output;
it is hard to know whether AEC received a valid far-end reference signal.

Audio debugging needs evidence that can be saved and compared. That is where AEC_DUMP helps.

What AEC_DUMP Gives You

AEC_DUMP records key inputs and settings around the WebRTC audio processing path. It is not a complete replacement for real call testing, but it helps answer concrete questions:

Is the raw microphone input already broken?
Did the far-end playback reference reach AEC?
What changed before and after WebRTC audio processing?
Do sample rate, channel count, and APM switches match the expected device path?

This changes the debugging conversation from "it sounds different" to "we can compare the same captured material."

Test Environment and Boundary

A minimal test setup can use two Android devices:

Android device A
  -> WebRTC call
  -> Android device B

The test application can be based on the public sample project:

https://github.com/snowlyg/AndroidWebRTCGradle

Treat the sample build as a debugging tool, not as a production client. Public notes do not need real device identifiers, deployment details, test accounts, or complete field configuration.

One practical warning: if both devices use speaker mode, are too close to each other, and the volume is high, they can form a strong acoustic loop and produce harsh feedback. Keep volume and distance controlled, and use headphones or a fixed playback source when needed. Otherwise, an extreme acoustic feedback case may be mistaken for a pure software issue.

Capturing AEC_DUMP

A typical capture flow is:

Prepare two Android devices with USB debugging enabled.
Install a WebRTC test application that can save AEC_DUMP.
Enable AEC_DUMP file saving on both devices.
Start a WebRTC call.
Run a fixed test sequence: far-end speech, near-end silence, near-end speech, and double talk.
Hang up the call so the application closes the dump file.
Pull audio.aecdump from the device.

Example command:

adb pull /sdcard/Download/audio.aecdump ./audio.aecdump

The path is only an example. A real application can write the file to its own debug directory, but public material should avoid production paths.

Unpacking audio.aecdump

WebRTC source can build the unpack_aecdump tool. Build directories and target names can vary by WebRTC version, but a common command in an existing WebRTC build environment is:

ninja -C out/Default unpack_aecdump

After the tool is available, unpack the captured file:

./unpack_aecdump audio.aecdump

The common output files are:

input.wav
reverse.wav
ref_out.wav
settings.txt

Each file represents a different kind of evidence. They should not be interpreted as the same signal.

reverse.wav: The Far-End Reference for AEC

reverse.wav is the far-end playback reference that is fed into AEC. In practical terms, it represents what the local speaker side is expected to play.

Use it to check:

whether AEC received a far-end reference signal;
whether the reference audio is empty, intermittent, or abnormally low/high;
whether the reference could plausibly align with the real playback path.

If reverse.wav has no useful content, AEC does not have enough information to cancel speaker audio that leaks back into the microphone. In that case, tuning AEC delay or NS parameters is usually not the first priority.

If reverse.wav looks healthy but echo remains strong, continue with acoustic path, speaker volume, capture gain, delay alignment, and microphone input.

input.wav: Near-End Audio Before 3A

input.wav is the near-end microphone audio before WebRTC APM processing. It is the key file for judging the hardware capture and system input path.

Look for:

raw capture level being too low;
clipping;
obvious noise floor during silence;
far-end playback already overloading the microphone input;
hardware, Audio HAL, capture gain, or microphone path problems before WebRTC processing.

If input.wav is already heavily clipped, AEC, NS, and AGC cannot reliably repair it. Clipping means information has already been lost. Start with hardware gain, system capture path, microphone placement, and speaker volume.

ref_out.wav: Output After 3A Processing

ref_out.wav is the processed near-end reference output reconstructed from the dump inputs and settings. It is useful when compared with input.wav:

input.wav
  -> WebRTC APM
  -> ref_out.wav

Inspect whether:

background noise is reduced;
echo components are weaker;
near-end speech is over-suppressed;
AGC makes loudness unstable;
NS introduces obvious distortion or cuts speech tails.

If input.wav has clear speech but ref_out.wav severely suppresses the near-end speaker, check 3A settings, double-talk behavior, NS strength, AGC behavior, and AEC delay alignment.

settings.txt: Verify Configuration Before Tuning

settings.txt records WebRTC APM initialization and reconfiguration information. At minimum, check:

input / output / reverse sample rate;
input / output / reverse channels;
whether AEC is enabled;
whether AECM is enabled;
whether NS is enabled;
whether AGC is enabled;
whether HPF is enabled.

If sample rate, channel count, or 3A switches do not match expectations, fix configuration first. Otherwise, waveform analysis may be based on the wrong assumptions.

For example, if capture and reverse playback have mismatched sample-rate assumptions, AEC delay reasoning becomes unreliable. If channel count is different from the expected device path, APM output may not represent the real product behavior.

Using Audacity for Engineering Judgment

Audacity should not be treated as an absolute scoring system. Its value is in practical inspection:

waveform height for loudness and dynamic range;
flat tops for clipping;
silence thickness for noise floor;
spectrogram brightness for noise distribution;
same-time comparison between input.wav and ref_out.wav;
far-end playback sections to see whether reverse.wav leaks into input.wav.

A fixed test sequence helps:

0-5s: near-end silence, far-end speech
5-10s: near-end speech, far-end silence
10-15s: double talk
15-20s: near-end silence, room noise only

After each firmware or parameter change, compare the same type of segment instead of relying on a full-call impression.

Common Decision Paths

An AEC_DUMP review can be organized as several decision paths:

reverse.wav is empty or broken
  -> check render path and AEC reverse stream first

input.wav is clipped
  -> check microphone gain, Audio HAL, speaker volume, acoustic path

input.wav is clean but ref_out.wav is damaged
  -> check APM settings, AGC/NS strength, double-talk behavior, delay

settings.txt is unexpected
  -> fix sample rate, channels, and 3A switches before tuning

The point is not to solve everything in one pass. The point is to locate the layer first: capture is bad, reverse reference is missing, APM configuration is wrong, or the processing strategy needs tuning.

Relationship to Device Acoustic Tuning

AEC_DUMP is not isolated from device acoustic work. It belongs to the same loop as enclosure tuning, sample-rate adaptation, and AEC delay adjustment.

If input.wav shows a large amount of speaker energy entering the microphone, changing AEC parameters may only reduce the symptom. The device structure may need attention: speaker outlet, microphone sealing, enclosure reflection, pickup distance, and playback loudness.

If settings.txt shows sample-rate or channel assumptions that do not match the real device path, align software configuration before evaluating AEC delay. Otherwise, parameter tuning becomes guesswork.

A safer order is:

capture AEC_DUMP
  -> inspect input / reverse / ref_out / settings
  -> fix obvious capture or config problems
  -> adjust acoustic structure or playback level
  -> tune AEC delay and 3A parameters
  -> capture again and compare

For embedded WebRTC audio work, the reusable rule is simple: visualize first, tune second.

Conclusion

AEC_DUMP does not replace real call testing, and it does not prove that a device sounds good in every environment. But it makes Android WebRTC audio problems recordable, comparable, and reviewable.

For Android WebRTC devices, input.wav, reverse.wav, ref_out.wav, and settings.txt represent different layers of evidence. First confirm that the far-end reference exists, then check whether raw microphone input is healthy, compare audio before and after 3A, and finally verify sample rate, channels, and processing switches.

The value is not adding more ceremony to debugging. The value is avoiding the vague conclusion that "AEC is bad" whenever audio quality fails. The actual issue may be microphone gain, enclosure acoustics, sample-rate assumptions, render reference, delay alignment, or APM configuration. Splitting the evidence makes the fix path much more stable.

DEV Community