How I Fixed a Web Audio Echo Problem with a 5-Second Delay

#webaudio #javascript #webrtc #audio

How I fixed a web audio echo problem with a 5-second delay

if you want to do anything with audio in the browser, you’ll have to deal with the Web Audio API. it’s what powers everything — from music apps to AI voice chats.

recently I was building a real-time user ↔ AI voice conversation app CrystaCode.ai.

everything worked fine except one weird thing:

👉 during the first few seconds of each conversation, the AI’s voice got picked up by the mic. after that? no echo at all.

I spent days trying everything different routing, gain nodes, worklets, custom dsp, RNNoise — nothing fixed it. then, out of frustration, i added one line:

await new Promise(res => setTimeout(res, 5000));

between starting the mic and sending audio to the AI. and boom — echo gone forever.

wait... why did this work?
when you call:

navigator.mediaDevices.getUserMedia({
  audio: { echoCancellation: true, noiseSuppression: true, autoGainControl: true }
});

the browser gives you a mic stream with built-in acoustic echo cancellation (aec).
that aec runs natively inside chrome, firefox, safari, or edge usually the same code used in webrtc (like webrtc::EchoCanceller in chrome).

the browser AEC works like this:

it watches what’s playing from your speakers
it listens to your microphone at the same time
it learns the relationship between the two — basically: “when I play this, the mic hears that”
it builds a model of your room acoustics and device path
after a few seconds, it knows how to subtract the speaker sound from the mic signal

so it needs time to adapt — normally 2–5 seconds. during that time, the AEC hasn’t finished learning yet, so a little bit of echo leaks through. the bug wasn’t a bug

my web audio code was fine. the browser just needed time to adapt.

by delaying the start of sending mic audio to the ai for 5 seconds, I let the AEC finish adapting. after that, no more echo.

why browsers behave like this

browser AEC uses adaptive algorithms. they constantly adjust to changing acoustics speaker position, device, user movement, etc.

so, every time you start the mic, the algorithm resets and starts learning again. the first few seconds are a "training phase". if you stream mic audio too early, it still contains echo. after it converges — everything becomes clean.

DEV Community

How I Fixed a Web Audio Echo Problem with a 5-Second Delay

How I fixed a web audio echo problem with a 5-second delay

why browsers behave like this

TL;DR

More info

Top comments (0)