How I fixed a web audio echo problem with a 5-second delay
if you want to do anything with audio in the browser, you’ll have to deal with the Web Audio API. it’s what powers everything — from music apps to AI voice chats.
recently I was building a real-time user ↔ AI voice conversation app CrystaCode.ai.
everything worked fine except one weird thing:
👉 during the first few seconds of each conversation, the AI’s voice got picked up by the mic. after that? no echo at all.
I spent days trying everything different routing, gain nodes, worklets, custom dsp, RNNoise — nothing fixed it. then, out of frustration, i added one line:
await new Promise(res => setTimeout(res, 5000));
between starting the mic and sending audio to the AI. and boom — echo gone forever.
wait... why did this work?
when you call:
navigator.mediaDevices.getUserMedia({
audio: { echoCancellation: true, noiseSuppression: true, autoGainControl: true }
});
the browser gives you a mic stream with built-in acoustic echo cancellation (aec).
that aec runs natively inside chrome, firefox, safari, or edge usually the same code used in webrtc (like webrtc::EchoCanceller in chrome).
the browser AEC works like this:
- it watches what’s playing from your speakers
- it listens to your microphone at the same time
- it learns the relationship between the two — basically: “when I play this, the mic hears that”
- it builds a model of your room acoustics and device path
- after a few seconds, it knows how to subtract the speaker sound from the mic signal
so it needs time to adapt — normally 2–5 seconds. during that time, the AEC hasn’t finished learning yet, so a little bit of echo leaks through. the bug wasn’t a bug
my web audio code was fine. the browser just needed time to adapt.
by delaying the start of sending mic audio to the ai for 5 seconds, I let the AEC finish adapting. after that, no more echo.
why browsers behave like this
browser AEC uses adaptive algorithms. they constantly adjust to changing acoustics speaker position, device, user movement, etc.
so, every time you start the mic, the algorithm resets and starts learning again. the first few seconds are a "training phase". if you stream mic audio too early, it still contains echo. after it converges — everything becomes clean.
TL;DR
- problem: ai voice was picked up by mic in first seconds
- reason: browser AEC needs time to adapt
- fix: delay sending mic data for ~5s before starting conversation
Top comments (0)