Here's How
How I used the Web Audio API to simulate eight playback environments, run spectral analysis, and generate actionable mix feedback — all in the browser, with zero dependencies.
Every mixing engineer knows the ritual. You finish a mix, bounce it, transfer it to your phone. Walk to the car. Play it on the kitchen Bluetooth speaker. Try a pair of cheap earbuds. Each loop takes ten minutes. By the third one you've lost your reference and the session momentum is gone.
There are plenty of analyser plugins that tell you what a mix is — spectral balance, loudness, stereo width. But there are very few tools that pressure-test what a mix becomes on the systems where people actually listen.
That's the problem CX Mix Translator solves. Drop a stereo bounce, switch between eight monitoring simulations with level-matched A/B, and walk away with six structured diagnostics and three ranked action items. Audio never leaves the browser.
This post walks through the engineering decisions behind it.
Why the browser?
I wanted zero friction. No plugin installation, no DAW integration, no account creation, no upload to a server. A mixing engineer at 2am should be able to open a URL, drag in a file, and get answers.
The Web Audio API gives you everything you need: decoding, biquad filters, channel splitting, gain control, and an offline rendering context for analysis. The trade-off is that browser audio isn't sample-accurate the way a native DSP plugin is — but for a translation-checking tool, perceptual accuracy matters more than sample accuracy. You're asking "will the vocal disappear on a phone?" not "is this filter at exactly 3.2 kHz?"
The entire tool ships as a single self-contained HTML file. No framework, no build step, no npm install. Open the file, it works.
Simulating eight playback contexts
Each monitoring mode is a deliberately lightweight approximation of a playback context, not a hardware emulation. The goal isn't to perfectly model an iPhone 14 speaker — it's to expose the translation risks that context creates.
Here's what the eight modes target:
Mono — L+R summed. Exposes phase coherence issues and width-dependent elements that collapse.
Phone — Bandwidth-limited to roughly 380 Hz–6.8 kHz with upper-mid emphasis and mono-leaning imaging. Tests whether the vocal survives on a tiny driver.
Laptop — Reduced low-end, narrowed stereo width, modest presence boost. The most common first-listen environment.
Small speaker — Boxy upper-bass emphasis, presence bump, narrow bandwidth. Kitchen Bluetooth territory.
Car — Hyped lows, slight mid scoop, top-end lift. Tests whether the low end translates to a subwoofer-assisted environment.
Club — Heavier low-end energy and broad top-end presence. Tests mix behaviour under sustained loudness.
Low volume — Inverse equal-loudness contour at quiet monitoring level. Tests whether the mix reads when someone plays it softly.
Bypass — Flat reference. The A/B anchor.
Building simulations from biquad chains
Each mode is constructed from cascaded Web Audio BiquadFilterNode instances. A phone simulation, for example, chains a highpass filter (cutting below 380 Hz), a lowpass (cutting above 6.8 kHz), and a peaking filter boosting the 2–4 kHz presence range:
javascript// Phone simulation — simplified
const hp = audioCtx.createBiquadFilter();
hp.type = 'highpass';
hp.frequency.value = 380;
hp.Q.value = 0.7;
const lp = audioCtx.createBiquadFilter();
lp.type = 'lowpass';
lp.frequency.value = 6800;
lp.Q.value = 0.7;
const presence = audioCtx.createBiquadFilter();
presence.type = 'peaking';
presence.frequency.value = 3000;
presence.gain.value = 3.5;
presence.Q.value = 1.2;
source.connect(hp).connect(lp).connect(presence).connect(destination);
The car and club modes add low-shelf boosts. The low-volume mode applies Fletcher-Munson-style compensation — boosting bass and treble to simulate the perceptual loss at quiet playback levels.
Stereo width manipulation
Several modes narrow or collapse the stereo image. This uses an M/S (Mid/Side) matrix built from a channel splitter and merger:
javascript// M/S width control
const splitter = audioCtx.createChannelSplitter(2);
const merger = audioCtx.createChannelMerger(2);
const midGain = audioCtx.createGain();
const sideGain = audioCtx.createGain();
// Width = 0.0 (mono) to 1.0 (full stereo)
midGain.gain.value = 1.0;
sideGain.gain.value = width; // e.g. 0.3 for "laptop"
// Split L/R, derive Mid (L+R) and Side (L-R), scale, recombine
Mono mode sets sideGain to zero. Laptop and phone modes reduce it to 0.2–0.4. This catches elements that only exist in the sides — a wide reverb tail, a panned synth, a stereo-widened vocal — and flags them as translation risks.
Level-matched A/B
This is the detail that makes the tool trustworthy. Each simulation applies a perceptual gain trim so that switching between Bypass and any mode reflects translation differences, not loudness differences. Without this, the louder signal always sounds better and the comparison is useless.
The trim values are calibrated by ear against pink noise through each filter chain, then fine-tuned against a set of reference mixes across genres.
Spectral analysis
The tool runs six diagnostic categories:
Mono compatibility — does energy drop significantly when summed?
Vocal presence — is the 1–5 kHz range strong enough to cut through?
Low-end translation — does the sub/bass balance hold across contexts?
Harshness risk — is there excessive energy in the 2.5–6 kHz range?
Stereo width — is the mix over-reliant on side information?
Dynamics / limiting — is the crest factor dangerously low?
Offline band analysis
Rather than running the analyser in real-time (which ties results to playback position), I render the full file through parallel OfflineAudioContext instances — one per frequency band:
javascriptasync function analyseBand(buffer, lowFreq, highFreq) {
const offline = new OfflineAudioContext(
1, buffer.length, buffer.sampleRate
);
const source = offline.createBufferSource();
source.buffer = buffer;
const bp = offline.createBiquadFilter();
bp.type = 'bandpass';
bp.frequency.value = Math.sqrt(lowFreq * highFreq);
bp.Q.value = /* calculated from bandwidth */;
source.connect(bp).connect(offline.destination);
source.start();
const rendered = await offline.startRendering();
return computeRMS(rendered.getChannelData(0));
}
Six bands — sub, low, low-mid, mid, presence, air — each rendered offline, producing RMS values that feed the heuristic warning system. This gives whole-file averages independent of where the playhead happens to be.
Heuristic warnings
Each diagnostic combines the band RMS values with time-domain statistics (peak-to-RMS ratio for dynamics, L/R correlation for mono compatibility, side-to-mid ratio for width). Simple threshold checks produce a status (pass / caution / warning), a one-sentence diagnosis, and a one-line recommendation.
These findings roll up into a Quick Compare row showing translation risk per mode, and a Top 3 action list ranked by translation impact.
Design decisions
Three principles shaped the interface:
Decisions per minute, not features per screen. Every control lives under the dominant hand. Keyboard shortcuts handle everything: 1–8 switch modes, B flips A/B, Space plays, O opens a file, arrows seek. The goal is to audition all eight modes in under sixty seconds.
Diagnosis and audition share one frame. Findings sit to the right of the waveform, not behind a tab. There are no drill-downs, no second pages, no settings panels. Everything the engineer needs is visible at once.
Engineered tool, not a dashboard. Compact typography, hairline dividers, 4px radii, tracked uppercase labels. The visual language borrows from broadcast monitoring and metering tools, not SaaS dashboards. It should feel like a piece of studio equipment that happens to live in a browser tab.
Export
The session sheet exports as a JPG rendered to a 2× DPR canvas — not a screenshot. This matters because browser screenshot APIs don't capture custom fonts or fine spacing reliably. Instead, the export draws every element programmatically, including manual letter-spacing fallback for the brand typeface (IBM Plex Mono) on the canvas context.
What I'd build next
LUFS-I and LRA measurement panel
Reference-track loading with matched A/B comparison
Waveform-aware analysis — chorus detection so warnings can flag the loudest section explicitly
Save and recall mode comparison snapshots
The stack
HTML, CSS, Vanilla JavaScript, Web Audio API, Canvas 2D. One file. No dependencies. No build step.
The tool is live at cx-mix-translator.vercel.app. It's the companion to CX Sonic Analyzer, which handles real-time spectral analysis during mixing. Together they cover the two phases where engineers need the most feedback: while mixing (Sonic Analyzer) and before bouncing (Mix Translator).
I'm Christian Okonkwo — I build AI-powered fintech tools, music-tech software, and sports analytics products as a solo builder. More at my portfolio or on GitHub.
Top comments (0)