SEN LLC

Posted on May 13

A Browser Ear-Training Trainer in 350 Lines — Equal-Temperament Frequencies and Three Web Audio Footguns

#webaudio #javascript #music #frontend

Reading a music-theory textbook is one way to drill relative pitch recognition. Loading a webpage is another. This is a 350-line ear-training trainer that plays a reference C4 then a target note (or a two-note interval pair) through Web Audio and asks the user to identify it. The interesting parts: the one-line equal-temperament frequency formula, the three Web Audio footguns I hit while building it (AudioContext.state === "suspended", missing the first note via osc.start(currentTime), and the click-on-stop without an ADSR envelope), and a small DOM-free pitch module that gets 21 tests under node --test.

🌐 Demo: https://sen.ltd/portfolio/pitch-trainer/
📦 GitHub: https://github.com/sen-ltd/pitch-trainer

Three modes:

Mode	Question	Choices
Major scale	C4 → one note in the major scale	Do / Re / Mi / Fa / Sol / La / Si
Chromatic	C4 → one of 12 pitch classes	C / C# / D / ... / B
Interval	Two distinct notes sequentially, then held together	P1 / m2 / M2 / ... / M7

The equal-temperament frequency formula is one line

12-TET anchored at A4 = 440 Hz gives:

export function semitoneToHz(semitone) {
  if (!Number.isFinite(semitone)) return 0;
  return 440 * Math.pow(2, semitone / 12);
}

semitone is the offset from A4. So:

semitoneToHz(0) = 440 (A4)
semitoneToHz(12) = 880 (A5, one octave up)
semitoneToHz(-9) = 261.63 (C4, middle C)

For users who think in MIDI, the conversion is semitone = midi - 69 because A4 is MIDI 69.

The !Number.isFinite(semitone) guard returns 0 instead of letting NaN reach OscillatorNode.frequency.value. Web Audio's response to a NaN frequency is the audio thread silently dropping the note, which is one of the harder things to debug from the JS side.

The unit test pins precision:

test("semitoneToHz computes equal-temperament semitones to within 1e-9", () => {
  const expected = 440 / Math.pow(2, 9 / 12);   // C4
  assert.ok(Math.abs(semitoneToHz(-9) - expected) < 1e-9);
});

Footgun 1: `AudioContext` ships in `suspended` state

Browsers refuse to make sound until the user has clicked or tapped at least once. That's the autoplay policy, and it's correct — but the failure mode is silent. The AudioContext starts in state: "suspended", you can createOscillator() and start() perfectly normally, and nothing comes out.

The fix is to resume() inside a user-gesture handler:

function ensureAudioContext() {
  if (!state.audioCtx) {
    state.audioCtx = new (window.AudioContext || window.webkitAudioContext)();
  }
  if (state.audioCtx.state === "suspended") {
    state.audioCtx.resume().catch(() => {});
  }
}

The "Start" click runs ensureAudioContext() before scheduling any notes. resume() returns a promise; ignore the rejection because there's nothing useful to do with it.

Backgrounded tabs are also a relevant case — Chrome and Safari both suspend() an inactive context after a while. Re-checking state before every playback session is cheap and catches that.

Footgun 2: `osc.start(currentTime)` drops the first note

Calling OscillatorNode.start(ctx.currentTime) looks correct but isn't. By the time the audio thread picks up the schedule, ctx.currentTime has already advanced past the time you told it, so the start is effectively in the past. The note either renders at quiet, or is dropped entirely.

Shift everything forward by a few milliseconds. I use a 12-ms attack ramp inside an ADSR envelope (which also fixes footgun 3, below), so the effective "earliest sound" is t0 + 0.012s rather than t0 itself.

function playNote(semitoneFromA4, startOffset = 0) {
  const ctx = state.audioCtx;
  const osc = ctx.createOscillator();
  const gain = ctx.createGain();
  osc.frequency.value = semitoneToHz(semitoneFromA4);

  const t0 = ctx.currentTime + startOffset;
  const t1 = t0 + 0.012;       // attack peak
  const t2 = t0 + 0.82;        // sustain end
  const t3 = t0 + 0.9;         // release end

  gain.gain.setValueAtTime(0, t0);
  gain.gain.linearRampToValueAtTime(0.35, t1);
  gain.gain.setValueAtTime(0.35, t2);
  gain.gain.exponentialRampToValueAtTime(0.0001, t3);

  osc.connect(gain).connect(ctx.destination);
  osc.start(t0);
  osc.stop(t3 + 0.01);
}

Sequencing the reference note and the target note is just two playNote calls with different startOffsets:

playNote(rootSemitone, 0);
playNote(targetSemitone, NOTE_DURATION_SEC + GAP_BETWEEN_NOTES_SEC);

The audio thread schedules both ahead of time. The JS event loop can be sluggish in between; the notes still come out at the right time because the schedule lives on the audio side.

Footgun 3: no ADSR → audible "tic" on start and stop

A naked oscillator goes 0 → 1 → 0 in zero time. That step is a discontinuity in the waveform, which the human ear hears as a click. The minimum viable Attack/Decay/Sustain/Release envelope above smooths the corners away:

Phase	Duration	Gain
Attack	12 ms	0 → 0.35 (linear)
Sustain	rest	0.35
Release	80 ms	0.35 → 0.0001 (exponential)

(Decay is skipped because Sustain is the same value as the end of Attack.)

Two technical gotchas:

exponentialRampToValueAtTime(0, ...) throws. The exponential curve can't reach exactly zero. Use a near-zero target like 0.0001.
Order matters. setValueAtTime followed by linearRampToValueAtTime defines a ramp from the set value. Without the explicit setValueAtTime(0, t0), the ramp would start from whatever the gain happened to be (1.0 by default), and you'd hear an instant 1.0 followed by an upward ramp, which is worse than the original click.

Recent-N exclude for non-repetitive questions

Pure Math.random() will sometimes give the user three Mis in a row, which feels broken even though it's statistically expected. The fix is to exclude the last few picks from the candidate pool:

export function pickFromScale(scaleOffsets, recent = [], excludeN = 2, rng = Math.random) {
  if (!scaleOffsets.length) return null;
  const tail = recent.slice(-excludeN);
  const tailSet = new Set(tail);
  const candidates = scaleOffsets.filter((s) => !tailSet.has(s));
  const pool = candidates.length ? candidates : scaleOffsets;
  return pool[Math.floor(rng() * pool.length)];
}

If excludeN is bigger than the scale (unusual, but possible if the scale is tiny), the candidate pool becomes empty and the function falls back to the full scale. Better to occasionally repeat than to throw.

rng is injected so the tests can pin output:

function rngFrom(values) {
  let i = 0;
  return () => values[i++ % values.length];
}

test("pickFromScale excludes the last `excludeN` recent picks", () => {
  const rng = rngFrom([0, 0, 0]);
  // Recent = [0, 2]. Major scale minus those starts at 4.
  const picked = pickFromScale(SCALES.major, [0, 2], 2, rng);
  assert.equal(picked, 4);
});

Interval labels by absolute distance

Music theory distinguishes ascending and descending intervals; software ear-training rarely cares. The trainer asks "what interval did you hear?" and the user types a label without worrying about direction. So absolute-distance modulo 12 is the right interpretation:

export function intervalName(semitones) {
  if (!Number.isFinite(semitones)) return "?";
  const folded = Math.abs(semitones) % 12;
  return INTERVAL_NAMES[folded];
}

intervalName(-7) → "P5" (descending fifth has the same quality as ascending fifth).
intervalName(13) → "m2" (compound minor ninth folds to minor second's label).

My first version used signed modulo (((semitones % 12) + 12) % 12), which silently inverted descending intervals to their octave-complement labels. The test failure (intervalName(-7) === "P4", expected "P5") caught it — that's the test paying for itself in less than a minute.

TL;DR

12-TET frequency is 440 * 2 ** (semitone / 12).
Always resume() the AudioContext inside a click handler. The "audio context is suspended and nothing plays" failure has no visible error.
Don't call osc.start(ctx.currentTime) directly. Add at least 10 ms of attack so the audio thread has head-room.
Wrap every oscillator in a GainNode with a short ADSR envelope. Use linearRampToValueAtTime for attack and exponentialRampToValueAtTime for release, and never ramp exponential to zero (use 0.0001).
For random question selection, exclude the last N picks before sampling; fall back to the full pool if everything got excluded.
For ear-training UIs, interval labels by absolute distance (mod 12) match the user's mental model better than signed semitone math.

Source: https://github.com/sen-ltd/pitch-trainer — MIT, ~350 lines of JS, 21 unit tests, no build step, zero runtime dependencies.

🛠 Built by SEN LLC as part of an ongoing series of small, focused developer tools. Browse the full portfolio for more.

DEV Community

A Browser Ear-Training Trainer in 350 Lines — Equal-Temperament Frequencies and Three Web Audio Footguns

The equal-temperament frequency formula is one line

Footgun 1: `AudioContext` ships in `suspended` state

Footgun 2: `osc.start(currentTime)` drops the first note

Footgun 3: no ADSR → audible "tic" on start and stop

Recent-N exclude for non-repetitive questions

Interval labels by absolute distance

TL;DR

Top comments (0)

The equal-temperament frequency formula is one line

Footgun 1: AudioContext ships in suspended state

Footgun 2: osc.start(currentTime) drops the first note

Footgun 3: no ADSR → audible "tic" on start and stop

Recent-N exclude for non-repetitive questions

Interval labels by absolute distance

TL;DR

Footgun 1: `AudioContext` ships in `suspended` state

Footgun 2: `osc.start(currentTime)` drops the first note