Reading a music-theory textbook is one way to drill relative pitch recognition. Loading a webpage is another. This is a 350-line ear-training trainer that plays a reference C4 then a target note (or a two-note interval pair) through Web Audio and asks the user to identify it. The interesting parts: the one-line equal-temperament frequency formula, the three Web Audio footguns I hit while building it (
AudioContext.state === "suspended", missing the first note viaosc.start(currentTime), and the click-on-stop without an ADSR envelope), and a small DOM-free pitch module that gets 21 tests undernode --test.
🌐 Demo: https://sen.ltd/portfolio/pitch-trainer/
📦 GitHub: https://github.com/sen-ltd/pitch-trainer
Three modes:
| Mode | Question | Choices |
|---|---|---|
| Major scale | C4 → one note in the major scale | Do / Re / Mi / Fa / Sol / La / Si |
| Chromatic | C4 → one of 12 pitch classes | C / C# / D / ... / B |
| Interval | Two distinct notes sequentially, then held together | P1 / m2 / M2 / ... / M7 |
The equal-temperament frequency formula is one line
12-TET anchored at A4 = 440 Hz gives:
export function semitoneToHz(semitone) {
if (!Number.isFinite(semitone)) return 0;
return 440 * Math.pow(2, semitone / 12);
}
semitone is the offset from A4. So:
-
semitoneToHz(0)= 440 (A4) -
semitoneToHz(12)= 880 (A5, one octave up) -
semitoneToHz(-9)= 261.63 (C4, middle C)
For users who think in MIDI, the conversion is semitone = midi - 69 because A4 is MIDI 69.
The !Number.isFinite(semitone) guard returns 0 instead of letting NaN reach OscillatorNode.frequency.value. Web Audio's response to a NaN frequency is the audio thread silently dropping the note, which is one of the harder things to debug from the JS side.
The unit test pins precision:
test("semitoneToHz computes equal-temperament semitones to within 1e-9", () => {
const expected = 440 / Math.pow(2, 9 / 12); // C4
assert.ok(Math.abs(semitoneToHz(-9) - expected) < 1e-9);
});
Footgun 1: AudioContext ships in suspended state
Browsers refuse to make sound until the user has clicked or tapped at least once. That's the autoplay policy, and it's correct — but the failure mode is silent. The AudioContext starts in state: "suspended", you can createOscillator() and start() perfectly normally, and nothing comes out.
The fix is to resume() inside a user-gesture handler:
function ensureAudioContext() {
if (!state.audioCtx) {
state.audioCtx = new (window.AudioContext || window.webkitAudioContext)();
}
if (state.audioCtx.state === "suspended") {
state.audioCtx.resume().catch(() => {});
}
}
The "Start" click runs ensureAudioContext() before scheduling any notes. resume() returns a promise; ignore the rejection because there's nothing useful to do with it.
Backgrounded tabs are also a relevant case — Chrome and Safari both suspend() an inactive context after a while. Re-checking state before every playback session is cheap and catches that.
Footgun 2: osc.start(currentTime) drops the first note
Calling OscillatorNode.start(ctx.currentTime) looks correct but isn't. By the time the audio thread picks up the schedule, ctx.currentTime has already advanced past the time you told it, so the start is effectively in the past. The note either renders at quiet, or is dropped entirely.
Shift everything forward by a few milliseconds. I use a 12-ms attack ramp inside an ADSR envelope (which also fixes footgun 3, below), so the effective "earliest sound" is t0 + 0.012s rather than t0 itself.
function playNote(semitoneFromA4, startOffset = 0) {
const ctx = state.audioCtx;
const osc = ctx.createOscillator();
const gain = ctx.createGain();
osc.frequency.value = semitoneToHz(semitoneFromA4);
const t0 = ctx.currentTime + startOffset;
const t1 = t0 + 0.012; // attack peak
const t2 = t0 + 0.82; // sustain end
const t3 = t0 + 0.9; // release end
gain.gain.setValueAtTime(0, t0);
gain.gain.linearRampToValueAtTime(0.35, t1);
gain.gain.setValueAtTime(0.35, t2);
gain.gain.exponentialRampToValueAtTime(0.0001, t3);
osc.connect(gain).connect(ctx.destination);
osc.start(t0);
osc.stop(t3 + 0.01);
}
Sequencing the reference note and the target note is just two playNote calls with different startOffsets:
playNote(rootSemitone, 0);
playNote(targetSemitone, NOTE_DURATION_SEC + GAP_BETWEEN_NOTES_SEC);
The audio thread schedules both ahead of time. The JS event loop can be sluggish in between; the notes still come out at the right time because the schedule lives on the audio side.
Footgun 3: no ADSR → audible "tic" on start and stop
A naked oscillator goes 0 → 1 → 0 in zero time. That step is a discontinuity in the waveform, which the human ear hears as a click. The minimum viable Attack/Decay/Sustain/Release envelope above smooths the corners away:
| Phase | Duration | Gain |
|---|---|---|
| Attack | 12 ms | 0 → 0.35 (linear) |
| Sustain | rest | 0.35 |
| Release | 80 ms | 0.35 → 0.0001 (exponential) |
(Decay is skipped because Sustain is the same value as the end of Attack.)
Two technical gotchas:
-
exponentialRampToValueAtTime(0, ...)throws. The exponential curve can't reach exactly zero. Use a near-zero target like0.0001. -
Order matters.
setValueAtTimefollowed bylinearRampToValueAtTimedefines a ramp from the set value. Without the explicitsetValueAtTime(0, t0), the ramp would start from whatever the gain happened to be (1.0 by default), and you'd hear an instant 1.0 followed by an upward ramp, which is worse than the original click.
Recent-N exclude for non-repetitive questions
Pure Math.random() will sometimes give the user three Mis in a row, which feels broken even though it's statistically expected. The fix is to exclude the last few picks from the candidate pool:
export function pickFromScale(scaleOffsets, recent = [], excludeN = 2, rng = Math.random) {
if (!scaleOffsets.length) return null;
const tail = recent.slice(-excludeN);
const tailSet = new Set(tail);
const candidates = scaleOffsets.filter((s) => !tailSet.has(s));
const pool = candidates.length ? candidates : scaleOffsets;
return pool[Math.floor(rng() * pool.length)];
}
If excludeN is bigger than the scale (unusual, but possible if the scale is tiny), the candidate pool becomes empty and the function falls back to the full scale. Better to occasionally repeat than to throw.
rng is injected so the tests can pin output:
function rngFrom(values) {
let i = 0;
return () => values[i++ % values.length];
}
test("pickFromScale excludes the last `excludeN` recent picks", () => {
const rng = rngFrom([0, 0, 0]);
// Recent = [0, 2]. Major scale minus those starts at 4.
const picked = pickFromScale(SCALES.major, [0, 2], 2, rng);
assert.equal(picked, 4);
});
Interval labels by absolute distance
Music theory distinguishes ascending and descending intervals; software ear-training rarely cares. The trainer asks "what interval did you hear?" and the user types a label without worrying about direction. So absolute-distance modulo 12 is the right interpretation:
export function intervalName(semitones) {
if (!Number.isFinite(semitones)) return "?";
const folded = Math.abs(semitones) % 12;
return INTERVAL_NAMES[folded];
}
intervalName(-7) → "P5" (descending fifth has the same quality as ascending fifth).
intervalName(13) → "m2" (compound minor ninth folds to minor second's label).
My first version used signed modulo (((semitones % 12) + 12) % 12), which silently inverted descending intervals to their octave-complement labels. The test failure (intervalName(-7) === "P4", expected "P5") caught it — that's the test paying for itself in less than a minute.
TL;DR
- 12-TET frequency is
440 * 2 ** (semitone / 12). - Always
resume()the AudioContext inside a click handler. The "audio context is suspended and nothing plays" failure has no visible error. - Don't call
osc.start(ctx.currentTime)directly. Add at least 10 ms of attack so the audio thread has head-room. - Wrap every oscillator in a
GainNodewith a short ADSR envelope. UselinearRampToValueAtTimefor attack andexponentialRampToValueAtTimefor release, and never ramp exponential to zero (use 0.0001). - For random question selection, exclude the last N picks before sampling; fall back to the full pool if everything got excluded.
- For ear-training UIs, interval labels by absolute distance (mod 12) match the user's mental model better than signed semitone math.
Source: https://github.com/sen-ltd/pitch-trainer — MIT, ~350 lines of JS, 21 unit tests, no build step, zero runtime dependencies.
🛠 Built by SEN LLC as part of an ongoing series of small, focused developer tools. Browse the full portfolio for more.

Top comments (0)