DEV Community

SEN LLC
SEN LLC

Posted on

Sample-Free Drum Synthesis in Web Audio — Building Kick, Snare, and Hi-Hat from Oscillators in 60 Lines

Most drum machines on the web load .wav or .mp3 samples and play them back. That works, but you can also synthesise every voice from OscillatorNode, BufferSource-as-noise, and BiquadFilter — no audio files needed, no fetch wait, instant click-to-play.

Here's a 16-step drum machine where every voice is built in code (≈20 lines each), plus the acoustic reasoning behind why the recipes work.

drum-machine UI in the four-on-the-floor preset: Kick on steps 1/5/9/13, Snare on 5/13, Hi-hat on every other step, Open hat on 7/15. The playhead outline is on step 7. Dark theme.

🥁 Demo: https://sen.ltd/portfolio/drum-machine/
📦 GitHub: https://github.com/sen-ltd/drum-machine

Three knobs that make a sound feel like a drum

Three properties matter when synthesising a percussive hit:

Property What it controls How to set it
Energy curve How sharp the attack is, how fast it decays GainNode.gain with setValueAtTime + exponentialRampToValueAtTime
Spectrum The "colour" of the sound — dull, bright, metallic OscillatorNode.type (sine/triangle/square) + BiquadFilter
Pitch motion High at the attack, dropping into the body (the essence of a kick) frequency.exponentialRampToValueAtTime

Sample playback locks all three to "however the recording captured them." Synthesis lets you tune each one independently — small adjustments like "sharper attack" or "thicker body" become parameter changes.

Kick — sine sweep + fast amp decay

A real kick drum is "click" → "boom": the beater hits the head with a sharp transient, and the resonance of the shell decays into a low fundamental. One sine oscillator with a fast pitch sweep does this convincingly:

osc.frequency.setValueAtTime(150, time);                       // attack
osc.frequency.exponentialRampToValueAtTime(48, time + 0.05);   // body (50 ms)
amp envelope: 1.0  0.0001 over 450 ms (exponential)
Enter fullscreen mode Exit fullscreen mode

What matters:

  • The 150 Hz → 48 Hz drop creates the "click → boom." 150 alone is too pitched, 48 alone has no attack.
  • The 50 ms ramp is fast — humans hear sub-30-ms events as a single click, not as motion.
  • The 450 ms amp decay sits in the right pocket. Shorter sounds tinny ("pip"); longer drags the boom and crowds the next beat.
  • The waveform is always sine. Triangle or square add audible upper harmonics that read as "synth," not "drum."
export function playKick(ctx, time, dest, gain = 1) {
  const osc = ctx.createOscillator();
  osc.type = "sine";
  osc.frequency.setValueAtTime(150, time);
  osc.frequency.exponentialRampToValueAtTime(48, time + 0.05);

  const ampEnv = ctx.createGain();
  ampEnv.gain.setValueAtTime(0, time);
  ampEnv.gain.linearRampToValueAtTime(gain, time + 0.001);
  ampEnv.gain.exponentialRampToValueAtTime(0.0001, time + 0.45);

  osc.connect(ampEnv).connect(dest);
  osc.start(time);
  osc.stop(time + 0.5);
}
Enter fullscreen mode Exit fullscreen mode

Gotcha: exponentialRampToValueAtTime cannot reach 0 — it works in log space internally, where 0 is −∞. Aim for 0.0001 (or finish with an explicit setValueAtTime(0, end)). Forgetting this leaks audio between hits.

Snare — pitched body + bandpassed noise, mixed in parallel

A real snare is mostly noise. But noise alone reads as "ssshhh," not "snare." The trick is to layer a small pitched component that gives the hit a tonal centre:

[ triangle 200 Hz, 50 ms decay ] ─┐
                                   ├─→ master
[ noise → bandpass 1.5 kHz Q=0.6, 180 ms decay ] ─┘
Enter fullscreen mode Exit fullscreen mode

In code:

// Body: 200 Hz triangle, fast decay
const body = ctx.createOscillator();
body.type = "triangle";
body.frequency.setValueAtTime(200, time);
const bodyEnv = envelopedGain(ctx, time, 0.55, 0.05);  // 50 ms
body.connect(bodyEnv).connect(dest);
body.start(time);
body.stop(time + 0.07);

// Noise: white noise → bandpass → longer decay
const noise = ctx.createBufferSource();
noise.buffer = whiteNoiseBuffer;       // pre-allocated Float32Array
const filter = ctx.createBiquadFilter();
filter.type = "bandpass";
filter.frequency.value = 1500;
filter.Q.value = 0.6;
const noiseEnv = envelopedGain(ctx, time, 0.6, 0.18);  // 180 ms
noise.connect(filter).connect(noiseEnv).connect(dest);
noise.start(time);
Enter fullscreen mode Exit fullscreen mode

Key choices:

  • Bandpass at 1.5 kHz — the frequency range where snare "snap" lives. Move it lower and you get a tom; higher and it turns into a hi-hat.
  • Q = 0.6 is gentle filtering. Crank Q up to 5+ and it starts ringing like a struck cymbal — useful for percussion, wrong for a snare.
  • Triangle, not sine, for the body. Triangle adds odd-harmonic content; the snare reads firmer.
  • Cache the noise buffer. Allocate one Float32Array of white noise at start, reuse for every hit. Allocating per hit is several KB per beat, which the GC will eventually decide to handle inconveniently.

Hi-hat — square wave through a high-Q bandpass

The hi-hat sound is metal-on-metal: a dense cluster of high partials, decaying fast. Recipe:

osc.type = "square";          // square has rich odd harmonics → metallic
osc.frequency.setValueAtTime(7800, time);

filter.type = "bandpass";
filter.frequency.value = 7500;  // 7-8 kHz where cymbals live
filter.Q.value = 12;            // narrow band → "ringing" timbre

decay: closed = 40 ms / open = 320 ms
Enter fullscreen mode Exit fullscreen mode

What's load-bearing:

  • Square wave, not sine. Square's odd harmonics produce the metallic sheen; sine alone is a pure "blip."
  • Q = 12 is high — this filter rings, which is exactly what a cymbal does.
  • Closed and open hi-hats differ only in decay length — same oscillator, same filter, different envelope. No second voice needed.
export function playHat(ctx, time, dest, open = false, gain = 1) {
  const osc = ctx.createOscillator();
  osc.type = "square";
  osc.frequency.setValueAtTime(7800, time);

  const filter = ctx.createBiquadFilter();
  filter.type = "bandpass";
  filter.frequency.value = 7500;
  filter.Q.value = 12;

  const decay = open ? 0.32 : 0.04;
  const env = envelopedGain(ctx, time, 0.45 * gain, decay);

  osc.connect(filter).connect(env).connect(dest);
  osc.start(time);
  osc.stop(time + decay + 0.02);
}
Enter fullscreen mode Exit fullscreen mode

The lookahead scheduler — why setInterval doesn't work

Voicing is half the problem; the other half is firing the voices on time. The naive shape:

setInterval(() => playStep(currentStep++), 60000 / bpm / 4);
Enter fullscreen mode Exit fullscreen mode

…drifts. The JS event loop has no commitment to firing on schedule. GC, layout, another extension's content script — any of them can delay the callback by 5-50 ms, and the error accumulates.

The fix from Chris Wilson's "A Tale of Two Clocks" is to schedule against the audio clock, not the JS clock:

function tick() {
  while (state.nextStepTime < audioCtx.currentTime + 0.1) {
    onStep({ step: state.step, time: state.nextStepTime });
    state.nextStepTime += secondsPerStep();
    state.step = (state.step + 1) % stepsPerBar;
  }
}
setInterval(tick, 25);   // 25 ms poll, 100 ms lookahead = 75 ms of slack
Enter fullscreen mode Exit fullscreen mode

The mental model:

  1. audioCtx.currentTime is the actual audio renderer's clock — sample-accurate.
  2. osc.start(t) can take a future time; the audio thread renders the sample at t exactly, regardless of what JS is doing.
  3. The JS timer's only job is to keep the queue topped up with the next 100 ms of events.
  4. Even if JS stalls for 50 ms, the audio thread has already received the next several hits and renders them on time.

The result is drift-free timing, audible immediately if you A/B against the naive version on anything other than a quiet, idle browser tab.

Tests — fake the AudioContext

Synthesis correctness ("does the kick sound like a kick?") is verified by ear in real Chrome. The scheduler, however, is fully testable without an audio engine — inject a fake audioCtx (just a currentTime field) and a fake setInterval:

function makeFakes() {
  const audio = { currentTime: 0 };
  const intervals = [];
  const setIntervalFn = (fn) => {
    intervals.push({ fn, cleared: false });
    return intervals.length - 1;
  };
  const advance = (sec) => {
    audio.currentTime += sec;
    for (const slot of intervals) if (!slot.cleared) slot.fn();
  };
  return { audio, setIntervalFn, advance, /* ... */ };
}

test("at 120 BPM, 16 steps per bar = 0.125 s per step", () => {
  const fakes = makeFakes();
  const queued = [];
  const sched = createScheduler({
    audioCtx: fakes.audio,
    bpm: 120, stepsPerBar: 16, beatsPerBar: 4,
    onStep: ({ time }) => queued.push(time),
    setIntervalFn: fakes.setIntervalFn,
    clearIntervalFn: fakes.clearIntervalFn,
  });
  sched.start();
  fakes.advance(1.0);
  for (let i = 1; i < queued.length; i++) {
    assert.ok(Math.abs(queued[i] - queued[i - 1] - 0.125) < 1e-9);
  }
});
Enter fullscreen mode Exit fullscreen mode

17 tests run under node --test in 70 ms. No browser, no jsdom.

Takeaways

  • Kick: sine 150 Hz → 48 Hz exponential sweep, 450 ms amp decay. The pitch sweep is the magic; the decay length is the pocket.
  • Snare: triangle 200 Hz body (50 ms) layered with bandpassed white noise at 1.5 kHz Q=0.6 (180 ms). Noise alone reads as "shhh"; the body gives the hit a tonal centre.
  • Hi-hat: square 7.8 kHz through bandpass 7.5 kHz Q=12. Closed (40 ms) and open (320 ms) differ only in envelope decay — same nodes, two voices.
  • exponentialRampToValueAtTime(0, …) doesn't work — log space has no zero. Use 0.0001 or finish with setValueAtTime(0, end).
  • Lookahead scheduler (Chris Wilson pattern): 25 ms JS poll, 100 ms ahead queue, drift-free under main-thread jitter.
  • Inject the audio clock for tests so the scheduler is verified deterministically without booting an AudioContext.

Full source on GitHub. MIT.

Third in the Web Audio series: metronome (timing) → pitch-detector (analysis) → drum-machine (synthesis).

Top comments (0)