monkeymore studio

Posted on Apr 22

I Taught a Browser to Play Piano — Here's How It Figures Out Which Finger Goes Where

#algorithms #javascript #showdev #webdev

Learning piano is hard enough without guessing which finger hits which key. Most sheet music doesn't bother telling you, and when it does, the fingering is often generic or just plain wrong for your hand size. I got tired of that, so I built a tool that reads a MusicXML score, runs a full physics-and-biomechanics simulation in your browser, and shows you exactly how your hands should move across the keys.

No servers. No uploads. Just drop a .xml or .mxl file into the page and watch your browser plan out every finger placement in real time. You can try it yourself on our free piano finger visualization tool.

Why Do This in the Browser?

You might think: "Isn't this the kind of heavy computation that belongs on a server?" Turns out, keeping everything client-side has some serious advantages.

Your Scores Stay Private

Musicians work on original compositions, unpublished arrangements, and copyrighted material. Uploading a MusicXML file to a remote server means trusting someone else with your intellectual property. When the entire analysis runs inside your browser, the file never leaves your device. Not even a byte gets transmitted.

Instant Feedback

No network round-trips means no loading spinners while a server crunches numbers. A typical score with a few hundred notes gets analyzed in under a second on a modern laptop. The animation starts the moment you hit "Process Score."

Offline Capability

Once the page loads, you can use it without an internet connection. Great for practice rooms with spotty Wi-Fi or flights where you want to review fingerings for upcoming repertoire.

Zero Setup

No software to install, no plugins, no DAW integration. If you have a web browser, you have a fully functional piano fingering analyzer.

How the Whole Thing Works

Here's the bird's-eye view of what happens from the moment you drop a file to the moment the animation starts playing:

The heavy lifting happens in a custom JavaScript port of the pianoplayer engine, adapted to run entirely in the browser. Let's dig into each stage.

Parsing MusicXML Without a Server-Side XML Library

MusicXML is the standard exchange format for digital sheet music. It's XML-based, often zipped (.mxl), and can be surprisingly messy. Our parser lives in musicxml_io.js and handles the entire pipeline from raw bytes to structured note events.

The Core Data Model

Every note in the score gets normalized into an INote object:

export class INote {
  constructor() {
    this.name = null;
    this.isChord = false;
    this.isBlack = false;
    this.pitch = 0;
    this.octave = 0;
    this.x = 0.0;        // physical key position in cm
    this.time = 0.0;     // onset time in seconds
    this.duration = 0.0;
    this.fingering = 0;  // assigned finger (1-5)
    this.measure = 0;
    this.staff = 0;
  }
}

The x field is crucial. It maps a MIDI pitch to a physical position on the keyboard in centimeters, measured from a reference point. This lets us reason about actual hand spans and finger distances rather than abstract semitone counts.

From XML to Note Sequence

The parser walks the MusicXML DOM tree, extracts <part> elements, resolves <pitch> tags with accidentals, and builds a timeline of EventInfo objects. It also handles duplicated notes that sometimes appear in poorly exported scores — we keep the longer duration and drop the rest.

function _pitch_from_note(noteEl) {
  const pitchEl = elFindDirect(noteEl, "pitch");
  const step = elFindDirect(pitchEl, "step").textContent.trim().toUpperCase();
  const alter = parseInt(elFindDirect(pitchEl, "alter")?.textContent || "0", 10);
  const octave = parseInt(elFindDirect(pitchEl, "octave").textContent, 10);
  const semitone = _STEP_TO_SEMITONE[step] + alter;
  const midi = (octave + 1) * 12 + semitone;
  return new PitchInfo(_note_name(step, alter), octave, midi);
}

Once we have the events, noteseq_from_part converts them into the INote sequence used by the optimizer. Chords get special handling: notes sharing the same onset are grouped, and each chord note gets a tiny time offset (default 50ms) so the optimizer treats them as simultaneous but distinguishable events.

The Hand Optimizer: Teaching Math to Play Piano

This is where things get interesting. Assigning fingers to notes isn't just about "thumb on C, middle finger on E." A good fingering minimizes hand movement, avoids awkward stretches, respects the natural strengths of each finger, and stays within the player's physical reach.

We model each hand as a Hand object with biomechanical constraints:

export class Hand {
  constructor(noteseq, side = "right", size = "M") {
    this.LR = side;
    this.frest = [null, -7.0, -2.8, 0.0, 2.8, 5.6]; // relaxed finger positions (cm)
    this.weights = [null, 1.1, 1.0, 1.1, 0.9, 0.8]; // finger strength weights
    this.bfactor = [null, 0.3, 1.0, 1.1, 0.8, 0.7]; // black-key comfort factors
    this.size = size;
    this.hf = Hand.size_factor(size); // hand-size multiplier
    this.max_span_cm = 21.0 * this.hf;
    this.max_follow_lag_cm = 2.5 * this.hf;
    this.min_finger_gap_cm = 0.15 * this.hf;
  }
}

Hand sizes range from XXS to XXL, scaling the maximum comfortable span from about 7 cm up to 25 cm. A child with small hands gets very different fingerings than an adult with large hands.

The Optimization Strategy

For each note, we need to pick a finger (1–5). The naive approach would try all 5 possibilities for every note, but that explodes exponentially. Instead, we use a sliding window with depth-limited backtracking search.

The algorithm looks at a window of upcoming notes (up to 9, automatically adjusted based on note density) and tries every valid fingering combination:

optimize_seq(nseq, istart) {
  const u_start = istart === 0 ? [...this.fingers] : [istart];
  let best_fingering = [0, 0, 0, 0, 0, 0, 0, 0, 0];
  let minvel = 1.0e10;
  const candidate = [0, 0, 0, 0, 0, 0, 0, 0, 0];

  const backtrack = (level) => {
    if (level === depth) {
      const velocity = this.ave_velocity(candidate, nseq);
      if (velocity < minvel) {
        best_fingering = [...candidate];
        minvel = velocity;
      }
      return;
    }

    const choices = level === 0 ? u_start : this.fingers;
    for (const finger of choices) {
      if (level > 0 && this.skip(candidate[level - 1], finger, nseq[level - 1], nseq[level], ...)) {
        continue;
      }
      candidate[level] = finger;
      backtrack(level + 1);
    }
  };

  backtrack(0);
  return [best_fingering, minvel];
}

Pruning Bad Combinations Early

The skip function is the secret sauce. It eliminates physically impossible or ergonomically terrible transitions before they waste computation time:

Same finger on two different consecutive notes? Skip.
Crossing fingers in the wrong direction (e.g., finger 3 moving left of finger 4 on the right hand)? Skip.
Impossible stretches inside a chord? Skip.
Thumb hitting a black key while moving upward? Usually skip.

This pruning turns an exponential search into something that finishes in milliseconds.

The Cost Function: Minimizing Average Velocity

For each valid fingering sequence, we compute an "average finger velocity" — essentially, how much effort it takes to move the hand into position:

ave_velocity(fingering, notes) {
  let vmean = 0.0;
  for (let i = 1; i < this.depth; i++) {
    const na = notes[i - 1];
    const nb = notes[i];
    const fb = fingering[i];
    const finger_pos = finger_positions[fb];
    const dx = Math.abs(nb.x - finger_pos);
    const dt = Math.abs(nb.time - na.time) + 0.1;
    let v = dx / dt;

    const weight = this.weights[fb] ?? 1.0;
    if (nb.isBlack) {
      v /= weight * this.bfactor[fb];
    } else {
      v /= weight;
    }
    vmean += v;
  }
  return vmean / (this.depth - 1);
}

Black keys get a comfort bonus because shorter fingers (thumb, pinky) struggle with them. Weaker fingers get penalized. The sequence with the lowest average velocity wins.

Position Constraints

After picking a finger for the current note, the algorithm updates where the other fingers "should" be resting. These positions are constrained so fingers don't overlap, don't stretch beyond the hand span, and don't lag too far behind their relaxed targets:

_apply_position_constraints(finger_positions, fi, note_x, targets) {
  // Limit follow lag
  for (const j of [1, 2, 3, 4, 5]) {
    const lag = pos - target;
    if (lag > this.max_follow_lag_cm) finger_positions[j] = target + this.max_follow_lag_cm;
    else if (lag < -this.max_follow_lag_cm) finger_positions[j] = target - this.max_follow_lag_cm;
  }

  // Enforce minimum finger gaps
  for (const j of [2, 3, 4, 5]) {
    if (b < a + this.min_finger_gap_cm) finger_positions[j] = a + this.min_finger_gap_cm;
  }

  // Enforce maximum hand span
  if (span > this.max_span_cm) {
    // Clamp outer fingers toward center
  }
}

Rendering the Keyboard and Animating the Hands

Once the optimizer produces a fingerseq — a timeline of finger positions for every note — we need to visualize it. The UI is a React client component that renders everything on a <canvas> element.

Drawing the Keyboard

The keyboard is drawn from scratch for every frame. White keys are rectangles; black keys are slightly offset and shorter. We map MIDI pitches to horizontal positions using a constant key width (KEYBSIZE = 16.5 cm per octave):

function pitchToX(pitch) {
  const octave = Math.floor(pitch / 12) - 1;
  const pc = pitch % 12;
  const names = ["C", "C#", "D", "D#", "E", "F", "F#", "G", "G#", "A", "A#", "B"];
  const name = names[pc];
  const whitePos = { C: 0.5, D: 1.5, E: 2.5, F: 3.5, G: 4.5, A: 5.5, B: 6.5 };
  const blackPos = { "C#": 1.0, "D#": 2.0, "F#": 4.0, "G#": 5.0, "A#": 6.0 };
  const step = name in blackPos ? blackPos[name] : whitePos[name];
  return octave * KEYBSIZE + step * (KEYBSIZE / 7);
}

The canvas uses DPR scaling (device pixel ratio) so it looks crisp on Retina displays.

Finger Visualization

Each finger is drawn as a colored line from the root of the hand down to the key surface. Engaged fingers (currently pressing a key) are drawn fully opaque and hover slightly lower; relaxed fingers are semi-transparent and raised:

function drawFingers(ctx, hand, color, scaleX, minX, canvasHeight) {
  const fingers = {
    1: { tipOffset: 30, wid: 15 }, // thumb: longer, thicker
    2: { tipOffset: 10, wid: 10 },
    3: { tipOffset: 0,  wid: 10 }, // middle: shortest reach
    4: { tipOffset: 12, wid: 9 },
    5: { tipOffset: 26, wid: 8 },  // pinky: thin, awkward reach
  };

  for (let f = 1; f <= 5; f++) {
    const pos = s.current_positions[f];
    const renderPos = s.side === "left" ? -pos : pos;
    const cx = xCmToCanvas(renderPos, minX, scaleX, 40);
    const hover = isEngaged ? 0 : 8;
    const tipY = keyTop + 10 + hover + finger.tipOffset;

    ctx.beginPath();
    ctx.moveTo(cx, rootY);
    ctx.lineTo(cx, tipY);
    ctx.lineWidth = finger.wid;
    ctx.strokeStyle = color;
    ctx.globalAlpha = isEngaged ? 0.9 : 0.5;
    ctx.stroke();
  }
}

Right hand is red, left hand is blue. Active keys get a translucent overlay matching the hand color.

The Animation Loop

When you hit Play, a requestAnimationFrame loop drives the simulation. It tracks elapsed time, updates which notes should be pressed or released, locks engaged fingers to their key positions, and advances the hand posture:

const loop = () => {
  if (!playingRef.current) return;
  const elapsed = ((performance.now() - t0Ref.current) / 1000) * speedRef.current;
  updateHand(rhRef.current, elapsed);
  updateHand(lhRef.current, elapsed);
  drawFrame(elapsed);
  animIdRef.current = requestAnimationFrame(loop);
};

The updateHand function uses two index pointers — press_idx and release_idx — to efficiently track note onsets and offsets without scanning the entire sequence every frame.

Synthesizing Sound in the Browser

Visuals are great, but hearing the notes helps you internalize the timing. We generate audio using the Web Audio API — no MP3s, no SoundFonts, just raw oscillators:

const playNote = (pitch, duration) => {
  const freq = 440 * Math.pow(2, (pitch - 69) / 12);
  const osc = audioCtx.createOscillator();
  const gain = audioCtx.createGain();
  osc.type = "sine";
  osc.frequency.setValueAtTime(freq, now);
  gain.gain.setValueAtTime(0.3, now);
  gain.gain.exponentialRampToValueAtTime(0.001, now + duration);
  osc.connect(gain);
  gain.connect(audioCtx.destination);
  osc.start(now);
  osc.stop(now + duration);
};

It's a simple sine wave with an exponential decay envelope. Not concert-grand realism, but perfectly adequate for following the melodic line and checking rhythm. There's also a metronome click (square wave at 1200 Hz) that ticks along at the current BPM, complete with a little CSS pendulum animation on the UI.

Handling Left Hands vs. Right Hands

Piano scores often combine both hands in a single part with two staves. The engine automatically routes staff 1 to the right hand and staff 2 to the left hand. For the optimizer, left-hand logic is symmetric: we simply mirror the x-coordinates (anote.x = -anote.x), run the same optimization, then flip everything back for display.

Users can also override this and process only one hand, or specify a custom measure range if they only want to practice a difficult passage.

Putting It All Together

The complete pipeline from file drop to animated playback looks like this:

Why This Approach Works

The magic isn't any single breakthrough — it's the combination of several pragmatic choices:

Biomechanical modeling beats rule-based heuristics. By simulating actual finger positions and hand spans, we get fingerings that feel natural rather than theoretically "correct."
Depth-limited backtracking with aggressive pruning gives us optimal-ish results without waiting for a GPU cluster. Most scores process in under a second.
Client-side execution removes every privacy, latency, and availability concern. The tool works anywhere, instantly.
Canvas + Web Audio keeps the stack simple. No WebGL, no WASM audio engines, no external dependencies beyond the parser itself.

Try It on Your Own Scores

Got a MusicXML file sitting around? Whether it's a Bach prelude, a pop lead sheet, or your own composition, you can see exactly how your hands should navigate the keyboard.

Head over to our free piano finger visualization tool and give it a spin. Upload your score, hit Process, then press Play. You'll see your fingers dance across the keys in real time — all without a single packet leaving your browser.

DEV Community