monkeymore studio

Posted on Apr 22

Turn Your MIDI Files Into MP3 Audio — Pick Any Instrument You Want

#webdev #showdev #tooling #productivity

MIDI files are great for editing, but terrible for sharing. Try sending a .mid file to a friend and watch them struggle to open it. Most phones don't have MIDI players. Social platforms don't support MIDI uploads. What you actually need is an MP3 — something that plays everywhere.

The usual workflow involves opening a DAW, loading a SoundFont, bouncing to audio, and then compressing. That's a lot of steps for something that should be simple. I built a browser tool that handles the entire pipeline: upload a MIDI or MusicXML file, pick an instrument from 28 options ranging from grand piano to 8-bit chiptune, and download a properly encoded MP3. No software installation, no account, no uploads to a server.

Try it out on our free MIDI to MP3 converter.

Why Keep This in the Browser?

Audio rendering is traditionally a desktop task. But forcing users to install a DAW just to convert a MIDI ringtone is absurd.

Zero Network Uploads

Your MIDI file stays on your device. The synthesis, rendering, and MP3 encoding all happen inside your browser's JavaScript engine. There's no cloud server processing your music, no queue, no "we'll email you when it's done."

Instant and Unlimited

A typical 3-minute MIDI file converts in under 10 seconds. There's no daily limit, no watermark, no premium tier. Convert as many files as you want.

Cross-Platform

Because it runs in a browser, it works on macOS, Windows, Linux, ChromeOS, even tablets. The only requirement is a modern browser with Web Audio API support.

The Full Pipeline

Here's what happens from file drop to MP3 download:

Let's walk through each stage.

Parsing MIDI and MusicXML

The tool accepts both formats. MIDI is parsed with @tonejs/midi:

async function parseMidiFile(file: File): Promise<NoteEvent[]> {
  const { Midi } = await import('@tonejs/midi');
  const arrayBuffer = await file.arrayBuffer();
  const midi = new Midi(arrayBuffer);
  const track = midi.tracks.find((t) => t.notes.length > 0);
  if (!track) return [];
  return track.notes.map((n) => ({
    time: n.time,
    duration: n.duration,
    note: n.name,
    velocity: n.velocity,
  }));
}

MusicXML is parsed with the browser's built-in DOMParser:

function parseMusicXML(xmlText: string): NoteEvent[] {
  const parser = new DOMParser();
  const doc = parser.parseFromString(xmlText, 'application/xml');
  const parserError = doc.querySelector('parsererror');
  if (parserError) throw new Error('Invalid MusicXML file');

  const notes: NoteEvent[] = [];
  const parts = doc.querySelectorAll('part');
  let globalBpm = 120;

  for (const part of parts) {
    const measures = part.querySelectorAll('measure');
    let divisions = 4;
    let measureTime = 0;

    for (const measure of measures) {
      const attr = measure.querySelector('attributes');
      if (attr) {
        const divElem = attr.querySelector('divisions');
        if (divElem) divisions = parseInt(divElem.textContent || '4');
      }
      const soundTempo = measure.querySelector('sound[tempo]');
      if (soundTempo) {
        globalBpm = parseInt(soundTempo.getAttribute('tempo') || '120');
      }

      const measureNotes = measure.querySelectorAll('note');
      let currentTick = 0;
      let lastNoteTime = 0;

      for (const note of measureNotes) {
        const durationTicks = parseInt(note.querySelector('duration')?.textContent || '0');
        const isChord = note.querySelector('chord') !== null;
        const isRest = note.querySelector('rest') !== null;

        if (isChord) {
          const noteDuration = (durationTicks / divisions) * (60 / globalBpm);
          notes.push({ time: lastNoteTime, duration: noteDuration, note: noteName, velocity: 0.8 });
        } else {
          if (isRest) { currentTick += durationTicks; continue; }
          const noteDuration = (durationTicks / divisions) * (60 / globalBpm);
          lastNoteTime = measureTime + (currentTick / divisions) * (60 / globalBpm);
          notes.push({ time: lastNoteTime, duration: noteDuration, note: noteName, velocity: 0.8 });
          currentTick += durationTicks;
        }
      }
      measureTime += (currentTick / divisions) * (60 / globalBpm);
    }
  }
  return notes;
}

Both parsers produce the same output: an array of NoteEvent objects with time (seconds), duration (seconds), note (e.g., "C4"), and velocity (0–1).

The Instrument Engine: 28 Sounds, Zero Samples

Unlike the MIDI player which uses SoundFont samples, this converter uses Tone.js's built-in synthesizers. That means no external sample downloads, no loading spinners, and instant instrument switching. The trade-off is that the sounds are synthesized rather than sampled — more like a vintage keyboard than a concert grand.

Instrument Configuration

We define 28 instruments with specific oscillator types and envelope settings:

const INSTRUMENTS: Instrument[] = [
  { id: 'piano', name: 'Piano', type: 'triangle', envelope: { attack: 0.005, decay: 0.3, sustain: 0.1, release: 1.2 } },
  { id: 'strings', name: 'Strings', type: 'sawtooth', envelope: { attack: 0.2, decay: 0.3, sustain: 0.6, release: 1.5 } },
  { id: 'flute', name: 'Flute', type: 'sine', envelope: { attack: 0.05, decay: 0.2, sustain: 0.6, release: 0.8 } },
  { id: 'guitar', name: 'Guitar', type: 'square', envelope: { attack: 0.001, decay: 0.4, sustain: 0.1, release: 0.8 } },
  { id: 'fm', name: 'FM Synth', synthType: 'fm', volume: -6 },
  { id: 'pluck', name: 'Pluck Synth', synthType: 'pluck', volume: -3 },
  // ...and 22 more
];

Most instruments use Tone.PolySynth(Tone.Synth) with a custom oscillator type (triangle, sawtooth, sine, square) and ADSR envelope. The envelope shapes the sound over time:

Piano: Fast attack (5ms), medium decay, low sustain, long release — mimics a struck string
Strings: Slow attack (200ms), high sustain, long release — mimics a bowed instrument
Guitar: Instant attack, short sustain, medium release — plucked character
Pad: Very slow attack (800ms), high sustain, very long release (2s) — ambient swells

Specialized Synthesizers

For more exotic sounds, we use Tone.js's advanced synths:

FM Synth uses frequency modulation for bell-like, metallic tones:

synth = new Tone.PolySynth(Tone.FMSynth, {
  harmonicity: 3,
  modulationIndex: 10,
  oscillator: { type: 'sine' },
  envelope: { attack: 0.01, decay: 0.2, sustain: 0.3, release: 0.5 },
  modulation: { type: 'square' },
  modulationEnvelope: { attack: 0.01, decay: 0.2, sustain: 0.3, release: 0.5 },
});

Pluck Synth uses a Karplus-Strong physical model for string sounds:

synth = new Tone.PolySynth(Tone.PluckSynth, {
  attackNoise: 1,
  dampening: 4000,
  resonance: 0.7,
});

Duo Synth stacks two detuned voices for a rich, chorused sound:

synth = new Tone.PolySynth(Tone.DuoSynth, {
  vibratoAmount: 0.5,
  vibratoRate: 5,
  harmonicity: 1.5,
  voice0: { oscillator: { type: 'sine' }, envelope: { attack: 0.01, decay: 0.2, sustain: 0.3, release: 0.5 } },
  voice1: { oscillator: { type: 'sine' }, envelope: { attack: 0.01, decay: 0.2, sustain: 0.3, release: 0.5 } },
});

Mono Synth adds a filter envelope for squelchy bass sounds:

synth = new Tone.PolySynth(Tone.MonoSynth, {
  oscillator: { type: 'sawtooth' },
  envelope: { attack: 0.01, decay: 0.3, sustain: 0.6, release: 0.4 },
  filterEnvelope: { attack: 0.01, decay: 0.3, sustain: 0.5, release: 0.4, baseFrequency: 200, octaves: 3, exponent: 2 },
});

Offline Rendering: Synthesizing Without Sound

Here's the trick that makes this whole thing possible. We need to generate audio, but we don't want to play it through speakers in real time. We want to render it faster-than-realtime and capture the output as a buffer. Tone.js provides Tone.Offline exactly for this:

const buffer = await Tone.Offline(({ transport }: any) => {
  // Create the synthesizer based on selected instrument
  let synth: any;
  if (instrument?.synthType === 'fm') {
    synth = new Tone.PolySynth(Tone.FMSynth, { /* ... */ }).toDestination();
  } else if (instrument?.synthType === 'pluck') {
    synth = new Tone.PolySynth(Tone.PluckSynth, { /* ... */ }).toDestination();
  } else {
    synth = new Tone.PolySynth(Tone.Synth, {
      oscillator: { type: instrument?.type || 'triangle' },
      envelope: instrument?.envelope || { attack: 0.02, decay: 0.1, sustain: 0.3, release: 1 },
    }).toDestination();
  }
  synth.volume.value = instrument?.volume ?? -6;

  // Schedule all notes
  const partData = notes.map((n) => [n.time, { note: n.note, duration: n.duration, velocity: n.velocity }]);
  const part = new Tone.Part((time: number, value: any) => {
    synth.triggerAttackRelease(value.note, value.duration, time, value.velocity);
  }, partData);
  part.start(0);
  transport.start();
}, totalDuration, 2, 44100);

The parameters to Tone.Offline are:

totalDuration — how long to render (slightly longer than the last note)
2 — number of channels (stereo)
44100 — sample rate (CD quality)

Inside the callback, everything works exactly like real-time playback — we create synths, schedule parts, and start transport. But instead of going to speakers, the audio is written to an offline buffer. A 3-minute song renders in about 5–10 seconds, depending on your CPU.

Encoding to MP3 with lamejs

Once we have a raw audio buffer, we need to compress it to MP3. We use lamejs, a pure-JavaScript port of the LAME encoder.

The Loading Problem and Our Solution

lamejs has a quirk: its source files have implicit global dependencies that break in modern bundlers like Webpack and Turbopack. Importing it as an npm module causes "XXX is not defined" errors at runtime.

Our solution is a lightweight polyfill loader that injects the pre-built lame.min.js via a script tag:

export async function loadLamejs(): Promise<{ Mp3Encoder: any; WavHeader: any } | null> {
  if (typeof window === 'undefined') return null;

  const win = window as any;
  if (win.lamejs?.Mp3Encoder) {
    return win.lamejs;
  }

  return new Promise((resolve, reject) => {
    const script = document.createElement('script');
    script.src = '/lame.min.js';
    script.async = true;
    script.onload = () => {
      if (win.lamejs?.Mp3Encoder) {
        resolve(win.lamejs);
      } else {
        reject(new Error('lamejs loaded but Mp3Encoder not found'));
      }
    };
    script.onerror = () => reject(new Error('Failed to load lame.min.js'));
    document.head.appendChild(script);
  });
}

The minified file (~156 KB) is served from the public directory. Because it runs in the global scope, all internal dependencies resolve correctly within the same closure.

Float32 to Int16 Conversion

The Web Audio API uses Float32 samples in the range [-1, 1]. LAME expects 16-bit signed integers. We convert:

const sampleRate = buffer.sampleRate;
const left = buffer.getChannelData(0);
const right = buffer.numberOfChannels > 1 ? buffer.getChannelData(1) : left;
const leftInt = new Int16Array(left.length);
const rightInt = new Int16Array(right.length);

for (let i = 0; i < left.length; i++) {
  const l = Math.max(-1, Math.min(1, left[i]));
  leftInt[i] = l < 0 ? l * 0x8000 : l * 0x7FFF;
  const r = Math.max(-1, Math.min(1, right[i]));
  rightInt[i] = r < 0 ? r * 0x8000 : r * 0x7FFF;
}

Clamping to [-1, 1] prevents overflow. Negative values map to [-32768, 0) and positive values map to [0, 32767].

MP3 Encoding

const encoder = new Mp3Encoder(2, sampleRate, 192);
const mp3Data = encoder.encodeBuffer(leftInt, rightInt);
const mp3End = encoder.flush();
const mp3Blob = new Blob([mp3Data, mp3End], { type: 'audio/mp3' });

We encode in stereo at 192 kbps — a good balance between quality and file size. For reference, that's roughly 1.4 MB per minute of audio. The encodeBuffer call processes the entire audio in one shot (fine for short files; longer files could be chunked to avoid memory spikes). Finally, flush() writes the MP3 tail frames including the Xing/VBR header.

Downloading the Result

The MP3 blob is converted to an object URL and triggered as a download:

const url = URL.createObjectURL(mp3Blob);
const a = document.createElement('a');
a.href = url;
a.download = `${baseName}.mp3`;
document.body.appendChild(a);
a.click();
document.body.removeChild(a);
URL.revokeObjectURL(url);

No server round-trip. The file goes straight from JavaScript memory to your Downloads folder.

Why Synthesizers Instead of SoundFonts?

The MIDI player in the same project uses SoundFont samples for realistic instrument sounds. This converter deliberately uses synthesizers instead. Here's why:

No network requests — SoundFonts require downloading sample files (even sparse sets are ~1-2 MB per instrument). Synthesizers generate audio mathematically, so there's nothing to download.
Instant switching — Changing from piano to FM synth is just a dropdown change. No loading spinner, no cache management.
Smaller bundle — The converter doesn't need Tone.js Sampler or base URL configuration. The core Tone.js synth code is already included.
Creative sounds — You can't get an FM synth or a pluck synth from a SoundFont. The synthesized options offer textures that sampled instruments can't match.

The trade-off is realism. If you want a believable grand piano, use the MIDI player with its SoundFont piano. If you want a quick MP3 export with a pleasant synthetic sound, this converter is perfect.

Try It Yourself

Got a MIDI file from an old keyboard? A MusicXML export from a notation app? Something you downloaded from a retro gaming site?

Upload it to our free MIDI to MP3 converter. Pick an instrument — maybe piano for a classical piece, strings for something cinematic, or chiptune if you're feeling nostalgic. Hit Export and get a real MP3 file that plays on any device.

All synthesis and encoding happens locally in your browser. Your files never leave your machine.

DEV Community