A Voice Memo App With MediaRecorder, IndexedDB, and Live Waveform Rendering

#javascript #webapi #audio #webdev

A Voice Memo App With MediaRecorder, IndexedDB, and Live Waveform Rendering

MediaRecorder captures microphone audio as WebM. IndexedDB stores the blobs (localStorage is too small for audio). An AnalyserNode feeds a Canvas for the live waveform during recording. Web Speech API provides best-effort transcription. Each API has its own quirks — together they make a fully client-side voice memo tool.

Voice memo apps exist natively on every phone, but web-based ones are surprisingly rare because audio is awkward on the web platform. The APIs exist, but they don't compose cleanly — you have to wire together four different browser APIs and deal with each one's edge cases.

🔗 Live demo: https://sen.ltd/portfolio/voice-memo/
📦 GitHub: https://github.com/sen-ltd/voice-memo

Features:

MediaRecorder audio capture
IndexedDB blob storage
Live waveform during recording (Canvas + AnalyserNode)
Playback with scrubber
Web Speech API transcription (best-effort)
Label + tag + delete
Export recordings
Japanese / English UI (language-aware recognition)
Zero dependencies, 55 tests

Recording with MediaRecorder

async function startRecording() {
  const stream = await navigator.mediaDevices.getUserMedia({ audio: true });
  const recorder = new MediaRecorder(stream, { mimeType: getBestMimeType() });
  const chunks = [];
  recorder.ondataavailable = (e) => chunks.push(e.data);
  recorder.onstop = () => {
    const blob = new Blob(chunks, { type: recorder.mimeType });
    saveToIndexedDB(blob);
    stream.getTracks().forEach(t => t.stop());
  };
  recorder.start();
}

The getBestMimeType() helper picks whichever format the browser supports:

export function supportedMimeTypes() {
  return [
    'audio/webm;codecs=opus',
    'audio/webm',
    'audio/ogg;codecs=opus',
    'audio/mp4',
  ].filter(MediaRecorder.isTypeSupported);
}

Chrome prefers WebM/Opus, Safari prefers MP4/AAC. Supporting both means falling back to whichever is available.

Live waveform with AnalyserNode

For the visual feedback during recording, we tap the audio stream through a Web Audio AnalyserNode and read its time-domain data every frame:

const audioCtx = new AudioContext();
const source = audioCtx.createMediaStreamSource(stream);
const analyser = audioCtx.createAnalyser();
analyser.fftSize = 2048;
source.connect(analyser);

const buffer = new Uint8Array(analyser.fftSize);
function draw() {
  analyser.getByteTimeDomainData(buffer);
  drawLiveWaveform(canvas, buffer);
  if (recording) requestAnimationFrame(draw);
}
draw();

The buffer holds PCM samples in the 0-255 range (128 = silence). Rendering is a simple line plot of sample values across the canvas width.

IndexedDB, not localStorage

Audio blobs are hundreds of KB to megabytes. localStorage has a 5-10 MB per-origin limit and only stores strings — you'd have to base64-encode the blob, which adds 33% overhead. IndexedDB stores binary blobs natively and has a much higher quota:

async function saveMemo(memo) {
  const db = await openDB();
  const tx = db.transaction('memos', 'readwrite');
  const store = tx.objectStore('memos');
  await promisify(store.put(memo));
  return memo.id;
}

The wrapper turns IndexedDB's verbose callback API into promises. IndexedDB also supports secondary indexes, so sorting by createdAt is trivial:

const index = store.index('createdAt');
const memos = await promisify(index.getAll());
memos.sort((a, b) => b.createdAt - a.createdAt);

Web Speech API transcription

The tricky part: Web Speech API transcribes live microphone input, not arbitrary blobs. To transcribe a saved memo, we play it back through the speakers while SpeechRecognition listens to the microphone. Not ideal (requires a quiet room), but works:

async function transcribeMemo(memo) {
  const SR = window.SpeechRecognition || window.webkitSpeechRecognition;
  const recog = new SR();
  recog.lang = getSpeechLang(); // 'ja-JP' or 'en-US'
  recog.continuous = true;
  recog.interimResults = false;

  const audio = new Audio(URL.createObjectURL(memo.blob));
  let transcript = '';
  recog.onresult = (e) => {
    for (let i = e.resultIndex; i < e.results.length; i++) {
      transcript += e.results[i][0].transcript;
    }
  };
  recog.start();
  audio.play();
  await new Promise(r => audio.onended = r);
  recog.stop();
  return transcript;
}

A better approach would be to send the blob to Whisper or another off-device speech model, but that requires a server. For pure-browser best-effort, playback-through-microphone is the only option.