Show Dev: Your Face Is Now a Music Generator

#webdev #showdev #sideprojects #ai

The Dumbest Idea I've Shipped This Month

I built a thing called The Mood Ring Playlist. It points your webcam at your face, reads your emotions, and generates a live music track that shifts as you do.

Smile? Tempo kicks up, major key, bright pads. Scowl at your code review? Minor key, slower, a little more low end. Zone out completely? It drifts into ambient territory.

This article is the "why would you do this" post, plus a couple of technical choices that turned out weirder than I expected.

Why Build This

I wanted to play with two things at once: browser-based face detection and procedural audio. Every "AI music" demo I've seen is either:

Generate a 30-second clip, wait, listen, regenerate.
Upload a giant model to the browser and pray the user's laptop doesn't melt.

I wanted something continuous. Not "click to generate a song" but "the song is you, right now." That framing changed every architectural decision.

Face In The Browser, Music On The Server

Face detection runs entirely client-side. Webcam frames → face landmarks → a small vector of emotion probabilities (happy, sad, angry, surprised, neutral, etc.). None of that leaves your machine. This is both a privacy win and a latency win — I don't want to ship webcam frames across the internet just to find out you're mildly annoyed.

The music generation, though, lives in its own Railway service. That was deliberate. The main Next.js app is a thin frontend. The generator is a separate long-running Node service that holds state, manages the synthesis graph, and streams parameter updates based on the emotion vector it receives.

Why split it?

Cold starts would kill the vibe. Serverless functions booting mid-song is not a vibe.
It's stateful. The generator needs memory of what was just playing so transitions don't sound like someone slapping the radio dial.
I can redeploy the UI without killing the audio.

The client just posts emotion snapshots and receives back instructions for what the Web Audio API should do next.

useEffect(() => {
  if (!emotions) return;
  const id = setTimeout(async () => {
    const res = await fetch('/api/mood', {
      method: 'POST',
      body: JSON.stringify({ emotions, t: performance.now() }),
    });
    const { tempo, key, layers } = await res.json();
    mixer.current?.update({ tempo, key, layers });
  }, 250);
  return () => clearTimeout(id);
}, [emotions]);

That 250ms debounce matters. Early versions updated every frame and the music sounded like a panic attack. Humans don't actually change mood 60 times a second — who knew.

The AI Part, Honestly

The "AI" here isn't a giant model generating waveforms. It's a smaller system that maps an emotion vector to musical parameters: scale, tempo range, instrument layer weights, chord progression tendencies. Think of it as a learned policy over a procedural music engine, not Suno-in-a-box. That's how it stays real-time.

I tried the "just call a big model every few seconds" route first. It sounded great when it worked and terrible when it stitched. Seams everywhere. The procedural-with-a-smart-director approach wins because continuity matters more than novelty in ambient music.