Nagisa Dozono

Posted on Jun 14

How I Built an Audio-Reactive 3D Visualizer with Three.js and the Web Audio API

#threejs #typescript #webaudio #opensource

I recently released Audio Reactive 3D Visualizer v1.0.0, an open-source browser application that turns uploaded music and artwork into real-time visuals.

Users can generate synchronized 3D objects, particles, waveforms, and image effects, then export the result as a 1920×1080 MP4 at 30 FPS.

The complete media-processing workflow runs locally in the browser.

Live demo: https://waveform.tranjectories.xyz/
Source code: https://github.com/7g3n/phase-viz
v1.0.0 release: https://github.com/7g3n/phase-viz/releases/tag/v1.0.0

In this article, I will explain the main ideas behind the project:

Decoding audio with the Web Audio API
Converting frequency data into stable visual signals
Mapping audio data to Three.js scenes
Creating audio-reactive particle systems
Separating real-time playback from video export
Exporting MP4 files with WebCodecs
Falling back to ffmpeg.wasm
Processing uploaded media locally

The code examples below are simplified to explain the architecture. The complete implementation is available in the GitHub repository.

Why I built it

I work as both a frontend developer and a music producer.

Independent musicians often need visual content for music releases, social media, live performances, vocal synth projects, and music-video mockups.

However, producing a complete music video can be expensive and time-consuming.

There is also a communication problem.

A musician may know exactly where a visual should pulse, distort, expand, or collapse, but translating that timing and atmosphere into instructions for another creator can be difficult.

I wanted to create a tool that could quickly transform the structure and energy of a track into a visual starting point.

My main requirements were:

No desktop video editor required
Real-time audio-reactive rendering
Multiple visual styles
Customizable parameters
A fullscreen Live / VJ mode
Full HD MP4 export
Local processing of uploaded media
An open-source codebase that developers can study and extend

Technology stack

The application is built with:

React 19
TypeScript
Vite
Three.js
React Three Fiber
React Three Drei
WebGL
Canvas 2D
Web Audio API
Zustand
Material UI
WebCodecs
mp4-muxer
ffmpeg.wasm
Cloudflare Workers Static Assets

It currently includes three main visual modes.

3D Visualizer

The 3D mode provides multiple scene presets, particle systems, geometry controls, camera movement, morphing, shaders, and post-processing effects.

Wave Visualizer

The Wave mode renders the uploaded audio as horizontal, circular, or bar-based waveform visualizations.

Image FX

Image FX applies audio-reactive glow, blur, RGB shift, noise, distortion, and pulse effects to uploaded artwork.

Decoding audio in the browser

The first step is loading and decoding an audio file with the Web Audio API.

A simplified implementation looks like this:

const audioContext = new AudioContext();

const arrayBuffer = await file.arrayBuffer();
const audioBuffer = await audioContext.decodeAudioData(arrayBuffer);

The resulting AudioBuffer contains decoded PCM audio data.

This is useful for offline processing such as:

Building waveform previews
Sampling the complete track
Detecting peaks
Estimating BPM
Preparing deterministic export data

For real-time playback, an AnalyserNode provides continuously updated frequency and waveform data.

const analyser = audioContext.createAnalyser();

analyser.fftSize = 2048;
analyser.smoothingTimeConstant = 0.8;

const frequencyData = new Uint8Array(
  analyser.frequencyBinCount
);

const waveformData = new Uint8Array(
  analyser.fftSize
);

During playback, the arrays are updated with:

analyser.getByteFrequencyData(frequencyData);
analyser.getByteTimeDomainData(waveformData);

These arrays form the connection between the audio engine and the rendering system.

Turning FFT data into visual signals

Raw FFT data is not automatically useful for animation.

The renderer usually needs a smaller set of normalized signals, such as:

Overall energy
Bass energy
Midrange energy
High-frequency energy
Peak intensity
Waveform displacement
Smoothed amplitude

A basic helper for averaging a range of frequency bins can look like this:

function averageRange(
  values: Uint8Array,
  start: number,
  end: number
): number {
  let sum = 0;
  const safeEnd = Math.min(end, values.length);

  for (let index = start; index < safeEnd; index += 1) {
    sum += values[index];
  }

  const count = Math.max(1, safeEnd - start);

  return sum / count / 255;
}

The spectrum can then be divided into broad frequency bands:

const bass = averageRange(frequencyData, 0, 24);
const mids = averageRange(frequencyData, 24, 96);
const highs = averageRange(frequencyData, 96, 256);

The ideal ranges depend on the FFT size, sample rate, music style, and desired visual response.

For electronic music, bass transients work well for large-scale movement, while high frequencies can control particles, flashes, noise, and smaller details.

Smoothing audio-reactive movement

Directly mapping frequency values to visual objects often produces unstable movement.

Audio data changes extremely quickly, so I smooth the signals before sending them to the renderer.

function smoothValue(
  current: number,
  target: number,
  factor: number
): number {
  return current + (target - current) * factor;
}

Example:

smoothedBass = smoothValue(smoothedBass, bass, 0.12);
smoothedMids = smoothValue(smoothedMids, mids, 0.08);
smoothedHighs = smoothValue(smoothedHighs, highs, 0.06);

Different smoothing values create different visual characteristics.

Fast smoothing feels responsive and aggressive
Slow smoothing feels fluid and atmospheric
Separate attack and release speeds create sharp impacts with smoother decay

This signal-conditioning stage is one of the most important parts of an audio-reactive system.

The visual result often depends more on normalization and smoothing than on the complexity of the 3D scene itself.

Mapping audio data to Three.js

React Three Fiber makes it possible to manage Three.js scenes through React components.

A simplified reactive mesh might look like this:

import { useFrame } from "@react-three/fiber";
import { useRef } from "react";
import type { Mesh } from "three";

type ReactiveMeshProps = {
  getEnergy: () => number;
};

export function ReactiveMesh({
  getEnergy,
}: ReactiveMeshProps) {
  const meshRef = useRef<Mesh>(null);

  useFrame(() => {
    if (!meshRef.current) {
      return;
    }

    const energy = getEnergy();
    const scale = 1 + energy * 0.8;

    meshRef.current.scale.setScalar(scale);

    meshRef.current.rotation.x +=
      0.002 + energy * 0.01;

    meshRef.current.rotation.y +=
      0.003 + energy * 0.015;
  });

  return (
    <mesh ref={meshRef}>
      <icosahedronGeometry args={[1.5, 5]} />
      <meshStandardMaterial wireframe />
    </mesh>
  );
}

In the complete application, audio signals can control:

Geometry deformation
Particle positions
Particle size
Camera movement
Shader uniforms
Noise intensity
Post-processing strength
Image distortion
RGB displacement
Glow and blur
Waveform radius
Layer opacity

The important architectural decision was to keep audio analysis separate from visual rendering.

The analysis layer produces normalized signals.

Each visualizer consumes those signals without needing to understand how the audio was decoded.

Building audio-reactive particles

Particle systems are especially useful for music visualization because different frequency bands can control different parts of the scene.

For example:

Bass controls the overall radius
Midrange controls turbulence
High frequencies control flickering
Overall energy controls particle size
Waveform values offset individual particle positions

A simplified particle update might look like this:

for (
  let index = 0;
  index < particleCount;
  index += 1
) {
  const offset = index * 3;
  const waveformIndex =
    index % waveformData.length;

  const waveformValue =
    (waveformData[waveformIndex] - 128) / 128;

  positions[offset] =
    basePositions[offset] +
    waveformValue * intensity;

  positions[offset + 1] =
    basePositions[offset + 1] +
    Math.sin(time + index * 0.01) * mids;

  positions[offset + 2] =
    basePositions[offset + 2] +
    bass * depth;
}

After modifying a Three.js BufferAttribute, it must be marked for an update:

positionAttribute.needsUpdate = true;

Updating thousands of particles every frame can become expensive.

Performance depends on:

Particle count
Attribute update frequency
Geometry complexity
Device pixel ratio
Post-processing passes
Memory allocations inside the render loop

Avoiding unnecessary object creation during each frame is particularly important.

Managing application state

The visualizer has many adjustable parameters:

Visual mode
Active preset
Particle count
Particle size
Particle shape
Camera distance
Morph intensity
Waveform style
Image effects
Layer order
Playback position
Export progress
Live-mode controls

I use Zustand for shared application state.

type VisualizerState = {
  bass: number;
  mids: number;
  highs: number;
  particleSize: number;

  setAudioBands: (
    bass: number,
    mids: number,
    highs: number
  ) => void;
};

const useVisualizerStore =
  create<VisualizerState>((set) => ({
    bass: 0,
    mids: 0,
    highs: 0,
    particleSize: 1,

    setAudioBands: (bass, mids, highs) =>
      set({ bass, mids, highs }),
  }));

However, not every audio-frame update should trigger a complete React re-render.

For rapidly changing values, refs, dedicated analysis objects, or direct store access can be more efficient.

UI state and render-loop state have different performance requirements.

Real-time playback and video export use different clocks

Real-time playback can use the browser animation loop:

requestAnimationFrame(render);

Video export needs deterministic frames.

At 30 FPS, frame n must always represent:

const time = frameIndex / 30;

The result should not depend on how quickly the computer happens to render.

A simplified export loop looks like this:

const fps = 30;
const totalFrames = Math.ceil(durationSeconds * fps);

for (
  let frameIndex = 0;
  frameIndex < totalFrames;
  frameIndex += 1
) {
  const time = frameIndex / fps;

  updateAudioAnalysisAtTime(time);
  renderVisualizerFrame(time);

  await encodeCurrentFrame(frameIndex, fps);
}

Without fixed timestamps:

Frames may be skipped
Frames may be duplicated
Visual timing may drift
Audio and video may lose synchronization
Results may differ between devices

The live renderer and exporter can share visual logic, but they should not use the same clock.

Exporting MP4 video with WebCodecs

WebCodecs provides low-level access to browser media encoders.

The export pipeline is approximately:

Render a visual frame
Create a VideoFrame
Send it to a VideoEncoder
Receive encoded chunks
Mux the video and audio tracks into MP4
Create a downloadable Blob

A simplified encoding function looks like this:

function encodeCanvasFrame(
  canvas: HTMLCanvasElement,
  encoder: VideoEncoder,
  frameIndex: number,
  fps: number
): void {
  const timestamp = Math.round(
    (frameIndex / fps) * 1_000_000
  );

  const frame = new VideoFrame(canvas, {
    timestamp,
  });

  encoder.encode(frame);
  frame.close();
}

Closing each VideoFrame is essential.

If frames remain open during a long export, memory usage can grow rapidly.

A possible encoder configuration is:

const config: VideoEncoderConfig = {
  codec: "avc1.42001f",
  width: 1920,
  height: 1080,
  bitrate: 8_000_000,
  framerate: 30,
};

Before starting, the application should check whether the browser supports the requested configuration:

const support =
  await VideoEncoder.isConfigSupported(config);

if (!support.supported) {
  throw new Error(
    "The requested encoder configuration is not supported"
  );
}

The project uses mp4-muxer to combine encoded video and audio data into the final MP4 container.

Why I added an ffmpeg.wasm fallback

WebCodecs can be fast, but codec support varies between browsers and operating systems.

A browser may expose the WebCodecs API while rejecting a specific encoder configuration.

For that reason, the application retries failed exports with ffmpeg.wasm.

try {
  await exportWithWebCodecs();
} catch (error) {
  console.warn(
    "WebCodecs export failed. Falling back to ffmpeg.wasm.",
    error
  );

  await exportWithFfmpeg();
}

The fallback path is slower and usually requires more memory, but it makes the application usable in more environments.

The deployed application first attempts to load ffmpeg core assets locally.

If those files are unavailable, it can retrieve the required runtime assets from jsDelivr.

Keeping uploaded media local

Uploaded music and artwork are not sent to an application backend.

The browser handles:

Audio decoding
Frequency analysis
Waveform sampling
Image processing
Three.js rendering
Video encoding
MP4 generation

The completed MP4 is generated as a browser Blob and downloaded locally.

This provides several advantages:

Private or unreleased music does not need to be uploaded
There is no media upload delay
Server-side media storage is unnecessary
Hosting costs stay relatively low
The application can be deployed mostly as static assets

It also introduces limitations:

Large files consume browser memory
Long exports can be CPU-intensive
Performance varies by device
The browser tab must remain open
Complex scenes are difficult on lower-powered mobile devices

Live / VJ mode

The application also includes a fullscreen Live / VJ mode.

It:

Hides the editing interface
Expands the visual output
Supports keyboard controls
Provides temporary effect boosts
Allows the visualizer to be used during performance experiments

The main challenge was keeping editing controls and performance controls connected to the same state without allowing UI updates to interrupt rendering.

Separating the application shell from the visual components made this easier.

Project structure

The codebase is organized by responsibility:

src/
├── audio/       # Decoding, FFT, waveform, BPM and analysis
├── export/      # WebCodecs, MP4 muxing and ffmpeg fallback
├── ui/          # Controls and visualizer canvases
├── visual/      # Three.js scenes, particles, shaders and presets
├── App.tsx      # Application shell and orchestration
└── store.ts     # Shared Zustand state

The main architectural principle is:

Analyze the audio once, normalize the result, and let multiple visual systems consume the same signals.

This allows the 3D, waveform, and image-effect modes to share the same musical timing while rendering in completely different ways.

The hardest parts

The most difficult problems were not simply creating objects with Three.js.

Synchronizing video export

Real-time rendering and fixed-frame export use different timing systems.

The exporter needs predictable timestamps and repeatable audio sampling.

Browser codec differences

WebCodecs availability does not guarantee support for every video or audio codec configuration.

Memory management

Full HD rendering, video encoding, audio processing, and WebAssembly can consume significant amounts of memory.

Stabilizing movement

Raw frequency data is noisy.

Every visual property needs suitable normalization, smoothing, sensitivity, and limits.

Sharing audio logic across visual modes

The 3D, waveform, and image-effect renderers all need access to the same analysis data without becoming tightly coupled.

What I want to improve next

The current roadmap includes:

Better keyboard accessibility
Improved screen-reader support
Responsive layouts for narrower screens
Automated tests for audio analysis and export utilities
Saveable and shareable visual presets
Better long-duration export guidance
A maintained browser compatibility matrix
Additional visual presets
Performance benchmarks for complex scenes

The repository already includes contributing guidelines, issue templates, pull request templates, security instructions, a code of conduct, and an MIT License.

Contributions are welcome.

Minimal starter project

I also released a smaller educational repository that focuses only on the core audio-to-visual pipeline.

It demonstrates how to:

Load a local audio file
Analyze volume, bass, mids, highs, and waveform data
Keep high-frequency audio updates outside normal React rerenders
Drive a Three.js mesh and particle system in real time
Structure the project with React, TypeScript, and React Three Fiber

Repository:

https://github.com/7g3n/web-audio-threejs-starter

Live demo:

https://7g3n.github.io/web-audio-threejs-starter/

If you want to understand the core implementation before exploring the full visualizer, this starter is the best place to begin.

Try the project

If you are interested in Three.js, the Web Audio API, creative coding, audiovisual tools, or browser-based media processing, I would appreciate your feedback.

Issues, pull requests, and other contributions are welcome.

If the project is useful to you, consider starring the repository.

Thanks for reading.

DEV Community