DEV Community

Cover image for How I Built an Audio-Reactive 3D Visualizer with Three.js and the Web Audio API
Nagisa Dozono
Nagisa Dozono

Posted on

How I Built an Audio-Reactive 3D Visualizer with Three.js and the Web Audio API

I recently released Audio Reactive 3D Visualizer v1.0.0, an open-source browser application that turns uploaded music and artwork into real-time visuals.

Users can generate synchronized 3D objects, particles, waveforms, and image effects, then export the result as a 1920×1080 MP4 at 30 FPS.

The complete media-processing workflow runs locally in the browser.

Audio Reactive 3D Visualizer interface

In this article, I will explain the main ideas behind the project:

  • Decoding audio with the Web Audio API
  • Converting frequency data into stable visual signals
  • Mapping audio data to Three.js scenes
  • Creating audio-reactive particle systems
  • Separating real-time playback from video export
  • Exporting MP4 files with WebCodecs
  • Falling back to ffmpeg.wasm
  • Processing uploaded media locally

The code examples below are simplified to explain the architecture. The complete implementation is available in the GitHub repository.


Why I built it

I work as both a frontend developer and a music producer.

Independent musicians often need visual content for music releases, social media, live performances, vocal synth projects, and music-video mockups.

However, producing a complete music video can be expensive and time-consuming.

There is also a communication problem.

A musician may know exactly where a visual should pulse, distort, expand, or collapse, but translating that timing and atmosphere into instructions for another creator can be difficult.

I wanted to create a tool that could quickly transform the structure and energy of a track into a visual starting point.

My main requirements were:

  • No desktop video editor required
  • Real-time audio-reactive rendering
  • Multiple visual styles
  • Customizable parameters
  • A fullscreen Live / VJ mode
  • Full HD MP4 export
  • Local processing of uploaded media
  • An open-source codebase that developers can study and extend

Technology stack

The application is built with:

  • React 19
  • TypeScript
  • Vite
  • Three.js
  • React Three Fiber
  • React Three Drei
  • WebGL
  • Canvas 2D
  • Web Audio API
  • Zustand
  • Material UI
  • WebCodecs
  • mp4-muxer
  • ffmpeg.wasm
  • Cloudflare Workers Static Assets

It currently includes three main visual modes.

3D Visualizer

The 3D mode provides multiple scene presets, particle systems, geometry controls, camera movement, morphing, shaders, and post-processing effects.

Wave Visualizer

The Wave mode renders the uploaded audio as horizontal, circular, or bar-based waveform visualizations.

Image FX

Image FX applies audio-reactive glow, blur, RGB shift, noise, distortion, and pulse effects to uploaded artwork.


Decoding audio in the browser

The first step is loading and decoding an audio file with the Web Audio API.

A simplified implementation looks like this:

const audioContext = new AudioContext();

const arrayBuffer = await file.arrayBuffer();
const audioBuffer = await audioContext.decodeAudioData(arrayBuffer);
Enter fullscreen mode Exit fullscreen mode

The resulting AudioBuffer contains decoded PCM audio data.

This is useful for offline processing such as:

  • Building waveform previews
  • Sampling the complete track
  • Detecting peaks
  • Estimating BPM
  • Preparing deterministic export data

For real-time playback, an AnalyserNode provides continuously updated frequency and waveform data.

const analyser = audioContext.createAnalyser();

analyser.fftSize = 2048;
analyser.smoothingTimeConstant = 0.8;

const frequencyData = new Uint8Array(
  analyser.frequencyBinCount
);

const waveformData = new Uint8Array(
  analyser.fftSize
);
Enter fullscreen mode Exit fullscreen mode

During playback, the arrays are updated with:

analyser.getByteFrequencyData(frequencyData);
analyser.getByteTimeDomainData(waveformData);
Enter fullscreen mode Exit fullscreen mode

These arrays form the connection between the audio engine and the rendering system.


Turning FFT data into visual signals

Raw FFT data is not automatically useful for animation.

The renderer usually needs a smaller set of normalized signals, such as:

  • Overall energy
  • Bass energy
  • Midrange energy
  • High-frequency energy
  • Peak intensity
  • Waveform displacement
  • Smoothed amplitude

A basic helper for averaging a range of frequency bins can look like this:

function averageRange(
  values: Uint8Array,
  start: number,
  end: number
): number {
  let sum = 0;
  const safeEnd = Math.min(end, values.length);

  for (let index = start; index < safeEnd; index += 1) {
    sum += values[index];
  }

  const count = Math.max(1, safeEnd - start);

  return sum / count / 255;
}
Enter fullscreen mode Exit fullscreen mode

The spectrum can then be divided into broad frequency bands:

const bass = averageRange(frequencyData, 0, 24);
const mids = averageRange(frequencyData, 24, 96);
const highs = averageRange(frequencyData, 96, 256);
Enter fullscreen mode Exit fullscreen mode

The ideal ranges depend on the FFT size, sample rate, music style, and desired visual response.

For electronic music, bass transients work well for large-scale movement, while high frequencies can control particles, flashes, noise, and smaller details.


Smoothing audio-reactive movement

Directly mapping frequency values to visual objects often produces unstable movement.

Audio data changes extremely quickly, so I smooth the signals before sending them to the renderer.

function smoothValue(
  current: number,
  target: number,
  factor: number
): number {
  return current + (target - current) * factor;
}
Enter fullscreen mode Exit fullscreen mode

Example:

smoothedBass = smoothValue(smoothedBass, bass, 0.12);
smoothedMids = smoothValue(smoothedMids, mids, 0.08);
smoothedHighs = smoothValue(smoothedHighs, highs, 0.06);
Enter fullscreen mode Exit fullscreen mode

Different smoothing values create different visual characteristics.

  • Fast smoothing feels responsive and aggressive
  • Slow smoothing feels fluid and atmospheric
  • Separate attack and release speeds create sharp impacts with smoother decay

This signal-conditioning stage is one of the most important parts of an audio-reactive system.

The visual result often depends more on normalization and smoothing than on the complexity of the 3D scene itself.


Mapping audio data to Three.js

React Three Fiber makes it possible to manage Three.js scenes through React components.

A simplified reactive mesh might look like this:

import { useFrame } from "@react-three/fiber";
import { useRef } from "react";
import type { Mesh } from "three";

type ReactiveMeshProps = {
  getEnergy: () => number;
};

export function ReactiveMesh({
  getEnergy,
}: ReactiveMeshProps) {
  const meshRef = useRef<Mesh>(null);

  useFrame(() => {
    if (!meshRef.current) {
      return;
    }

    const energy = getEnergy();
    const scale = 1 + energy * 0.8;

    meshRef.current.scale.setScalar(scale);

    meshRef.current.rotation.x +=
      0.002 + energy * 0.01;

    meshRef.current.rotation.y +=
      0.003 + energy * 0.015;
  });

  return (
    <mesh ref={meshRef}>
      <icosahedronGeometry args={[1.5, 5]} />
      <meshStandardMaterial wireframe />
    </mesh>
  );
}
Enter fullscreen mode Exit fullscreen mode

In the complete application, audio signals can control:

  • Geometry deformation
  • Particle positions
  • Particle size
  • Camera movement
  • Shader uniforms
  • Noise intensity
  • Post-processing strength
  • Image distortion
  • RGB displacement
  • Glow and blur
  • Waveform radius
  • Layer opacity

The important architectural decision was to keep audio analysis separate from visual rendering.

The analysis layer produces normalized signals.

Each visualizer consumes those signals without needing to understand how the audio was decoded.


Building audio-reactive particles

Particle systems are especially useful for music visualization because different frequency bands can control different parts of the scene.

For example:

  • Bass controls the overall radius
  • Midrange controls turbulence
  • High frequencies control flickering
  • Overall energy controls particle size
  • Waveform values offset individual particle positions

A simplified particle update might look like this:

for (
  let index = 0;
  index < particleCount;
  index += 1
) {
  const offset = index * 3;
  const waveformIndex =
    index % waveformData.length;

  const waveformValue =
    (waveformData[waveformIndex] - 128) / 128;

  positions[offset] =
    basePositions[offset] +
    waveformValue * intensity;

  positions[offset + 1] =
    basePositions[offset + 1] +
    Math.sin(time + index * 0.01) * mids;

  positions[offset + 2] =
    basePositions[offset + 2] +
    bass * depth;
}
Enter fullscreen mode Exit fullscreen mode

After modifying a Three.js BufferAttribute, it must be marked for an update:

positionAttribute.needsUpdate = true;
Enter fullscreen mode Exit fullscreen mode

Updating thousands of particles every frame can become expensive.

Performance depends on:

  • Particle count
  • Attribute update frequency
  • Geometry complexity
  • Device pixel ratio
  • Post-processing passes
  • Memory allocations inside the render loop

Avoiding unnecessary object creation during each frame is particularly important.


Managing application state

The visualizer has many adjustable parameters:

  • Visual mode
  • Active preset
  • Particle count
  • Particle size
  • Particle shape
  • Camera distance
  • Morph intensity
  • Waveform style
  • Image effects
  • Layer order
  • Playback position
  • Export progress
  • Live-mode controls

I use Zustand for shared application state.

type VisualizerState = {
  bass: number;
  mids: number;
  highs: number;
  particleSize: number;

  setAudioBands: (
    bass: number,
    mids: number,
    highs: number
  ) => void;
};

const useVisualizerStore =
  create<VisualizerState>((set) => ({
    bass: 0,
    mids: 0,
    highs: 0,
    particleSize: 1,

    setAudioBands: (bass, mids, highs) =>
      set({ bass, mids, highs }),
  }));
Enter fullscreen mode Exit fullscreen mode

However, not every audio-frame update should trigger a complete React re-render.

For rapidly changing values, refs, dedicated analysis objects, or direct store access can be more efficient.

UI state and render-loop state have different performance requirements.


Real-time playback and video export use different clocks

Real-time playback can use the browser animation loop:

requestAnimationFrame(render);
Enter fullscreen mode Exit fullscreen mode

Video export needs deterministic frames.

At 30 FPS, frame n must always represent:

const time = frameIndex / 30;
Enter fullscreen mode Exit fullscreen mode

The result should not depend on how quickly the computer happens to render.

A simplified export loop looks like this:

const fps = 30;
const totalFrames = Math.ceil(durationSeconds * fps);

for (
  let frameIndex = 0;
  frameIndex < totalFrames;
  frameIndex += 1
) {
  const time = frameIndex / fps;

  updateAudioAnalysisAtTime(time);
  renderVisualizerFrame(time);

  await encodeCurrentFrame(frameIndex, fps);
}
Enter fullscreen mode Exit fullscreen mode

Without fixed timestamps:

  • Frames may be skipped
  • Frames may be duplicated
  • Visual timing may drift
  • Audio and video may lose synchronization
  • Results may differ between devices

The live renderer and exporter can share visual logic, but they should not use the same clock.


Exporting MP4 video with WebCodecs

WebCodecs provides low-level access to browser media encoders.

The export pipeline is approximately:

  1. Render a visual frame
  2. Create a VideoFrame
  3. Send it to a VideoEncoder
  4. Receive encoded chunks
  5. Mux the video and audio tracks into MP4
  6. Create a downloadable Blob

A simplified encoding function looks like this:

function encodeCanvasFrame(
  canvas: HTMLCanvasElement,
  encoder: VideoEncoder,
  frameIndex: number,
  fps: number
): void {
  const timestamp = Math.round(
    (frameIndex / fps) * 1_000_000
  );

  const frame = new VideoFrame(canvas, {
    timestamp,
  });

  encoder.encode(frame);
  frame.close();
}
Enter fullscreen mode Exit fullscreen mode

Closing each VideoFrame is essential.

If frames remain open during a long export, memory usage can grow rapidly.

A possible encoder configuration is:

const config: VideoEncoderConfig = {
  codec: "avc1.42001f",
  width: 1920,
  height: 1080,
  bitrate: 8_000_000,
  framerate: 30,
};
Enter fullscreen mode Exit fullscreen mode

Before starting, the application should check whether the browser supports the requested configuration:

const support =
  await VideoEncoder.isConfigSupported(config);

if (!support.supported) {
  throw new Error(
    "The requested encoder configuration is not supported"
  );
}
Enter fullscreen mode Exit fullscreen mode

The project uses mp4-muxer to combine encoded video and audio data into the final MP4 container.


Why I added an ffmpeg.wasm fallback

WebCodecs can be fast, but codec support varies between browsers and operating systems.

A browser may expose the WebCodecs API while rejecting a specific encoder configuration.

For that reason, the application retries failed exports with ffmpeg.wasm.

try {
  await exportWithWebCodecs();
} catch (error) {
  console.warn(
    "WebCodecs export failed. Falling back to ffmpeg.wasm.",
    error
  );

  await exportWithFfmpeg();
}
Enter fullscreen mode Exit fullscreen mode

The fallback path is slower and usually requires more memory, but it makes the application usable in more environments.

The deployed application first attempts to load ffmpeg core assets locally.

If those files are unavailable, it can retrieve the required runtime assets from jsDelivr.


Keeping uploaded media local

Uploaded music and artwork are not sent to an application backend.

The browser handles:

  • Audio decoding
  • Frequency analysis
  • Waveform sampling
  • Image processing
  • Three.js rendering
  • Video encoding
  • MP4 generation

The completed MP4 is generated as a browser Blob and downloaded locally.

This provides several advantages:

  • Private or unreleased music does not need to be uploaded
  • There is no media upload delay
  • Server-side media storage is unnecessary
  • Hosting costs stay relatively low
  • The application can be deployed mostly as static assets

It also introduces limitations:

  • Large files consume browser memory
  • Long exports can be CPU-intensive
  • Performance varies by device
  • The browser tab must remain open
  • Complex scenes are difficult on lower-powered mobile devices

Live / VJ mode

The application also includes a fullscreen Live / VJ mode.

It:

  • Hides the editing interface
  • Expands the visual output
  • Supports keyboard controls
  • Provides temporary effect boosts
  • Allows the visualizer to be used during performance experiments

The main challenge was keeping editing controls and performance controls connected to the same state without allowing UI updates to interrupt rendering.

Separating the application shell from the visual components made this easier.


Project structure

The codebase is organized by responsibility:

src/
├── audio/       # Decoding, FFT, waveform, BPM and analysis
├── export/      # WebCodecs, MP4 muxing and ffmpeg fallback
├── ui/          # Controls and visualizer canvases
├── visual/      # Three.js scenes, particles, shaders and presets
├── App.tsx      # Application shell and orchestration
└── store.ts     # Shared Zustand state
Enter fullscreen mode Exit fullscreen mode

The main architectural principle is:

Analyze the audio once, normalize the result, and let multiple visual systems consume the same signals.

This allows the 3D, waveform, and image-effect modes to share the same musical timing while rendering in completely different ways.


The hardest parts

The most difficult problems were not simply creating objects with Three.js.

Synchronizing video export

Real-time rendering and fixed-frame export use different timing systems.

The exporter needs predictable timestamps and repeatable audio sampling.

Browser codec differences

WebCodecs availability does not guarantee support for every video or audio codec configuration.

Memory management

Full HD rendering, video encoding, audio processing, and WebAssembly can consume significant amounts of memory.

Stabilizing movement

Raw frequency data is noisy.

Every visual property needs suitable normalization, smoothing, sensitivity, and limits.

Sharing audio logic across visual modes

The 3D, waveform, and image-effect renderers all need access to the same analysis data without becoming tightly coupled.


What I want to improve next

The current roadmap includes:

  • Better keyboard accessibility
  • Improved screen-reader support
  • Responsive layouts for narrower screens
  • Automated tests for audio analysis and export utilities
  • Saveable and shareable visual presets
  • Better long-duration export guidance
  • A maintained browser compatibility matrix
  • Additional visual presets
  • Performance benchmarks for complex scenes

The repository already includes contributing guidelines, issue templates, pull request templates, security instructions, a code of conduct, and an MIT License.

Contributions are welcome.


Minimal starter project

I also released a smaller educational repository that focuses only on the core audio-to-visual pipeline.

It demonstrates how to:

  • Load a local audio file
  • Analyze volume, bass, mids, highs, and waveform data
  • Keep high-frequency audio updates outside normal React rerenders
  • Drive a Three.js mesh and particle system in real time
  • Structure the project with React, TypeScript, and React Three Fiber

Repository:

https://github.com/7g3n/web-audio-threejs-starter

Live demo:

https://7g3n.github.io/web-audio-threejs-starter/

If you want to understand the core implementation before exploring the full visualizer, this starter is the best place to begin.

Try the project

Live demo

https://waveform.tranjectories.xyz/

Source code

https://github.com/7g3n/phase-viz

v1.0.0 release

https://github.com/7g3n/phase-viz/releases/tag/v1.0.0

Music video created with the visualizer

https://www.youtube.com/watch?v=R8ItWr2V_ZA

If you are interested in Three.js, the Web Audio API, creative coding, audiovisual tools, or browser-based media processing, I would appreciate your feedback.

Issues, pull requests, and other contributions are welcome.

If the project is useful to you, consider starring the repository.

Thanks for reading.

Top comments (0)