monkeymore studio

Posted on Apr 12

Building a Browser-Based Voice Changer: A Complete Technical Guide

#javascript #webdev #showdev #tutorial

Have you ever wondered how to create a voice changer that runs entirely in the browser? No server uploads, no external APIs, just pure client-side audio processing. In this guide, I'll walk you through how we built our free online voice changer tool at AudioTool, explaining every technical detail along the way.

Why Build a Browser-Only Voice Changer?

Before diving into the code, let's address the elephant in the room: why keep everything in the browser?

Privacy First

When you're recording your voice or uploading personal audio files, the last thing you want is for that data to travel across the internet to some unknown server. By processing everything locally in the browser, your audio never leaves your device. This is crucial for users who care about privacy.

Zero Latency

Server-based processing introduces network latency. With browser-based processing, the audio flows directly from your microphone through the Web Audio API to your speakers. The delay is imperceptible.

No Server Costs

Running audio processing servers is expensive. CPU-intensive tasks like real-time audio effects can quickly rack up cloud computing bills. By offloading this work to the user's browser, we can offer the tool completely free, forever.

Works Offline

Once the page loads, our voice changer works even without an internet connection. This makes it reliable in situations with poor connectivity.

The Architecture Overview

Our voice changer is built as a Next.js 15 application using React 19 and TypeScript. The core audio processing relies on the Tone.js library, which provides a powerful abstraction over the Web Audio API.

Here's the high-level architecture:

Core Data Structures

Let's look at the key data structures that power our voice changer:

Effect Type Definition

type EffectType = "robot" | "alien" | "monster" | "telephone" | "underwater" | "cave" | "chipmunk" | "demon" | "none";

interface Effect {
  id: EffectType;
  name: string;
  icon: string;
  description: string;
}

This type-safe approach ensures we can only select valid effects. The Effect interface provides metadata for rendering the UI.

State Management

const [isRecording, setIsRecording] = useState(false);
const [isPlaying, setIsPlaying] = useState(false);
const [audioBlob, setAudioBlob] = useState<Blob | null>(null);
const [audioUrl, setAudioUrl] = useState<string | null>(null);
const [selectedEffect, setSelectedEffect] = useState<EffectType>("none");
const [volume, setVolume] = useState(0.8);
const [recordingTime, setRecordingTime] = useState(0);
const [isProcessing, setIsProcessing] = useState(false);

These state variables track the entire user flow from recording to playback.

Refs for Audio Handling

const mediaRecorderRef = useRef<MediaRecorder | null>(null);
const audioChunksRef = useRef<Blob[]>([]);
const recordingTimerRef = useRef<NodeJS.Timeout | null>(null);
const processedAudioRef = useRef<HTMLAudioElement | null>(null);
const toneInitializedRef = useRef(false);

Using refs is crucial here because we need to persist these values across re-renders without triggering React's reconciliation.

Recording Audio from the Microphone

The first step is capturing audio from the user's microphone. We use the MediaRecorder API:

const startRecording = async () => {
  try {
    await initTone();

    const stream = await navigator.mediaDevices.getUserMedia({ audio: true });
    setPermissionDenied(false);

    audioChunksRef.current = [];

    const mediaRecorder = new MediaRecorder(stream);
    mediaRecorderRef.current = mediaRecorder;

    mediaRecorder.ondataavailable = (event) => {
      if (event.data.size > 0) {
        audioChunksRef.current.push(event.data);
      }
    };

    mediaRecorder.onstop = () => {
      const audioBlob = new Blob(audioChunksRef.current, { type: "audio/wav" });
      setAudioBlob(audioBlob);
      const url = URL.createObjectURL(audioBlob);
      setAudioUrl(url);
      stream.getTracks().forEach(track => track.stop());
    };

    mediaRecorder.start();
    setIsRecording(true);
    setRecordingTime(0);

    recordingTimerRef.current = setInterval(() => {
      setRecordingTime(prev => prev + 1);
    }, 1000);

  } catch (err) {
    console.error("Error accessing microphone:", err);
    setPermissionDenied(true);
  }
};

The getUserMedia API requests microphone access from the browser. Once granted, we create a MediaRecorder instance that captures audio chunks as they're available. When recording stops, we combine these chunks into a single Blob and create an object URL for playback.

The Heart of the System: Effect Processing

Here's where the magic happens. We use Tone.js's Tone.Offline to render audio with effects applied:

const applyEffect = async (audioBuffer: AudioBuffer, effectType: EffectType): Promise<AudioBuffer> => {
  if (effectType === "none") return audioBuffer;

  const renderedBuffer = await Tone.Offline(() => {
    const source = new Tone.Player(audioBuffer);
    let effectChain: Tone.ToneAudioNode[] = [];

    switch (effectType) {
      case "robot":
        effectChain = [
          new Tone.BitCrusher(4),
          new Tone.Filter(800, "lowpass"),
          new Tone.Distortion(0.3),
        ];
        break;

      case "alien":
        effectChain = [
          new Tone.PitchShift(8),
          new Tone.Chorus(4, 2.5, 0.5).start(),
          new Tone.Reverb({ decay: 3, wet: 0.4 }),
        ];
        break;

      case "monster":
        effectChain = [
          new Tone.PitchShift(-12),
          new Tone.Distortion(0.5),
          new Tone.Filter(400, "lowpass"),
        ];
        break;

      case "telephone":
        effectChain = [
          new Tone.Filter(3400, "lowpass"),
          new Tone.Filter(300, "highpass"),
          new Tone.BitCrusher(8),
        ];
        break;

      case "underwater":
        effectChain = [
          new Tone.Filter(400, "lowpass"),
          new Tone.AutoFilter(2, 200).start(),
          new Tone.Reverb({ decay: 4, wet: 0.6 }),
        ];
        break;

      case "cave":
        effectChain = [
          new Tone.Reverb({ decay: 5, wet: 0.7 }),
          new Tone.Delay(0.3, 0.4),
          new Tone.Filter(2000, "lowpass"),
        ];
        break;

      case "chipmunk":
        effectChain = [
          new Tone.PitchShift(12),
          new Tone.Filter(8000, "lowpass"),
        ];
        break;

      case "demon":
        effectChain = [
          new Tone.PitchShift(-8),
          new Tone.Distortion(0.4),
          new Tone.Filter(600, "lowpass"),
          new Tone.Reverb({ decay: 2, wet: 0.3 }),
        ];
        break;
    }

    if (effectChain.length > 0) {
      source.chain(...effectChain, Tone.getDestination());
    } else {
      source.toDestination();
    }

    source.start(0);
  }, audioBuffer.duration);

  return renderedBuffer as unknown as AudioBuffer;
};

Understanding the Effect Chain

Each effect is a combination of audio processing nodes:

Robot: Uses BitCrusher to reduce bit depth (creating digital distortion), a lowpass filter to muffle the sound, and distortion for that mechanical edge.
Alien: Pitch shifts up 8 semitones, adds chorus for a shimmering quality, and reverb for spacey ambience.
Monster: Pitch shifts down 12 semitones (one octave), adds heavy distortion, and filters out high frequencies for that deep, growling sound.
Telephone: Simulates telephone bandwidth by bandpass filtering (300Hz-3400Hz) and adding mild bitcrushing.
Underwater: Heavy lowpass filtering, auto-filter for movement, and reverb for that submerged feeling.
Cave: Long decay reverb, delay for echoes, and filtering to simulate acoustic space.
Chipmunk: Pitch shifts up 12 semitones (one octave) with a lowpass filter to prevent harshness.
Demon: Pitch shifts down 8 semitones with distortion and filtering for an evil, deep voice.

The source.chain(...effectChain, Tone.getDestination()) method connects the audio source through each effect in sequence, finally reaching the destination (output).

The Complete Processing Flow

Here's how the entire process flows from user action to output:

Converting AudioBuffer to WAV

After processing, we need to export the audio as a downloadable WAV file. Here's the conversion function:

const audioBufferToWav = (buffer: AudioBuffer): Blob => {
  const numberOfChannels = buffer.numberOfChannels;
  const sampleRate = buffer.sampleRate;
  const format = 1; // PCM
  const bitDepth = 16;

  const bytesPerSample = bitDepth / 8;
  const blockAlign = numberOfChannels * bytesPerSample;

  const dataLength = buffer.length * numberOfChannels * bytesPerSample;
  const bufferLength = 44 + dataLength;

  const arrayBuffer = new ArrayBuffer(bufferLength);
  const view = new DataView(arrayBuffer);

  // Write WAV header
  const writeString = (view: DataView, offset: number, string: string) => {
    for (let i = 0; i < string.length; i++) {
      view.setUint8(offset + i, string.charCodeAt(i));
    }
  };

  writeString(view, 0, "RIFF");
  view.setUint32(4, 36 + dataLength, true);
  writeString(view, 8, "WAVE");
  writeString(view, 12, "fmt ");
  view.setUint32(16, 16, true);
  view.setUint16(20, format, true);
  view.setUint16(22, numberOfChannels, true);
  view.setUint32(24, sampleRate, true);
  view.setUint32(28, sampleRate * blockAlign, true);
  view.setUint16(32, blockAlign, true);
  view.setUint16(34, bitDepth, true);
  writeString(view, 36, "data");
  view.setUint32(40, dataLength, true);

  // Write audio data
  const offset = 44;
  const channels: Float32Array[] = [];
  for (let i = 0; i < numberOfChannels; i++) {
    channels.push(buffer.getChannelData(i));
  }

  let index = 0;
  for (let i = 0; i < buffer.length; i++) {
    for (let channel = 0; channel < numberOfChannels; channel++) {
      const sample = Math.max(-1, Math.min(1, channels[channel][i]));
      const intSample = sample < 0 ? sample * 0x8000 : sample * 0x7FFF;
      view.setInt16(offset + index, intSample, true);
      index += 2;
    }
  }

  return new Blob([arrayBuffer], { type: "audio/wav" });
};

This function constructs a proper WAV file header (44 bytes) followed by the interleaved PCM audio data. The WAV format is universally supported and doesn't require any encoding libraries.

Handling File Uploads

Users can also upload existing audio files instead of recording:

const handleFileUpload = (event: React.ChangeEvent<HTMLInputElement>) => {
  const file = event.target.files?.[0];
  if (file) {
    if (!file.type.startsWith("audio/")) {
      alert("Please upload a valid audio file");
      return;
    }

    if (audioUrl) {
      URL.revokeObjectURL(audioUrl);
    }

    const url = URL.createObjectURL(file);
    setAudioBlob(file);
    setAudioUrl(url);
    setRecordingTime(0);
  }
};

We validate the file type, clean up any previous URLs to prevent memory leaks, and create a new object URL for the uploaded file.

Playback and Download

The playback function processes the audio on-demand:

const processAndPlayAudio = async () => {
  if (!audioBlob || !audioUrl) return;

  setIsProcessing(true);

  try {
    if (processedAudioRef.current) {
      processedAudioRef.current.pause();
      processedAudioRef.current = null;
    }

    const arrayBuffer = await audioBlob.arrayBuffer();
    const AudioContextClass = window.AudioContext || (window as unknown as { webkitAudioContext: typeof AudioContext }).webkitAudioContext;
    const audioContext = new AudioContextClass();
    const audioBuffer = await audioContext.decodeAudioData(arrayBuffer);

    const processedBuffer = await applyEffect(audioBuffer, selectedEffect);

    const wavBlob = audioBufferToWav(processedBuffer);
    const processedUrl = URL.createObjectURL(wavBlob);

    const audio = new Audio(processedUrl);
    audio.volume = volume;
    processedAudioRef.current = audio;

    audio.onended = () => {
      setIsPlaying(false);
    };

    setIsPlaying(true);
    await audio.play();
    setIsProcessing(false);

  } catch (error) {
    console.error("Error processing audio:", error);
    setIsProcessing(false);
  }
};

The download function follows a similar pattern but triggers a file download instead of playback:

const downloadAudio = async () => {
  if (!audioBlob) return;

  setIsProcessing(true);

  try {
    const arrayBuffer = await audioBlob.arrayBuffer();
    const AudioContextClass = window.AudioContext || (window as unknown as { webkitAudioContext: typeof AudioContext }).webkitAudioContext;
    const audioContext = new AudioContextClass();
    const audioBuffer = await audioContext.decodeAudioData(arrayBuffer);

    const processedBuffer = await applyEffect(audioBuffer, selectedEffect);
    const wavBlob = audioBufferToWav(processedBuffer);

    const url = URL.createObjectURL(wavBlob);
    const timestamp = new Date().toISOString().replace(/[:.]/g, '-');
    const a = document.createElement("a");
    a.href = url;
    a.download = `voice-changer-${selectedEffect}-${timestamp}.wav`;
    document.body.appendChild(a);
    a.click();
    document.body.removeChild(a);
    URL.revokeObjectURL(url);
    setIsProcessing(false);
  } catch (error) {
    console.error("Error downloading audio:", error);
    setIsProcessing(false);
  }
};

Key Technical Decisions

Why Tone.js?

We chose Tone.js over raw Web Audio API for several reasons:

Higher-level abstractions: Tone.js provides musical concepts like pitch shifting in semitones, which maps naturally to how we think about voice effects.
Offline rendering: The Tone.Offline method is essential for our use case. It creates a temporary offline audio context that renders the entire audio file with effects applied, rather than processing in real-time.
Effect chaining: The chain() method makes it trivial to connect multiple effects in series.
Cross-browser compatibility: Tone.js handles browser quirks and provides polyfills where needed.

Why WAV Format?

We export to WAV rather than MP3 for simplicity:

No encoding required: WAV is uncompressed PCM audio, so we just need to write the raw samples with a header.
Universal support: Every device and application can play WAV files.
No quality loss: Since we're already processing the audio, there's no need to add another lossy compression step.

Memory Management

Audio processing can be memory-intensive. We take several precautions:

Revoke object URLs: We always call URL.revokeObjectURL() when cleaning up to prevent memory leaks.
Stop streams: When recording finishes, we stop all tracks on the media stream to release the microphone.
Clear refs: We null out audio element refs when stopping playback.

Browser Compatibility

Our voice changer works in all modern browsers:

Chrome/Edge: Full support
Firefox: Full support
Safari: Full support (with some Web Audio API quirks handled by Tone.js)

The key APIs we depend on are:

MediaRecorder: Supported in all modern browsers
AudioContext: Supported in all modern browsers
getUserMedia: Supported in all modern browsers (requires HTTPS)

Try It Yourself

Now that you understand how it works, why not try it out? Head over to our free online voice changer and experiment with the different effects. All processing happens right in your browser - your audio never leaves your device.

Whether you want to sound like a robot, an alien, or a monster, the technology is now accessible to everyone, completely free, with no registration required.

Conclusion

Building a browser-based voice changer demonstrates the power of modern web APIs. By leveraging the MediaRecorder API for input, the Web Audio API (via Tone.js) for processing, and standard browser features for output, we can create sophisticated audio applications without any server infrastructure.

The key takeaways:

Browser audio processing is powerful: You can do real-time and offline audio processing entirely client-side.
Privacy by design: Keeping data local is not just possible, it's often preferable.
Tone.js simplifies audio: The Web Audio API is powerful but verbose. Tone.js provides the right level of abstraction.
WAV is your friend: For browser-based audio tools, WAV export is simple and universally compatible.

If you're building your own audio applications, I hope this guide gives you a solid foundation. The complete source code is available in our repository, and you can see it in action at Free Online Audio Tools.

Happy coding, and have fun with your new voice! 🎤🤖👽

DEV Community