Niv Dvir

Posted on Mar 22

How I Built a Cochlear Spiral Spectrogram That Visualizes Music Like the Inner Ear

#audio

What if you could see music the way your inner ear hears it?

I built a visualization system that maps audio frequencies onto a Fermat spiral — the same geometric curve that describes how the human cochlea arranges its frequency-sensitive hair cells. The result reveals the hidden geometry of harmony: you can literally see the difference between a major and minor chord.

The Core Idea

Traditional spectrograms show frequency vs. time as a rectangular heatmap. They're useful but clinical — they don't capture the feeling of music.

The cochlea (your inner ear) isn't rectangular. It's a spiral. Low frequencies resonate at the outer end, high frequencies at the inner end — logarithmically spaced, just like musical octaves.

So I asked: what if we visualize frequencies on an actual spiral?

How It Works

1. Audio Analysis (scipy FFT)

381 logarithmically-spaced frequency bins (20 Hz — 8 kHz)
ISO 226 equal-loudness contours for perceptual accuracy
60 FPS frame-by-frame analysis

# Simplified core: FFT → cochlear frequency mapping
import numpy as np
from scipy.fft import rfft, rfftfreq

def analyze_frame(samples, sample_rate=44100, n_bins=381):
    spectrum = np.abs(rfft(samples))
    freqs = rfftfreq(len(samples), 1/sample_rate)

    # Logarithmic bins: 20 Hz to 8 kHz (cochlear range)
    bin_edges = np.logspace(np.log10(20), np.log10(8000), n_bins + 1)
    amplitudes = np.zeros(n_bins)

    for i in range(n_bins):
        mask = (freqs >= bin_edges[i]) & (freqs < bin_edges[i+1])
        if mask.any():
            amplitudes[i] = spectrum[mask].mean()

    return amplitudes

2. Spiral Mapping (Fermat Spiral)

Each frequency bin gets a position on a Fermat spiral: r = sqrt(θ)

Low frequencies sit at the outer edge (like the cochlea's apex), high frequencies spiral inward.

# Map frequency bins to spiral coordinates
theta = np.linspace(0, 8 * np.pi, n_bins)
r = np.sqrt(theta)
x = r * np.cos(theta)
y = r * np.sin(theta)

3. Chromesthesia Color Mapping

Colors follow a chromesthesia mapping — the neurological phenomenon where people "see" sounds as colors:

Low frequencies (bass) → warm reds/oranges
Mid frequencies (voice, guitar) → greens/yellows
High frequencies (cymbals, harmonics) → cool blues/cyans

4. Temporal Features (The Secret Sauce)

Static spectrograms miss the movement of music. I added 5 temporal features, each validated across 1,704 audio samples:

Feature	What it does	Optimal parameter
Melodic trails	Short glowing trails following melody	10 frames, 0.70 decay
Rhythm pulses	Radial pulse on beat hits	0.50 intensity, 0.25 decay
Harmonic auras	Sustained glow for held chords	4.0s blend time
Atmospheric context	Background mood from 60s window	0.35 influence
Harmonic connections	Lines between harmonically related notes	Octave + fifth detection

Why Harmony Looks Beautiful

This is the magical part. When notes are harmonically related (octaves, fifths, thirds), they land at symmetric positions on the spiral. A major chord creates a visually balanced, symmetric pattern. Dissonance creates asymmetric, chaotic (but still beautiful) patterns.

Different musical traditions create remarkably different visual signatures:

Classical harmony → orderly radial symmetry
Arabic maqam → quarter-tone asymmetry with unique geometric beauty
EDM/electronic → explosive, pulsing energy patterns

Try It: The Wellspring

I also built a crowdsourcing platform called The Wellspring where people can rate how well these visualizations capture the music. The goal: build an open dataset for AI-powered audio visualization evaluation.

Tech Stack

Audio analysis: scipy (FFT), librosa
Rendering: PIL (2D), PyVista (3D optional)
Video encoding: FFmpeg (H.264, CRF 18, 60 FPS)
Web platform: React 18 + TypeScript, Node/Express, PostgreSQL

What's Next

I'm working on browser-based creation tools so anyone can create their own audio-visual harmony — no installation needed. The vision: a global community of creators exploring the intersection of sound and moving image.

The ancient dance between rhythm and movement, renewed with modern tools.

Channel: youtube.com/@NivDvir-ND
The Wellspring: synesthesia-labeler.onrender.com

I'd love to hear your thoughts — especially from anyone working on audio visualization, creative coding, or signal processing!

Top comments (2)

Niv Dvir • Apr 2

@sudipmondal2002 Thanks for the kind words! The cochlear spiral mapping is really fascinating — it's amazing how closely the frequency-to-position relationship in the cochlea maps to musical perception. Your Amplitron project sounds interesting — real-time audio processing is such a rewarding domain to work in. The visualization side definitely helps debug and understand what the signal processing is actually doing. Cheers!

Some comments may only be visible to logged-in visitors. Sign in to view all comments.