Vu Hung Nguyen (Hưng)

Posted on Nov 16

The Mathematics and Engineering Behind 3D Heart Animations Synchronized with Music

#maths #animation #music #python

The Mathematics and Engineering Behind 3D Heart Animations Synchronized with Music

Introduction

What happens when you combine parametric equations, audio signal processing, and real-time visualization? You get mesmerizing 3D heart animations that dance to music. This project started as a simple mathematical visualization and evolved into a comprehensive system for creating music-synchronized animations using Python, NumPy, Matplotlib, and librosa.

In this deep-dive, I'll walk you through the technical journey of building this system, from the mathematical foundations of the 3D heart shape to the audio engineering that makes hearts pulse in perfect sync with music.

Demo Videos:

The Key Ideas

At its core, this project solves three main challenges:

Mathematical Modeling: Creating a 3D heart shape using parametric equations
Audio Analysis: Extracting musical features (beats, tempo, loudness, bass, onsets) from audio files
Real-Time Synchronization: Mapping audio features to visual transformations in real-time

The magic happens when these three systems work together, each frame of the animation queries the audio features at that exact moment in time, creating a synchronized experience where the heart "feels" the music.

The Mathematics: Parametric Heart Equations

The 3D heart shape is defined by parametric equations using two parameters: u ∈ [0, π] and v ∈ [0, 2π].

The Core Equations

x = sin(u) · (15·sin(v) - 4·sin(3v))
y = 8·cos(u)
z = sin(u) · (15·cos(v) - 5·cos(2v) - 2·cos(3v) - cos(v))

These equations create a heart shape by:

Using sin(u) as a scaling factor that varies from 0 to 1 and back to 0
Combining multiple harmonics of sin(v) and cos(v) to create the heart's characteristic curves
The coefficients (15, -4, -5, -2, -1) control the shape's proportions

Implementation in Python

# Generate parameter grids
u = np.linspace(0, np.pi, u_points)
v = np.linspace(0, 2 * np.pi, v_points)
u_grid, v_grid = np.meshgrid(u, v)

# Flatten for scatter plot
u_flat = u_grid.flatten()
v_flat = v_grid.flatten()

# Apply parametric equations
x = np.sin(u_flat) * (15 * np.sin(v_flat) - 4 * np.sin(3 * v_flat))
y = 8 * np.cos(u_flat)
z = np.sin(u_flat) * (
    15 * np.cos(v_flat) +
    -5 * np.cos(2 * v_flat) +
    -2 * np.cos(3 * v_flat) +
    -1 * np.cos(v_flat)
)

3D Rotation

To rotate the heart around the Y-axis, we apply a rotation matrix:

alpha_deg = frame * 360 * tempo_factor / total_frames
alpha_rad = np.deg2rad(alpha_deg)

x_rotated = x * np.cos(alpha_rad) + z * np.sin(alpha_rad)
y_rotated = y  # Y-axis rotation doesn't affect Y coordinate
z_rotated = -x * np.sin(alpha_rad) + z * np.cos(alpha_rad)

The tempo_factor adapts rotation speed based on the music's tempo, making the heart rotate faster during upbeat sections.

The Sound Engineering: Audio Feature Extraction with Librosa

Librosa is a Python library for music and audio analysis. It provides the tools we need to extract musical features that drive our visualizations.

Audio Analysis Pipeline

The analyze_audio.py script performs a comprehensive analysis:

Beat Detection: Identifies the rhythmic pulse of the music
Tempo Tracking: Calculates BPM and tracks tempo changes over time
Onset Detection: Finds moments when new sounds begin
RMS Energy (Loudness): Measures overall volume over time
Bass Analysis: Extracts low-frequency energy

Beat Detection

tempo, beat_frames = librosa.beat.beat_track(y=y, sr=sr)
beat_times = librosa.frames_to_time(beat_frames, sr=sr)

This gives us a list of timestamps where beats occur, which we use to make the heart pulse.

Dynamic Tempo Tracking

For longer pieces with tempo changes, we track tempo over time:

tempo_times, tempo_values = librosa.beat.tempo(
    y=y, sr=sr, 
    aggregate=np.median,
    hop_length=512
)

This creates a time series of tempo values, allowing the heart's rotation speed to adapt to tempo changes throughout the song.

RMS Energy (Loudness)

rms = librosa.feature.rms(y=y)[0]
rms_times = librosa.frames_to_time(range(len(rms)), sr=sr)
# Normalize to 0-1 range
rms_normalized = (rms - rms.min()) / (rms.max() - rms.min())

RMS energy measures the "loudness" at each moment. We use this to control camera zoom - louder sections bring the camera closer for a more intimate view.

Bass Extraction

# Extract bass frequencies (typically 20-250 Hz)
S = librosa.stft(y)
bass_freqs = np.where((freqs >= 20) & (freqs <= 250))[0]
bass_energy = np.sum(np.abs(S[bass_freqs, :]), axis=0)

Bass frequencies create the "thump" in music. We use bass strength to control the heart's brightness - more bass means a brighter, more glowing heart.

Onset Detection

onset_frames = librosa.onset.onset_detect(y=y, sr=sr, units='frames')
onset_times = librosa.frames_to_time(onset_frames, sr=sr)

Onsets mark the beginning of new musical events. Strong onsets trigger additional heartbeat pulses, creating visual emphasis on musical accents.

Output: JSON Feature File

All extracted features are saved to a JSON file:

{
  "beat_times": [0.1, 0.5, 0.9, ...],
  "tempo_times": [0.0, 0.5, 1.0, ...],
  "tempo_values": [120.5, 121.2, 120.8, ...],
  "rms_times": [0.0, 0.023, 0.046, ...],
  "rms_values": [0.3, 0.5, 0.4, ...],
  "bass_times": [0.0, 0.023, 0.046, ...],
  "bass_values": [0.2, 0.6, 0.3, ...],
  "onset_times": [0.15, 0.8, 1.2, ...]
}

This JSON file becomes the "score" that the animation follows.

Synchronizing Sound and Visuals

The synchronization happens in real-time during animation rendering. Each frame queries the audio features at that exact moment.

The Update Function

Every effect implements an update(frame) function that:

Calculates the current time: current_second = frame / fps
Queries audio features at that time
Applies transformations based on those features
Updates the 3D scatter plot

Audio Feature Queries

def get_beat_intensity(current_time, beat_times, window=0.1):
    """Returns 0-1 intensity based on proximity to nearest beat."""
    distances = np.abs(np.array(beat_times) - current_time)
    nearest_distance = np.min(distances)

    if nearest_distance < window:
        intensity = 1.0 - (nearest_distance / window)
        return float(intensity)
    return 0.0

This function checks if we're near a beat. If we're within 0.1 seconds of a beat, it returns an intensity value (1.0 = exactly on beat, 0.0 = far from beat).

Applying Audio Features to Visuals

# Heartbeat pulse on beats
heartbeat_scale = 1.0
if beat_intensity > 0:
    heartbeat_scale = 1.0 + 0.2 * beat_intensity  # Pulse 20% larger

# Apply scaling
x_rotated = x_base * heartbeat_scale
y_rotated = y_base * heartbeat_scale
z_rotated = z_base * heartbeat_scale

# Brightness responds to bass
alpha = 0.5 + 0.4 * bass  # More bass = brighter

# Zoom responds to loudness
zoom_factor = base_zoom - 3 * loudness  # Louder = closer

Each audio feature maps to a visual property:

Beats → Heartbeat pulse (scale)
Tempo → Rotation speed
Loudness → Camera zoom
Bass → Brightness/alpha
Onsets → Additional pulses

FFmpeg: Combining Video and Audio

Matplotlib's FFMpegWriter creates video-only files. We use FFmpeg to combine the video with the original audio track.

Video Generation

from matplotlib.animation import FFMpegWriter

writer = FFMpegWriter(fps=30, bitrate=5000)
anim.save('outputs/video.mp4', writer=writer)

This creates a silent video file with the animation.

Audio Combination

ffmpeg -i outputs/video.mp4 \
       -i inputs/audio.mp3 \
       -c:v copy \
       -c:a aac \
       -b:a 192k \
       -shortest \
       -y \
       outputs/final_video+audio.mp4

Key FFmpeg options:

-c:v copy: Copy video stream (no re-encoding, faster)
-c:a aac: Encode audio as AAC
-b:a 192k: Audio bitrate (192 kbps is good quality)
-shortest: End when shortest stream ends (syncs video and audio length)
-y: Overwrite output file without asking

Automated Workflow

PowerShell build scripts automate the entire process:

# Step 1: Analyze audio
python analyze_audio.py inputs/song.mp3

# Step 2: Generate video
python heart_animation.py --effect I2 \
    --audio-features song_features.json \
    --resolution large \
    --output outputs/video.mp4

# Step 3: Combine with audio
ffmpeg -i outputs/video.mp4 -i inputs/song.mp3 \
    -c:v copy -c:a aac -b:a 192k -shortest \
    outputs/final.mp4

Python and Matplotlib: The Visualization Engine

Matplotlib's 3D Scatter Plot

from mpl_toolkits.mplot3d import Axes3D
import matplotlib.pyplot as plt

fig = plt.figure(figsize=(width, height), dpi=dpi)
ax = fig.add_subplot(111, projection='3d')

# Create scatter plot
scatter = ax.scatter(x, y, z, c=colors, cmap='magma', s=1, alpha=0.8)

# Set view angle
ax.view_init(elev=30, azim=45)

# Set axis limits (controls zoom)
ax.set_xlim([-zoom_factor, zoom_factor])
ax.set_ylim([-zoom_factor, zoom_factor])
ax.set_zlim([-zoom_factor, zoom_factor])

Animation with FuncAnimation

from matplotlib.animation import FuncAnimation

def update(frame):
    # Calculate current time
    current_second = frame / fps

    # Query audio features
    beat_intensity = get_beat_intensity(current_second, beat_times)
    loudness = get_loudness_at_time(current_second, rms_times, rms_values)

    # Apply transformations
    # ... rotation, scaling, camera movement ...

    # Update scatter plot
    scatter._offsets3d = (x_new, y_new, z_new)
    scatter.set_alpha(alpha)

    # Update camera
    ax.view_init(elev=elevation, azim=azimuth)
    ax.set_xlim([-zoom, zoom])

    return scatter,

anim = FuncAnimation(fig, update, frames=total_frames, 
                    interval=1000/fps, blit=False)

Key points:

blit=False: Required for 3D plots (blitting doesn't work with 3D)
interval=1000/fps: Controls playback speed (33ms = 30 fps)
Each frame calls update() with the frame number

Performance Optimization

For multi-heart effects (I2, I3), we use lower point density:

density_multipliers = {
    'lower': 0.35,   # ~5,000 points per heart
    'low': 0.5,      # ~10,000 points
    'medium': 0.75,  # ~22,500 points
    'high': 1.0      # ~40,000 points
}

With 5 hearts at 40,000 points each = 200,000 points to render per frame. Using lower density reduces this to 25,000 points, making rendering feasible.

Putting It All Together: The Complete Pipeline

Audio Analysis (librosa)
- Input: Audio file (MP3, WAV, etc.)
- Output: JSON file with beat times, tempo, loudness, bass, onsets
Heart Generation (NumPy)
- Input: Density setting, formula coefficients
- Output: 3D point cloud (x, y, z coordinates)
Animation Rendering (Matplotlib)
- Input: Heart points, audio features JSON, effect configuration
- Process: For each frame, query audio features and apply transformations
- Output: Video file (MP4, no audio)
Audio Combination (FFmpeg)
- Input: Video file + original audio file
- Output: Final video with synchronized audio

Effect H9: Cuba to New Orleans - A Musical Journey

Duration: ~698 seconds (11.6 minutes)

H9 is an epic journey through a single piece of music, featuring strategic "through-heart" passages where the camera passes through the heart's core at key musical moments.

Technical Highlights

Dynamic Tempo Adaptation: Rotation speed adapts to tempo changes throughout the 11-minute piece
Through-Heart Passages: Camera zooms through the heart at specific timestamps, synchronized with musical transitions
Phase-Based Narrative: 14 distinct phases, each responding to different sections of the music
Multi-Feature Sync: Combines beats, tempo, loudness, bass, and onsets for comprehensive audio response

Implementation Details

# Phase 3: First through-heart passage (100-120s)
if current_second < 120.0:
    phase_t = (current_second - 100.0) / 20.0
    # Zoom through heart: 50 → 5 → 50
    zoom_factor = 50 - 45 * np.sin(np.pi * phase_t)
    # Camera passes through
    elevation = 20 + 10 * np.sin(2 * np.pi * phase_t)

The through-heart effect is achieved by zooming the camera from a distance (zoom=50) to very close (zoom=5) and back, creating the illusion of passing through the heart.

Effect I2: Five Hearts - Comprehensive Audio Feature Synchronization

Duration: Matches audio file length

I2 demonstrates multi-heart visualization where each heart responds to a different audio feature, creating a visual symphony.

The Five Hearts

Heart 1 (Beats): Pulses on detected beats, magma colormap
Heart 2 (Tempo): Rotation speed adapts to tempo, YlOrRd colormap
Heart 3 (Loudness): Scales with RMS energy, Blues colormap
Heart 4 (Bass): Brightness responds to bass, Greens colormap
Heart 5 (Onsets): Pulses on musical events, Purples colormap

Technical Implementation

# Each heart gets its own scatter plot
scatter1 = ax.scatter(x1, y1, z1, c=colors1, cmap='magma', s=1, alpha=0.8)
scatter2 = ax.scatter(x2, y2, z2, c=colors2, cmap='YlOrRd', s=1, alpha=0.6)
# ... etc for hearts 3, 4, 5

# In update function, process each heart independently
for i, (scatter, x_orig, y_orig, z_orig) in enumerate(hearts):
    # Assign feature based on heart index
    feature_type = i % 5

    if feature_type == 0:  # Beats
        scale = 1.0 + 0.2 * beat_intensity
    elif feature_type == 1:  # Tempo
        tempo_factor = current_tempo / 75.0
        rotation_speed = tempo_factor
    # ... etc

    # Update this heart's scatter plot
    scatter._offsets3d = (x_new, y_new, z_new)

Camera Strategy

The camera switches between modes:

Multi-heart frame: Wide view showing all 5 hearts
Individual focus: Close-up on one heart at a time
Orbital motion: Camera orbits around the center

Effect I3: Birthday Celebration - 11 Hearts to 16 Hearts

Duration: Matches audio file length (typically ~60 seconds for "Happy Birthday")

I3 is a special celebration effect that transitions from 11 hearts to 16 hearts, with number displays (11, 16, 2025) appearing at strategic moments.

The Two-Phase Design

Phase 1 (0-50%): 11 hearts in circular formation

1 center heart + 5 inner circle + 5 outer circle
Each heart syncs to different audio features

Phase 2 (50-100%): 16 hearts in 4x4 grid

Smooth transition adds 5 more hearts
More complex interactions and patterns

Number Display

Numbers are displayed as text overlays using Matplotlib's text annotation:

# Show "11" during 11-heart phase
if current_second < phase5_end and current_second >= total_duration * 0.11:
    fig.text(0.5, 0.85, '11', 
             fontsize=72, ha='center', va='center',
             color='white', alpha=0.7, weight='bold')

Heart Positioning

def _get_heart_positions_11(self):
    """11 hearts: 1 center + 5 inner + 5 outer"""
    positions = [(0, 0, 0)]  # Center

    # Inner circle
    for i in range(5):
        angle = 2 * np.pi * i / 5
        x = 20 * np.cos(angle)
        z = 20 * np.sin(angle)
        positions.append((x, 0, z))

    # Outer circle
    for i in range(5):
        angle = 2 * np.pi * i / 5 + np.pi / 5
        x = 35 * np.cos(angle)
        z = 35 * np.sin(angle)
        positions.append((x, 0, z))

    return positions

Lessons Learned: The "Vibe Coding" Approach

This project evolved through what I call "vibe coding", an iterative, creative development process driven by prompts and experimentation rather than strict upfront planning.

The Prompt-Driven Development

Each effect started as a detailed prompt in Prompt.md describing:

Visual narrative (phases, camera movements, transitions)
Audio synchronization strategy
Technical requirements
Implementation notes

These prompts served as both specification and inspiration, allowing the code to evolve organically.

Key Learnings

Start Simple, Iterate Complex
- Effect A (simple rotation) → Effect I3 (16 hearts with number display)
- Each effect built on previous learnings
Audio Features Are Rich
- Initially used only beats
- Discovered tempo, loudness, bass, and onsets each add unique visual dimensions
- Combining multiple features creates more nuanced responses
Performance Matters
- 40,000 points per heart × 16 hearts = 640,000 points per frame
- Lower density (5,000 points) still looks great and renders 128× faster
- Always profile before optimizing
Modular Design Wins
- Base effect class with update(frame) method
- Audio sync functions reusable across effects
- Easy to create new effects by extending the base class
Automation Is Essential
- Build scripts handle: audio analysis → video generation → audio combination
- Saves hours of manual work
- Makes experimentation faster
Documentation as You Go
- Prompt.md captures design decisions
- Code comments explain "why" not just "what"
- Makes revisiting code months later much easier

The Creative Process

Listen to the music - Understand its structure, energy, and emotional arc
Design the narrative - Plan phases that match the music's journey
Map audio to visuals - Decide which features drive which visual properties
Implement and iterate - Code, render, watch, adjust, repeat
Refine timing - Fine-tune phase boundaries and transitions
Optimize performance - Balance quality and render time

This process is more art than science, it requires intuition, experimentation, and patience.

Demo Videos

See these effects in action:

H9: Cuba to New Orleans - An 11-minute musical journey with through-heart passages
I2: Five Hearts - Five hearts, each dancing to different audio features
I3: Birthday Celebration - 11 hearts transform to 16 hearts with number displays

Conclusion

This project demonstrates how mathematics, signal processing, and visualization can combine to create something beautiful. The technical stack, NumPy for math, librosa for audio, Matplotlib for visualization, FFmpeg for video processing, is powerful yet accessible.

The real magic isn't in any single technology, but in how they work together:

Mathematics provides the shape
Audio analysis provides the rhythm
Synchronization provides the connection
Visualization provides the beauty

Whether you're interested in parametric equations, audio signal processing, or creative coding, there's something here to explore. The code is open source, the techniques are documented, and the possibilities are endless.

Next Steps:

Explore the codebase: GitHub Repository
Try creating your own effect
Analyze your favorite music
Make something beautiful

Created with Python, mathematics, and a passion for making music visible.