DEV Community

Vu Hung Nguyen (Hưng)
Vu Hung Nguyen (Hưng)

Posted on

The Mathematics and Engineering Behind 3D Heart Animations Synchronized with Music

The Mathematics and Engineering Behind 3D Heart Animations Synchronized with Music

Introduction

What happens when you combine parametric equations, audio signal processing, and real-time visualization? You get mesmerizing 3D heart animations that dance to music. This project started as a simple mathematical visualization and evolved into a comprehensive system for creating music-synchronized animations using Python, NumPy, Matplotlib, and librosa.

In this deep-dive, I'll walk you through the technical journey of building this system, from the mathematical foundations of the 3D heart shape to the audio engineering that makes hearts pulse in perfect sync with music.

Demo Videos:


The Key Ideas

At its core, this project solves three main challenges:

  1. Mathematical Modeling: Creating a 3D heart shape using parametric equations
  2. Audio Analysis: Extracting musical features (beats, tempo, loudness, bass, onsets) from audio files
  3. Real-Time Synchronization: Mapping audio features to visual transformations in real-time

The magic happens when these three systems work together, each frame of the animation queries the audio features at that exact moment in time, creating a synchronized experience where the heart "feels" the music.


The Mathematics: Parametric Heart Equations

The 3D heart shape is defined by parametric equations using two parameters: u ∈ [0, π] and v ∈ [0, 2π].

The Core Equations

x = sin(u) · (15·sin(v) - 4·sin(3v))
y = 8·cos(u)
z = sin(u) · (15·cos(v) - 5·cos(2v) - 2·cos(3v) - cos(v))
Enter fullscreen mode Exit fullscreen mode

These equations create a heart shape by:

  • Using sin(u) as a scaling factor that varies from 0 to 1 and back to 0
  • Combining multiple harmonics of sin(v) and cos(v) to create the heart's characteristic curves
  • The coefficients (15, -4, -5, -2, -1) control the shape's proportions

Implementation in Python

# Generate parameter grids
u = np.linspace(0, np.pi, u_points)
v = np.linspace(0, 2 * np.pi, v_points)
u_grid, v_grid = np.meshgrid(u, v)

# Flatten for scatter plot
u_flat = u_grid.flatten()
v_flat = v_grid.flatten()

# Apply parametric equations
x = np.sin(u_flat) * (15 * np.sin(v_flat) - 4 * np.sin(3 * v_flat))
y = 8 * np.cos(u_flat)
z = np.sin(u_flat) * (
    15 * np.cos(v_flat) +
    -5 * np.cos(2 * v_flat) +
    -2 * np.cos(3 * v_flat) +
    -1 * np.cos(v_flat)
)
Enter fullscreen mode Exit fullscreen mode

3D Rotation

To rotate the heart around the Y-axis, we apply a rotation matrix:

alpha_deg = frame * 360 * tempo_factor / total_frames
alpha_rad = np.deg2rad(alpha_deg)

x_rotated = x * np.cos(alpha_rad) + z * np.sin(alpha_rad)
y_rotated = y  # Y-axis rotation doesn't affect Y coordinate
z_rotated = -x * np.sin(alpha_rad) + z * np.cos(alpha_rad)
Enter fullscreen mode Exit fullscreen mode

The tempo_factor adapts rotation speed based on the music's tempo, making the heart rotate faster during upbeat sections.


The Sound Engineering: Audio Feature Extraction with Librosa

Librosa is a Python library for music and audio analysis. It provides the tools we need to extract musical features that drive our visualizations.

Audio Analysis Pipeline

The analyze_audio.py script performs a comprehensive analysis:

  1. Beat Detection: Identifies the rhythmic pulse of the music
  2. Tempo Tracking: Calculates BPM and tracks tempo changes over time
  3. Onset Detection: Finds moments when new sounds begin
  4. RMS Energy (Loudness): Measures overall volume over time
  5. Bass Analysis: Extracts low-frequency energy

Beat Detection

tempo, beat_frames = librosa.beat.beat_track(y=y, sr=sr)
beat_times = librosa.frames_to_time(beat_frames, sr=sr)
Enter fullscreen mode Exit fullscreen mode

This gives us a list of timestamps where beats occur, which we use to make the heart pulse.

Dynamic Tempo Tracking

For longer pieces with tempo changes, we track tempo over time:

tempo_times, tempo_values = librosa.beat.tempo(
    y=y, sr=sr, 
    aggregate=np.median,
    hop_length=512
)
Enter fullscreen mode Exit fullscreen mode

This creates a time series of tempo values, allowing the heart's rotation speed to adapt to tempo changes throughout the song.

RMS Energy (Loudness)

rms = librosa.feature.rms(y=y)[0]
rms_times = librosa.frames_to_time(range(len(rms)), sr=sr)
# Normalize to 0-1 range
rms_normalized = (rms - rms.min()) / (rms.max() - rms.min())
Enter fullscreen mode Exit fullscreen mode

RMS energy measures the "loudness" at each moment. We use this to control camera zoom - louder sections bring the camera closer for a more intimate view.

Bass Extraction

# Extract bass frequencies (typically 20-250 Hz)
S = librosa.stft(y)
bass_freqs = np.where((freqs >= 20) & (freqs <= 250))[0]
bass_energy = np.sum(np.abs(S[bass_freqs, :]), axis=0)
Enter fullscreen mode Exit fullscreen mode

Bass frequencies create the "thump" in music. We use bass strength to control the heart's brightness - more bass means a brighter, more glowing heart.

Onset Detection

onset_frames = librosa.onset.onset_detect(y=y, sr=sr, units='frames')
onset_times = librosa.frames_to_time(onset_frames, sr=sr)
Enter fullscreen mode Exit fullscreen mode

Onsets mark the beginning of new musical events. Strong onsets trigger additional heartbeat pulses, creating visual emphasis on musical accents.

Output: JSON Feature File

All extracted features are saved to a JSON file:

{
  "beat_times": [0.1, 0.5, 0.9, ...],
  "tempo_times": [0.0, 0.5, 1.0, ...],
  "tempo_values": [120.5, 121.2, 120.8, ...],
  "rms_times": [0.0, 0.023, 0.046, ...],
  "rms_values": [0.3, 0.5, 0.4, ...],
  "bass_times": [0.0, 0.023, 0.046, ...],
  "bass_values": [0.2, 0.6, 0.3, ...],
  "onset_times": [0.15, 0.8, 1.2, ...]
}
Enter fullscreen mode Exit fullscreen mode

This JSON file becomes the "score" that the animation follows.


Synchronizing Sound and Visuals

The synchronization happens in real-time during animation rendering. Each frame queries the audio features at that exact moment.

The Update Function

Every effect implements an update(frame) function that:

  1. Calculates the current time: current_second = frame / fps
  2. Queries audio features at that time
  3. Applies transformations based on those features
  4. Updates the 3D scatter plot

Audio Feature Queries

def get_beat_intensity(current_time, beat_times, window=0.1):
    """Returns 0-1 intensity based on proximity to nearest beat."""
    distances = np.abs(np.array(beat_times) - current_time)
    nearest_distance = np.min(distances)

    if nearest_distance < window:
        intensity = 1.0 - (nearest_distance / window)
        return float(intensity)
    return 0.0
Enter fullscreen mode Exit fullscreen mode

This function checks if we're near a beat. If we're within 0.1 seconds of a beat, it returns an intensity value (1.0 = exactly on beat, 0.0 = far from beat).

Applying Audio Features to Visuals

# Heartbeat pulse on beats
heartbeat_scale = 1.0
if beat_intensity > 0:
    heartbeat_scale = 1.0 + 0.2 * beat_intensity  # Pulse 20% larger

# Apply scaling
x_rotated = x_base * heartbeat_scale
y_rotated = y_base * heartbeat_scale
z_rotated = z_base * heartbeat_scale

# Brightness responds to bass
alpha = 0.5 + 0.4 * bass  # More bass = brighter

# Zoom responds to loudness
zoom_factor = base_zoom - 3 * loudness  # Louder = closer
Enter fullscreen mode Exit fullscreen mode

Each audio feature maps to a visual property:

  • Beats → Heartbeat pulse (scale)
  • Tempo → Rotation speed
  • Loudness → Camera zoom
  • Bass → Brightness/alpha
  • Onsets → Additional pulses

FFmpeg: Combining Video and Audio

Matplotlib's FFMpegWriter creates video-only files. We use FFmpeg to combine the video with the original audio track.

Video Generation

from matplotlib.animation import FFMpegWriter

writer = FFMpegWriter(fps=30, bitrate=5000)
anim.save('outputs/video.mp4', writer=writer)
Enter fullscreen mode Exit fullscreen mode

This creates a silent video file with the animation.

Audio Combination

ffmpeg -i outputs/video.mp4 \
       -i inputs/audio.mp3 \
       -c:v copy \
       -c:a aac \
       -b:a 192k \
       -shortest \
       -y \
       outputs/final_video+audio.mp4
Enter fullscreen mode Exit fullscreen mode

Key FFmpeg options:

  • -c:v copy: Copy video stream (no re-encoding, faster)
  • -c:a aac: Encode audio as AAC
  • -b:a 192k: Audio bitrate (192 kbps is good quality)
  • -shortest: End when shortest stream ends (syncs video and audio length)
  • -y: Overwrite output file without asking

Automated Workflow

PowerShell build scripts automate the entire process:

# Step 1: Analyze audio
python analyze_audio.py inputs/song.mp3

# Step 2: Generate video
python heart_animation.py --effect I2 \
    --audio-features song_features.json \
    --resolution large \
    --output outputs/video.mp4

# Step 3: Combine with audio
ffmpeg -i outputs/video.mp4 -i inputs/song.mp3 \
    -c:v copy -c:a aac -b:a 192k -shortest \
    outputs/final.mp4
Enter fullscreen mode Exit fullscreen mode

Python and Matplotlib: The Visualization Engine

Matplotlib's 3D Scatter Plot

from mpl_toolkits.mplot3d import Axes3D
import matplotlib.pyplot as plt

fig = plt.figure(figsize=(width, height), dpi=dpi)
ax = fig.add_subplot(111, projection='3d')

# Create scatter plot
scatter = ax.scatter(x, y, z, c=colors, cmap='magma', s=1, alpha=0.8)

# Set view angle
ax.view_init(elev=30, azim=45)

# Set axis limits (controls zoom)
ax.set_xlim([-zoom_factor, zoom_factor])
ax.set_ylim([-zoom_factor, zoom_factor])
ax.set_zlim([-zoom_factor, zoom_factor])
Enter fullscreen mode Exit fullscreen mode

Animation with FuncAnimation

from matplotlib.animation import FuncAnimation

def update(frame):
    # Calculate current time
    current_second = frame / fps

    # Query audio features
    beat_intensity = get_beat_intensity(current_second, beat_times)
    loudness = get_loudness_at_time(current_second, rms_times, rms_values)

    # Apply transformations
    # ... rotation, scaling, camera movement ...

    # Update scatter plot
    scatter._offsets3d = (x_new, y_new, z_new)
    scatter.set_alpha(alpha)

    # Update camera
    ax.view_init(elev=elevation, azim=azimuth)
    ax.set_xlim([-zoom, zoom])

    return scatter,

anim = FuncAnimation(fig, update, frames=total_frames, 
                    interval=1000/fps, blit=False)
Enter fullscreen mode Exit fullscreen mode

Key points:

  • blit=False: Required for 3D plots (blitting doesn't work with 3D)
  • interval=1000/fps: Controls playback speed (33ms = 30 fps)
  • Each frame calls update() with the frame number

Performance Optimization

For multi-heart effects (I2, I3), we use lower point density:

density_multipliers = {
    'lower': 0.35,   # ~5,000 points per heart
    'low': 0.5,      # ~10,000 points
    'medium': 0.75,  # ~22,500 points
    'high': 1.0      # ~40,000 points
}
Enter fullscreen mode Exit fullscreen mode

With 5 hearts at 40,000 points each = 200,000 points to render per frame. Using lower density reduces this to 25,000 points, making rendering feasible.


Putting It All Together: The Complete Pipeline

  1. Audio Analysis (librosa)

    • Input: Audio file (MP3, WAV, etc.)
    • Output: JSON file with beat times, tempo, loudness, bass, onsets
  2. Heart Generation (NumPy)

    • Input: Density setting, formula coefficients
    • Output: 3D point cloud (x, y, z coordinates)
  3. Animation Rendering (Matplotlib)

    • Input: Heart points, audio features JSON, effect configuration
    • Process: For each frame, query audio features and apply transformations
    • Output: Video file (MP4, no audio)
  4. Audio Combination (FFmpeg)

    • Input: Video file + original audio file
    • Output: Final video with synchronized audio

Effect H9: Cuba to New Orleans - A Musical Journey

Duration: ~698 seconds (11.6 minutes)

H9 is an epic journey through a single piece of music, featuring strategic "through-heart" passages where the camera passes through the heart's core at key musical moments.

Technical Highlights

  • Dynamic Tempo Adaptation: Rotation speed adapts to tempo changes throughout the 11-minute piece
  • Through-Heart Passages: Camera zooms through the heart at specific timestamps, synchronized with musical transitions
  • Phase-Based Narrative: 14 distinct phases, each responding to different sections of the music
  • Multi-Feature Sync: Combines beats, tempo, loudness, bass, and onsets for comprehensive audio response

Implementation Details

# Phase 3: First through-heart passage (100-120s)
if current_second < 120.0:
    phase_t = (current_second - 100.0) / 20.0
    # Zoom through heart: 50 → 5 → 50
    zoom_factor = 50 - 45 * np.sin(np.pi * phase_t)
    # Camera passes through
    elevation = 20 + 10 * np.sin(2 * np.pi * phase_t)
Enter fullscreen mode Exit fullscreen mode

The through-heart effect is achieved by zooming the camera from a distance (zoom=50) to very close (zoom=5) and back, creating the illusion of passing through the heart.


Effect I2: Five Hearts - Comprehensive Audio Feature Synchronization

Duration: Matches audio file length

I2 demonstrates multi-heart visualization where each heart responds to a different audio feature, creating a visual symphony.

The Five Hearts

  1. Heart 1 (Beats): Pulses on detected beats, magma colormap
  2. Heart 2 (Tempo): Rotation speed adapts to tempo, YlOrRd colormap
  3. Heart 3 (Loudness): Scales with RMS energy, Blues colormap
  4. Heart 4 (Bass): Brightness responds to bass, Greens colormap
  5. Heart 5 (Onsets): Pulses on musical events, Purples colormap

Technical Implementation

# Each heart gets its own scatter plot
scatter1 = ax.scatter(x1, y1, z1, c=colors1, cmap='magma', s=1, alpha=0.8)
scatter2 = ax.scatter(x2, y2, z2, c=colors2, cmap='YlOrRd', s=1, alpha=0.6)
# ... etc for hearts 3, 4, 5

# In update function, process each heart independently
for i, (scatter, x_orig, y_orig, z_orig) in enumerate(hearts):
    # Assign feature based on heart index
    feature_type = i % 5

    if feature_type == 0:  # Beats
        scale = 1.0 + 0.2 * beat_intensity
    elif feature_type == 1:  # Tempo
        tempo_factor = current_tempo / 75.0
        rotation_speed = tempo_factor
    # ... etc

    # Update this heart's scatter plot
    scatter._offsets3d = (x_new, y_new, z_new)
Enter fullscreen mode Exit fullscreen mode

Camera Strategy

The camera switches between modes:

  • Multi-heart frame: Wide view showing all 5 hearts
  • Individual focus: Close-up on one heart at a time
  • Orbital motion: Camera orbits around the center

Effect I3: Birthday Celebration - 11 Hearts to 16 Hearts

Duration: Matches audio file length (typically ~60 seconds for "Happy Birthday")

I3 is a special celebration effect that transitions from 11 hearts to 16 hearts, with number displays (11, 16, 2025) appearing at strategic moments.

The Two-Phase Design

Phase 1 (0-50%): 11 hearts in circular formation

  • 1 center heart + 5 inner circle + 5 outer circle
  • Each heart syncs to different audio features

Phase 2 (50-100%): 16 hearts in 4x4 grid

  • Smooth transition adds 5 more hearts
  • More complex interactions and patterns

Number Display

Numbers are displayed as text overlays using Matplotlib's text annotation:

# Show "11" during 11-heart phase
if current_second < phase5_end and current_second >= total_duration * 0.11:
    fig.text(0.5, 0.85, '11', 
             fontsize=72, ha='center', va='center',
             color='white', alpha=0.7, weight='bold')
Enter fullscreen mode Exit fullscreen mode

Heart Positioning

def _get_heart_positions_11(self):
    """11 hearts: 1 center + 5 inner + 5 outer"""
    positions = [(0, 0, 0)]  # Center

    # Inner circle
    for i in range(5):
        angle = 2 * np.pi * i / 5
        x = 20 * np.cos(angle)
        z = 20 * np.sin(angle)
        positions.append((x, 0, z))

    # Outer circle
    for i in range(5):
        angle = 2 * np.pi * i / 5 + np.pi / 5
        x = 35 * np.cos(angle)
        z = 35 * np.sin(angle)
        positions.append((x, 0, z))

    return positions
Enter fullscreen mode Exit fullscreen mode

Lessons Learned: The "Vibe Coding" Approach

This project evolved through what I call "vibe coding", an iterative, creative development process driven by prompts and experimentation rather than strict upfront planning.

The Prompt-Driven Development

Each effect started as a detailed prompt in Prompt.md describing:

  • Visual narrative (phases, camera movements, transitions)
  • Audio synchronization strategy
  • Technical requirements
  • Implementation notes

These prompts served as both specification and inspiration, allowing the code to evolve organically.

Key Learnings

  1. Start Simple, Iterate Complex

    • Effect A (simple rotation) → Effect I3 (16 hearts with number display)
    • Each effect built on previous learnings
  2. Audio Features Are Rich

    • Initially used only beats
    • Discovered tempo, loudness, bass, and onsets each add unique visual dimensions
    • Combining multiple features creates more nuanced responses
  3. Performance Matters

    • 40,000 points per heart × 16 hearts = 640,000 points per frame
    • Lower density (5,000 points) still looks great and renders 128× faster
    • Always profile before optimizing
  4. Modular Design Wins

    • Base effect class with update(frame) method
    • Audio sync functions reusable across effects
    • Easy to create new effects by extending the base class
  5. Automation Is Essential

    • Build scripts handle: audio analysis → video generation → audio combination
    • Saves hours of manual work
    • Makes experimentation faster
  6. Documentation as You Go

    • Prompt.md captures design decisions
    • Code comments explain "why" not just "what"
    • Makes revisiting code months later much easier

The Creative Process

  1. Listen to the music - Understand its structure, energy, and emotional arc
  2. Design the narrative - Plan phases that match the music's journey
  3. Map audio to visuals - Decide which features drive which visual properties
  4. Implement and iterate - Code, render, watch, adjust, repeat
  5. Refine timing - Fine-tune phase boundaries and transitions
  6. Optimize performance - Balance quality and render time

This process is more art than science, it requires intuition, experimentation, and patience.


Demo Videos

See these effects in action:


Conclusion

This project demonstrates how mathematics, signal processing, and visualization can combine to create something beautiful. The technical stack, NumPy for math, librosa for audio, Matplotlib for visualization, FFmpeg for video processing, is powerful yet accessible.

The real magic isn't in any single technology, but in how they work together:

  • Mathematics provides the shape
  • Audio analysis provides the rhythm
  • Synchronization provides the connection
  • Visualization provides the beauty

Whether you're interested in parametric equations, audio signal processing, or creative coding, there's something here to explore. The code is open source, the techniques are documented, and the possibilities are endless.

Next Steps:

  • Explore the codebase: GitHub Repository
  • Try creating your own effect
  • Analyze your favorite music
  • Make something beautiful

Created with Python, mathematics, and a passion for making music visible.

Top comments (0)