The Mathematics and Engineering Behind 3D Heart Animations Synchronized with Music
Introduction
What happens when you combine parametric equations, audio signal processing, and real-time visualization? You get mesmerizing 3D heart animations that dance to music. This project started as a simple mathematical visualization and evolved into a comprehensive system for creating music-synchronized animations using Python, NumPy, Matplotlib, and librosa.
In this deep-dive, I'll walk you through the technical journey of building this system, from the mathematical foundations of the 3D heart shape to the audio engineering that makes hearts pulse in perfect sync with music.
Demo Videos:
The Key Ideas
At its core, this project solves three main challenges:
- Mathematical Modeling: Creating a 3D heart shape using parametric equations
- Audio Analysis: Extracting musical features (beats, tempo, loudness, bass, onsets) from audio files
- Real-Time Synchronization: Mapping audio features to visual transformations in real-time
The magic happens when these three systems work together, each frame of the animation queries the audio features at that exact moment in time, creating a synchronized experience where the heart "feels" the music.
The Mathematics: Parametric Heart Equations
The 3D heart shape is defined by parametric equations using two parameters: u ∈ [0, π] and v ∈ [0, 2π].
The Core Equations
x = sin(u) · (15·sin(v) - 4·sin(3v))
y = 8·cos(u)
z = sin(u) · (15·cos(v) - 5·cos(2v) - 2·cos(3v) - cos(v))
These equations create a heart shape by:
- Using
sin(u)as a scaling factor that varies from 0 to 1 and back to 0 - Combining multiple harmonics of
sin(v)andcos(v)to create the heart's characteristic curves - The coefficients (15, -4, -5, -2, -1) control the shape's proportions
Implementation in Python
# Generate parameter grids
u = np.linspace(0, np.pi, u_points)
v = np.linspace(0, 2 * np.pi, v_points)
u_grid, v_grid = np.meshgrid(u, v)
# Flatten for scatter plot
u_flat = u_grid.flatten()
v_flat = v_grid.flatten()
# Apply parametric equations
x = np.sin(u_flat) * (15 * np.sin(v_flat) - 4 * np.sin(3 * v_flat))
y = 8 * np.cos(u_flat)
z = np.sin(u_flat) * (
15 * np.cos(v_flat) +
-5 * np.cos(2 * v_flat) +
-2 * np.cos(3 * v_flat) +
-1 * np.cos(v_flat)
)
3D Rotation
To rotate the heart around the Y-axis, we apply a rotation matrix:
alpha_deg = frame * 360 * tempo_factor / total_frames
alpha_rad = np.deg2rad(alpha_deg)
x_rotated = x * np.cos(alpha_rad) + z * np.sin(alpha_rad)
y_rotated = y # Y-axis rotation doesn't affect Y coordinate
z_rotated = -x * np.sin(alpha_rad) + z * np.cos(alpha_rad)
The tempo_factor adapts rotation speed based on the music's tempo, making the heart rotate faster during upbeat sections.
The Sound Engineering: Audio Feature Extraction with Librosa
Librosa is a Python library for music and audio analysis. It provides the tools we need to extract musical features that drive our visualizations.
Audio Analysis Pipeline
The analyze_audio.py script performs a comprehensive analysis:
- Beat Detection: Identifies the rhythmic pulse of the music
- Tempo Tracking: Calculates BPM and tracks tempo changes over time
- Onset Detection: Finds moments when new sounds begin
- RMS Energy (Loudness): Measures overall volume over time
- Bass Analysis: Extracts low-frequency energy
Beat Detection
tempo, beat_frames = librosa.beat.beat_track(y=y, sr=sr)
beat_times = librosa.frames_to_time(beat_frames, sr=sr)
This gives us a list of timestamps where beats occur, which we use to make the heart pulse.
Dynamic Tempo Tracking
For longer pieces with tempo changes, we track tempo over time:
tempo_times, tempo_values = librosa.beat.tempo(
y=y, sr=sr,
aggregate=np.median,
hop_length=512
)
This creates a time series of tempo values, allowing the heart's rotation speed to adapt to tempo changes throughout the song.
RMS Energy (Loudness)
rms = librosa.feature.rms(y=y)[0]
rms_times = librosa.frames_to_time(range(len(rms)), sr=sr)
# Normalize to 0-1 range
rms_normalized = (rms - rms.min()) / (rms.max() - rms.min())
RMS energy measures the "loudness" at each moment. We use this to control camera zoom - louder sections bring the camera closer for a more intimate view.
Bass Extraction
# Extract bass frequencies (typically 20-250 Hz)
S = librosa.stft(y)
bass_freqs = np.where((freqs >= 20) & (freqs <= 250))[0]
bass_energy = np.sum(np.abs(S[bass_freqs, :]), axis=0)
Bass frequencies create the "thump" in music. We use bass strength to control the heart's brightness - more bass means a brighter, more glowing heart.
Onset Detection
onset_frames = librosa.onset.onset_detect(y=y, sr=sr, units='frames')
onset_times = librosa.frames_to_time(onset_frames, sr=sr)
Onsets mark the beginning of new musical events. Strong onsets trigger additional heartbeat pulses, creating visual emphasis on musical accents.
Output: JSON Feature File
All extracted features are saved to a JSON file:
{
"beat_times": [0.1, 0.5, 0.9, ...],
"tempo_times": [0.0, 0.5, 1.0, ...],
"tempo_values": [120.5, 121.2, 120.8, ...],
"rms_times": [0.0, 0.023, 0.046, ...],
"rms_values": [0.3, 0.5, 0.4, ...],
"bass_times": [0.0, 0.023, 0.046, ...],
"bass_values": [0.2, 0.6, 0.3, ...],
"onset_times": [0.15, 0.8, 1.2, ...]
}
This JSON file becomes the "score" that the animation follows.
Synchronizing Sound and Visuals
The synchronization happens in real-time during animation rendering. Each frame queries the audio features at that exact moment.
The Update Function
Every effect implements an update(frame) function that:
- Calculates the current time:
current_second = frame / fps - Queries audio features at that time
- Applies transformations based on those features
- Updates the 3D scatter plot
Audio Feature Queries
def get_beat_intensity(current_time, beat_times, window=0.1):
"""Returns 0-1 intensity based on proximity to nearest beat."""
distances = np.abs(np.array(beat_times) - current_time)
nearest_distance = np.min(distances)
if nearest_distance < window:
intensity = 1.0 - (nearest_distance / window)
return float(intensity)
return 0.0
This function checks if we're near a beat. If we're within 0.1 seconds of a beat, it returns an intensity value (1.0 = exactly on beat, 0.0 = far from beat).
Applying Audio Features to Visuals
# Heartbeat pulse on beats
heartbeat_scale = 1.0
if beat_intensity > 0:
heartbeat_scale = 1.0 + 0.2 * beat_intensity # Pulse 20% larger
# Apply scaling
x_rotated = x_base * heartbeat_scale
y_rotated = y_base * heartbeat_scale
z_rotated = z_base * heartbeat_scale
# Brightness responds to bass
alpha = 0.5 + 0.4 * bass # More bass = brighter
# Zoom responds to loudness
zoom_factor = base_zoom - 3 * loudness # Louder = closer
Each audio feature maps to a visual property:
- Beats → Heartbeat pulse (scale)
- Tempo → Rotation speed
- Loudness → Camera zoom
- Bass → Brightness/alpha
- Onsets → Additional pulses
FFmpeg: Combining Video and Audio
Matplotlib's FFMpegWriter creates video-only files. We use FFmpeg to combine the video with the original audio track.
Video Generation
from matplotlib.animation import FFMpegWriter
writer = FFMpegWriter(fps=30, bitrate=5000)
anim.save('outputs/video.mp4', writer=writer)
This creates a silent video file with the animation.
Audio Combination
ffmpeg -i outputs/video.mp4 \
-i inputs/audio.mp3 \
-c:v copy \
-c:a aac \
-b:a 192k \
-shortest \
-y \
outputs/final_video+audio.mp4
Key FFmpeg options:
-
-c:v copy: Copy video stream (no re-encoding, faster) -
-c:a aac: Encode audio as AAC -
-b:a 192k: Audio bitrate (192 kbps is good quality) -
-shortest: End when shortest stream ends (syncs video and audio length) -
-y: Overwrite output file without asking
Automated Workflow
PowerShell build scripts automate the entire process:
# Step 1: Analyze audio
python analyze_audio.py inputs/song.mp3
# Step 2: Generate video
python heart_animation.py --effect I2 \
--audio-features song_features.json \
--resolution large \
--output outputs/video.mp4
# Step 3: Combine with audio
ffmpeg -i outputs/video.mp4 -i inputs/song.mp3 \
-c:v copy -c:a aac -b:a 192k -shortest \
outputs/final.mp4
Python and Matplotlib: The Visualization Engine
Matplotlib's 3D Scatter Plot
from mpl_toolkits.mplot3d import Axes3D
import matplotlib.pyplot as plt
fig = plt.figure(figsize=(width, height), dpi=dpi)
ax = fig.add_subplot(111, projection='3d')
# Create scatter plot
scatter = ax.scatter(x, y, z, c=colors, cmap='magma', s=1, alpha=0.8)
# Set view angle
ax.view_init(elev=30, azim=45)
# Set axis limits (controls zoom)
ax.set_xlim([-zoom_factor, zoom_factor])
ax.set_ylim([-zoom_factor, zoom_factor])
ax.set_zlim([-zoom_factor, zoom_factor])
Animation with FuncAnimation
from matplotlib.animation import FuncAnimation
def update(frame):
# Calculate current time
current_second = frame / fps
# Query audio features
beat_intensity = get_beat_intensity(current_second, beat_times)
loudness = get_loudness_at_time(current_second, rms_times, rms_values)
# Apply transformations
# ... rotation, scaling, camera movement ...
# Update scatter plot
scatter._offsets3d = (x_new, y_new, z_new)
scatter.set_alpha(alpha)
# Update camera
ax.view_init(elev=elevation, azim=azimuth)
ax.set_xlim([-zoom, zoom])
return scatter,
anim = FuncAnimation(fig, update, frames=total_frames,
interval=1000/fps, blit=False)
Key points:
-
blit=False: Required for 3D plots (blitting doesn't work with 3D) -
interval=1000/fps: Controls playback speed (33ms = 30 fps) - Each frame calls
update()with the frame number
Performance Optimization
For multi-heart effects (I2, I3), we use lower point density:
density_multipliers = {
'lower': 0.35, # ~5,000 points per heart
'low': 0.5, # ~10,000 points
'medium': 0.75, # ~22,500 points
'high': 1.0 # ~40,000 points
}
With 5 hearts at 40,000 points each = 200,000 points to render per frame. Using lower density reduces this to 25,000 points, making rendering feasible.
Putting It All Together: The Complete Pipeline
-
Audio Analysis (librosa)
- Input: Audio file (MP3, WAV, etc.)
- Output: JSON file with beat times, tempo, loudness, bass, onsets
-
Heart Generation (NumPy)
- Input: Density setting, formula coefficients
- Output: 3D point cloud (x, y, z coordinates)
-
Animation Rendering (Matplotlib)
- Input: Heart points, audio features JSON, effect configuration
- Process: For each frame, query audio features and apply transformations
- Output: Video file (MP4, no audio)
-
Audio Combination (FFmpeg)
- Input: Video file + original audio file
- Output: Final video with synchronized audio
Effect H9: Cuba to New Orleans - A Musical Journey
Duration: ~698 seconds (11.6 minutes)
H9 is an epic journey through a single piece of music, featuring strategic "through-heart" passages where the camera passes through the heart's core at key musical moments.
Technical Highlights
- Dynamic Tempo Adaptation: Rotation speed adapts to tempo changes throughout the 11-minute piece
- Through-Heart Passages: Camera zooms through the heart at specific timestamps, synchronized with musical transitions
- Phase-Based Narrative: 14 distinct phases, each responding to different sections of the music
- Multi-Feature Sync: Combines beats, tempo, loudness, bass, and onsets for comprehensive audio response
Implementation Details
# Phase 3: First through-heart passage (100-120s)
if current_second < 120.0:
phase_t = (current_second - 100.0) / 20.0
# Zoom through heart: 50 → 5 → 50
zoom_factor = 50 - 45 * np.sin(np.pi * phase_t)
# Camera passes through
elevation = 20 + 10 * np.sin(2 * np.pi * phase_t)
The through-heart effect is achieved by zooming the camera from a distance (zoom=50) to very close (zoom=5) and back, creating the illusion of passing through the heart.
Effect I2: Five Hearts - Comprehensive Audio Feature Synchronization
Duration: Matches audio file length
I2 demonstrates multi-heart visualization where each heart responds to a different audio feature, creating a visual symphony.
The Five Hearts
- Heart 1 (Beats): Pulses on detected beats, magma colormap
- Heart 2 (Tempo): Rotation speed adapts to tempo, YlOrRd colormap
- Heart 3 (Loudness): Scales with RMS energy, Blues colormap
- Heart 4 (Bass): Brightness responds to bass, Greens colormap
- Heart 5 (Onsets): Pulses on musical events, Purples colormap
Technical Implementation
# Each heart gets its own scatter plot
scatter1 = ax.scatter(x1, y1, z1, c=colors1, cmap='magma', s=1, alpha=0.8)
scatter2 = ax.scatter(x2, y2, z2, c=colors2, cmap='YlOrRd', s=1, alpha=0.6)
# ... etc for hearts 3, 4, 5
# In update function, process each heart independently
for i, (scatter, x_orig, y_orig, z_orig) in enumerate(hearts):
# Assign feature based on heart index
feature_type = i % 5
if feature_type == 0: # Beats
scale = 1.0 + 0.2 * beat_intensity
elif feature_type == 1: # Tempo
tempo_factor = current_tempo / 75.0
rotation_speed = tempo_factor
# ... etc
# Update this heart's scatter plot
scatter._offsets3d = (x_new, y_new, z_new)
Camera Strategy
The camera switches between modes:
- Multi-heart frame: Wide view showing all 5 hearts
- Individual focus: Close-up on one heart at a time
- Orbital motion: Camera orbits around the center
Effect I3: Birthday Celebration - 11 Hearts to 16 Hearts
Duration: Matches audio file length (typically ~60 seconds for "Happy Birthday")
I3 is a special celebration effect that transitions from 11 hearts to 16 hearts, with number displays (11, 16, 2025) appearing at strategic moments.
The Two-Phase Design
Phase 1 (0-50%): 11 hearts in circular formation
- 1 center heart + 5 inner circle + 5 outer circle
- Each heart syncs to different audio features
Phase 2 (50-100%): 16 hearts in 4x4 grid
- Smooth transition adds 5 more hearts
- More complex interactions and patterns
Number Display
Numbers are displayed as text overlays using Matplotlib's text annotation:
# Show "11" during 11-heart phase
if current_second < phase5_end and current_second >= total_duration * 0.11:
fig.text(0.5, 0.85, '11',
fontsize=72, ha='center', va='center',
color='white', alpha=0.7, weight='bold')
Heart Positioning
def _get_heart_positions_11(self):
"""11 hearts: 1 center + 5 inner + 5 outer"""
positions = [(0, 0, 0)] # Center
# Inner circle
for i in range(5):
angle = 2 * np.pi * i / 5
x = 20 * np.cos(angle)
z = 20 * np.sin(angle)
positions.append((x, 0, z))
# Outer circle
for i in range(5):
angle = 2 * np.pi * i / 5 + np.pi / 5
x = 35 * np.cos(angle)
z = 35 * np.sin(angle)
positions.append((x, 0, z))
return positions
Lessons Learned: The "Vibe Coding" Approach
This project evolved through what I call "vibe coding", an iterative, creative development process driven by prompts and experimentation rather than strict upfront planning.
The Prompt-Driven Development
Each effect started as a detailed prompt in Prompt.md describing:
- Visual narrative (phases, camera movements, transitions)
- Audio synchronization strategy
- Technical requirements
- Implementation notes
These prompts served as both specification and inspiration, allowing the code to evolve organically.
Key Learnings
-
Start Simple, Iterate Complex
- Effect A (simple rotation) → Effect I3 (16 hearts with number display)
- Each effect built on previous learnings
-
Audio Features Are Rich
- Initially used only beats
- Discovered tempo, loudness, bass, and onsets each add unique visual dimensions
- Combining multiple features creates more nuanced responses
-
Performance Matters
- 40,000 points per heart × 16 hearts = 640,000 points per frame
- Lower density (5,000 points) still looks great and renders 128× faster
- Always profile before optimizing
-
Modular Design Wins
- Base effect class with
update(frame)method - Audio sync functions reusable across effects
- Easy to create new effects by extending the base class
- Base effect class with
-
Automation Is Essential
- Build scripts handle: audio analysis → video generation → audio combination
- Saves hours of manual work
- Makes experimentation faster
-
Documentation as You Go
-
Prompt.mdcaptures design decisions - Code comments explain "why" not just "what"
- Makes revisiting code months later much easier
-
The Creative Process
- Listen to the music - Understand its structure, energy, and emotional arc
- Design the narrative - Plan phases that match the music's journey
- Map audio to visuals - Decide which features drive which visual properties
- Implement and iterate - Code, render, watch, adjust, repeat
- Refine timing - Fine-tune phase boundaries and transitions
- Optimize performance - Balance quality and render time
This process is more art than science, it requires intuition, experimentation, and patience.
Demo Videos
See these effects in action:
- H9: Cuba to New Orleans - An 11-minute musical journey with through-heart passages
- I2: Five Hearts - Five hearts, each dancing to different audio features
- I3: Birthday Celebration - 11 hearts transform to 16 hearts with number displays
Conclusion
This project demonstrates how mathematics, signal processing, and visualization can combine to create something beautiful. The technical stack, NumPy for math, librosa for audio, Matplotlib for visualization, FFmpeg for video processing, is powerful yet accessible.
The real magic isn't in any single technology, but in how they work together:
- Mathematics provides the shape
- Audio analysis provides the rhythm
- Synchronization provides the connection
- Visualization provides the beauty
Whether you're interested in parametric equations, audio signal processing, or creative coding, there's something here to explore. The code is open source, the techniques are documented, and the possibilities are endless.
Next Steps:
- Explore the codebase: GitHub Repository
- Try creating your own effect
- Analyze your favorite music
- Make something beautiful
Created with Python, mathematics, and a passion for making music visible.
Top comments (0)