Kunal Chakraborty

Posted on Jan 27

# Building an Emotion-Based Music Recommendation System: When AI Meets Your Mood

Creating a music player is straightforward; building an AI that reads your face and curates the perfect playlist is a whole different challenge. For my latest project, I ventured into the fascinating intersection of computer vision and music technology, creating a system that analyzes your facial expressions and recommends Spotify tracks to match your mood.

While the sleek interface and instant recommendations are what users experience, the real magic happens behind the scenes—training emotion detection models, integrating with Spotify's API, and building a Flask backend that ties it all together. Here's how I brought emotions and algorithms into harmony.

The Tech Stack: AI Meets Music

To build a system that truly understands emotions and delivers perfect music matches, I combined cutting-edge ML libraries with production-ready APIs:

Machine Learning & Computer Vision:

OpenCV: The powerhouse for real-time face detection and video stream processing. Captures frames from the webcam and preprocesses them for the emotion detection model.
TensorFlow/Keras: Built and trained a Convolutional Neural Network (CNN) on the FER-2013 dataset to classify seven emotions: Happy, Sad, Angry, Surprised, Fearful, Disgusted, and Neutral.
DeepFace: Initially experimented with this pre-trained model for quick prototyping before fine-tuning my own model for better accuracy.

Backend & API:

Flask: A lightweight Python framework perfect for serving the ML model and handling API requests. Built RESTful endpoints for emotion detection and music recommendations.
Spotify Web API: The gateway to millions of songs. Used OAuth 2.0 authentication and accessed audio features (valence, energy, tempo) to match songs with detected emotions.
Spotipy: A Python wrapper that made interacting with Spotify's API incredibly simple—from searching tracks to creating custom playlists.

Frontend:

HTML/CSS/JavaScript: A clean, responsive interface with real-time webcam preview and dynamic playlist rendering.
Chart.js: Visualized emotion probabilities and song audio features in beautiful, interactive charts.

The Core Features: From Face to Playlist

Real-Time Emotion Detection

The heart of the system is the emotion detection pipeline. Here's how it works:

Capture: OpenCV accesses the webcam and captures frames at 30 FPS
Detect: Haar Cascade classifier detects faces in each frame
Preprocess: Crop the face, convert to grayscale, resize to 48x48 pixels
Predict: Feed the preprocessed image into the CNN model
Classify: Model outputs probabilities for all seven emotions

The Challenge: Real-time processing without lag. The solution? Efficient frame skipping and model optimization:

# Process every 3rd frame for better performance
frame_count = 0
if frame_count % 3 == 0:
    emotion = detect_emotion(frame)
    current_emotion = emotion
frame_count += 1

Smart Music Mapping

Detecting emotions is only half the battle. The real magic is translating emotions into music that feels right.

The Algorithm: I created an emotion-to-audio-feature mapping based on Spotify's audio analysis:

emotion_mapping = {
    'happy': {'valence': (0.6, 1.0), 'energy': (0.5, 1.0), 'tempo': (100, 180)},
    'sad': {'valence': (0.0, 0.4), 'energy': (0.0, 0.4), 'tempo': (60, 100)},
    'angry': {'valence': (0.0, 0.5), 'energy': (0.7, 1.0), 'tempo': (120, 200)},
    'calm': {'valence': (0.4, 0.7), 'energy': (0.2, 0.5), 'tempo': (60, 90)},
    # ... more mappings
}

The Result: When you're sad, you get melancholic acoustic tracks with low energy. When you're happy, upbeat pop songs with high valence flood your screen.

Personalized Recommendations

The system doesn't just pick random songs—it learns from your Spotify listening history:

# Get user's top tracks for personalization
top_tracks = sp.current_user_top_tracks(limit=20, time_range='short_term')
top_genres = extract_genres(top_tracks)

# Combine emotion-based features with user preferences
recommendations = sp.recommendations(
    seed_genres=top_genres,
    target_valence=emotion_valence,
    target_energy=emotion_energy,
    limit=10
)

Training the Emotion Detection Model:

Building an accurate emotion classifier was the most challenging—and rewarding—part of this project.

The Dataset Journey

I started with the FER-2013 dataset (35,000+ facial images labeled with emotions). But raw data is messy:

Inconsistent lighting conditions
Mislabeled images
Class imbalance (way more "happy" faces than "disgusted")

The Solution: Data augmentation and class weighting:

datagen = ImageDataGenerator(
    rotation_range=20,
    width_shift_range=0.2,
    height_shift_range=0.2,
    horizontal_flip=True,
    zoom_range=0.2
)

# Balance classes with weighted loss
class_weights = compute_class_weight('balanced', 
                                     classes=np.unique(y_train),
                                     y=y_train)

The CNN Architecture

After experimenting with different architectures, I settled on a custom CNN:

model = Sequential([
    Conv2D(32, (3,3), activation='relu', input_shape=(48,48,1)),
    BatchNormalization(),
    MaxPooling2D(2,2),
    Dropout(0.25),

    Conv2D(64, (3,3), activation='relu'),
    BatchNormalization(),
    MaxPooling2D(2,2),
    Dropout(0.25),

    Conv2D(128, (3,3), activation='relu'),
    BatchNormalization(),
    MaxPooling2D(2,2),
    Dropout(0.4),

    Flatten(),
    Dense(256, activation='relu'),
    Dropout(0.5),
    Dense(7, activation='softmax')  # 7 emotions
])

The Results: After 50 epochs, achieved 67% validation accuracy—not perfect, but solid for real-world use.

Professional Workflow: From Notebook to Production

Moving from a Jupyter notebook to a production Flask app taught me invaluable lessons about ML engineering.

1. Model Optimization & Serialization

Trained models are large (40+ MB). I learned to optimize and save them properly:

# Save the model in HDF5 format
model.save('emotion_model.h5')

# In production, load once at startup
emotion_model = load_model('emotion_model.h5')

The Lesson: Loading the model on every request killed performance. Load it once when the Flask app starts.

2. API Rate Limiting & Caching

Spotify's API has rate limits (1 request per second for some endpoints). I implemented caching to avoid hitting limits:

from functools import lru_cache

@lru_cache(maxsize=128)
def get_recommendations(emotion, user_id):
    # Cache recommendations for 5 minutes
    return fetch_spotify_recommendations(emotion)

3. Error Handling for Real Users

In notebooks, errors crash the cell. In production, they crash the user's experience. I built comprehensive error handling:

try:
    emotion = detect_emotion(frame)
except Exception as e:
    logger.error(f"Emotion detection failed: {e}")
    emotion = "neutral"  # Fallback to neutral

try:
    tracks = get_spotify_tracks(emotion)
except SpotifyException as e:
    return jsonify({"error": "Spotify service unavailable"}), 503

4. Environment Variables & Secrets

Never expose API keys. I used environment variables for all sensitive data:

import os
from dotenv import load_dotenv

load_dotenv()

SPOTIFY_CLIENT_ID = os.getenv('SPOTIFY_CLIENT_ID')
SPOTIFY_CLIENT_SECRET = os.getenv('SPOTIFY_CLIENT_SECRET')

Deployment Challenges & Solutions

Challenge 1: Large Model Files

Deploying a 40MB model to Heroku exceeded slug size limits.

Solution: Switched to AWS EC2 with more storage, and optimized the model using TensorFlow Lite for a 60% size reduction.

Challenge 2: Webcam Access in Browser

Modern browsers require HTTPS for webcam access.

Solution: Set up SSL certificates using Let's Encrypt and configured Flask to serve over HTTPS.

Challenge 3: Cold Start Latency

First request took 10+ seconds as the model loaded.

Solution: Implemented a "warmup" endpoint that loads the model on server startup and keeps it in memory.

Lessons Learned

What Worked:

Transfer Learning: Using pre-trained face detection models (Haar Cascades) saved weeks of training time.
Spotify's Audio Features: Valence and energy metrics are surprisingly accurate for emotion-music matching.
Flask's Simplicity: For ML projects, Flask's lightweight nature beats Django's complexity.

What I'd Do Differently:

Start with Pre-trained Models: I spent weeks training from scratch. Fine-tuning a pre-trained model (like VGGFace) would have been faster.
Add User Feedback Loop: Currently, the system doesn't learn if recommendations are accurate. A "thumbs up/down" feature would improve over time.
Better Lighting Handling: The model struggles in low light. Adding brightness normalization would help.

Future Enhancements: Taking It to the Next Level

The current version delivers solid emotion-based recommendations, but there's so much more potential. Here's what's on the roadmap:

1. Multi-Person Emotion Detection

Right now, it detects one face. Imagine a party mode where it analyzes everyone's emotions and creates a playlist that fits the room's vibe:

Detect multiple faces simultaneously
Average emotion scores across all faces
Weight recommendations toward the dominant emotion
Create collaborative playlists that satisfy the group

2. Voice Tone Analysis

Facial expressions are only part of the story. Voice carries emotional cues too:

Integrate speech emotion recognition using librosa and PyAudio
Analyze pitch, tone, and speaking rate
Combine facial and vocal emotions for more accurate detection
Especially useful for users on phone calls or podcasts

3. Emotion History & Trends

Track emotional patterns over time:

Build a "mood journal" showing emotion trends across days/weeks
Identify triggers (time of day, day of week) for different emotions
Recommend music proactively: "You're usually stressed on Monday mornings. Here's a calming playlist."
Visualize emotional journeys with interactive graphs

4. Integration with Smart Home

Turn the system into an ambient mood manager:

Connect with Philips Hue to adjust lighting based on emotion
Integrate with smart speakers for hands-free music control
Create "mood scenes": Sad = dim lights + melancholic music

5. Mobile App

Currently web-based, but a mobile app would be more practical:

React Native or Flutter for cross-platform development
On-device ML inference using TensorFlow Lite (faster, more private)
Background emotion tracking with periodic check-ins
Push notifications: "Feeling stressed? Try this playlist."

Final Reflections:

Building an emotion-based music recommendation system taught me that AI isn't just about algorithms—it's about understanding human experiences and translating them into meaningful interactions. The project challenged me to:

Bridge the gap between ML research and production engineering
Design systems that feel intuitive, not intrusive
Balance accuracy with real-time performance
Think about privacy (webcam data never leaves the device)

"Technology should adapt to humans, not the other way around. If your music player doesn't understand your mood, is it really smart?"

This project transformed me from someone who "knows Python" to someone who can build end-to-end AI applications that solve real problems. And with voice analysis, multi-person detection, and smart home integration on the horizon, this is just the beginning of the journey.

Tech Stack Summary:

ML/CV: TensorFlow, Keras, OpenCV, DeepFace
Backend: Flask, Python, Spotipy
APIs: Spotify Web API (OAuth 2.0)
Frontend: HTML, CSS, JavaScript, Chart.js
Deployment: AWS EC2, Let's Encrypt SSL
Future: Speech Recognition, TensorFlow Lite, IoT Integration

LINK: IT'S Under Development

What emotions should music apps detect? How would you improve this system? Share your thoughts in the comments below!

DEV Community