DEV Community

Scrape Spotify Playlist Data for Powerful Analytics

Spotify streams over 400 billion hours of music annually. That’s an ocean of data, waiting to be explored. Imagine mining that treasure—grabbing playlists, artists, track details—automatically, with a few lines of Python. Powerful, right?
If you want to build music analytics tools, create smart playlists, or fuel your app with fresh music data, scraping Spotify playlists is a game-changer. This guide cuts through the noise and gets you scraping legally and efficiently.

The Toolbox You Need and Why It Matters

First step? Grab the right Python libraries. Install them with:

pip install beautifulsoup4 selenium requests
Enter fullscreen mode Exit fullscreen mode

Why these three?
BeautifulSoup dives deep into HTML to snatch data from static pages.
Selenium handles the tricky stuff — dynamic sites that load content as you scroll or interact.
Requests is your fast lane for talking to APIs without opening a browser.

Get Selenium Ready and Meet ChromeDriver

Selenium doesn’t work alone. It needs a browser driver — and for Chrome, that’s ChromeDriver.
How to set it up:
Download ChromeDriver from its official site, matching your Chrome version.
Extract it, note the path.
Plug that path into your script.
Test it quickly:

from selenium import webdriver

driver_path = "C:/webdriver/chromedriver.exe"  # Adjust to your setup
driver = webdriver.Chrome(driver_path)
driver.get("https://google.com")
print("Browser launched — you're set!")
driver.quit()
Enter fullscreen mode Exit fullscreen mode

If Chrome opens and navigates to Google, you're ready.

Scraping Spotify Playlist Data

Spotify’s playlist pages load songs dynamically. That means you must scroll to load every track before scraping.
Inspect the page (F12) and spot this HTML pattern:

<div class="tracklist-row">
    <span class="track-name">Song Title</span>
    <span class="artist-name">Artist Name</span>
    <span class="track-duration">3:45</span>
</div>
Enter fullscreen mode Exit fullscreen mode

Your mission? Extract all those song titles, artists, and durations.
Here’s a Python function that does exactly that:

from selenium import webdriver
from bs4 import BeautifulSoup
import time

def get_spotify_playlist_data(playlist_url):
    options = webdriver.ChromeOptions()
    options.add_argument("--headless")  # No browser window needed
    driver = webdriver.Chrome(options=options)

    driver.get(playlist_url)
    time.sleep(5)  # Let page load fully

    driver.execute_script("window.scrollTo(0, document.body.scrollHeight);")
    time.sleep(2)  # Wait for more content to load

    html = driver.page_source
    driver.quit()

    soup = BeautifulSoup(html, "lxml")
    tracks = []

    for track in soup.find_all(class_="IjYxRc5luMiDPhKhZVUH UpiE7J6vPrJIa59qxts4"):
        name = track.find(
            class_="e-9541-text encore-text-body-medium encore-internal-color-text-base btE2c3IKaOXZ4VNAb8WQ standalone-ellipsis-one-line"
        ).text
        artist = track.find(class_="e-9541-text encore-text-body-small").find('a').text
        duration = track.find(
            class_="e-9541-text encore-text-body-small encore-internal-color-text-subdued l5CmSxiQaap8rWOOpEpk"
        ).text

        tracks.append({"track title": name, "artist": artist, "duration": duration})

    return tracks
Enter fullscreen mode Exit fullscreen mode

Call it like this:

playlist_url = "https://open.spotify.com/album/7aJuG4TFXa2hmE4z1yxc3n?si=W7c1b1nNR3C7akuySGq_7g"
data = get_spotify_playlist_data(playlist_url)

for track in data:
    print(track)
Enter fullscreen mode Exit fullscreen mode

You have a clean, structured list of tracks ready to use.

Use the Spotify API for Speed and Legality

If you want cleaner data and faster access, the Spotify API is your best friend.
Step 1: Register Your App
Head over to the Spotify Developer Dashboard, sign in, and create an app. You’ll get a Client ID and Client Secret. Keep those safe.

Step 2: Get Your Access Token

import requests
import base64

CLIENT_ID = "your_client_id"
CLIENT_SECRET = "your_client_secret"

credentials = f"{CLIENT_ID}:{CLIENT_SECRET}"
encoded_credentials = base64.b64encode(credentials.encode()).decode()

url = "https://accounts.spotify.com/api/token"
headers = {
    "Authorization": f"Basic {encoded_credentials}",
    "Content-Type": "application/x-www-form-urlencoded"
}
data = {"grant_type": "client_credentials"}

response = requests.post(url, headers=headers, data=data)
token = response.json().get("access_token")

print("Access Token:", token)
Enter fullscreen mode Exit fullscreen mode

Step 3: Fetch Artist or Playlist Data

artist_id = "6qqNVTkY8uBg9cP3Jd7DAH"  # Billie Eilish’s Spotify ID
url = f"https://api.spotify.com/v1/artists/{artist_id}"
headers = {"Authorization": f"Bearer {token}"}

response = requests.get(url, headers=headers)
artist_data = response.json()

print(artist_data)
Enter fullscreen mode Exit fullscreen mode

Save Data Like a Pro

JSON is the go-to format for saving scraped data:

import json

with open('tracks.json', 'w', encoding='utf-8') as f:
    json.dump(data, f, ensure_ascii=False, indent=4)
    print("Data saved to tracks.json")
Enter fullscreen mode Exit fullscreen mode

How to Scrape Spotify Ethically and Effectively

Scraping can be a powerful tool — but only if you use it responsibly.
Prefer the official Spotify API to avoid headaches.
Respect rate limits to prevent server overload.
Always check the website’s robots.txt for permissions.
Use proxies wisely to avoid blocks.
Never scrape personal or restricted data.
Doing this keeps your project smooth, legal, and future-proof.

Conclusion

BeautifulSoup is your go-to for static content, while Selenium handles dynamic, unpredictable pages. Spotify’s API offers the safest and fastest access. By practicing ethical scraping, you protect yourself from blocks and bans. With these tools and strategies, the vast world of Spotify music data is ready for you to explore.

Top comments (0)