Swiftproxy - Residential Proxies

Posted on Jun 11

How to Scrape Spotify Playlist Data

#webscraping

Spotify streams over 400 million tracks. Imagine tapping into that vast ocean of data to power your next music app or analytics project. Sounds exciting, right? But how do you get that data?
You could dive into the Spotify API—your legal, go-to gateway. But sometimes, the API falls short, or you want data in a way it doesn’t support. That’s when web scraping steps in.
This guide will show you exactly how to scrape Spotify playlist data with Python—step-by-step. No fluff, just practical code, tools, and best practices you can use today.

Step 1: Get Your Tools Ready

Before jumping into the code, install three essential Python libraries:

pip install beautifulsoup4 selenium requests

Here’s what each does:

BeautifulSoup digs through static HTML to pull out track names, artists, and more.
Selenium handles dynamic content — think clicking buttons, scrolling pages, loading all those tracks that only appear after you scroll.
Requests is your lightweight tool for API calls or simple HTTP requests. If your data lives on a static page, BeautifulSoup alone might do. But for Spotify’s dynamic playlists? You’ll want Selenium in your toolkit.

Step 2: Set Up Selenium’s WebDriver

Selenium controls a real browser behind the scenes. For Chrome, download ChromeDriver.
Unzip it, save it somewhere handy, and point Selenium to it like this:

from selenium import webdriver

driver_path = "C:/webdriver/chromedriver.exe"  # Change to your driver path
driver = webdriver.Chrome(driver_path)
driver.get("https://open.spotify.com")

Run that and watch a browser window pop open — magic!

Step 3: Analyze the Spotify Playlist Page

Press F12 on your keyboard to open Chrome’s Developer Tools and inspect the HTML.
Look for the elements holding the info you want — tracks, artists, duration. You might find something like this:

<div class="tracklist-row">
  <span class="track-name">Song Title</span>
  <span class="artist-name">Artist</span>
  <span class="track-duration">3:45</span>
</div>

Keep note of these classes. They’re your data anchors.

Step 4: Write Your Scraper

Here’s a Python function combining Selenium and BeautifulSoup to scrape the playlist:

from selenium import webdriver
from bs4 import BeautifulSoup
import time

def get_spotify_playlist_data(playlist_url):
    options = webdriver.ChromeOptions()
    options.add_argument("--headless")  # Run without opening browser window
    driver = webdriver.Chrome(options=options)

    driver.get(playlist_url)
    time.sleep(5)  # Wait for the page to fully load

    # Scroll to bottom to load all songs
    driver.execute_script("window.scrollTo(0, document.body.scrollHeight);")
    time.sleep(2)  # Let content load

    html = driver.page_source
    driver.quit()

    soup = BeautifulSoup(html, "lxml")
    tracks = []

    for track in soup.find_all(class_="IjYxRc5luMiDPhKhZVUH UpiE7J6vPrJIa59qxts4"):
        name = track.find(class_="e-9541-text encore-text-body-medium encore-internal-color-text-base btE2c3IKaOXZ4VNAb8WQ standalone-ellipsis-one-line").text
        artist = track.find(class_="e-9541-text encore-text-body-small").find('a').text
        duration = track.find(class_="e-9541-text encore-text-body-small encore-internal-color-text-subdued l5CmSxiQaap8rWOOpEpk").text

        tracks.append({"track title": name, "artist": artist, "duration": duration})

    return tracks

This script:

Opens the playlist page headlessly
Scrolls down to load all songs
Extracts song title, artist, and duration

Step 5: Activate Your Scraper

Pass any Spotify playlist URL to the function and print your results:

playlist_url = "https://open.spotify.com/album/7aJuG4TFXa2hmE4z1yxc3n?si=W7c1b1nNR3C7akuySGq_7g"
data = get_spotify_playlist_data(playlist_url)

for track in data:
    print(track)

You’ve got your playlist data.

Step 6: Using the Spotify API

If you want official access (and you should, to play by the rules), you’ll need an access token. Here’s how:

Register your app at the Spotify Developer Dashboard.
Get your Client ID and Client Secret.

Use this Python snippet to request your token:

import requests
import base64

CLIENT_ID = "your_client_id"
CLIENT_SECRET = "your_client_secret"

credentials = f"{CLIENT_ID}:{CLIENT_SECRET}"
encoded_credentials = base64.b64encode(credentials.encode()).decode()

url = "https://accounts.spotify.com/api/token"
headers = {
    "Authorization": f"Basic {encoded_credentials}",
    "Content-Type": "application/x-www-form-urlencoded"
}
data = {"grant_type": "client_credentials"}

response = requests.post(url, headers=headers, data=data)
token = response.json().get("access_token")

print("Access Token:", token)

With this token, you can query Spotify’s API directly — much faster and more reliable than scraping.

Step 7: Save Your Data

Once you’ve got the data, save it in JSON for easy analysis:

import json

with open('tracks.json', 'w', encoding='utf-8') as json_file:
    json.dump(data, json_file, ensure_ascii=False, indent=4)
    print("Data saved to tracks.json")

Pro Tips and Best Practices

Use the API whenever possible. It’s legal and stable.
Throttle your requests — don’t hammer Spotify’s servers. Respect rate limits.
Check the site’s robots.txt to see what’s allowed.
Use proxies if you’re running large scrapes to avoid IP bans.
Always handle errors gracefully — pages change and so do classes.

Final Thoughts

Scraping Spotify playlist data with Python isn’t rocket science, but it does require attention to detail and respect for the platform’s rules. By combining Selenium’s ability to interact with dynamic content and BeautifulSoup’s strength in parsing HTML, it’s possible to extract valuable music data efficiently. Don’t forget the Spotify API either—it’s a reliable and ethical tool for robust data gathering when available.

Top comments (0)

Some comments may only be visible to logged-in visitors. Sign in to view all comments.