Scraping Music Streaming Data: Spotify Charts and Artist Stats

#webdev #python #programming #tutorial

Introduction

Music streaming platforms generate massive amounts of data — from chart rankings to artist statistics. Whether you're building a music analytics dashboard, tracking emerging artists, or analyzing genre trends, scraping streaming data opens up powerful insights.

In this tutorial, we'll build a Python scraper that collects Spotify chart data and artist statistics from publicly available sources.

Setting Up the Environment

# Implementation is proprietary (that IS the moat).
# Skip the build — use our ready-made Apify actor:
# see the CTA below for the link (fpr=yw6md3).

Scraping Spotify Charts Data

Spotify's public chart pages display top tracks by country and globally. Let's build a scraper for these:

# Implementation is proprietary (that IS the moat).
# Skip the build — use our ready-made Apify actor:
# see the CTA below for the link (fpr=yw6md3).

Collecting Artist Statistics

Beyond charts, artist profile pages contain monthly listeners, follower counts, and popular tracks:

# Implementation is proprietary (that IS the moat).
# Skip the build — use our ready-made Apify actor:
# see the CTA below for the link (fpr=yw6md3).

Building a Trend Tracker

The real value comes from tracking changes over time:

def track_chart_movements(countries=None, interval_days=7):
    """Track chart position changes across countries."""
    if countries is None:
        countries = ["global", "us", "gb", "de", "jp", "br"]

    all_data = []

    for country in countries:
        print(f"Scraping charts for {country}...")
        tracks = scrape_spotify_charts(country)

        for track in tracks:
            track["country"] = country
            track["date"] = pd.Timestamp.now().strftime("%Y-%m-%d")
            all_data.append(track)

        time.sleep(2)  # Respect rate limits

    df = pd.DataFrame(all_data)
    df.to_csv(f"charts_{pd.Timestamp.now().strftime('%Y%m%d')}.csv", index=False)

    return df

def analyze_trends(historical_dir="./data"):
    """Analyze chart trends from historical data."""
    import glob

    files = glob.glob(f"{historical_dir}/charts_*.csv")
    all_charts = pd.concat([pd.read_csv(f) for f in files])

    # Find fastest risers
    risers = all_charts.groupby(["title", "artist"]).agg(
        best_rank=("rank", "min"),
        worst_rank=("rank", "max"),
        appearances=("rank", "count")
    ).reset_index()

    risers["climb"] = risers["worst_rank"] - risers["best_rank"]
    return risers.sort_values("climb", ascending=False).head(20)

Monitoring Multiple Platforms

For comprehensive analysis, scrape across platforms and compare:

# Implementation is proprietary (that IS the moat).
# Skip the build — use our ready-made Apify actor:
# see the CTA below for the link (fpr=yw6md3).

Data Storage and Analysis

import sqlite3

def store_chart_data(tracks, db_path="music_data.db"):
    """Store scraped data in SQLite for analysis."""
    conn = sqlite3.connect(db_path)
    df = pd.DataFrame(tracks)
    df.to_sql("charts", conn, if_exists="append", index=False)
    conn.close()

Conclusion

By combining chart scraping, artist statistics, and cross-platform comparison, you can build powerful music analytics tools. Remember to respect robots.txt, use rate limiting, and consider using ScraperAPI for handling JavaScript rendering and proxy rotation at scale.

The complete code with scheduling and visualization is available in the examples above — adapt it to track the genres and markets that matter to your analysis.