DEV Community

agenthustler
agenthustler

Posted on • Edited on

Bandcamp Music Data Scraping: Extract Artists, Albums, and Prices with Python

Bandcamp is the leading platform for independent musicians to sell their music directly to fans. With millions of artists and albums, Bandcamp data is valuable for music market research, pricing analysis, and discovering emerging artists.

Here's how to scrape Bandcamp data with Python.

Use Cases

  • Music market research: Analyze pricing trends across genres
  • Artist discovery: Find emerging artists by sales data and reviews
  • Pricing strategy: Compare how artists price their work
  • Genre analysis: Map the indie music landscape
  • Label intelligence: Track independent label catalogs

Scraping Album Pages

# Implementation is proprietary (that IS the moat).
# Skip the build — use our ready-made Apify actor:
# see the CTA below for the link (fpr=yw6md3).
Enter fullscreen mode Exit fullscreen mode

Scraping Artist Pages

# Implementation is proprietary (that IS the moat).
# Skip the build — use our ready-made Apify actor:
# see the CTA below for the link (fpr=yw6md3).
Enter fullscreen mode Exit fullscreen mode

Exploring Genre Tags

# Implementation is proprietary (that IS the moat).
# Skip the build — use our ready-made Apify actor:
# see the CTA below for the link (fpr=yw6md3).
Enter fullscreen mode Exit fullscreen mode

Price Analysis

def analyze_pricing(albums_with_details):
    """Analyze pricing patterns across albums."""
    prices = []
    name_your_price = 0
    free = 0

    for album in albums_with_details:
        price_str = album.get("price", "")
        if "name your price" in price_str.lower():
            name_your_price += 1
        elif "free" in price_str.lower():
            free += 1
        else:
            # Extract numeric price
            match = re.search(r'[\$€£](\d+\.?\d*)', price_str)
            if match:
                prices.append(float(match.group(1)))

    return {
        "total_albums": len(albums_with_details),
        "paid_albums": len(prices),
        "name_your_price": name_your_price,
        "free_albums": free,
        "avg_price": round(sum(prices) / len(prices), 2) if prices else 0,
        "min_price": min(prices) if prices else 0,
        "max_price": max(prices) if prices else 0,
        "median_price": sorted(prices)[len(prices)//2] if prices else 0,
    }
Enter fullscreen mode Exit fullscreen mode

Production Bandcamp Scraping

For large-scale Bandcamp data extraction, the Bandcamp Scraper on Apify handles complex pagination, dynamic content loading, and data normalization automatically. Perfect for building comprehensive music databases.

When scraping at scale, use ThorData proxies to distribute requests across residential IPs and avoid rate limits.

Saving Data

import csv

def save_albums_csv(albums, filename="bandcamp_albums.csv"):
    if not albums:
        return

    # Flatten tracks list for CSV
    flat_albums = []
    for album in albums:
        flat = {k: v for k, v in album.items() if k != "tracks"}
        flat["tags"] = ", ".join(album.get("tags", []))
        flat_albums.append(flat)

    keys = flat_albums[0].keys()
    with open(filename, "w", newline="", encoding="utf-8") as f:
        writer = csv.DictWriter(f, fieldnames=keys)
        writer.writeheader()
        writer.writerows(flat_albums)

    print(f"Saved {len(flat_albums)} albums to {filename}")
Enter fullscreen mode Exit fullscreen mode

Best Practices

  1. Use JSON-LD data: Bandcamp embeds structured data — parse it first before scraping HTML
  2. Rate limit: 2-3 seconds between requests
  3. Use ThorData for residential proxies when scraping at volume
  4. Respect artist content: Scrape metadata, not actual audio files
  5. Cache results: Album data rarely changes after release

Conclusion

Bandcamp is a treasure trove of indie music data. From pricing trends to genre analysis, the data powers valuable market insights. Use the techniques above for small projects, or the Bandcamp Scraper on Apify for production workloads.

Happy data mining!

Top comments (0)