Bandcamp Music Data Scraping: Extract Artists, Albums, and Prices with Python

#python #webscraping #tutorial #webdev

Bandcamp is the leading platform for independent musicians to sell their music directly to fans. With millions of artists and albums, Bandcamp data is valuable for music market research, pricing analysis, and discovering emerging artists.

Here's how to scrape Bandcamp data with Python.

Use Cases

Music market research: Analyze pricing trends across genres
Artist discovery: Find emerging artists by sales data and reviews
Pricing strategy: Compare how artists price their work
Genre analysis: Map the indie music landscape
Label intelligence: Track independent label catalogs

Scraping Album Pages

# Implementation is proprietary (that IS the moat).
# Skip the build — use our ready-made Apify actor:
# see the CTA below for the link (fpr=yw6md3).

Scraping Artist Pages

# Implementation is proprietary (that IS the moat).
# Skip the build — use our ready-made Apify actor:
# see the CTA below for the link (fpr=yw6md3).

Exploring Genre Tags

# Implementation is proprietary (that IS the moat).
# Skip the build — use our ready-made Apify actor:
# see the CTA below for the link (fpr=yw6md3).

Price Analysis

def analyze_pricing(albums_with_details):
    """Analyze pricing patterns across albums."""
    prices = []
    name_your_price = 0
    free = 0

    for album in albums_with_details:
        price_str = album.get("price", "")
        if "name your price" in price_str.lower():
            name_your_price += 1
        elif "free" in price_str.lower():
            free += 1
        else:
            # Extract numeric price
            match = re.search(r'[\$€£](\d+\.?\d*)', price_str)
            if match:
                prices.append(float(match.group(1)))

    return {
        "total_albums": len(albums_with_details),
        "paid_albums": len(prices),
        "name_your_price": name_your_price,
        "free_albums": free,
        "avg_price": round(sum(prices) / len(prices), 2) if prices else 0,
        "min_price": min(prices) if prices else 0,
        "max_price": max(prices) if prices else 0,
        "median_price": sorted(prices)[len(prices)//2] if prices else 0,
    }

Production Bandcamp Scraping

For large-scale Bandcamp data extraction, the Bandcamp Scraper on Apify handles complex pagination, dynamic content loading, and data normalization automatically. Perfect for building comprehensive music databases.

When scraping at scale, use ThorData proxies to distribute requests across residential IPs and avoid rate limits.

Saving Data

import csv

def save_albums_csv(albums, filename="bandcamp_albums.csv"):
    if not albums:
        return

    # Flatten tracks list for CSV
    flat_albums = []
    for album in albums:
        flat = {k: v for k, v in album.items() if k != "tracks"}
        flat["tags"] = ", ".join(album.get("tags", []))
        flat_albums.append(flat)

    keys = flat_albums[0].keys()
    with open(filename, "w", newline="", encoding="utf-8") as f:
        writer = csv.DictWriter(f, fieldnames=keys)
        writer.writeheader()
        writer.writerows(flat_albums)

    print(f"Saved {len(flat_albums)} albums to {filename}")

Best Practices

Use JSON-LD data: Bandcamp embeds structured data — parse it first before scraping HTML
Rate limit: 2-3 seconds between requests
Use ThorData for residential proxies when scraping at volume
Respect artist content: Scrape metadata, not actual audio files
Cache results: Album data rarely changes after release

Conclusion

Bandcamp is a treasure trove of indie music data. From pricing trends to genre analysis, the data powers valuable market insights. Use the techniques above for small projects, or the Bandcamp Scraper on Apify for production workloads.

Happy data mining!

DEV Community