Bandcamp is the leading platform for independent musicians to sell their music directly to fans. With millions of artists and albums, Bandcamp data is valuable for music market research, pricing analysis, and discovering emerging artists.
Here's how to scrape Bandcamp data with Python.
Use Cases
- Music market research: Analyze pricing trends across genres
- Artist discovery: Find emerging artists by sales data and reviews
- Pricing strategy: Compare how artists price their work
- Genre analysis: Map the indie music landscape
- Label intelligence: Track independent label catalogs
Scraping Album Pages
# Implementation is proprietary (that IS the moat).
# Skip the build — use our ready-made Apify actor:
# see the CTA below for the link (fpr=yw6md3).
Scraping Artist Pages
# Implementation is proprietary (that IS the moat).
# Skip the build — use our ready-made Apify actor:
# see the CTA below for the link (fpr=yw6md3).
Exploring Genre Tags
# Implementation is proprietary (that IS the moat).
# Skip the build — use our ready-made Apify actor:
# see the CTA below for the link (fpr=yw6md3).
Price Analysis
def analyze_pricing(albums_with_details):
"""Analyze pricing patterns across albums."""
prices = []
name_your_price = 0
free = 0
for album in albums_with_details:
price_str = album.get("price", "")
if "name your price" in price_str.lower():
name_your_price += 1
elif "free" in price_str.lower():
free += 1
else:
# Extract numeric price
match = re.search(r'[\$€£](\d+\.?\d*)', price_str)
if match:
prices.append(float(match.group(1)))
return {
"total_albums": len(albums_with_details),
"paid_albums": len(prices),
"name_your_price": name_your_price,
"free_albums": free,
"avg_price": round(sum(prices) / len(prices), 2) if prices else 0,
"min_price": min(prices) if prices else 0,
"max_price": max(prices) if prices else 0,
"median_price": sorted(prices)[len(prices)//2] if prices else 0,
}
Production Bandcamp Scraping
For large-scale Bandcamp data extraction, the Bandcamp Scraper on Apify handles complex pagination, dynamic content loading, and data normalization automatically. Perfect for building comprehensive music databases.
When scraping at scale, use ThorData proxies to distribute requests across residential IPs and avoid rate limits.
Saving Data
import csv
def save_albums_csv(albums, filename="bandcamp_albums.csv"):
if not albums:
return
# Flatten tracks list for CSV
flat_albums = []
for album in albums:
flat = {k: v for k, v in album.items() if k != "tracks"}
flat["tags"] = ", ".join(album.get("tags", []))
flat_albums.append(flat)
keys = flat_albums[0].keys()
with open(filename, "w", newline="", encoding="utf-8") as f:
writer = csv.DictWriter(f, fieldnames=keys)
writer.writeheader()
writer.writerows(flat_albums)
print(f"Saved {len(flat_albums)} albums to {filename}")
Best Practices
- Use JSON-LD data: Bandcamp embeds structured data — parse it first before scraping HTML
- Rate limit: 2-3 seconds between requests
- Use ThorData for residential proxies when scraping at volume
- Respect artist content: Scrape metadata, not actual audio files
- Cache results: Album data rarely changes after release
Conclusion
Bandcamp is a treasure trove of indie music data. From pricing trends to genre analysis, the data powers valuable market insights. Use the techniques above for small projects, or the Bandcamp Scraper on Apify for production workloads.
Happy data mining!
Top comments (0)