DEV Community

agenthustler
agenthustler

Posted on

Scraping Bandcamp in 2026: Tracks, Albums, Artists Without the API

Bandcamp is one of the most scraper-friendly platforms on the web — if you know where to look. Every page embeds rich structured data in multiple formats, and there's no aggressive anti-bot protection. This guide covers the technical approach to extracting music data from Bandcamp in 2026.

Bandcamp's Data Architecture

Bandcamp embeds data in three places on every page:

1. JSON-LD (Schema.org)

Every track and album page includes a <script type="application/ld+json"> block with Schema.org MusicRecording or MusicAlbum markup:

{
  "@type": "MusicAlbum",
  "name": "Album Title",
  "byArtist": {"@type": "MusicGroup", "name": "Artist Name"},
  "datePublished": "2026-01-15",
  "numTracks": 12,
  "albumRelease": [{"@type": "MusicRelease", "musicReleaseFormat": "DigitalFormat"}]
}
Enter fullscreen mode Exit fullscreen mode

This is the cleanest source for basic metadata — title, artist, release date, format.

2. data-tralbum Attribute

The main content div includes a data-tralbum attribute containing a JSON blob with detailed track-level data:

{
  "current": {
    "title": "Album Title",
    "release_date": "15 Jan 2026",
    "minimum_price": 7.00,
    "art_id": 1234567890
  },
  "trackinfo": [
    {
      "title": "Track One",
      "duration": 234.5,
      "track_num": 1,
      "file": {"mp3-128": "https://..."}
    }
  ]
}
Enter fullscreen mode Exit fullscreen mode

This is the richest data source — it includes pricing, track durations, streaming URLs, and art IDs.

3. data-band Attribute

Artist/label pages include a data-band attribute with profile information:

{
  "id": 123456,
  "name": "Artist Name",
  "bio": "Artist biography text...",
  "location": "Portland, Oregon"
}
Enter fullscreen mode Exit fullscreen mode

Building a Basic Scraper with Python

Here's a working scraper using httpx and BeautifulSoup4:

import httpx
from bs4 import BeautifulSoup
import json

def scrape_bandcamp_album(url: str) -> dict:
    """Extract album data from a Bandcamp album page."""
    resp = httpx.get(url, headers={
        "User-Agent": "Mozilla/5.0 (compatible; MusicBot/1.0)"
    })
    resp.raise_for_status()
    soup = BeautifulSoup(resp.text, "html.parser")

    # Extract JSON-LD
    ld_script = soup.find("script", {"type": "application/ld+json"})
    ld_data = json.loads(ld_script.string) if ld_script else {}

    # Extract data-tralbum (richest source)
    tralbum_div = soup.find(attrs={"data-tralbum": True})
    tralbum = json.loads(tralbum_div["data-tralbum"]) if tralbum_div else {}

    # Extract data-band
    band_div = soup.find(attrs={"data-band": True})
    band = json.loads(band_div["data-band"]) if band_div else {}

    current = tralbum.get("current", {})
    tracks = tralbum.get("trackinfo", [])

    return {
        "title": current.get("title", ld_data.get("name", "")),
        "artist": band.get("name", ""),
        "location": band.get("location", ""),
        "release_date": current.get("release_date", ""),
        "price": current.get("minimum_price", 0),
        "tracks": [
            {
                "title": t["title"],
                "duration_sec": t.get("duration", 0),
                "track_num": t.get("track_num", 0),
            }
            for t in tracks
        ],
        "num_tracks": len(tracks),
        "tags": [tag.text.strip() for tag in soup.select(".tralbum-tags .tag")],
    }

# Usage
album = scrape_bandcamp_album("https://someartist.bandcamp.com/album/some-album")
print(json.dumps(album, indent=2))
Enter fullscreen mode Exit fullscreen mode

Scraping Search Results and Tag Pages

Bandcamp's tag pages (bandcamp.com/tag/synthwave) and search (bandcamp.com/search?q=ambient) return server-rendered HTML. You can paginate through them:

def scrape_bandcamp_tag(tag: str, pages: int = 3) -> list:
    """Scrape releases from a Bandcamp tag page."""
    results = []
    for page in range(1, pages + 1):
        url = f"https://bandcamp.com/tag/{tag}?page={page}&sort_field=date"
        resp = httpx.get(url, headers={
            "User-Agent": "Mozilla/5.0 (compatible; MusicBot/1.0)"
        })
        soup = BeautifulSoup(resp.text, "html.parser")

        for item in soup.select(".item_list .item"):
            title_el = item.select_one(".itemtext")
            artist_el = item.select_one(".itemsubtext")
            link_el = item.select_one("a")
            if title_el and link_el:
                results.append({
                    "title": title_el.text.strip(),
                    "artist": artist_el.text.strip() if artist_el else "",
                    "url": link_el.get("href", ""),
                })
    return results

releases = scrape_bandcamp_tag("ambient", pages=5)
print(f"Found {len(releases)} releases")
Enter fullscreen mode Exit fullscreen mode

Handling Rate Limits

Bandcamp is relatively permissive, but you should still be respectful:

  • Add delays: 1-2 seconds between requests is sufficient.
  • Use proper User-Agent: Identify your bot. Bandcamp doesn't aggressively block scrapers.
  • Cache responses: Album pages don't change frequently. Cache for 24h+.
  • Respect robots.txt: Bandcamp's robots.txt is minimal but check it.
import time

def scrape_multiple_albums(urls: list) -> list:
    results = []
    for url in urls:
        results.append(scrape_bandcamp_album(url))
        time.sleep(1.5)  # Be polite
    return results
Enter fullscreen mode Exit fullscreen mode

Use Cases

Music Market Research

Track pricing trends across genres. Are artists in electronic music more likely to use "name your price" than rock artists? What's the median album price in different scenes?

Artist Analytics

Monitor an artist's release cadence, tag usage, and pricing strategy over time. Useful for labels scouting talent or artists benchmarking themselves.

Playlist Curation

Build genre-specific databases by scraping tag pages, then filter by duration, release date, and tags to find tracks matching your playlist criteria.

Scaling Up: Using an Apify Actor

The DIY approach works for small jobs, but scaling to thousands of pages means handling retries, parallelism, and storage. The CryptoSignals Bandcamp Scraper handles this:

from apify_client import ApifyClient

client = ApifyClient("YOUR_TOKEN")

run = client.actor("cryptosignals/bandcamp-scraper").call(run_input={
    "urls": ["https://bandcamp.com/tag/synthwave"],
    "scrapeType": "tracks",
    "maxItems": 500,
})

for item in client.dataset(run["defaultDatasetId"]).iterate_items():
    print(f"{item['artist']} - {item['title']} ({item.get('duration', 'N/A')}s)")
Enter fullscreen mode Exit fullscreen mode

Benefits over DIY: automatic retries, parallel execution, structured output, cloud storage, and scheduling.


Bandcamp's embedded structured data makes it one of the easiest platforms to scrape reliably. Whether you build your own scraper or use a managed actor, the data is there — you just need to extract it.

Try the CryptoSignals Bandcamp Scraper on the Apify Store.

Top comments (0)