agenthustler

Posted on Apr 9

Bandcamp Scraping: Extract Music, Artists, and Fan Data

#webdev #javascript #programming #webscraping

Bandcamp is one of the most artist-friendly music platforms on the internet, hosting millions of tracks from independent musicians across every genre imaginable. Unlike streaming giants, Bandcamp gives artists direct control over pricing and provides detailed track metadata, making it a goldmine for music data analysis, market research, and building recommendation systems.

In this guide, we'll explore how to scrape Bandcamp data — including track metadata, album information, artist pages, and fan activity — effectively and responsibly.

Understanding Bandcamp's Data Structure

Bandcamp's architecture is fundamentally different from platforms like Spotify or Apple Music. Each artist gets their own subdomain (e.g., artist.bandcamp.com), making the site feel like a network of individual storefronts rather than a monolithic platform.

Tracks

Individual tracks are the atomic unit of content on Bandcamp. Each track contains:

Title and duration — the track name and length
Album association — which album (if any) the track belongs to
Price — name-your-price minimum or fixed price
Streaming URL — a temporary URL for the audio preview
Lyrics — full lyrics if the artist has provided them
Credits — musician credits and production details
Tags/genres — artist-applied genre tags
Play count and fan count — engagement metrics
Release date — when the track was published

Albums

Albums group tracks together and add additional metadata:

Album title and artist
Track listing — ordered list of all tracks
Cover art URL — the album artwork
Total price — the album's price (often discounted vs. individual tracks)
Release date — the official release date
About/description — artist's notes about the album
Tags — genre and descriptive tags
UPC/catalog number — if provided by the artist

Artist Pages

Artist pages serve as storefronts with rich information:

Artist/band name and location
Bio/description — the artist's story
Discography — all albums and singles
Merch listings — physical products for sale
Shows/tour dates — upcoming performances
Links — social media and website URLs
Fan count — total followers

Fan Profiles

Bandcamp fans (buyers/followers) have public profiles showing:

Username and avatar
Collection — music they've purchased
Wishlist — music they want to buy
Following — artists and labels they follow
Fan-since date — when they joined

Why Scrape Bandcamp?

Bandcamp data has numerous practical applications:

Music Discovery: Build recommendation engines based on genre tags and fan overlap
Market Research: Analyze pricing strategies across genres and regions
Trend Spotting: Identify emerging genres and artists before they break out
Academic Research: Study independent music economics and distribution patterns
Label Scouting: Find promising unsigned artists based on engagement metrics
Data Journalism: Report on the state of independent music

Bandcamp's Technical Architecture

Bandcamp uses a relatively straightforward server-rendered architecture. Most pages are delivered as complete HTML with embedded JSON data, which makes scraping more reliable than heavily JavaScript-dependent sites.

Embedded JSON Data (TralbumData)

One of the most valuable aspects of Bandcamp's architecture for scrapers is that each album and track page embeds structured data directly in the HTML. Look for the data-tralbum attribute on the page:

// This JSON is embedded right in the HTML!
const tralbumData = {
  current: {
    title: "Album Title",
    artist: "Artist Name",
    release_date: "01 Mar 2026 00:00:00 GMT",
    minimum_price: 7.00,
    art_id: 1234567890,
  },
  trackinfo: [
    {
      title: "Track One",
      duration: 234.5,
      file: { "mp3-128": "https://..." },
      track_num: 1,
    },
    // ... more tracks
  ],
  url: "https://artist.bandcamp.com/album/album-name",
};

This is extremely convenient — you don't need to make separate API calls or parse complex DOM structures. The data is right there in the page source.

The Bandcamp API (Internal)

Bandcamp also has internal API endpoints used by its frontend. Some useful ones include:

https://bandcamp.com/api/discover/3/get_web
https://bandcamp.com/api/bcweekly/3/list
https://bandcamp.com/api/fancollection/1/collection_items

These endpoints return JSON and support parameters for filtering and pagination.

Building a Bandcamp Track Scraper

Let's build a scraper that extracts track and album data from Bandcamp artist pages.

Setting Up the Project

mkdir bandcamp-scraper && cd bandcamp-scraper
npm init -y
npm install crawlee cheerio

Extracting Album Data

const { CheerioCrawler, Dataset } = require('crawlee');

const crawler = new CheerioCrawler({
  maxRequestsPerCrawl: 50,
  async requestHandler({ $, request, log, enqueueLinks }) {
    const url = request.url;

    // If this is an artist page, find all album links
    if (!url.includes('/album/') && !url.includes('/track/')) {
      log.info(`Scanning artist page: ${url}`);

      // Enqueue all album links
      await enqueueLinks({
        selector: 'a[href*="/album/"]',
        baseUrl: url,
      });
      return;
    }

    // This is an album page — extract the embedded data
    log.info(`Scraping album: ${url}`);

    // Find the embedded TralbumData
    const scriptContent = $('script[data-tralbum]').attr('data-tralbum');
    if (!scriptContent) {
      // Try alternative: look for it in a script tag
      const scripts = $('script').toArray();
      for (const script of scripts) {
        const text = $(script).html();
        if (text && text.includes('TralbumData')) {
          // Parse the embedded JSON
          const match = text.match(/TralbumData\s*=\s*({.*?});/s);
          if (match) {
            const data = JSON.parse(match[1]);
            await processAlbumData(data, url);
            return;
          }
        }
      }
      return;
    }

    const data = JSON.parse(scriptContent);
    await processAlbumData(data, url);
  },
});

async function processAlbumData(data, url) {
  const album = {
    url,
    title: data.current?.title,
    artist: data.current?.artist || data.artist,
    releaseDate: data.current?.release_date,
    minimumPrice: data.current?.minimum_price,
    currency: data.current?.currency,
    about: data.current?.about,
    tracks: (data.trackinfo || []).map(track => ({
      title: track.title,
      duration: track.duration,
      trackNumber: track.track_num,
      hasLyrics: !!track.has_lyrics,
      isStreamable: !!track.file,
    })),
    tags: data.current?.tags || [],
    trackCount: data.trackinfo?.length || 0,
    scrapedAt: new Date().toISOString(),
  };

  await Dataset.pushData(album);
}

// Run starting from an artist page
await crawler.run(['https://artist.bandcamp.com/']);

Scraping Artist Profile Details

async function scrapeArtistPage($, url) {
  const bandData = {};

  // Extract basic info
  bandData.name = $('#band-name-location .title').text().trim();
  bandData.location = $('#band-name-location .location').text().trim();
  bandData.bio = $('meta[property="og:description"]').attr('content') || '';
  bandData.imageUrl = $('img.band-photo').attr('src') || null;
  bandData.url = url;

  // Extract discography links
  bandData.albums = [];
  $('#music-grid .music-grid-item').each((i, el) => {
    const $el = $(el);
    bandData.albums.push({
      title: $el.find('.title').text().trim(),
      url: new URL($el.find('a').attr('href'), url).toString(),
      artUrl: $el.find('img').attr('src') || null,
    });
  });

  // Extract links
  bandData.links = [];
  $('#band-links a').each((i, el) => {
    bandData.links.push({
      text: $(el).text().trim(),
      url: $(el).attr('href'),
    });
  });

  return bandData;
}

Extracting Fan Collection Data

Fan collections reveal purchasing patterns and taste profiles:

async function scrapeFanCollection(fanUrl) {
  const { CheerioCrawler } = require('crawlee');
  const collections = [];

  const crawler = new CheerioCrawler({
    async requestHandler({ $, request, log }) {
      log.info(`Scraping fan page: ${request.url}`);

      // Extract collection items
      const itemsData = $('div[data-blob]').attr('data-blob');
      if (itemsData) {
        const blob = JSON.parse(itemsData);
        const items = blob.item_cache || {};

        Object.values(items).forEach(item => {
          collections.push({
            type: item.tralbum_type === 'a' ? 'album' : 'track',
            title: item.album_title || item.title,
            artist: item.band_name,
            purchaseDate: item.purchased,
            itemUrl: item.item_url,
            artId: item.art_id,
          });
        });
      }

      // Extract fan info
      const fanName = $('#fan-name').text().trim();
      const fanSince = $('.fan-since').text().trim();

      log.info(`Fan: ${fanName}, Collection: ${collections.length} items`);
    },
  });

  await crawler.run([fanUrl]);
  return collections;
}

Scraping Bandcamp Discover and Tags

Bandcamp's discover page and tag system are excellent for trend analysis:

async function scrapeDiscoverPage(genre, subgenre = null) {
  const params = new URLSearchParams({
    g: genre,          // e.g., 'electronic', 'rock', 'hip-hop-rap'
    t: 'top',          // 'top', 'new', 'rec'
    f: 'all',          // format: 'all', 'digital', 'vinyl'
    w: 0,              // time window: 0=all, 1=past week, 2=past month
    p: 0,              // page number
  });

  if (subgenre) {
    params.set('s', subgenre);
  }

  const response = await fetch(
    `https://bandcamp.com/api/discover/3/get_web?${params}`
  );
  const data = await response.json();

  return data.items.map(item => ({
    title: item.primary_text,
    artist: item.secondary_text,
    genre: item.genre_text,
    url: item.tralbum_url,
    artUrl: `https://f4.bcbits.com/img/a${item.art_id}_16.jpg`,
    featuredDate: item.featured_date_s,
  }));
}

Tag-Based Scraping

Bandcamp's tag pages group music by genre and descriptive tags:

async function scrapeTagPage($, tagUrl) {
  const results = [];

  // Extract albums from the tag page
  $('.item_list .item').each((i, el) => {
    const $el = $(el);
    results.push({
      title: $el.find('.itemtext').text().trim(),
      artist: $el.find('.itemsubtext').text().trim(),
      url: $el.find('a').attr('href'),
      artUrl: $el.find('img').attr('src'),
    });
  });

  // Get related tags
  const relatedTags = [];
  $('.tags_cloud a').each((i, el) => {
    relatedTags.push({
      tag: $(el).text().trim(),
      url: $(el).attr('href'),
    });
  });

  return { results, relatedTags };
}

Scaling with Apify

While building your own Bandcamp scraper is educational, running it at scale requires infrastructure for proxy management, scheduling, and result storage. Apify provides all of this out of the box.

Why Use Apify for Bandcamp Scraping?

Cloud execution — no need to keep your machine running
Proxy management — automatic IP rotation to avoid rate limiting
Data storage — built-in datasets with export to JSON, CSV, Excel
Scheduling — run scrapers hourly, daily, or weekly
Monitoring — get alerts when scrapers fail
Pay-per-result — cost-effective pricing model

Using Bandcamp Scrapers from the Apify Store

The Apify Store offers pre-built scrapers for various music platforms. These actors handle the complexities of scraping — pagination, rate limiting, proxy rotation — so you can focus on your analysis.

To get started:

Sign up at apify.com
Search the Store for music and Bandcamp-related actors
Configure inputs — specify artist URLs, genres, or search terms
Run and download — execute in the cloud and get structured results

Running Bandcamp Scrapers via the API

const { ApifyClient } = require('apify-client');

const client = new ApifyClient({
  token: 'YOUR_APIFY_TOKEN',
});

async function scrapeBandcampArtist(artistUrl) {
  const run = await client.actor('YOUR_ACTOR_ID').call({
    startUrls: [{ url: artistUrl }],
    maxAlbums: 50,
    includeTracks: true,
    includeFans: false,
    proxy: {
      useApifyProxy: true,
    },
  });

  const { items } = await client.dataset(run.defaultDatasetId).listItems();
  console.log(`Scraped ${items.length} albums/tracks`);
  return items;
}

// Usage
scrapeBandcampArtist('https://artist.bandcamp.com/');

Building a Genre Trend Monitor

Combine Apify scheduling with Bandcamp's discover API to track genre trends over time:

const { Actor } = require('apify');

Actor.main(async () => {
  const input = await Actor.getInput();
  const { genres = ['electronic', 'indie', 'hip-hop-rap'] } = input;

  const results = [];

  for (const genre of genres) {
    const response = await fetch(
      `https://bandcamp.com/api/discover/3/get_web?g=${genre}&t=top&f=all&w=1&p=0`
    );
    const data = await response.json();

    data.items.forEach(item => {
      results.push({
        genre,
        title: item.primary_text,
        artist: item.secondary_text,
        url: item.tralbum_url,
        featuredDate: item.featured_date_s,
        scrapedAt: new Date().toISOString(),
      });
    });
  }

  await Actor.pushData(results);
  console.log(`Tracked ${results.length} trending items across ${genres.length} genres`);
});

Data Analysis: What Can You Do With Bandcamp Data?

Pricing Analysis

function analyzePricing(albums) {
  const priced = albums.filter(a => a.minimumPrice > 0);
  const nameYourPrice = albums.filter(a => a.minimumPrice === 0);

  const avgPrice = priced.reduce((sum, a) => sum + a.minimumPrice, 0) / priced.length;

  const priceByGenre = {};
  albums.forEach(album => {
    (album.tags || []).forEach(tag => {
      if (!priceByGenre[tag]) priceByGenre[tag] = [];
      priceByGenre[tag].push(album.minimumPrice);
    });
  });

  const avgByGenre = {};
  Object.entries(priceByGenre).forEach(([genre, prices]) => {
    avgByGenre[genre] = prices.reduce((a, b) => a + b, 0) / prices.length;
  });

  return {
    totalAlbums: albums.length,
    pricedAlbums: priced.length,
    nameYourPriceAlbums: nameYourPrice.length,
    averagePrice: avgPrice.toFixed(2),
    averagePriceByGenre: avgByGenre,
  };
}

Fan Overlap Analysis

function findFanOverlap(artist1Fans, artist2Fans) {
  const set1 = new Set(artist1Fans.map(f => f.username));
  const overlap = artist2Fans.filter(f => set1.has(f.username));

  return {
    artist1FanCount: artist1Fans.length,
    artist2FanCount: artist2Fans.length,
    overlapCount: overlap.length,
    overlapPercentage: (
      (overlap.length / Math.min(artist1Fans.length, artist2Fans.length)) * 100
    ).toFixed(1),
    sharedFans: overlap.map(f => f.username),
  };
}

Best Practices and Ethics

When scraping Bandcamp, follow these guidelines:

Respect rate limits: Bandcamp is an independent platform — don't hammer their servers. Add delays of 2-5 seconds between requests.
Check robots.txt: Review and respect Bandcamp's crawling directives.
Don't scrape audio files: Downloading music without permission is piracy. Scrape metadata only.
Respect artist privacy: Some artists may not want their data aggregated. Be mindful of how you use and share the data.
Support the artists: If you find music you like through your data analysis, buy it! Bandcamp pays artists directly.
Cache responsibly: Store results locally to minimize repeat requests.
Comply with regulations: Follow GDPR/CCPA when handling fan data.

Conclusion

Bandcamp is a uniquely scraper-friendly platform thanks to its clean architecture and embedded JSON data. Whether you're building a music recommendation engine, analyzing pricing trends in independent music, or scouting for emerging artists, Bandcamp data provides rich, structured insights.

For production scraping at scale, the Apify platform handles the infrastructure challenges — proxy rotation, cloud execution, scheduling, and data export — letting you focus on the analysis and insights that matter.

The independent music ecosystem is a vibrant, data-rich space. With the right scraping tools and ethical practices, you can unlock powerful insights that benefit artists, labels, and music lovers alike.

Explore the Apify Store for ready-to-use music and web scraping actors that handle the infrastructure complexity for you.

DEV Community