Web scraping Spotify data opens up incredible possibilities for music analysts, playlist curators, market researchers, and developers building music-related applications. Whether you're tracking chart positions, analyzing artist popularity trends, or building recommendation engines, having access to structured Spotify data is invaluable.
In this comprehensive guide, we'll explore different approaches to extracting Spotify data — from the official Web API to web scraping techniques — and show you how to collect artist profiles, playlist data, music charts, and track metadata at scale using Apify.
Why Scrape Spotify Data?
Spotify hosts over 100 million tracks and 600 million active users. The platform generates massive amounts of publicly visible data that's useful for:
- Music industry research: Track which genres are trending, which artists are gaining traction, and how listening habits shift over time
- Playlist analysis: Understand what makes popular playlists successful — track ordering, genre mix, tempo progression
- Artist competitive intelligence: Monitor how artists compare in terms of monthly listeners, follower growth, and playlist placements
- Market research: Identify emerging artists before they break through by tracking growth metrics
- Academic research: Study music consumption patterns, cultural trends, and recommendation algorithm behavior
Spotify Web API vs Web Scraping
Before diving into scraping, it's important to understand what the official Spotify Web API offers and where scraping fills the gaps.
Spotify Web API
The Spotify Web API provides authenticated access to a rich set of endpoints:
// Example: Fetching an artist's top tracks via the API
const fetch = require('node-fetch');
async function getArtistTopTracks(artistId, token) {
const response = await fetch(
`https://api.spotify.com/v1/artists/${artistId}/top-tracks?market=US`,
{
headers: { 'Authorization': `Bearer ${token}` }
}
);
const data = await response.json();
return data.tracks.map(track => ({
name: track.name,
popularity: track.popularity,
album: track.album.name,
duration_ms: track.duration_ms,
preview_url: track.preview_url
}));
}
Pros of the API:
- Official, well-documented endpoints
- Audio features data (danceability, energy, valence)
- Clean JSON responses
- Real-time data
Limitations of the API:
- Rate limits (varies by endpoint, roughly 30 requests/second with backoff)
- Requires OAuth authentication and app registration
- No access to historical chart data
- Limited to 50 items per request in most endpoints
- No follower count history or growth data
- Playlist follower counts not available for all playlists
Web Scraping Approach
Web scraping complements the API by accessing data that's publicly visible on Spotify's web player but not available through official endpoints:
- Historical chart positions and chart archives
- Playlist follower counts displayed on the web player
- "Discovered on" playlist data for artists
- Monthly listener counts over time
- "Fans also like" relationship data at scale
- Concert/event listing data
Setting Up Your Spotify Scraping Environment
Let's start with a practical scraping setup. We'll use Node.js with Playwright for browser-based scraping since Spotify's web player is a JavaScript-heavy single-page application.
// spotify-scraper.js
const { chromium } = require('playwright');
class SpotifyScraper {
constructor() {
this.browser = null;
this.page = null;
this.baseUrl = 'https://open.spotify.com';
}
async init() {
this.browser = await chromium.launch({
headless: true,
args: ['--no-sandbox', '--disable-setuid-sandbox']
});
this.page = await this.browser.newPage();
await this.page.setViewportSize({ width: 1280, height: 720 });
// Set a realistic user agent
await this.page.setExtraHTTPHeaders({
'Accept-Language': 'en-US,en;q=0.9'
});
}
async close() {
if (this.browser) await this.browser.close();
}
}
Extracting Artist Profile Data
Artist profiles on Spotify contain rich data. Here's how to extract comprehensive artist information:
async function scrapeArtistProfile(scraper, artistUrl) {
await scraper.page.goto(artistUrl, { waitUntil: 'networkidle' });
// Wait for the main content to load
await scraper.page.waitForSelector('[data-testid="entityTitle"]', {
timeout: 15000
});
const artistData = await scraper.page.evaluate(() => {
const getName = () => {
const el = document.querySelector('[data-testid="entityTitle"] h1');
return el ? el.textContent.trim() : null;
};
const getMonthlyListeners = () => {
const el = document.querySelector(
'[data-testid="monthly-listeners-label"]'
);
if (!el) return null;
const text = el.textContent.trim();
const match = text.match(/([\d,]+)/);
return match ? parseInt(match[1].replace(/,/g, '')) : null;
};
const getTopTracks = () => {
const tracks = [];
const rows = document.querySelectorAll(
'[data-testid="top-tracks-list"] [data-testid="tracklist-row"]'
);
rows.forEach((row, index) => {
const title = row.querySelector(
'[data-testid="internal-track-link"]'
);
const playCount = row.querySelector(
'span[data-testid="playcount"]'
);
tracks.push({
position: index + 1,
title: title ? title.textContent.trim() : '',
playCount: playCount ? playCount.textContent.trim() : ''
});
});
return tracks;
};
return {
name: getName(),
monthlyListeners: getMonthlyListeners(),
topTracks: getTopTracks(),
scrapedAt: new Date().toISOString()
};
});
return artistData;
}
Extracting Related Artists
The "Fans also like" section provides valuable data for building artist relationship graphs:
async function scrapeRelatedArtists(scraper, artistUrl) {
await scraper.page.goto(`${artistUrl}/related`, {
waitUntil: 'networkidle'
});
await scraper.page.waitForSelector('[data-testid="grid-container"]', {
timeout: 10000
});
const relatedArtists = await scraper.page.evaluate(() => {
const artists = [];
const cards = document.querySelectorAll(
'[data-testid="grid-container"] [data-testid="card"]'
);
cards.forEach(card => {
const nameEl = card.querySelector('a[href*="/artist/"]');
const link = nameEl ? nameEl.getAttribute('href') : '';
const name = nameEl ? nameEl.textContent.trim() : '';
const artistId = link.split('/artist/')[1];
artists.push({ name, artistId, profileUrl: link });
});
return artists;
});
return relatedArtists;
}
Scraping Playlist Data
Playlists are one of Spotify's most valuable data sources. Popular playlists can make or break an artist's career, and understanding playlist composition is crucial for music marketing.
async function scrapePlaylist(scraper, playlistUrl) {
await scraper.page.goto(playlistUrl, { waitUntil: 'networkidle' });
// Scroll to load all tracks (Spotify uses virtual scrolling)
async function scrollToLoadAll() {
let previousHeight = 0;
let attempts = 0;
const maxAttempts = 50;
while (attempts < maxAttempts) {
const currentHeight = await scraper.page.evaluate(
() => document.documentElement.scrollHeight
);
if (currentHeight === previousHeight) break;
previousHeight = currentHeight;
await scraper.page.evaluate(
() => window.scrollTo(0, document.documentElement.scrollHeight)
);
await scraper.page.waitForTimeout(1500);
attempts++;
}
}
await scrollToLoadAll();
const playlistData = await scraper.page.evaluate(() => {
const titleEl = document.querySelector(
'[data-testid="entityTitle"] h1'
);
const descEl = document.querySelector(
'[data-testid="playlist-description"]'
);
const followerEl = document.querySelector(
'span[data-testid="annotated-follower-count"]'
);
const tracks = [];
const rows = document.querySelectorAll(
'[data-testid="tracklist-row"]'
);
rows.forEach((row, index) => {
const trackLink = row.querySelector(
'[data-testid="internal-track-link"]'
);
const artistLinks = row.querySelectorAll(
'a[href*="/artist/"]'
);
const durationEl = row.querySelector(
'[data-testid="tracklist-duration"]'
);
const artists = [];
artistLinks.forEach(a => {
artists.push({
name: a.textContent.trim(),
id: a.getAttribute('href').split('/artist/')[1]
});
});
tracks.push({
position: index + 1,
title: trackLink ? trackLink.textContent.trim() : '',
artists: artists,
duration: durationEl ? durationEl.textContent.trim() : ''
});
});
return {
title: titleEl ? titleEl.textContent.trim() : '',
description: descEl ? descEl.textContent.trim() : '',
followerCount: followerEl ? followerEl.textContent.trim() : '',
trackCount: tracks.length,
tracks: tracks
};
});
return playlistData;
}
Tracking Music Charts
Chart data is extremely valuable for trend analysis. Spotify doesn't provide historical chart data through its API, making scraping essential:
async function scrapeChartData(scraper, chartUrl) {
// Example: https://charts.spotify.com/charts/view/regional-global-daily/latest
await scraper.page.goto(chartUrl, { waitUntil: 'networkidle' });
await scraper.page.waitForSelector('table', { timeout: 15000 });
const chartData = await scraper.page.evaluate(() => {
const entries = [];
const rows = document.querySelectorAll('table tbody tr');
rows.forEach(row => {
const cells = row.querySelectorAll('td');
if (cells.length < 4) return;
const positionEl = cells[0];
const trackEl = cells[1]?.querySelector('a');
const artistEl = cells[1]?.querySelectorAll('span');
const streamsEl = cells[cells.length - 1];
entries.push({
position: positionEl
? parseInt(positionEl.textContent.trim())
: null,
track: trackEl ? trackEl.textContent.trim() : '',
artist: artistEl && artistEl[1]
? artistEl[1].textContent.trim()
: '',
streams: streamsEl ? streamsEl.textContent.trim() : ''
});
});
return {
date: new Date().toISOString().split('T')[0],
entries
};
});
return chartData;
}
Scaling with Apify
While the code above works for small-scale scraping, collecting data across thousands of artists and playlists requires infrastructure. This is where Apify excels.
Apify provides managed cloud infrastructure for running scrapers at scale, handling proxy rotation, browser management, and data storage. You can find ready-made Spotify scrapers on the Apify Store, or deploy your own custom actors.
Using an Apify Spotify Actor
const { ApifyClient } = require('apify-client');
const client = new ApifyClient({
token: 'YOUR_APIFY_TOKEN'
});
async function runSpotifyActor() {
// Run a Spotify scraper actor from the Apify Store
const run = await client.actor('spotify-scraper-actor').call({
urls: [
'https://open.spotify.com/artist/6eUKZXaKkcviH0Ku9w2n3V',
'https://open.spotify.com/artist/1Xyo4u8uXC1ZmMpatF05PJ',
'https://open.spotify.com/playlist/37i9dQZF1DXcBWIGoYBM5M'
],
maxItems: 1000,
proxy: {
useApifyProxy: true,
apifyProxyGroups: ['RESIDENTIAL']
}
});
// Fetch results from the dataset
const { items } = await client
.dataset(run.defaultDatasetId)
.listItems();
console.log(`Collected ${items.length} items`);
return items;
}
Building a Custom Apify Actor for Spotify
For more control, you can create a custom Apify actor:
// src/main.js - Custom Apify Spotify Actor
const Apify = require('apify');
const { PlaywrightCrawler } = require('crawlee');
Apify.main(async () => {
const input = await Apify.getInput();
const { urls, maxConcurrency = 5, proxyConfig } = input;
const proxyConfiguration = await Apify.createProxyConfiguration(
proxyConfig
);
const crawler = new PlaywrightCrawler({
proxyConfiguration,
maxConcurrency,
navigationTimeoutSecs: 60,
requestHandlerTimeoutSecs: 120,
async requestHandler({ page, request, log }) {
log.info(`Processing: ${request.url}`);
if (request.url.includes('/artist/')) {
await page.waitForSelector(
'[data-testid="entityTitle"]',
{ timeout: 15000 }
);
const data = await page.evaluate(() => {
// Artist extraction logic here
return {
type: 'artist',
name: document.querySelector(
'[data-testid="entityTitle"] h1'
)?.textContent?.trim(),
url: window.location.href
};
});
await Apify.pushData(data);
} else if (request.url.includes('/playlist/')) {
// Playlist extraction logic
await page.waitForSelector(
'[data-testid="entityTitle"]',
{ timeout: 15000 }
);
const data = await page.evaluate(() => ({
type: 'playlist',
title: document.querySelector(
'[data-testid="entityTitle"] h1'
)?.textContent?.trim(),
url: window.location.href
}));
await Apify.pushData(data);
}
},
async failedRequestHandler({ request, log }) {
log.error(`Failed: ${request.url}`);
}
});
await crawler.run(urls.map(url => ({ url })));
});
Handling Track Metadata and Audio Features
Track metadata includes everything from basic info (title, duration, album) to Spotify's unique audio analysis features. Here's how to combine API data with scraped data for the most complete picture:
async function getCompleteTrackData(trackId, apiToken) {
// Get basic track info + audio features from the API
const [trackInfo, audioFeatures] = await Promise.all([
fetch(`https://api.spotify.com/v1/tracks/${trackId}`, {
headers: { Authorization: `Bearer ${apiToken}` }
}).then(r => r.json()),
fetch(`https://api.spotify.com/v1/audio-features/${trackId}`, {
headers: { Authorization: `Bearer ${apiToken}` }
}).then(r => r.json())
]);
return {
id: trackId,
name: trackInfo.name,
artists: trackInfo.artists.map(a => ({
name: a.name,
id: a.id
})),
album: {
name: trackInfo.album.name,
releaseDate: trackInfo.album.release_date,
totalTracks: trackInfo.album.total_tracks
},
duration_ms: trackInfo.duration_ms,
popularity: trackInfo.popularity,
explicit: trackInfo.explicit,
isrc: trackInfo.external_ids?.isrc,
audioFeatures: {
danceability: audioFeatures.danceability,
energy: audioFeatures.energy,
key: audioFeatures.key,
loudness: audioFeatures.loudness,
mode: audioFeatures.mode,
speechiness: audioFeatures.speechiness,
acousticness: audioFeatures.acousticness,
instrumentalness: audioFeatures.instrumentalness,
liveness: audioFeatures.liveness,
valence: audioFeatures.valence,
tempo: audioFeatures.tempo,
timeSignature: audioFeatures.time_signature
}
};
}
Scraping Album Data
Albums contain structured metadata that's useful for catalog analysis:
async function scrapeAlbumData(scraper, albumUrl) {
await scraper.page.goto(albumUrl, { waitUntil: 'networkidle' });
await scraper.page.waitForSelector('[data-testid="entityTitle"]', {
timeout: 15000
});
const albumData = await scraper.page.evaluate(() => {
const title = document.querySelector(
'[data-testid="entityTitle"] h1'
)?.textContent?.trim();
const artistEl = document.querySelector(
'a[data-testid="creator-link"]'
);
const artist = artistEl ? artistEl.textContent.trim() : '';
const metaText = document.querySelector(
'[data-testid="entity-metadata"]'
)?.textContent || '';
const yearMatch = metaText.match(/(\d{4})/);
const year = yearMatch ? yearMatch[1] : '';
const tracks = [];
const rows = document.querySelectorAll(
'[data-testid="tracklist-row"]'
);
rows.forEach((row, idx) => {
const trackTitle = row.querySelector(
'[data-testid="internal-track-link"]'
)?.textContent?.trim();
const duration = row.querySelector(
'[data-testid="tracklist-duration"]'
)?.textContent?.trim();
tracks.push({
number: idx + 1,
title: trackTitle || '',
duration: duration || ''
});
});
return {
title,
artist,
year,
trackCount: tracks.length,
tracks
};
});
return albumData;
}
Data Storage and Export
Once you've collected Spotify data, you'll want to store it in a structured format:
const fs = require('fs');
function exportToCSV(data, filename) {
if (!data.length) return;
const headers = Object.keys(data[0]);
const csvRows = [headers.join(',')];
data.forEach(item => {
const values = headers.map(h => {
const val = item[h];
if (typeof val === 'string' && val.includes(',')) {
return `"${val.replace(/"/g, '""')}"`;
}
return val;
});
csvRows.push(values.join(','));
});
fs.writeFileSync(filename, csvRows.join('\n'));
console.log(`Exported ${data.length} records to ${filename}`);
}
function exportToJSON(data, filename) {
fs.writeFileSync(filename, JSON.stringify(data, null, 2));
console.log(`Exported ${data.length} records to ${filename}`);
}
Best Practices and Ethical Considerations
When scraping Spotify data, keep these guidelines in mind:
Respect rate limits: Add delays between requests (1-3 seconds minimum). Spotify actively monitors and blocks aggressive scrapers.
Use the official API first: If the data you need is available through the Web API, use it. Scraping should complement the API, not replace it.
Rotate proxies: For large-scale scraping, use residential proxies to avoid IP bans. Apify's proxy infrastructure handles this automatically.
Cache aggressively: Artist profiles and album data don't change frequently. Cache results and only re-scrape at sensible intervals.
Handle dynamic content: Spotify's web player is a React SPA. Always wait for elements to render before extracting data.
Respect robots.txt: Check Spotify's robots.txt for guidance on which paths are disallowed.
Don't scrape private data: Only collect publicly visible information. Never attempt to access private playlists, user listening history, or account data.
Store data responsibly: Follow data protection regulations (GDPR, CCPA) when storing and processing collected data.
Conclusion
Spotify data scraping is a powerful technique for music industry analysis, research, and application development. By combining the official Spotify Web API with targeted web scraping, you can build comprehensive datasets covering artist profiles, playlist compositions, chart histories, and track metadata.
Apify makes it straightforward to scale these scraping operations with managed infrastructure, proxy rotation, and built-in data storage. Whether you're building a music analytics dashboard, conducting academic research, or developing a recommendation engine, the techniques in this guide give you the foundation to collect the Spotify data you need.
Start small with the API, add scraping where the API falls short, and scale up with Apify when you need to go big. Happy scraping!
Top comments (0)