agenthustler

Posted on May 4 • Originally published at web-data-labs.com

SoundCloud Data in 2026: Why It's Hard to Get and How to Extract It

#music #data #webscraping #automation

SoundCloud hosts over 175 million tracks from more than 30 million artists. For A&R teams, music data analysts, playlist curators, and indie label scouts, it's the single most important platform for spotting emerging talent before it shows up on Spotify or TikTok. And yet getting that data programmatically is genuinely difficult — not because the data is hidden, but because SoundCloud's official API has been closed to new applicants for years, and their anti-scraping infrastructure has tightened significantly since 2024.

This post covers what data is actually available on SoundCloud, why it's hard to get at scale, who needs it and why, and how to run our actor to extract it without building or maintaining any scraping infrastructure.

Why SoundCloud data is hard to get

The official API is effectively closed. SoundCloud's public API registration form has been disabled to new developers since 2021. Existing API keys still work, but new applications return a "registrations are temporarily disabled" message — and "temporarily" is now into its fifth year. For any team that didn't get access before 2021, the API is not an option.

The anti-scraping stack has gotten serious. SoundCloud's web app is a JavaScript-heavy single-page application that loads track data through internal endpoints with rotating client IDs. A naive requests.get() returns an empty shell. Headless browsers work, but SoundCloud added behavioral detection in 2024 that flags non-human navigation patterns within a few hundred requests. High-volume extraction now requires residential proxies and careful pacing.

Rate limiting is aggressive on the unauthenticated paths. Even legitimate browser sessions hit "too many requests" responses after a few hundred page loads from the same IP. The internal API endpoints that power the web app rotate their client tokens every few hours, which breaks any scraper that hardcodes them.

Data is fragmented across surfaces. A track has metadata on its own page, but play counts and engagement numbers update through a separate stats endpoint. Artist profiles list tracks but not full play counts. Playlists embed tracks but truncate descriptions. Pulling a complete picture means hitting multiple URLs per artist or track and stitching the responses together.

The result: most teams either pay for music industry data vendors (Chartmetric, Soundcharts) at $500-2,000/month, build fragile internal scrapers that need constant maintenance as SoundCloud updates its frontend, or just skip SoundCloud entirely and rely on Spotify data — which misses the entire underground/emerging-artist layer.

Who actually needs this data

A&R and label scouting. Independent labels and major-label A&R teams use SoundCloud as their primary scouting surface for hip-hop, electronic, and alternative music. Tracking which unsigned artists are gaining play counts week-over-week is how the next generation of signings gets identified — usually 6-12 months before the artist breaks on streaming platforms.

Music data analytics products. Companies building dashboards for managers, agents, and labels need SoundCloud track and artist data as a raw input alongside Spotify, Apple Music, YouTube, and TikTok numbers. SoundCloud is often the leading indicator that other platforms lag.

Playlist curation and discovery tools. Curators building niche playlists across genres need to scan thousands of new uploads weekly, filter by play count and like-to-play ratio, and surface candidates worth a human listen. Manual discovery doesn't scale past a few dozen tracks per week.

Sync licensing and music supervision. Music supervisors searching for tracks that fit a specific mood, BPM, or genre for ads, films, and games use SoundCloud as a pool of licensable indie music. Bulk track metadata extraction lets supervisors filter at scale rather than browsing manually.

Competitor and trend analysis. Labels and managers tracking what's working for competing artists need historical play count, repost, and follower trajectory data. SoundCloud exposes this on artist pages but doesn't offer any export or trend view.

Academic and journalistic research. Music researchers studying genre evolution, regional scene dynamics, or the structure of artist-to-artist influence need bulk data. SoundCloud is one of the few platforms where genre tags and reposts make those network structures visible.

What data you actually get

Our actor extracts the following fields from public SoundCloud pages — no authenticated session required:

track_id — SoundCloud track ID
title — track title
artist_name — track uploader display name
artist_url — canonical URL of the uploading artist's profile
genre — primary genre tag
tags — list of additional tags set by the uploader
duration_ms — track duration in milliseconds
play_count — total play count
like_count — total likes
repost_count — total reposts
comment_count — total comments
release_date — date the track was uploaded
description — full track description
artwork_url — URL to the track artwork image
track_url — canonical SoundCloud track URL
playlist_id — playlist ID (when scraping playlists)
playlist_title — playlist title
playlist_track_count — number of tracks in the playlist
artist_follower_count — uploader follower count
artist_track_count — total tracks uploaded by the artist
scraped_at — timestamp of extraction

How to run the actor

Via Apify Console (no code needed):

Go to apify.com/cryptosignals/soundcloud-scraper
Click Try for free
Paste your target URLs into the urls field — accepts track URLs, artist URLs, or playlist URLs
Set max_results to cap the run if you're working through a long list
Click Start and download results as JSON or CSV

Input JSON:

{
  "urls": [
    "https://soundcloud.com/skrillex",
    "https://soundcloud.com/discover/sets/charts-top:all-music",
    "https://soundcloud.com/fred-again/turn-on-the-lights-again-feat-future"
  ],
  "max_results": 100
}

Via Apify API:

curl -X POST "https://api.apify.com/v2/acts/cryptosignals~soundcloud-scraper/runs" \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer YOUR_APIFY_TOKEN" \
  -d '{
    "urls": ["https://soundcloud.com/skrillex"],
    "max_results": 50
  }'

Sample output record:

{
  "track_id": "1234567890",
  "title": "Turn On The Lights again..",
  "artist_name": "Fred again..",
  "artist_url": "https://soundcloud.com/fred-again",
  "genre": "Electronic",
  "tags": ["house", "uk", "fred again", "future"],
  "duration_ms": 218000,
  "play_count": 14820000,
  "like_count": 412000,
  "repost_count": 38500,
  "comment_count": 9200,
  "release_date": "2022-10-21",
  "description": "Turn On The Lights again.. (feat. Future)",
  "artwork_url": "https://i1.sndcdn.com/artworks-...",
  "track_url": "https://soundcloud.com/fred-again/turn-on-the-lights-again-feat-future",
  "artist_follower_count": 1240000,
  "artist_track_count": 87,
  "scraped_at": "2026-05-04T09:00:00+00:00"
}

Pricing

The actor uses pay-per-result pricing: $0.005 per track or profile record. The first 5 results are free so you can verify output quality before committing. For a list of 1,000 tracks, that's $5.

For high-volume A&R workflows (10,000+ tracks per run), residential proxy coverage becomes important for reliability. Oxylabs is the proxy infrastructure we've tested for this kind of music platform workload — their residential network handles SoundCloud's IP reputation checks without the rotation failures that plague datacenter proxies.

What you don't get

The actor extracts public metadata visible to any logged-out visitor. It does not download audio files — that would violate SoundCloud's terms and copyright law on every track that isn't explicitly Creative Commons. If you need audio fingerprinting or feature extraction, that's a separate workflow that operates only on tracks the artist has explicitly licensed for that use.

Private tracks, private playlists, and any data behind a SoundCloud Pro paywall are not accessible. Comment text is summarized as a count rather than a full thread dump — full comment scraping at scale runs into rate limits that aren't worth fighting. Direct messages and private interactions are obviously off-limits.

For very large historical backfills (tracking play count over time for 100,000+ tracks), the play count snapshots from a single run aren't enough — you need scheduled runs and your own time-series storage. The actor produces the snapshot; you're responsible for the longitudinal database.

The alternative

You can build this yourself. The engineering work involves: handling SoundCloud's rotating client IDs, parsing the embedded JSON state from the HTML shell, dealing with the separate stats endpoints, managing proxy rotation against the rate-limit detection, retrying partial responses, and maintaining the scraper when SoundCloud updates its frontend — which has happened three times since 2024.

That's 2-3 weeks of engineering time to build, and ongoing maintenance after that. At $0.005 per record, you'd need to extract more than 2 million records before the build-vs-buy math favors building.

For most A&R, analytics, and curation teams, the answer is clear.

Actor: apify.com/cryptosignals/soundcloud-scraper

By: Web Data Labs — data infrastructure for music industry teams.

DEV Community