Devil Scrapes

Posted on Jun 2

Reverb Scraper: pull historical sold prices for any musical gear

#webscraping #python #apify #data

Quick answer: The Reverb Price Guide is the largest public dataset of used-instrument sale prices on the internet — millions of completed transactions across guitars, basses, amps, pedals, synths, and drums — but Reverb offers no bulk export and no API. A Reverb scraper walks the Price Guide search endpoint, drills into each product's transaction history, and returns one flat JSON row per historical sale with both ask and buyer-paid price. The Apify Actor below does it for $0.005 per transaction (~$5.05 per 1,000), with proxy rotation, TLS fingerprinting, and pagination handled for you.

Resellers and vintage-gear investors have quietly used this dataset for years. The catch is extracting it at scale: the public UI shows 50 transactions per page with no download button. Want it as a CSV — for a comps report, a pricing model, a 1962-Strat appreciation study? You pull it yourself. Here is what that takes.

What is the Reverb Price Guide? 🎸

The Reverb Price Guide is a transaction database built into Reverb's marketplace. Unlike active listings, it records completed sales — the prices real buyers paid, not what sellers hope for. Reverb aggregates these into per-model "guide" entries (a Fender Stratocaster 1962 Sunburst, a Marshall JCM800 2203, and so on), each linking to its full, paginated transaction history.

Per transaction it gives you the final buyer-paid price and the original ask price (both, every row), the sale date (day-precision YYYY-MM-DD), the condition tier (Mint, Excellent, Very Good, Good, Fair), and the context: make, model, year, finish, product type. What it does not give you: a bulk export, a CSV download, a public API, or any way to pull more than 50 rows at a time through the UI.

Does Reverb have an API for sold listings? 🔎

No — not an official one. Reverb offers a seller API for managing your own listings, but publishes no documented endpoint for querying historical sold-transaction data from the Price Guide.

There is, however, an undocumented JSON API the Price Guide web UI calls internally. Two endpoints matter:

/api/priceguide?query=<text>&page=N — search guide entries by model name; returns id, title, make, model, year, finish, product_type, paginated.
/api/priceguide/{id}/transactions?page=N — the full, paginated transaction history for one guide entry. This is the sold data.

These return clean JSON, accept standard application/hal+json; version=3.0 pinning, and — at time of writing — are not Cloudflare-gated. We use them directly.

One thing explicitly does not work: GET /api/listings?state=ended. We verified on 2026-05-16 that this filter is silently ignored — every row comes back state.slug = "live". Live inventory, not sold data. We say so rather than quietly route around it.

What the data looks like 📤

Each transaction is one flat, typed row. A real one from guide 6285 (Fender Stratocaster 1962 Sunburst):

{
  "guide_id": 6285,
  "guide_title": "Fender Stratocaster 1962 Sunburst",
  "make": "Fender",
  "model": "Stratocaster",
  "year": "1962",
  "finish": "Sunburst",
  "product_type": "Electric Guitars",
  "order_id": 21423340,
  "sold_at": "2024-10-04",
  "condition": "Excellent",
  "source": "Reverb",
  "sold_price_amount": 1750.0,
  "sold_price_cents": 175000,
  "sold_price_currency": "USD",
  "sold_price_display": "$1,750",
  "ask_price_amount": 1899.0,
  "ask_price_cents": 189900,
  "ask_price_currency": "USD",
  "ask_price_display": "$1,899",
  "guide_url": "https://reverb.com/price-guide/guide/6285",
  "scraped_at": "2026-05-16T20:45:00+00:00"
}

Twenty-one fields, the same shape every time. sold_price_amount is what the buyer paid; ask_price_amount is what the seller listed it for. The gap between the two is one of the most useful signals in used-gear pricing — whether a model sells above, at, or below ask. Both are Pydantic v2-validated before they hit the dataset.

The naive approach (and why it falls apart) ⚠️

Spend ten minutes in DevTools and the first instinct is obvious: open a Stratocaster Price Guide page, find the network call that fetches transactions, replay it with requests.get(), paginate, done. It almost works — then it doesn't, for reasons that matter:

1. TLS fingerprinting is already there defensively. The HTML Price Guide pages return HTTP 403 to plain curl — they're Cloudflare-gated. The JSON API is not gated today, but any Python requests or httpx session emits a TLS ClientHello that fingerprints as Python, not a browser. We use curl-cffi with explicit browser impersonation — Chrome 131 TLS + HTTP/2 SETTINGS frames — so every handshake looks like a real browser. If Reverb adds Cloudflare to the API path tomorrow, we absorb it without re-architecture.

2. Proxy rotation keeps the pipeline alive. We rotate Apify residential proxies with a fresh session_id on every block and exponential backoff (2 → 4 → 8 → 16 → 30 s, max 5 attempts, honouring Retry-After) on 408 / 429 / 503. Rate-limited mid-run, the Actor surfaces partial progress instead of a silent empty dataset.

3. Fan-out control prevents runaway bills. A generic query like fender matches 60,000+ entries; without a cap, one run could fan out into tens of thousands of billed rows. The pipeline is two-stage — stage 1 pages the search for matching guide IDs (applying the productType filter); stage 2 drills into each guide's transaction endpoint — and we enforce an XOR between query and guideIds, bound maxListings (1–10,000) and maxGuides (1–200), and validate at parse time before any network call. Both pagination loops stop cleanly at maxListings.

The Actor 🛠️

I packaged the result as an Apify Actor: Reverb Sold Listings Scraper. Paste a search query in the Apify Console and click Start, or run it programmatically:

from apify_client import ApifyClient

client = ApifyClient("YOUR_APIFY_TOKEN")
run = client.actor("DevilScrapes/reverb-sold-listings").call(
    run_input={
        "query": "fender stratocaster",
        "productType": "Electric Guitars",
        "maxListings": 500,
        "maxGuides": 50,
        "useProxy": True,
    }
)
for item in client.dataset(run["defaultDatasetId"]).iterate_items():
    print(item["sold_at"], item["sold_price_display"], item["condition"])

Already know the exact Price Guide IDs (find one in any URL like reverb.com/price-guide/guide/6285)? Pass "guideIds": [6285, 44841, 22810] instead of query to skip the search stage entirely.

Input fields at a glance:

Field	Type	Default	Description
`query`	string	—	Free-text search (XOR with `guideIds`)
`guideIds`	integer[]	—	Explicit guide IDs, skip search (XOR with `query`)
`productType`	string	—	Case-insensitive category filter in query mode
`maxListings`	integer	100	Cap on total rows emitted (1–10,000)
`maxGuides`	integer	20	Cap on guide entries to drill into in query mode (1–200)
`useProxy`	boolean	true	Route through Apify Proxy

Results stream into the default dataset — export JSON, CSV, Excel, or XML from the Console, or pull via the Apify API.

What you'd actually use this for 💡

Four concrete patterns, not generic "market intelligence":

Reseller pricing models. Pull every transaction for a model over five years, take the median by condition tier, and price intake against the data instead of your gut. A fender stratocaster sweep with productType=Electric Guitars returns hundreds of points across all Strat variants.

Comparable-sales reports. Filter by year and finish, sort by sold_at, and you have the comps report an appraiser charges $50 for — on demand, for any instrument you carry.

Vintage gear investment analysis. sold_at plus sold_price_amount give you a time series of annual medians; ask_price_amount shows whether sellers and buyers are converging or diverging.

Market journalism. Stories on the post-pandemic gear bubble, boutique-pedal corrections, and vintage Marshall appreciation need primary data — and insurers and appraisers need the same numbers for defensible "current market value" grounded in real buyer-paid prices, not a listing page. The Reverb dataset is the canonical source — millions of rows, one run away.

Pricing — exact numbers 💰

Pay-per-event. You pay for rows that land, nothing for rows you request but don't receive — no data, no charge. The only fixed cost is the run warmup.

Event	Price
Actor start (once per run)	$0.05
Result row emitted	$0.005

Volume	Total cost
100 transactions (default trial)	$0.55
1,000 transactions	$5.05
10,000 transactions	$50.05
100,000 transactions	$500.05

Third-party gear-pricing databases license per-seat at $50–200/month for a slice of this data; Reverb publishes no bulk export at any price. The default maxListings=100 keeps trial runs around $0.55 — Apify's free $5 trial credit covers roughly 900 transactions, no credit card.

The technically interesting bit 🔬

The year field is a str, not an int — deliberate and load-bearing. Reverb's Price Guide mixes formats: "1966", "1950s", "Late 1960s", or empty. Coercing to integer would silently discard every decade-format entry — a data-integrity loss worse than a string year in your analysis.

The sold_price_cents / ask_price_cents integers sit alongside the _amount floats for the same reason: floating-point math on money is a known analytics hazard, and the cents integer is the lossless representation. Both choices are encoded in the Pydantic ResultRow model (year: str | None, sold_price_cents: int | None) so the schema itself communicates intent downstream.

Limitations 🚧

Day-precision dates only. Reverb publishes YYYY-MM-DD; no timestamps, and we don't fabricate them.
Coverage is what Reverb publishes. Not every sale ends up in the Price Guide. Items not yet mapped to a canonical guide entry are absent — upstream Reverb behaviour, not a scraping limit.
Generic queries need productType narrowing. A bare query=fender matches 60,000+ entries; the maxGuides cap (default 20) keeps the run predictable. Raise it deliberately for a full sweep.
Currency is whatever Reverb returns. Most rows are USD; some EU/UK sales come back EUR or GBP. sold_price_currency is always populated.
No image downloads, no aggregate stats. guide_url is included; fetch images yourself. The dataset is raw transactions — compute medians, ranges, and trends in your analysis layer.

FAQ ❓

Is scraping the Reverb Price Guide legal?
The Price Guide data is public — no login, no API key, no authentication. This Actor reads only what the public UI exposes, at a conservative pace, and collects only transaction metadata (no personal data). As always, review your jurisdiction and use case before running at scale.

Does the Reverb Price Guide have an official API?
No. Reverb offers a seller API for managing listings, but no documented endpoint for bulk Price Guide transaction data. The /api/priceguide paths this Actor uses are undocumented and may change without notice.

Can I export to Google Sheets or a data warehouse?
Yes — export CSV, Excel, JSON, or XML from the Console, webhook the dataset on ACTOR.RUN.SUCCEEDED into Make, Zapier, or n8n, or pull it via the Apify API.

Why does year come back as a string instead of a number?
Reverb mixes formats — "1966", "1950s", "Late 1960s", or blank. Casting to integer would silently drop every decade-format entry, so we preserve the original string.

Try it

The Actor is live on the Apify Store: apify.com/DevilScrapes/reverb-sold-listings.

Free $5 trial credit, no credit card. Run it on fender stratocaster with maxListings=100 and you'll have a dataset of real sold prices — ask and final — in under a minute. Missing a model category, a field, or a use case? Drop it in the comments — I ship based on what people actually need.

Built by Devil Scrapes — Apify Actors that do the dirty work so your dataset stays clean. 😈

DEV Community