Goodreads killed its public API in December 2020. If you need book data — ratings, reviews, author info, shelves — web scraping is now the only realistic option.
This guide covers the best tools available in 2026 for scraping Goodreads, from cloud-based actors to DIY approaches.
Why Goodreads Data Still Matters
Even without an API, Goodreads remains the largest book database on the web:
- 200+ million reviews across millions of titles
- 150+ million registered users generating reading activity
- Rich metadata: genres, series info, edition details, author bios
Common use cases include:
- Book recommendation engines — pull ratings and genre tags to build collaborative filtering models
- Publisher market research — track which genres trend, what ratings new releases get
- Author tracking — monitor new releases, rating changes, review sentiment
- Academic research — reading habits, genre popularity over time
- Library collection development — identify high-demand titles by rating volume
The Scraping Landscape in 2026
Goodreads uses server-rendered HTML, which makes it relatively straightforward to scrape compared to SPAs. However, there are challenges:
- Rate limiting — aggressive request patterns get IP-blocked quickly
- Dynamic content — some sections load via JavaScript (reviews pagination)
- Anti-bot measures — CAPTCHAs appear after sustained scraping
- Layout changes — Goodreads updates its HTML structure periodically
This is why managed scraping platforms like Apify have become popular — they handle proxies, retries, and browser rendering so you don't have to.
Option 1: Apify Store Actors
The Apify Store hosts several Goodreads scrapers built by the community. These run in the cloud with built-in proxy rotation and scheduling.
What to look for in an Apify actor:
- Recent updates (actors abandoned for 6+ months may break on layout changes)
- Proxy support (residential proxies work best for Goodreads)
- Structured output (JSON with consistent field names)
- Search + detail page support (not just one or the other)
Our Upcoming Actor
We're building a dedicated Goodreads scraper at apify.com/cryptosignals/goodreads-scraper focused on:
- Book search — scrape search results by keyword, genre, or list
- Book details — title, author, rating, review count, genres, description, ISBN, page count
- Author profiles — bio, book list, average rating, follower count
- List scraping — pull entire Goodreads lists (e.g., "Best Science Fiction of 2025")
This actor is upcoming and not yet publicly available — check the link for launch updates.
Option 2: DIY with Python
If you prefer to build your own scraper, here's a minimal approach using requests and BeautifulSoup4:
import requests
from bs4 import BeautifulSoup
def scrape_book(url):
headers = {"User-Agent": "Mozilla/5.0"}
resp = requests.get(url, headers=headers)
soup = BeautifulSoup(resp.text, "html.parser")
title = soup.select_one("h1.Text__title1")
author = soup.select_one("span.ContributorLink__name")
rating = soup.select_one("div.RatingStatistics__rating")
return {
"title": title.text.strip() if title else None,
"author": author.text.strip() if author else None,
"rating": rating.text.strip() if rating else None,
}
Pros: Full control, no platform fees, customizable output.
Cons: You handle proxies, rate limiting, retries, and maintenance when Goodreads changes its HTML. Expect to spend significant time on infrastructure rather than data analysis.
Option 3: Browser Automation
For JavaScript-heavy pages (like paginated reviews), you may need Playwright or Puppeteer:
from playwright.sync_api import sync_playwright
def scrape_reviews(book_url):
with sync_playwright() as p:
browser = p.chromium.launch()
page = browser.new_page()
page.goto(book_url)
page.wait_for_selector("article.ReviewCard")
reviews = page.query_selector_all("article.ReviewCard")
data = []
for review in reviews[:10]:
text = review.query_selector("span.Formatted")
stars = review.query_selector("span.RatingStars")
data.append({
"text": text.inner_text() if text else "",
"rating": stars.get_attribute("aria-label") if stars else ""
})
browser.close()
return data
This is slower and more resource-intensive but handles dynamic content that plain HTTP requests miss.
Comparison Table
| Feature | Apify Actors | DIY Python | Browser Automation |
|---|---|---|---|
| Setup time | Minutes | Hours | Hours |
| Proxy handling | Built-in | Manual | Manual |
| JavaScript support | Yes | No | Yes |
| Cost | Pay per usage | Free (+ proxy costs) | Free (+ proxy costs) |
| Maintenance | Actor maintainer | You | You |
| Scalability | High | Medium | Low |
Legal Considerations
Goodreads' Terms of Service prohibit automated scraping. In practice:
- Scraping public data for research or personal use is generally low-risk
- Scraping at high volume or for commercial redistribution carries more legal exposure
- The 2022 hiQ v. LinkedIn ruling supports scraping of publicly accessible data, but this is not settled law everywhere
- Always respect
robots.txtand rate-limit your requests
Recommendations
For most users: Start with an Apify actor. The time saved on proxy management and maintenance pays for itself quickly. Check our upcoming Goodreads scraper or browse the Apify Store for alternatives.
For developers who need full control: Build with requests + BeautifulSoup for metadata, add Playwright only for review scraping. Budget time for ongoing maintenance.
For one-off research: A simple Python script with time.sleep(2) between requests is often enough. No need for infrastructure.
Whatever approach you choose, Goodreads remains one of the richest sources of book data on the web — it just takes a bit more work to access it now that the API is gone.
Top comments (0)