Why Scrape Goodreads?
Goodreads is the world's largest book community — 150+ million members, 3.5 billion books catalogued, and millions of reviews. Whether you're building a book recommendation engine, analyzing reading trends, tracking author performance, or researching the publishing market, Goodreads data is invaluable.
But Goodreads has no public API (they shut it down in December 2020). That means scraping is the only way to access structured book data at scale.
In this guide, I'll show you how to scrape Goodreads books, reviews, and author data using Python — including a ready-to-use solution that handles anti-bot detection, pagination, and data formatting.
What Data Can You Extract from Goodreads?
Here's what's available:
- Book details: title, author, ISBN/ISBN-13, publisher, publication date, page count, edition, format
- Ratings & reviews: average rating, total ratings, total reviews, star distribution, individual review text
- Author info: name, bio, follower count, book count
- Genres & shelves: genre tags, popular shelves, reading lists
- Search results: keyword search, genre browsing, bestseller lists
Method 1: DIY with Python + BeautifulSoup
You can scrape Goodreads yourself with requests and BeautifulSoup:
import requests
from bs4 import BeautifulSoup
import time
import json
def scrape_goodreads_book(url):
headers = {
"User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36"
}
response = requests.get(url, headers=headers)
soup = BeautifulSoup(response.text, "html.parser")
title = soup.select_one("h1.Text__title1").text.strip()
author = soup.select_one("span.ContributorLink__name").text.strip()
rating = soup.select_one("div.RatingStatistics__rating").text.strip()
return {
"title": title,
"author": author,
"rating": float(rating),
"url": url
}
# Example usage
book = scrape_goodreads_book("https://www.goodreads.com/book/show/5907.The_Hobbit")
print(json.dumps(book, indent=2))
The problem: Goodreads uses heavy JavaScript rendering, dynamic class names, and aggressive rate limiting. Your DIY scraper will break within weeks as selectors change, and you'll get blocked after a few hundred requests.
Method 2: Using the Apify Goodreads Scraper (Recommended)
A more reliable approach is using a managed scraper that handles anti-bot detection, retries, and proxy rotation automatically.
The Goodreads Scraper on Apify extracts structured book data with zero configuration:
from apify_client import ApifyClient
# Initialize the Apify client
client = ApifyClient("YOUR_APIFY_TOKEN")
# Configure the scraper
run_input = {
"searchTerms": ["science fiction 2026"],
"maxResults": 50,
"includeReviews": True
}
# Run the actor
run = client.actor("cryptosignals/goodreads-scraper").call(run_input=run_input)
# Fetch results
for item in client.dataset(run["defaultDatasetId"]).iterate_items():
print(f"{item['title']} by {item['author']} — {item['rating']}/5 ({item['ratingsCount']} ratings)")
Install the client first:
pip install apify-client
Sample Output
Here's what the structured output looks like:
{
"title": "Project Hail Mary",
"author": "Andy Weir",
"rating": 4.52,
"ratingsCount": 1245678,
"reviewsCount": 89432,
"isbn": "0593135202",
"isbn13": "9780593135204",
"pages": 496,
"publisher": "Ballantine Books",
"publishDate": "2021-05-04",
"genres": ["Science Fiction", "Fiction", "Audiobook", "Space"],
"description": "Ryland Grace is the sole survivor on a desperate...",
"url": "https://www.goodreads.com/book/show/54493401"
}
Clean, structured JSON with every field you need — no parsing HTML, no broken selectors, no proxy management.
Use Cases for Goodreads Data
1. Book Recommendation Engines
Scrape ratings, genres, and review sentiment to build collaborative filtering models. Combine with user shelf data to find "readers who liked X also liked Y" patterns.
2. Publishing Market Research
Track which genres are trending, which debut authors are gaining traction, and what publication formats (hardcover vs. ebook vs. audio) are growing. Invaluable for publishers and literary agents.
3. Author Analytics
Monitor an author's rating trajectory over time, track review sentiment, compare performance across titles. Useful for marketing teams and self-published authors.
4. Academic Research
Study reading trends, cultural preferences across regions, or the impact of book-to-film adaptations on ratings. Goodreads data has been used in hundreds of published papers.
5. Competitive Intelligence for Booksellers
Track competitor titles' performance, identify underserved niches, and optimize inventory based on real reader demand rather than publisher push.
Cost Comparison: Goodreads Data Sources
| Method | Cost | Reliability | Speed |
|---|---|---|---|
| DIY scraper | Free (your time) | Low — breaks often | Slow — rate limited |
| Goodreads API | Dead (shut down 2020) | N/A | N/A |
| Apify Goodreads Scraper | $0.01/result, first 100 free | High — maintained | Fast — parallel |
| Data brokers | $200-500/dataset | Medium | One-time dump |
| Manual collection | Free | High | Extremely slow |
At $0.01 per result, scraping 1,000 books costs $9. That's less than a single Goodreads premium membership used to cost.
Advanced: Scraping Goodreads Reviews at Scale
Reviews are the most valuable Goodreads data for NLP and sentiment analysis. Here's how to extract them:
from apify_client import ApifyClient
import pandas as pd
client = ApifyClient("YOUR_APIFY_TOKEN")
# Scrape reviews for a specific book
run_input = {
"bookUrls": ["https://www.goodreads.com/book/show/5907.The_Hobbit"],
"includeReviews": True,
"maxReviews": 500
}
run = client.actor("cryptosignals/goodreads-scraper").call(run_input=run_input)
# Load into pandas for analysis
results = list(client.dataset(run["defaultDatasetId"]).iterate_items())
df = pd.DataFrame(results)
# Basic sentiment breakdown
print(f"Average rating: {df['rating'].mean():.2f}")
print(f"5-star reviews: {len(df[df['rating']==5])}")
print(f"1-star reviews: {len(df[df['rating']==1])}")
Tips for Scraping Goodreads Effectively
Start with search terms, not URLs. The scraper can find books by keyword, which is faster than collecting individual book URLs.
Use the free tier to test. Every run includes 100 free results — enough to validate your data pipeline before committing.
Export to CSV for spreadsheets. Apify lets you download results as CSV, JSON, or Excel directly from the dashboard.
Schedule recurring scrapes. Set up daily or weekly runs to track how ratings and review counts change over time.
Respect the platform. Don't scrape faster than necessary. The managed scraper handles rate limiting automatically.
Getting Started
- Create a free Apify account
- Go to the Goodreads Scraper
- Enter your search terms or book URLs
- Click Start and get structured data in minutes
No credit card needed for the free tier. First 100 results per run are always free.
Built by CryptoSignals on Apify. Have questions or feature requests? Open an issue on the actor page.
Top comments (0)