How to Scrape IMDb in 2026 (Movie Ratings, Cast, Box Office Data)

#python #webscraping #movies #datascience

IMDb is the largest movie database on the planet — 10M+ titles, 500M monthly visitors, and the go-to source for ratings, cast info, and box office data. But since Amazon shut down the IMDb API in 2023, there is no official way to get bulk data anymore.

Scraping is the only reliable option left. And in 2026, IMDb actually makes it easier than you might think.

Why IMDb Is Surprisingly Scrapable

IMDb runs on Next.js and embeds two goldmines in every page:

JSON-LD structured data — Schema.org Movie objects with title, rating, director, genre, and more, right in the <head> tag
__NEXT_DATA__ — A full JSON payload with cast lists, box office numbers, runtime, and metadata that the frontend hydrates from

This means you do not need to parse HTML tables or deal with CSS selectors that break every redesign. The data is already structured. You just need to extract it.

No login required. No API key. No rate-limit headers. Just fetch the page and parse the JSON.

The Fast Way: Use a Ready-Made IMDb Scraper

If you want results in minutes instead of hours of coding, the IMDb Scraper on Apify handles all of this out of the box. It supports four modes:

Mode	What It Does
Search	Find movies by keyword, genre, or year
Movie Details	Full data for specific titles
Actor/Person	Filmography, bio, known-for titles
Top Charts	IMDb Top 250, Most Popular, Box Office

Example: Search for Sci-Fi Movies from 2025

{
  "mode": "search",
  "query": "sci-fi 2025",
  "maxItems": 50
}

What You Get Back

Each result includes:

{
  "title": "Emergence",
  "year": 2025,
  "rating": 7.4,
  "genres": ["Sci-Fi", "Thriller"],
  "director": "Denis Villeneuve",
  "cast": ["Timothée Chalamet", "Zendaya"],
  "runtime": "148 min",
  "boxOffice": "$312M",
  "plot": "A physicist discovers...",
  "imdbUrl": "https://www.imdb.com/title/tt1234567/"
}

Clean, structured, ready to pipe into a database or spreadsheet.

Use Cases That Actually Make Money

1. Movie Recommendation Engines
Pull ratings, genres, and cast data for thousands of titles. Feed it into a collaborative filtering model. Services like Letterboxd and JustWatch started with exactly this kind of data pipeline.

2. Film Industry Research
Track box office performance by genre, director, or studio. Hedge funds and entertainment analysts pay for this data — and IMDb is the primary source.

3. Content Aggregation
Build a niche movie site (horror rankings, Oscar predictions, franchise trackers) with auto-updated data. Monetize with ads or affiliate links to streaming platforms.

4. Academic & Data Science Projects
IMDb datasets on Kaggle are years old and incomplete. A live scraper gives you current ratings, new releases, and trending titles that static datasets miss.

Building Your Own IMDb Scraper

If you prefer to build from scratch, here is the approach:

Fetch the page with a headless browser or HTTP client (IMDb does not heavily block requests)
Extract __NEXT_DATA__ from the <script id="__NEXT_DATA__"> tag
Parse JSON-LD from <script type="application/ld+json">
Merge both sources — JSON-LD has clean Schema.org fields, __NEXT_DATA__ has deeper details like full cast and box office

The main challenge is pagination for search results and handling IMDb's occasional layout changes. A managed scraper like the Apify actor handles retries, proxy rotation, and schema changes automatically.

Proxy Considerations

IMDb is lenient compared to most sites, but if you are pulling thousands of pages, you will want proxies. ScrapeOps offers a proxy aggregator that works well for entertainment sites — it rotates across multiple providers and handles CAPTCHAs if they appear.

For smaller jobs (under 500 pages), residential proxies are overkill. Datacenter proxies or even raw requests with reasonable delays (2-3 seconds between requests) work fine. For larger jobs where you do want residential IPs to avoid any throttling, ThorData offers good residential proxy bandwidth at per-GB rates — useful if you are pulling thousands of movie pages in a single run.

Legal Notes

IMDb's Terms of Service restrict automated access, but courts have consistently ruled that scraping publicly available data is legal (hiQ v. LinkedIn, 2022). That said:

Do not scrape user reviews or personal data at scale
Do not republish IMDb ratings while claiming them as your own
Respect robots.txt and rate limits
Use the data for analysis, aggregation, or building derivative products

Getting Started

The fastest path from zero to data:

Go to the IMDb Scraper on Apify
Set your mode (search, details, charts, or person)
Run it — free tier gives you enough for testing
Export as JSON, CSV, or push directly to your database

IMDb is not going to build another API. Scraping is the permanent solution. The data is structured, the access is straightforward, and the use cases are real. Start pulling data today.

Skip the Build

You don't have to reinvent this. We maintain a production-grade scraper as an Apify actor — proxies, anti-bot, retries, and schema all handled. You can run it on a pay-per-result basis and get clean JSON without writing a single line of scraping code.

Amazon Product Scraper on Apify