itch.io hosts over 500,000 indie games and has roughly 10 million registered users. It is the dominant platform for game jams, experimental titles, and developers who want to publish without a publisher. All of that catalog data — titles, ratings, genres, platforms, creator information — is publicly visible. None of it is available through a bulk API.
This post covers what data is actually accessible on itch.io, who needs it and why, and how to run our actor to extract it without building or maintaining any scraping infrastructure.
The data access problem
There is no public itch.io API for catalog data. itch.io provides a limited API for developers to manage their own games and purchases, but there is no endpoint that lets you query the catalog by genre, rating, platform, or creator. If you want a list of the top-rated horror games, or all puzzle games with more than 1,000 ratings, or every title a specific developer has published — you cannot get that from an API call. The data exists, it is all publicly visible on the site, but there is no programmatic way to retrieve it in bulk.
The catalog is large and fragmented. With 500,000+ games spanning dozens of genres, tags, and platforms, manually compiling even a modest dataset is impractical. A researcher trying to track rating trends across game jams, or a journalist building a "best of" list, or a developer benchmarking their own game against the market — all of them face the same problem: the data is there, but getting it out at any scale requires either hours of manual copy-paste or purpose-built extraction tooling.
Pages are server-rendered. itch.io renders its browse and search pages as standard HTML. This means the data is accessible to anyone who can read a web page — but actually collecting it at scale, handling pagination, respecting the site's rate limits, and normalizing the output into a structured format still requires non-trivial engineering work to do right.
Who actually needs this data
Indie game developers doing competitive research. Before pricing a new game, launching a jam entry, or picking a genre to target, developers want to understand the landscape. How many games in a given tag have ratings above 4.0? What is the average rating count for top-performing puzzle games? What platforms do the most-played browser games support? These are answerable questions if you have the data.
Game jam organizers and curators. Jam organizers running events on itch.io often want to survey the catalog of entries — track which games are gaining traction post-jam, compile shortlists for coverage, or analyze genre distribution across submissions. Doing this for dozens or hundreds of games without automated extraction is slow.
Game journalists and critics. Writers covering the indie space need lists. The top-rated games in a specific tag this year. The highest-rated games with under 500 ratings (hidden gems). All titles from a creator whose new release just launched. These are the kinds of queries that make for good editorial angles, and right now most journalists build them manually.
Academic researchers. Game studies researchers use itch.io as a field site. Analyzing genre trends over time, studying rating distributions, tracking which platform combinations correlate with higher engagement — all of this requires structured data from a large sample. itch.io's public catalog is one of the few places where indie game metadata is available at this scale.
Data product builders. Teams building game discovery tools, recommendation engines, or market intelligence products need raw catalog data as an input. itch.io is a natural complement to Steam data for anyone covering the full indie game market, not just the commercial storefront.
What data you actually get
Our actor extracts the following fields from public itch.io game listings — no account or API key required:
- id — itch.io's internal game ID
- title — game title as listed on the platform
- url — direct link to the game page
- creator — developer or publisher name
- creator_url — link to the creator's itch.io profile
- description — short game description from the listing
- genre — genre classification (e.g., Action, Visual Novel, Platformer, RPG)
- rating — average rating on a 0–5 scale
- rating_count — total number of ratings
-
platforms — supported platforms:
windows,mac,linux,android,browser - thumbnail — URL of the game's cover image
How to run the actor
The actor supports four modes: browsing top-rated games, browsing featured/popular games, searching by keyword, and filtering by tag. All modes return the same structured output.
Via Apify Console (no code needed):
- Go to apify.com/cryptosignals/itchio-scraper
- Click Try for free
- Choose your action and set any query parameters
- Click Start and download results as JSON or CSV
Input: top-rated games
{
"action": "top",
"maxItems": 100
}
Input: games by tag
{
"action": "tag",
"query": "horror",
"maxItems": 50
}
Input: keyword search
{
"action": "search",
"query": "bullet hell",
"maxItems": 50
}
Input: featured games
{
"action": "featured",
"maxItems": 36
}
Via Apify API:
curl -X POST "https://api.apify.com/v2/acts/cryptosignals~itchio-scraper/runs" \
-H "Content-Type: application/json" \
-H "Authorization: Bearer YOUR_APIFY_TOKEN" \
-d '{
"action": "tag",
"query": "puzzle",
"maxItems": 100
}'
Sample output record:
{
"id": 123456,
"title": "HoloCure - Save the Fans!",
"url": "https://kay-yu.itch.io/holocure",
"creator": "Kay Yu",
"creator_url": "https://kay-yu.itch.io",
"description": "Fan-made inspired by a certain VTuber group.",
"genre": "Action",
"rating": 4.94,
"rating_count": 18432,
"platforms": ["windows"],
"thumbnail": "https://img.itch.zone/..."
}
Results are returned as a JSON array and can be exported as CSV directly from the Apify console.
Pricing
The actor uses pay-per-event pricing — you pay per game extracted, and the first results are free so you can verify output quality before committing to a larger run. For typical research datasets of a few hundred games, the cost is low enough that the build-vs-buy decision is clear. Check the actor page for current pricing.
What you don't get
Price is not available in itch.io's listing pages. itch.io does not expose game prices in browse or search results — pricing is only visible on individual game pages. The actor extracts listing-level data (what you see when browsing by top, featured, search, or tag), not the full detail page for each game.
Individual game page data — full description text, all screenshots, download counts, comment threads, or pricing — would require a separate per-game fetch for each title. The current actor focuses on catalog-level metadata.
The alternative
You can build this yourself. The engineering work involves: handling itch.io's pagination across potentially hundreds of pages, normalizing inconsistent genre and platform markup, implementing polite rate limiting to avoid overloading the server, parsing structured data out of HTML, and maintaining the scraper when the page structure changes. That is a day or two of engineering time to build, plus ongoing maintenance.
For most research workflows, teams do not need a custom scraper — they need the data. The actor delivers structured JSON ready for analysis without any infrastructure overhead.
Actor: apify.com/cryptosignals/itchio-scraper
By: Web Data Labs — data infrastructure for developers and researchers.
Top comments (0)