Web scraping Steam is one of the most popular use cases in gaming analytics, price tracking, and market research. With over 70,000 games on the platform and millions of daily active users, Steam holds a goldmine of publicly available data — from pricing and discounts to user reviews and player statistics.
In this guide, we'll walk through how Steam's web structure works, what data you can extract, and how to build scrapers that pull game prices, reviews, and player stats efficiently.
Why Scrape Steam?
Steam is the largest digital distribution platform for PC gaming. Businesses and developers scrape Steam for several reasons:
- Price monitoring: Track game prices, discounts, and historical pricing across regions
- Market research: Analyze what genres are trending, what games are selling well
- Competitive analysis: Game developers want to understand pricing strategies and review sentiment
- Investment research: Gaming industry analysts need aggregate data on player counts and engagement
- Wishlist tools: Build price alert services that notify users when games hit their target price
The Steam Store, SteamDB, and the Steam Community all expose different data points. Let's break down the structure.
Understanding Steam's Web Structure
Steam's storefront is built as a server-rendered web application with heavy JavaScript enhancement. Here are the key page types:
Game Store Pages
Each game has a store page at https://store.steampowered.com/app/{appid}/{game_name}/. For example, Counter-Strike 2 lives at https://store.steampowered.com/app/730/CounterStrike_2/.
Key data points on store pages:
- Game title, description, and developer/publisher info
- Current price and discount percentage
- Release date and early access status
- Tags and categories
- System requirements
- Metacritic score
- Recent and all-time review summary
Review Pages
Steam reviews are accessible at https://store.steampowered.com/appreviews/{appid}?json=1. This is actually a semi-public API endpoint that returns JSON directly — no HTML parsing needed.
Player Statistics
Current player counts are visible on SteamCharts and through Steam's own API. The Steam Web API provides endpoints like ISteamUserStats/GetNumberOfCurrentPlayers.
Extracting Game Prices
Let's start with the most common use case: extracting game pricing data. Steam has a useful API endpoint for this:
const axios = require('axios');
async function getGamePrice(appId, countryCode = 'us') {
const url = `https://store.steampowered.com/api/appdetails`;
const params = {
appids: appId,
cc: countryCode,
filters: 'price_overview'
};
try {
const response = await axios.get(url, { params });
const data = response.data[appId];
if (data.success && data.data.price_overview) {
const price = data.data.price_overview;
return {
appId,
currency: price.currency,
initial: price.initial / 100,
final: price.final / 100,
discountPercent: price.discount_percent
};
}
return null;
} catch (error) {
console.error(`Error fetching price for ${appId}:`, error.message);
return null;
}
}
// Example: Get price for Elden Ring (appid: 1245620)
getGamePrice(1245620).then(console.log);
This returns structured pricing data including the original price, sale price, and discount percentage. You can pass different country codes (us, gb, eu, jp) to get region-specific pricing.
Bulk Price Extraction
For scraping prices across many games, you need to handle rate limiting carefully. Steam typically allows around 200 requests per 5 minutes from a single IP:
const axios = require('axios');
async function getBulkPrices(appIds, countryCode = 'us') {
const results = [];
// Steam allows up to 100 appids per request
const chunks = [];
for (let i = 0; i < appIds.length; i += 100) {
chunks.push(appIds.slice(i, i + 100));
}
for (const chunk of chunks) {
const url = `https://store.steampowered.com/api/appdetails`;
const params = {
appids: chunk.join(','),
cc: countryCode,
filters: 'price_overview'
};
try {
const response = await axios.get(url, { params });
for (const appId of chunk) {
const data = response.data[appId];
if (data && data.success && data.data?.price_overview) {
results.push({
appId,
name: data.data.name || appId,
price: data.data.price_overview.final / 100,
discount: data.data.price_overview.discount_percent,
currency: data.data.price_overview.currency
});
}
}
} catch (error) {
console.error('Batch error:', error.message);
}
// Respect rate limits
await new Promise(resolve => setTimeout(resolve, 1500));
}
return results;
}
Scraping Steam Reviews
Steam reviews are incredibly valuable for sentiment analysis and game quality assessment. The review API endpoint returns structured JSON:
const axios = require('axios');
async function getReviews(appId, options = {}) {
const {
filter = 'recent',
language = 'english',
numPerPage = 100,
cursor = '*'
} = options;
const url = `https://store.steampowered.com/appreviews/${appId}`;
const params = {
json: 1,
filter,
language,
num_per_page: numPerPage,
cursor,
purchase_type: 'all'
};
try {
const response = await axios.get(url, { params });
const data = response.data;
if (data.success === 1) {
return {
reviews: data.reviews.map(review => ({
recommendationId: review.recommendationid,
author: {
steamId: review.author.steamid,
playtimeForever: review.author.playtime_forever,
playtimeLastTwoWeeks: review.author.playtime_last_two_weeks,
lastPlayed: review.author.last_played
},
language: review.language,
review: review.review,
votedUp: review.voted_up,
votesUp: review.votes_up,
votesFunny: review.votes_funny,
timestampCreated: review.timestamp_created,
timestampUpdated: review.timestamp_updated,
steamPurchase: review.steam_purchase,
receivedForFree: review.received_for_free,
writtenDuringEarlyAccess: review.written_during_early_access
})),
cursor: data.cursor,
totalReviews: data.query_summary?.total_reviews,
totalPositive: data.query_summary?.total_positive,
totalNegative: data.query_summary?.total_negative,
reviewScore: data.query_summary?.review_score_desc
};
}
return null;
} catch (error) {
console.error(`Error fetching reviews for ${appId}:`, error.message);
return null;
}
}
// Paginate through all reviews
async function getAllReviews(appId, maxReviews = 1000) {
let allReviews = [];
let cursor = '*';
while (allReviews.length < maxReviews) {
const result = await getReviews(appId, { cursor });
if (!result || result.reviews.length === 0) break;
allReviews = allReviews.concat(result.reviews);
cursor = result.cursor;
// Rate limiting
await new Promise(resolve => setTimeout(resolve, 1000));
}
return allReviews.slice(0, maxReviews);
}
Each review includes the reviewer's playtime, which is powerful for filtering out low-quality reviews. A review from someone with 500+ hours carries different weight than one from someone with 30 minutes.
Extracting Player Statistics
Player count data helps you understand game popularity and engagement trends:
const axios = require('axios');
async function getCurrentPlayers(appId) {
const url = `https://api.steampowered.com/ISteamUserStats/GetNumberOfCurrentPlayers/v1/`;
const params = { appid: appId };
try {
const response = await axios.get(url, { params });
if (response.data.response.result === 1) {
return {
appId,
currentPlayers: response.data.response.player_count,
timestamp: new Date().toISOString()
};
}
return null;
} catch (error) {
console.error(`Error fetching player count for ${appId}:`, error.message);
return null;
}
}
// Track top games player counts
async function getTopGamesPlayers() {
const topGames = [
{ id: 730, name: 'Counter-Strike 2' },
{ id: 570, name: 'Dota 2' },
{ id: 440, name: 'Team Fortress 2' },
{ id: 1245620, name: 'Elden Ring' },
{ id: 892970, name: 'Valheim' }
];
const results = [];
for (const game of topGames) {
const players = await getCurrentPlayers(game.id);
if (players) {
results.push({ ...game, ...players });
}
await new Promise(resolve => setTimeout(resolve, 500));
}
return results;
}
Full Store Page Scraping with Cheerio
For extracting data that isn't available through the API — like tags, screenshots, and detailed descriptions — you'll need to parse the HTML:
const axios = require('axios');
const cheerio = require('cheerio');
async function scrapeStorePage(appId) {
const url = `https://store.steampowered.com/app/${appId}/`;
const headers = {
'Cookie': 'birthtime=0; lastagecheckage=1-0-2000; wants_mature_content=1',
'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36'
};
try {
const response = await axios.get(url, {
headers,
maxRedirects: 5
});
const $ = cheerio.load(response.data);
return {
title: $('.apphub_AppName').text().trim(),
description: $('.game_description_snippet').text().trim(),
releaseDate: $('.date').first().text().trim(),
developer: $('.dev_row .summary a').first().text().trim(),
publisher: $('.dev_row .summary a').last().text().trim(),
tags: $('.app_tag').map((i, el) => $(el).text().trim()).get(),
reviewSummary: $('.game_review_summary').first().text().trim(),
recentReviews: $('.game_review_summary').last().text().trim(),
price: $('.game_purchase_price').first().text().trim()
|| $('.discount_final_price').first().text().trim(),
screenshots: $('.highlight_screenshot_link').map(
(i, el) => $(el).attr('href')
).get(),
categories: $('.category_list .category_icon').map(
(i, el) => $(el).next().text().trim()
).get()
};
} catch (error) {
console.error(`Error scraping ${appId}:`, error.message);
return null;
}
}
Note the cookies — Steam has age gates on mature content that will redirect you to an age check page without them.
Handling Steam's Anti-Scraping Measures
Steam employs several anti-scraping measures:
- Rate limiting: Aggressive rate limits that can result in temporary IP bans
- Age gates: Redirect to age verification for mature content
- Region-based content: Different content served based on IP geolocation
- CAPTCHA challenges: For excessive requests from a single IP
To handle these at scale, you'll want proxy rotation and proper request throttling. This is where a managed platform like Apify becomes invaluable.
Using Apify for Steam Scraping
Building and maintaining your own scraping infrastructure for Steam is possible, but it means dealing with proxy management, rate limiting, scheduling, and data storage yourself. The Apify Store has ready-made Steam scrapers that handle all of this.
Apify actors for Steam data can handle:
- Automatic proxy rotation to avoid IP bans
- Scheduled runs for daily price monitoring
- Built-in data storage and export (JSON, CSV, Excel)
- Webhook integration for price alerts
- Scaling to thousands of games without infrastructure management
Here's how you'd use an Apify actor programmatically:
const { ApifyClient } = require('apify-client');
const client = new ApifyClient({
token: 'YOUR_APIFY_TOKEN'
});
async function runSteamScraper() {
const run = await client.actor('your-steam-actor-id').call({
appIds: [730, 570, 1245620],
includeReviews: true,
maxReviewsPerGame: 500,
countryCodes: ['us', 'gb', 'eu']
});
const { items } = await client.dataset(run.defaultDatasetId).listItems();
console.log(`Scraped ${items.length} game records`);
return items;
}
Data Output and Analysis
Once you've collected Steam data, common analysis tasks include:
- Price history graphs: Track discounts during seasonal sales
- Sentiment analysis: Classify reviews as positive/negative/mixed using NLP
- Player count trends: Identify games gaining or losing popularity
- Genre analysis: Find underserved genres with high demand
function analyzeReviewSentiment(reviews) {
const total = reviews.length;
const positive = reviews.filter(r => r.votedUp).length;
const negative = total - positive;
const avgPlaytime = reviews.reduce(
(sum, r) => sum + r.author.playtimeForever, 0
) / total;
return {
total,
positive,
negative,
positivePercent: ((positive / total) * 100).toFixed(1),
avgPlaytimeHours: (avgPlaytime / 60).toFixed(1),
highQualityReviews: reviews.filter(
r => r.author.playtimeForever > 600
).length
};
}
Legal and Ethical Considerations
When scraping Steam, keep these guidelines in mind:
- Respect robots.txt: Steam's robots.txt disallows certain paths — honor those restrictions
- Rate limit your requests: Don't hammer their servers. Space requests at least 1-2 seconds apart
- Don't scrape private data: Stick to publicly visible information
- Check Steam's Terms of Service: Automated access may violate ToS in some contexts
- Use data responsibly: Don't redistribute raw data commercially without understanding the legal implications
Conclusion
Steam offers a wealth of publicly accessible data through both its semi-public APIs and its web pages. Whether you're building a price tracker, conducting market research, or analyzing gaming trends, the combination of Steam's API endpoints and HTML scraping gives you comprehensive coverage.
For production-grade scraping at scale, platforms like Apify eliminate the infrastructure headaches and let you focus on what matters — the data and insights. Check the Apify Store for ready-made Steam actors that handle proxy rotation, scheduling, and data export out of the box.
The gaming data market continues to grow as more businesses recognize the value of gaming analytics. Getting your scraping pipeline right today positions you well for an increasingly data-driven gaming industry.
Top comments (0)