Twitch is the largest live streaming platform in the world, and its data is a goldmine for analytics. Whether you are tracking streamer growth, analyzing viewer trends, or building a clips aggregator — scraping Twitch gives you access to real-time and historical streaming data.
In this guide, I will walk you through how to scrape Twitch in 2026 for streams, channels, clips, and viewer data using Python.
Why Scrape Twitch?
Twitch has over 140 million monthly active users and thousands of live streams at any given moment. Common use cases:
- Streamer analytics — track follower growth, average viewers, and stream schedules
- Content aggregation — collect top clips, VODs, and highlights automatically
- Market research — analyze which games and categories are trending
- Esports data — monitor tournament streams and viewer peaks
- Brand monitoring — track mentions and sponsorship visibility
Setting Up Your Environment
import requests
from bs4 import BeautifulSoup
import json
import time
import random
HEADERS = {
"User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/131.0.0.0 Safari/537.36",
"Accept-Language": "en-US,en;q=0.9",
"Accept": "text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8",
}
Install the dependencies:
pip install requests beautifulsoup4
Scraping Channel Information
Twitch channel pages contain follower counts, stream status, bio, and category data.
def scrape_channel(channel_name):
"""Scrape a Twitch channel public profile data."""
url = f"https://www.twitch.tv/{channel_name}"
response = requests.get(url, headers=HEADERS)
soup = BeautifulSoup(response.text, "html.parser")
# Twitch embeds channel data in script tags
scripts = soup.find_all("script", {"type": "application/ld+json"})
for script in scripts:
try:
data = json.loads(script.string)
if data.get("@type") == "VideoObject":
return {
"channel": channel_name,
"title": data.get("name"),
"description": data.get("description", "")[:200],
"thumbnail": data.get("thumbnailUrl"),
"is_live": True,
}
except (json.JSONDecodeError, TypeError):
continue
return {"channel": channel_name, "is_live": False}
channel = scrape_channel("shroud")
print(json.dumps(channel, indent=2))
Scraping Live Streams by Category
Want to monitor which streams are live for a specific game? Twitch directory pages list them:
def scrape_directory(game_slug, max_pages=2):
"""Scrape live streams from a Twitch game directory."""
streams = []
url = f"https://www.twitch.tv/directory/category/{game_slug}"
response = requests.get(url, headers=HEADERS)
soup = BeautifulSoup(response.text, "html.parser")
# Extract stream cards
for card in soup.select("article[data-a-target='card']"):
title_el = card.select_one("h3")
channel_el = card.select_one("a[data-a-target='preview-card-channel-link']")
viewers_el = card.select_one("[data-a-target='preview-card-viewer-count']")
if title_el and channel_el:
streams.append({
"title": title_el.get_text(strip=True),
"channel": channel_el.get_text(strip=True),
"viewers": viewers_el.get_text(strip=True) if viewers_el else "0",
"game": game_slug,
})
return streams
streams = scrape_directory("league-of-legends")
for s in streams[:5]:
print(f"{s['channel']}: {s['title']} ({s['viewers']} viewers)")
Scraping Clips
Twitch clips are short highlights that go viral. Here is how to scrape top clips for a channel:
def scrape_clips(channel_name, period="7d"):
"""Scrape top clips from a Twitch channel."""
url = f"https://www.twitch.tv/{channel_name}/clips?filter=clips&range={period}"
response = requests.get(url, headers=HEADERS)
soup = BeautifulSoup(response.text, "html.parser")
clips = []
for clip_card in soup.select("[data-a-target='clips-card']"):
title_el = clip_card.select_one("h3")
views_el = clip_card.select_one("[data-a-target='clip-views']")
link_el = clip_card.select_one("a[href*='/clip/']")
if title_el:
clips.append({
"title": title_el.get_text(strip=True),
"views": views_el.get_text(strip=True) if views_el else "N/A",
"url": "https://www.twitch.tv" + link_el["href"] if link_el else None,
"channel": channel_name,
})
return clips
clips = scrape_clips("xqc")
for c in clips[:5]:
print(f"{c['title']} - {c['views']} views")
Using Twitch GQL API
Twitch frontend communicates via a GraphQL API. While unofficial, it is a powerful way to get structured data:
def query_twitch_gql(query, variables=None):
"""Query Twitch internal GQL API."""
gql_url = "https://gql.twitch.tv/gql"
gql_headers = {
"Client-Id": "kimne78kx3ncx6brgo4mv6wki5h1ko", # Public client ID
"Content-Type": "application/json",
}
payload = {"query": query}
if variables:
payload["variables"] = variables
response = requests.post(gql_url, headers=gql_headers, json=payload, timeout=15)
return response.json()
# Example: Get channel info
query = """
query {
user(login: "pokimane") {
displayName
description
followers {
totalCount
}
stream {
title
viewersCount
game {
name
}
}
}
}
"""
result = query_twitch_gql(query)
user = result.get("data", {}).get("user", {})
print(f"{user.get('displayName')} - {user.get('followers', {}).get('totalCount', 0)} followers")
if user.get("stream"):
print(f"LIVE: {user['stream']['title']} ({user['stream']['viewersCount']} viewers)")
Scraping Viewer Statistics Over Time
To track viewer trends, you can poll streams at intervals and build a time series:
import datetime
def track_viewers(channels, interval_seconds=300, duration_hours=1):
"""Track viewer counts for channels over time."""
data_points = []
end_time = time.time() + (duration_hours * 3600)
while time.time() < end_time:
timestamp = datetime.datetime.now().isoformat()
for channel in channels:
info = scrape_channel(channel)
data_points.append({
"timestamp": timestamp,
"channel": channel,
"is_live": info.get("is_live", False),
"title": info.get("title", ""),
})
print(f"[{timestamp}] Polled {len(channels)} channels")
time.sleep(interval_seconds)
return data_points
# Usage (short demo)
# data = track_viewers(["shroud", "pokimane"], interval_seconds=60, duration_hours=0.1)
Handling Anti-Scraping Measures
Twitch has aggressive bot detection. Here is what to expect:
- JavaScript rendering — Many pages require a browser. Consider using Playwright for dynamic content.
- Rate limiting — Twitch will throttle or block rapid requests.
- Fingerprinting — Twitch uses browser fingerprinting to detect automation.
For reliable scraping at scale, use a proxy service:
- ScraperAPI — handles JavaScript rendering and CAPTCHA solving automatically
- ThorData — residential proxies with high success rates on streaming platforms
def scrape_with_proxy(url, api_key):
"""Use ScraperAPI to handle JS rendering and anti-bot."""
proxy_url = f"http://scraperapi:{api_key}@proxy-server.scraperapi.com:8001"
proxies = {"http": proxy_url, "https": proxy_url}
return requests.get(url, proxies=proxies, timeout=60)
The Easy Route: Use a Pre-Built Scraper
Building and maintaining a Twitch scraper is time-consuming, especially with anti-bot measures. The Twitch Scraper on Apify handles all the complexity for you:
- Scrape channels, streams, clips, and categories
- Automatic proxy rotation and anti-bot handling
- Structured JSON output
- Scheduled runs for ongoing monitoring
- No code maintenance required
It is the fastest way to get Twitch data into your pipeline without managing infrastructure.
Storing Scraped Data
Save your Twitch data in a structured format for analysis:
import csv
def save_streams_csv(streams, filename="twitch_streams.csv"):
"""Save scraped stream data to CSV."""
if not streams:
return
keys = streams[0].keys()
with open(filename, "w", newline="", encoding="utf-8") as f:
writer = csv.DictWriter(f, fieldnames=keys)
writer.writeheader()
writer.writerows(streams)
print(f"Saved {len(streams)} streams to {filename}")
def save_to_json(data, filename="twitch_data.json"):
"""Save data as JSON for flexible querying."""
with open(filename, "w", encoding="utf-8") as f:
json.dump(data, f, indent=2, ensure_ascii=False)
print(f"Saved data to {filename}")
Legal and Ethical Considerations
- Twitch ToS restricts automated access — scrape responsibly
- Do not scrape private user data (whispers, subscriptions, payment info)
- Respect rate limits and add delays between requests
- Use data for analytics and research, not impersonation
- Consider Twitch official API for authorized access to public data
Conclusion
Twitch offers rich streaming data for analytics, content aggregation, and market research. You can scrape channels, live streams, clips, and viewer data using Python with BeautifulSoup or Twitch GQL API.
For production use without the maintenance headache, the Twitch Scraper on Apify gives you structured data with built-in anti-bot handling.
Pair your scraping setup with ScraperAPI or ThorData for reliable proxy rotation at scale.
Happy streaming data collection!
Top comments (0)