agenthustler

Posted on Mar 26

How to Scrape Twitch in 2026: Streams, Channels, Clips, and Viewer Data

#webdev #python #webscraping #tutorial

Twitch is the largest live streaming platform in the world, and its data is a goldmine for analytics. Whether you are tracking streamer growth, analyzing viewer trends, or building a clips aggregator — scraping Twitch gives you access to real-time and historical streaming data.

In this guide, I will walk you through how to scrape Twitch in 2026 for streams, channels, clips, and viewer data using Python.

Why Scrape Twitch?

Twitch has over 140 million monthly active users and thousands of live streams at any given moment. Common use cases:

Streamer analytics — track follower growth, average viewers, and stream schedules
Content aggregation — collect top clips, VODs, and highlights automatically
Market research — analyze which games and categories are trending
Esports data — monitor tournament streams and viewer peaks
Brand monitoring — track mentions and sponsorship visibility

Setting Up Your Environment

import requests
from bs4 import BeautifulSoup
import json
import time
import random

HEADERS = {
    "User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/131.0.0.0 Safari/537.36",
    "Accept-Language": "en-US,en;q=0.9",
    "Accept": "text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8",
}

Install the dependencies:

pip install requests beautifulsoup4

Scraping Channel Information

Twitch channel pages contain follower counts, stream status, bio, and category data.

def scrape_channel(channel_name):
    """Scrape a Twitch channel public profile data."""
    url = f"https://www.twitch.tv/{channel_name}"
    response = requests.get(url, headers=HEADERS)
    soup = BeautifulSoup(response.text, "html.parser")

    # Twitch embeds channel data in script tags
    scripts = soup.find_all("script", {"type": "application/ld+json"})
    for script in scripts:
        try:
            data = json.loads(script.string)
            if data.get("@type") == "VideoObject":
                return {
                    "channel": channel_name,
                    "title": data.get("name"),
                    "description": data.get("description", "")[:200],
                    "thumbnail": data.get("thumbnailUrl"),
                    "is_live": True,
                }
        except (json.JSONDecodeError, TypeError):
            continue

    return {"channel": channel_name, "is_live": False}

channel = scrape_channel("shroud")
print(json.dumps(channel, indent=2))

Scraping Live Streams by Category

Want to monitor which streams are live for a specific game? Twitch directory pages list them:

def scrape_directory(game_slug, max_pages=2):
    """Scrape live streams from a Twitch game directory."""
    streams = []
    url = f"https://www.twitch.tv/directory/category/{game_slug}"
    response = requests.get(url, headers=HEADERS)
    soup = BeautifulSoup(response.text, "html.parser")

    # Extract stream cards
    for card in soup.select("article[data-a-target='card']"):
        title_el = card.select_one("h3")
        channel_el = card.select_one("a[data-a-target='preview-card-channel-link']")
        viewers_el = card.select_one("[data-a-target='preview-card-viewer-count']")

        if title_el and channel_el:
            streams.append({
                "title": title_el.get_text(strip=True),
                "channel": channel_el.get_text(strip=True),
                "viewers": viewers_el.get_text(strip=True) if viewers_el else "0",
                "game": game_slug,
            })

    return streams

streams = scrape_directory("league-of-legends")
for s in streams[:5]:
    print(f"{s['channel']}: {s['title']} ({s['viewers']} viewers)")

Scraping Clips

Twitch clips are short highlights that go viral. Here is how to scrape top clips for a channel:

def scrape_clips(channel_name, period="7d"):
    """Scrape top clips from a Twitch channel."""
    url = f"https://www.twitch.tv/{channel_name}/clips?filter=clips&range={period}"
    response = requests.get(url, headers=HEADERS)
    soup = BeautifulSoup(response.text, "html.parser")

    clips = []
    for clip_card in soup.select("[data-a-target='clips-card']"):
        title_el = clip_card.select_one("h3")
        views_el = clip_card.select_one("[data-a-target='clip-views']")
        link_el = clip_card.select_one("a[href*='/clip/']")

        if title_el:
            clips.append({
                "title": title_el.get_text(strip=True),
                "views": views_el.get_text(strip=True) if views_el else "N/A",
                "url": "https://www.twitch.tv" + link_el["href"] if link_el else None,
                "channel": channel_name,
            })

    return clips

clips = scrape_clips("xqc")
for c in clips[:5]:
    print(f"{c['title']} - {c['views']} views")

Using Twitch GQL API

Twitch frontend communicates via a GraphQL API. While unofficial, it is a powerful way to get structured data:

def query_twitch_gql(query, variables=None):
    """Query Twitch internal GQL API."""
    gql_url = "https://gql.twitch.tv/gql"
    gql_headers = {
        "Client-Id": "kimne78kx3ncx6brgo4mv6wki5h1ko",  # Public client ID
        "Content-Type": "application/json",
    }

    payload = {"query": query}
    if variables:
        payload["variables"] = variables

    response = requests.post(gql_url, headers=gql_headers, json=payload, timeout=15)
    return response.json()

# Example: Get channel info
query = """
query {
  user(login: "pokimane") {
    displayName
    description
    followers {
      totalCount
    }
    stream {
      title
      viewersCount
      game {
        name
      }
    }
  }
}
"""

result = query_twitch_gql(query)
user = result.get("data", {}).get("user", {})
print(f"{user.get('displayName')} - {user.get('followers', {}).get('totalCount', 0)} followers")
if user.get("stream"):
    print(f"LIVE: {user['stream']['title']} ({user['stream']['viewersCount']} viewers)")

Scraping Viewer Statistics Over Time

To track viewer trends, you can poll streams at intervals and build a time series:

import datetime

def track_viewers(channels, interval_seconds=300, duration_hours=1):
    """Track viewer counts for channels over time."""
    data_points = []
    end_time = time.time() + (duration_hours * 3600)

    while time.time() < end_time:
        timestamp = datetime.datetime.now().isoformat()
        for channel in channels:
            info = scrape_channel(channel)
            data_points.append({
                "timestamp": timestamp,
                "channel": channel,
                "is_live": info.get("is_live", False),
                "title": info.get("title", ""),
            })
        print(f"[{timestamp}] Polled {len(channels)} channels")
        time.sleep(interval_seconds)

    return data_points

# Usage (short demo)
# data = track_viewers(["shroud", "pokimane"], interval_seconds=60, duration_hours=0.1)

Handling Anti-Scraping Measures

Twitch has aggressive bot detection. Here is what to expect:

JavaScript rendering — Many pages require a browser. Consider using Playwright for dynamic content.
Rate limiting — Twitch will throttle or block rapid requests.
Fingerprinting — Twitch uses browser fingerprinting to detect automation.

For reliable scraping at scale, use a proxy service:

ScraperAPI — handles JavaScript rendering and CAPTCHA solving automatically
ThorData — residential proxies with high success rates on streaming platforms

def scrape_with_proxy(url, api_key):
    """Use ScraperAPI to handle JS rendering and anti-bot."""
    proxy_url = f"http://scraperapi:{api_key}@proxy-server.scraperapi.com:8001"
    proxies = {"http": proxy_url, "https": proxy_url}
    return requests.get(url, proxies=proxies, timeout=60)

The Easy Route: Use a Pre-Built Scraper

Building and maintaining a Twitch scraper is time-consuming, especially with anti-bot measures. The Twitch Scraper on Apify handles all the complexity for you:

Scrape channels, streams, clips, and categories
Automatic proxy rotation and anti-bot handling
Structured JSON output
Scheduled runs for ongoing monitoring
No code maintenance required

It is the fastest way to get Twitch data into your pipeline without managing infrastructure.

Storing Scraped Data

Save your Twitch data in a structured format for analysis:

import csv

def save_streams_csv(streams, filename="twitch_streams.csv"):
    """Save scraped stream data to CSV."""
    if not streams:
        return
    keys = streams[0].keys()
    with open(filename, "w", newline="", encoding="utf-8") as f:
        writer = csv.DictWriter(f, fieldnames=keys)
        writer.writeheader()
        writer.writerows(streams)
    print(f"Saved {len(streams)} streams to {filename}")

def save_to_json(data, filename="twitch_data.json"):
    """Save data as JSON for flexible querying."""
    with open(filename, "w", encoding="utf-8") as f:
        json.dump(data, f, indent=2, ensure_ascii=False)
    print(f"Saved data to {filename}")

Legal and Ethical Considerations

Twitch ToS restricts automated access — scrape responsibly
Do not scrape private user data (whispers, subscriptions, payment info)
Respect rate limits and add delays between requests
Use data for analytics and research, not impersonation
Consider Twitch official API for authorized access to public data

Conclusion

Twitch offers rich streaming data for analytics, content aggregation, and market research. You can scrape channels, live streams, clips, and viewer data using Python with BeautifulSoup or Twitch GQL API.

For production use without the maintenance headache, the Twitch Scraper on Apify gives you structured data with built-in anti-bot handling.

Pair your scraping setup with ScraperAPI or ThorData for reliable proxy rotation at scale.

Happy streaming data collection!

DEV Community