DEV Community

agenthustler
agenthustler

Posted on

How to Scrape Twitch in 2026: Streams, Channels, Clips, and Viewer Data

Twitch is the largest live streaming platform in the world, and its data is a goldmine for analytics. Whether you are tracking streamer growth, analyzing viewer trends, or building a clips aggregator — scraping Twitch gives you access to real-time and historical streaming data.

In this guide, I will walk you through how to scrape Twitch in 2026 for streams, channels, clips, and viewer data using Python.

Why Scrape Twitch?

Twitch has over 140 million monthly active users and thousands of live streams at any given moment. Common use cases:

  • Streamer analytics — track follower growth, average viewers, and stream schedules
  • Content aggregation — collect top clips, VODs, and highlights automatically
  • Market research — analyze which games and categories are trending
  • Esports data — monitor tournament streams and viewer peaks
  • Brand monitoring — track mentions and sponsorship visibility

Setting Up Your Environment

import requests
from bs4 import BeautifulSoup
import json
import time
import random

HEADERS = {
    "User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/131.0.0.0 Safari/537.36",
    "Accept-Language": "en-US,en;q=0.9",
    "Accept": "text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8",
}
Enter fullscreen mode Exit fullscreen mode

Install the dependencies:

pip install requests beautifulsoup4
Enter fullscreen mode Exit fullscreen mode

Scraping Channel Information

Twitch channel pages contain follower counts, stream status, bio, and category data.

def scrape_channel(channel_name):
    """Scrape a Twitch channel public profile data."""
    url = f"https://www.twitch.tv/{channel_name}"
    response = requests.get(url, headers=HEADERS)
    soup = BeautifulSoup(response.text, "html.parser")

    # Twitch embeds channel data in script tags
    scripts = soup.find_all("script", {"type": "application/ld+json"})
    for script in scripts:
        try:
            data = json.loads(script.string)
            if data.get("@type") == "VideoObject":
                return {
                    "channel": channel_name,
                    "title": data.get("name"),
                    "description": data.get("description", "")[:200],
                    "thumbnail": data.get("thumbnailUrl"),
                    "is_live": True,
                }
        except (json.JSONDecodeError, TypeError):
            continue

    return {"channel": channel_name, "is_live": False}

channel = scrape_channel("shroud")
print(json.dumps(channel, indent=2))
Enter fullscreen mode Exit fullscreen mode

Scraping Live Streams by Category

Want to monitor which streams are live for a specific game? Twitch directory pages list them:

def scrape_directory(game_slug, max_pages=2):
    """Scrape live streams from a Twitch game directory."""
    streams = []
    url = f"https://www.twitch.tv/directory/category/{game_slug}"
    response = requests.get(url, headers=HEADERS)
    soup = BeautifulSoup(response.text, "html.parser")

    # Extract stream cards
    for card in soup.select("article[data-a-target='card']"):
        title_el = card.select_one("h3")
        channel_el = card.select_one("a[data-a-target='preview-card-channel-link']")
        viewers_el = card.select_one("[data-a-target='preview-card-viewer-count']")

        if title_el and channel_el:
            streams.append({
                "title": title_el.get_text(strip=True),
                "channel": channel_el.get_text(strip=True),
                "viewers": viewers_el.get_text(strip=True) if viewers_el else "0",
                "game": game_slug,
            })

    return streams

streams = scrape_directory("league-of-legends")
for s in streams[:5]:
    print(f"{s['channel']}: {s['title']} ({s['viewers']} viewers)")
Enter fullscreen mode Exit fullscreen mode

Scraping Clips

Twitch clips are short highlights that go viral. Here is how to scrape top clips for a channel:

def scrape_clips(channel_name, period="7d"):
    """Scrape top clips from a Twitch channel."""
    url = f"https://www.twitch.tv/{channel_name}/clips?filter=clips&range={period}"
    response = requests.get(url, headers=HEADERS)
    soup = BeautifulSoup(response.text, "html.parser")

    clips = []
    for clip_card in soup.select("[data-a-target='clips-card']"):
        title_el = clip_card.select_one("h3")
        views_el = clip_card.select_one("[data-a-target='clip-views']")
        link_el = clip_card.select_one("a[href*='/clip/']")

        if title_el:
            clips.append({
                "title": title_el.get_text(strip=True),
                "views": views_el.get_text(strip=True) if views_el else "N/A",
                "url": "https://www.twitch.tv" + link_el["href"] if link_el else None,
                "channel": channel_name,
            })

    return clips

clips = scrape_clips("xqc")
for c in clips[:5]:
    print(f"{c['title']} - {c['views']} views")
Enter fullscreen mode Exit fullscreen mode

Using Twitch GQL API

Twitch frontend communicates via a GraphQL API. While unofficial, it is a powerful way to get structured data:

def query_twitch_gql(query, variables=None):
    """Query Twitch internal GQL API."""
    gql_url = "https://gql.twitch.tv/gql"
    gql_headers = {
        "Client-Id": "kimne78kx3ncx6brgo4mv6wki5h1ko",  # Public client ID
        "Content-Type": "application/json",
    }

    payload = {"query": query}
    if variables:
        payload["variables"] = variables

    response = requests.post(gql_url, headers=gql_headers, json=payload, timeout=15)
    return response.json()

# Example: Get channel info
query = """
query {
  user(login: "pokimane") {
    displayName
    description
    followers {
      totalCount
    }
    stream {
      title
      viewersCount
      game {
        name
      }
    }
  }
}
"""

result = query_twitch_gql(query)
user = result.get("data", {}).get("user", {})
print(f"{user.get('displayName')} - {user.get('followers', {}).get('totalCount', 0)} followers")
if user.get("stream"):
    print(f"LIVE: {user['stream']['title']} ({user['stream']['viewersCount']} viewers)")
Enter fullscreen mode Exit fullscreen mode

Scraping Viewer Statistics Over Time

To track viewer trends, you can poll streams at intervals and build a time series:

import datetime

def track_viewers(channels, interval_seconds=300, duration_hours=1):
    """Track viewer counts for channels over time."""
    data_points = []
    end_time = time.time() + (duration_hours * 3600)

    while time.time() < end_time:
        timestamp = datetime.datetime.now().isoformat()
        for channel in channels:
            info = scrape_channel(channel)
            data_points.append({
                "timestamp": timestamp,
                "channel": channel,
                "is_live": info.get("is_live", False),
                "title": info.get("title", ""),
            })
        print(f"[{timestamp}] Polled {len(channels)} channels")
        time.sleep(interval_seconds)

    return data_points

# Usage (short demo)
# data = track_viewers(["shroud", "pokimane"], interval_seconds=60, duration_hours=0.1)
Enter fullscreen mode Exit fullscreen mode

Handling Anti-Scraping Measures

Twitch has aggressive bot detection. Here is what to expect:

  1. JavaScript rendering — Many pages require a browser. Consider using Playwright for dynamic content.
  2. Rate limiting — Twitch will throttle or block rapid requests.
  3. Fingerprinting — Twitch uses browser fingerprinting to detect automation.

For reliable scraping at scale, use a proxy service:

  • ScraperAPI — handles JavaScript rendering and CAPTCHA solving automatically
  • ThorData — residential proxies with high success rates on streaming platforms
def scrape_with_proxy(url, api_key):
    """Use ScraperAPI to handle JS rendering and anti-bot."""
    proxy_url = f"http://scraperapi:{api_key}@proxy-server.scraperapi.com:8001"
    proxies = {"http": proxy_url, "https": proxy_url}
    return requests.get(url, proxies=proxies, timeout=60)
Enter fullscreen mode Exit fullscreen mode

The Easy Route: Use a Pre-Built Scraper

Building and maintaining a Twitch scraper is time-consuming, especially with anti-bot measures. The Twitch Scraper on Apify handles all the complexity for you:

  • Scrape channels, streams, clips, and categories
  • Automatic proxy rotation and anti-bot handling
  • Structured JSON output
  • Scheduled runs for ongoing monitoring
  • No code maintenance required

It is the fastest way to get Twitch data into your pipeline without managing infrastructure.

Storing Scraped Data

Save your Twitch data in a structured format for analysis:

import csv

def save_streams_csv(streams, filename="twitch_streams.csv"):
    """Save scraped stream data to CSV."""
    if not streams:
        return
    keys = streams[0].keys()
    with open(filename, "w", newline="", encoding="utf-8") as f:
        writer = csv.DictWriter(f, fieldnames=keys)
        writer.writeheader()
        writer.writerows(streams)
    print(f"Saved {len(streams)} streams to {filename}")

def save_to_json(data, filename="twitch_data.json"):
    """Save data as JSON for flexible querying."""
    with open(filename, "w", encoding="utf-8") as f:
        json.dump(data, f, indent=2, ensure_ascii=False)
    print(f"Saved data to {filename}")
Enter fullscreen mode Exit fullscreen mode

Legal and Ethical Considerations

  • Twitch ToS restricts automated access — scrape responsibly
  • Do not scrape private user data (whispers, subscriptions, payment info)
  • Respect rate limits and add delays between requests
  • Use data for analytics and research, not impersonation
  • Consider Twitch official API for authorized access to public data

Conclusion

Twitch offers rich streaming data for analytics, content aggregation, and market research. You can scrape channels, live streams, clips, and viewer data using Python with BeautifulSoup or Twitch GQL API.

For production use without the maintenance headache, the Twitch Scraper on Apify gives you structured data with built-in anti-bot handling.

Pair your scraping setup with ScraperAPI or ThorData for reliable proxy rotation at scale.

Happy streaming data collection!

Top comments (0)