DEV Community

agenthustler
agenthustler

Posted on

Scraping Twitch in 2026: Live Streams, Top Games & Channel Data Without Login

The Twitch Public API Trick

Here's something most tutorials won't tell you: you don't need to register a Twitch Developer application to access their API. Twitch's web client uses a public Client-ID that's embedded right in their frontend JavaScript. With just this one header, you can query the Helix API for live streams, top games, and channel information.

No OAuth tokens. No application registration. No callback URLs. Just a single HTTP header.

Setting Up

We'll use Python with httpx (a modern, async-capable HTTP client). Install it:

pip install httpx
Enter fullscreen mode Exit fullscreen mode

The key ingredient is the Client-ID header that Twitch's own web app uses:

import httpx

HEADERS = {
    'Client-ID': 'kimne78kx3ncx6brgo4mv6wki5h1ko',
    'Accept': 'application/json'
}

BASE_URL = 'https://api.twitch.tv/helix'
Enter fullscreen mode Exit fullscreen mode

This Client-ID is publicly known — it's the same one embedded in Twitch's frontend. It gives read-only access to public data, which is exactly what we need for scraping.

Use Case 1: Streaming Analytics — Who's Live Right Now?

Let's fetch all live streams for a specific game. This is the bread and butter of streaming analytics — understanding what's happening on Twitch in real-time.

import httpx

HEADERS = {
    'Client-ID': 'kimne78kx3ncx6brgo4mv6wki5h1ko',
}

def get_live_streams(game_name: str, max_results: int = 100):
    """Fetch live streams for a given game."""
    # First, get the game ID
    with httpx.Client() as client:
        resp = client.get(
            'https://api.twitch.tv/helix/games',
            headers=HEADERS,
            params={'name': game_name}
        )
        games = resp.json().get('data', [])
        if not games:
            print(f'Game "{game_name}" not found')
            return []

        game_id = games[0]['id']

        # Now fetch streams for this game
        streams = []
        cursor = None

        while len(streams) < max_results:
            params = {
                'game_id': game_id,
                'first': min(100, max_results - len(streams))
            }
            if cursor:
                params['after'] = cursor

            resp = client.get(
                'https://api.twitch.tv/helix/streams',
                headers=HEADERS,
                params=params
            )
            data = resp.json()
            batch = data.get('data', [])
            if not batch:
                break

            streams.extend(batch)
            cursor = data.get('pagination', {}).get('cursor')
            if not cursor:
                break

        return streams

# Example usage
streams = get_live_streams('Fortnite', max_results=50)
for s in streams[:5]:
    print(f"{s['user_name']:20s} | {s['viewer_count']:>6d} viewers | {s['title'][:50]}")
Enter fullscreen mode Exit fullscreen mode

Output looks like:

ninja                |  15234 viewers | Late night grinding with the squad
shroud               |   8421 viewers | Ranked grind - Road to Unreal
pokimane             |   6102 viewers | Chill Fortnite with chat
Enter fullscreen mode Exit fullscreen mode

Use Case 2: Game Trend Tracking — What's Hot Right Now?

Track which games are dominating Twitch. This is invaluable for game developers, marketers, and investors trying to understand the gaming landscape.

def get_top_games(limit: int = 20):
    """Get top games by current viewer count."""
    with httpx.Client() as client:
        resp = client.get(
            'https://api.twitch.tv/helix/games/top',
            headers=HEADERS,
            params={'first': limit}
        )
        return resp.json().get('data', [])

games = get_top_games(20)
for i, game in enumerate(games, 1):
    print(f"{i:2d}. {game['name']}")
Enter fullscreen mode Exit fullscreen mode

Want to track trends over time? Run this on a schedule and store results:

import json
from datetime import datetime

def snapshot_top_games():
    games = get_top_games(50)
    snapshot = {
        'timestamp': datetime.utcnow().isoformat(),
        'games': games
    }
    filename = f"twitch_snapshot_{datetime.utcnow().strftime('%Y%m%d_%H%M')}.json"
    with open(filename, 'w') as f:
        json.dump(snapshot, f, indent=2)
    return filename
Enter fullscreen mode Exit fullscreen mode

Run this every hour with cron or Apify's scheduling, and you'll have a complete picture of how game popularity shifts throughout the day.

Use Case 3: Esports Research — Channel Deep Dives

For esports research, you often need detailed channel information — when they started, how many followers, what they stream.

def get_channel_info(usernames: list[str]):
    """Get detailed channel information for multiple users."""
    with httpx.Client() as client:
        # Twitch allows up to 100 users per request
        params = [('login', name) for name in usernames]
        resp = client.get(
            'https://api.twitch.tv/helix/users',
            headers=HEADERS,
            params=params
        )
        return resp.json().get('data', [])

channels = get_channel_info(['shroud', 'pokimane', 'xqc'])
for ch in channels:
    print(f"{ch['display_name']}: {ch['description'][:60]}...")
    print(f"  Created: {ch['created_at']}")
    print(f"  Type: {ch['broadcaster_type']}")
    print()
Enter fullscreen mode Exit fullscreen mode

Handling Rate Limits

The public Client-ID has rate limits. If you're doing heavy scraping, add basic retry logic:

import time

def safe_request(client, url, params):
    for attempt in range(3):
        resp = client.get(url, headers=HEADERS, params=params)
        if resp.status_code == 429:
            wait = int(resp.headers.get('Ratelimit-Reset', 5))
            time.sleep(wait)
            continue
        return resp
    return resp  # Return last response even if rate-limited
Enter fullscreen mode Exit fullscreen mode

The Easy Way: Use Our Apify Actor

If you don't want to manage the code yourself, we've packaged all of this into an Apify actor: Twitch Scraper.

It handles:

  • Authentication with the public Client-ID
  • Automatic pagination through large result sets
  • Rate limit handling and retries
  • Clean JSON/CSV/Excel output
  • Scheduled runs via Apify's cron system

Just configure the mode (streams, top-games, or channel details), set your parameters, and hit run. The data lands in Apify's dataset storage, ready to download or pipe into your workflow via API.

{
  "mode": "streams",
  "game": "League of Legends",
  "maxResults": 500
}
Enter fullscreen mode Exit fullscreen mode

What Can You Build With This?

  • Stream alert bots — notify a Discord channel when a specific game hits a viewer threshold
  • Talent scouting tools — find rising streamers in specific game categories before they blow up
  • Market research dashboards — track gaming trends for investment or content strategy
  • Academic datasets — study streaming culture, language distribution, or community dynamics

Wrapping Up

Twitch's public Client-ID makes it remarkably easy to scrape streaming data without any registration or authentication hassle. The three patterns above — live streams, top games, and channel details — cover most analytics and research use cases.

For production workloads, consider using the Twitch Scraper on Apify to avoid managing infrastructure and rate limits yourself.

What are you building with Twitch data? Drop a comment below — I'd love to hear about your use cases.

Top comments (0)