Recently, I wanted to work with the Twitch API to try to recreate the website twitchroulette.net, where you would be able to view a completely random live stream from all of the streams currently happening on the site. According to analytics from twitchtracker.com, there are currently an average of over 100,000 Twitch live streams at any given time.
When I went through the Twitch API documentation, I discovered that for the endpoint https://api.twitch.tv/helix/streams
to get live streams, Twitch limits the response to a maximum of 100 streams per API call. However, the response includes a pagination
field which contains a cursor
value (a string) which is used in a subsequent requests to specify the starting point of the next set of results.
The response body for the GET request at https://api.twitch.tv/helix/streams?first=100
would include the top 100 most active live streams, and the data looks like this:
{
"data": [
{
"id": "41375541868",
"user_id": "459331509",
"user_login": "auronplay",
"user_name": "auronplay",
"game_id": "494131",
"game_name": "Little Nightmares",
"type": "live",
"title": "hablamos y le damos a Little Nightmares 1",
"viewer_count": 78365,
"started_at": "2021-03-10T15:04:21Z",
"language": "es",
"thumbnail_url": "https://static-cdn.jtvnw.net/previews-ttv/live_user_auronplay-{width}x{height}.jpg",
"tag_ids": [
"d4bb9c58-2141-4881-bcdc-3fe0505457d1"
]
},
...
],
"pagination": {
"cursor": "eyJiIjp7IkN1cnNvciI6ImV5SnpJam8zT0RNMk5TNDBORFF4TlRjMU1UY3hOU3dpWkNJNlptRnNjMlVzSW5RaU9uUnlkV1Y5In0sImEiOnsiQ3Vyc29yIjoiZXlKeklqb3hOVGs0TkM0MU56RXhNekExTVRZNU1ESXNJbVFpT21aaGJITmxMQ0owSWpwMGNuVmxmUT09In19"
}
}
If you wanted to retrieve the next 100 most active live streams, the subsequent API request URL would need to be:
https://api.twitch.tv/helix/streams?first=100&after=eyJiIjp7IkN1cnNvciI6ImV5SnpJam8zT0RNMk5TNDBORFF4TlRjMU1UY3hOU3dpWkNJNlptRnNjMlVzSW5RaU9uUnlkV1Y5In0sImEiOnsiQ3Vyc29yIjoiZXlKeklqb3hOVGs0TkM0MU56RXhNekExTVRZNU1ESXNJbVFpT21aaGJITmxMQ0owSWpwMGNuVmxmUT09In19
This includes as its after
value the cursor value returned in the prior response.
It's not possible to sort the responses by least active, so in order to get results with streams with very few or no viewers, you would need results for the more active streams first.
It is also important to note that the Twitch API is rate-limited to 800 requests per minute, so the maximum number of livestreams we could retrieve in that time is 80,000, which is substantially lower than the current weekly average. It's therefore plausible that trying to get a truly complete list of results for live streams would run the risk of causing a HTTP 429 error (too many requests).
In order to try to retrieve as many live streams as possible, while keeping in mind the constraints of the rate-limit and a potentially impatient user, I approached this problem using recursion:
function getAllStreams (cursor, data = [], counter = 15) {
while (counter !== 0) {
const request = new Request('https://api.twitch.tv/helix/streams?first=100' + (cursor ? '&after=' + cursor : ''), {
method: 'GET' ,
headers: {
'Client-ID': clientId,
'Authorization': `Bearer ${access_token}`,
'Content-Type' : 'application/x-www-form-urlencoded; charset=UTF-8'
}
});
return fetch(request).then((response) => response.json()).then((responseJson) => {
if (counter === 1) return data;
data.push(...responseJson.data);
return getAllStreams(responseJson.pagination.cursor, data, --counter);
});
}
}
I found that each request took about half a second to complete, so that meant I also needed to limit the number of requests made in order to keep the user engaged, and I specify that limit as a default argument counter
. While 1500 streams might not seem like a big number, it does make it possible to recreate the experience of viewing a single random stream.
I would appreciate any suggestions or critiques of my approach, as this was the first time I've worked with and tried to 'crawl' a paginated API. I just wanted to share the way I went about using this endpoint in order to try to help other developers who attempt to do the same.
Thanks for reading!
Top comments (0)
Some comments may only be visible to logged-in visitors. Sign in to view all comments.