GitHub API Rate Limits: The Numbers That Block Your Project
GitHub’s REST API is one of the most generous public APIs out there — until it isn’t. At 5,000 requests per hour (authenticated) or a mere 60 requests per hour (unauthenticated), developers routinely hit walls when building anything beyond basic integrations.
If you’re doing repository analysis, tracking open-source trends, monitoring competitor activity, or aggregating data across thousands of repos — you’ll burn through that quota in minutes.
Let’s look at when the API is sufficient, when it’s not, and when web scraping becomes the pragmatic alternative.
GitHub API Rate Limits Explained (2026)
| Tier | Rate Limit | Auth Required | Best For |
|---|---|---|---|
| Unauthenticated | 60 req/hr | No | Quick lookups |
| Personal Access Token | 5,000 req/hr | Yes | Standard dev work |
| GitHub App | 5,000 req/hr + 50/repo | Yes | Org integrations |
| Enterprise | 15,000 req/hr | Yes | Large-scale use |
Sounds generous until you do the math:
# How fast can you exhaust 5,000 requests?
# Scenario: Analyze top 1,000 Python repos
requests_per_repo = 5 # repo info + contributors + languages + commits + issues
total_requests = 1000 * 5 # = 5,000
# Result: One scan = entire hourly quota
# Scenario: Monitor 200 repos for new releases
checks_per_hour = 200 * 1 # = 200 per cycle
cycles_per_hour = 5000 / 200 # = 25 cycles/hr (one every 2.4 min)
# Seems OK, but add commit history and you’re cooked
What the API Gives You (and What It Doesn’t)
GitHub’s API is excellent for structured data:
- Repository metadata, stars, forks
- Issues and pull requests
- Commit history (paginated)
- User profiles and contributions
- Release and tag information
But several things are not available or practical through the API:
- Trending repositories — no API endpoint for GitHub Trending
- Search ranking factors — can’t see why repos rank where they do
- Contribution graphs at scale — rate-limited per-user fetch
- Topic/tag aggregations — limited search API (30 req/min)
- Bulk profile data — fetching 10K developer profiles = 2+ hours
Real-World Rate Limit Pain Points
import requests
import time
token = "ghp_your_token_here"
headers = {"Authorization": f"token {token}"}
def check_rate_limit():
r = requests.get("https://api.github.com/rate_limit", headers=headers)
data = r.json()
remaining = data["resources"]["core"]["remaining"]
reset_time = data["resources"]["core"]["reset"]
return remaining, reset_time
remaining, reset = check_rate_limit()
print(f"Remaining: {remaining}/5000")
print(f"Reset in: {reset - time.time():.0f} seconds")
# The dreaded 403
# {
# "message": "API rate limit exceeded for user ID 12345.",
# "documentation_url": "https://docs.github.com/rest/overview/rate-limits-for-the-rest-api"
# }
When you hit that 403, your options are:
- Wait — up to 60 minutes for reset
- Use GraphQL — separate 5,000-point budget, but complex queries cost more points
- Multiple tokens — technically against ToS
- Web scraping — for data the API limits or doesn’t expose
When Web Scraping Makes More Sense
Web scraping GitHub works best for:
1. Trending Repositories
GitHub’s trending page has no API. Period.
from bs4 import BeautifulSoup
import requests
def get_trending(language="python", since="daily"):
url = f"https://github.com/trending/{language}?since={since}"
resp = requests.get(url)
soup = BeautifulSoup(resp.text, "html.parser")
repos = []
for article in soup.select("article.Box-row"):
name = article.select_one("h2 a").text.strip()
description = article.select_one("p")
stars = article.select_one(".Link--muted.d-inline-block.mr-3")
repos.append({
"name": name,
"description": description.text.strip() if description else "",
"stars_today": stars.text.strip() if stars else "0"
})
return repos
trending = get_trending("python", "weekly")
for repo in trending[:5]:
print(f"{repo['name']} — {repo['stars_today']}")
2. Bulk Data Collection Without Rate Limits
Scraping doesn’t have a 5,000/hour cap — you’re limited only by request pacing and proxy infrastructure.
3. Data the API Doesn’t Expose
- Repository traffic insights (normally owner-only)
- Dependency graphs in full
- Community health metrics across many repos
Scaling GitHub Scraping
For anything beyond basic scraping, you need to handle:
- GitHub’s bot detection
- JavaScript-rendered content (some pages use React)
- Session management
- Respectful rate limiting (don’t hammer their servers)
Managed scraping tools handle this. This GitHub scraper on Apify manages proxy rotation and rendering for bulk data extraction:
from apify_client import ApifyClient
client = ApifyClient("YOUR_API_TOKEN")
run = client.actor("cryptosignals/github-scraper").call(
run_input={
"searchQuery": "machine learning",
"language": "python",
"maxRepos": 500,
"includeReadme": True
}
)
for repo in client.dataset(run["defaultDatasetId"]).iterate_items():
print(f"{repo['fullName']} | {repo['stars']} stars")
API vs Scraping: Decision Matrix
| Use Case | Best Approach | Why |
|---|---|---|
| Single repo data | API | Fast, structured, within limits |
| CI/CD integration | API | Real-time webhooks available |
| Trending repos | Scraping | No API endpoint exists |
| 1000+ repo analysis | Scraping | API quota exhausted in minutes |
| User profile aggregation | Scraping | Bulk fetching is rate-limited |
| Commit monitoring (few repos) | API | Efficient with conditional requests |
| Cross-platform comparison | Scraping | Need to combine multiple sources |
Hybrid Approach: Best of Both
The smartest strategy combines both:
def get_repo_data(owner, repo, token):
# Use API for structured data within limits
api_data = fetch_from_api(owner, repo, token)
# Use scraping for data API doesn’t provide
if api_data.get("rate_limited"):
return fetch_from_scraper(owner, repo)
# Enrich with scraped data
api_data["trending_rank"] = get_trending_rank(owner, repo)
return api_data
The Bottom Line
GitHub’s API is excellent for standard integrations and moderate-scale use. But for data analysis, market research, trend tracking, and bulk operations, the rate limits become a genuine blocker.
Web scraping isn’t a replacement for the API — it’s a complement for the cases where 5,000 requests per hour simply isn’t enough, or where the data you need doesn’t have an API endpoint at all.
For production-grade GitHub data collection at scale, managed scraping solutions save weeks of infrastructure work.
Hit GitHub rate limits on a project? What workaround did you use? Share in the comments.
Top comments (0)