GitHub hosts over 400 million repositories and 100+ million developers. Whether you're building developer tools, analyzing open-source trends, or recruiting engineers, GitHub data is a goldmine. But the official API's rate limits can be a serious bottleneck.
GitHub API Rate Limits: The Problem
GitHub's REST API allows:
- 60 requests/hour for unauthenticated requests
- 5,000 requests/hour with a personal access token
That sounds generous until you need to scan thousands of repos or profile hundreds of developers. A single organization with 500 repos would consume 10% of your hourly budget just listing them.
Three Approaches to GitHub Data at Scale
1. Direct API with Smart Pagination
The most straightforward approach — use the API directly but be smart about it:
import requests
import time
TOKEN = "ghp_your_token"
headers = {"Authorization": f"token {TOKEN}"}
def search_repos(query, max_results=100):
repos = []
page = 1
while len(repos) < max_results:
resp = requests.get(
"https://api.github.com/search/repositories",
headers=headers,
params={"q": query, "per_page": 30, "page": page}
)
# Respect rate limits
remaining = int(resp.headers.get("X-RateLimit-Remaining", 0))
if remaining < 5:
reset = int(resp.headers["X-RateLimit-Reset"])
time.sleep(max(0, reset - time.time()) + 1)
data = resp.json()
repos.extend(data.get("items", []))
if len(data.get("items", [])) < 30:
break
page += 1
return repos[:max_results]
# Find popular Python AI repos
results = search_repos("language:python topic:ai stars:>100")
for repo in results:
print(f"{repo['full_name']}: ⭐ {repo['stargazers_count']}")
This works for small-scale needs but falls apart when you need data on thousands of entities.
2. Free API Endpoint (No Rate Limits)
I built a free API that proxies GitHub data without the rate limit headaches:
https://frog03-20494.wykr.es/api/v1/github
5 modes available:
| Mode | Endpoint | Description |
|---|---|---|
search-repos |
?mode=search-repos&q=fastapi |
Search repositories |
search-users |
?mode=search-users&q=python |
Search users |
user-profile |
?mode=user-profile&username=torvalds |
Full user profile |
repo-details |
?mode=repo-details&repo=facebook/react |
Repository details |
org-repos |
?mode=org-repos&org=microsoft |
Organization repos |
Example usage:
import requests
# Search for FastAPI-related repos
resp = requests.get(
"https://frog03-20494.wykr.es/api/v1/github",
params={"mode": "search-repos", "q": "fastapi", "limit": 20}
)
for repo in resp.json()["items"]:
print(f"{repo['full_name']}: ⭐ {repo['stars']}")
# Quick CLI usage
curl "https://frog03-20494.wykr.es/api/v1/github?mode=user-profile&username=torvalds"
No API key needed. No rate limits for reasonable usage.
3. Cloud Scraper for Large-Scale Collection
For serious data collection — thousands of repos, bulk user profiles, full org analysis — use our GitHub Scraper on Apify:
from apify_client import ApifyClient
client = ApifyClient("YOUR_APIFY_TOKEN")
# Scrape all repos from an organization
run = client.actor("cryptosignals/github-scraper").call(
run_input={
"mode": "org-repos",
"organization": "microsoft",
"includeReadme": True,
"maxItems": 500
}
)
for repo in client.dataset(run["defaultDatasetId"]).iterate_items():
print(f"{repo['name']}: {repo['language']} | ⭐ {repo['stars']}")
This runs in the cloud with automatic rate limit handling, pagination, and structured output.
Practical Use Cases
Developer Recruiting
Find active contributors in specific technologies:
# Find top Python developers in Berlin
resp = requests.get(
"https://frog03-20494.wykr.es/api/v1/github",
params={
"mode": "search-users",
"q": "location:Berlin language:python followers:>50"
}
)
for user in resp.json()["items"]:
print(f"{user['login']} - {user['bio']}")
Open Source Trend Analysis
Track which technologies are gaining traction by monitoring repo creation rates, star velocity, and fork patterns.
Competitive Intelligence
Monitor competitor engineering activity — what languages they're adopting, what projects they're open-sourcing, and who they're hiring.
Dependency Auditing
Map your dependency tree and monitor the health of critical open-source projects your product relies on.
Choosing the Right Approach
| Need | Best Option |
|---|---|
| Quick lookups, < 100 requests | GitHub API directly |
| Medium scale, no API key hassle | Free API endpoint |
| Large-scale bulk collection | GitHub Scraper on Apify |
Conclusion
GitHub data is immensely valuable for developer tools, recruiting, market research, and competitive intelligence. The official API is great but rate-limited. For anything beyond casual use, consider our free API endpoint or the full GitHub Scraper for cloud-scale collection.
All the code examples above work today — try them out and let me know what you build with the data.
Top comments (0)