GitHub has a public API — but if you have ever tried to pull structured data at scale, you know the pain: rate limits (60 req/hr unauthenticated), pagination gymnastics, auth token management, and nested GraphQL queries for basic fields.
Scraper actors on Apify handle all of that. You get clean JSON output, built-in retries, and proxy rotation — no API token required.
Here is when that matters:
- Recruiter sourcing: Pull developer profiles with tech stacks, contribution history, and contact info
- Competitor analysis: Track repo growth, star velocity, and release cadence across orgs
- Tech stack research: Search repos by language, topic, or keyword — structured, not grep
- Contributor analytics: Map who is active in a project, how often, and in what capacity
Comparison: GitHub Scraper Actors on Apify (March 2026)
| Actor | Users | What It Scrapes | Strengths | Limitations |
|---|---|---|---|---|
| saswave/github-profile-scraper | 180 | User profiles, followers, social links | Most popular, LinkedIn/Twitter extraction | Profiles only — no repo search |
| sauain/github-stars | 124 | Star counts | Simple, fast | Single metric only |
| saswave/github-search-scraper | 65 | Search results (repos, users, topics) | Broad search capability | Search results, not deep metadata |
| fresh_cliff/github-repository-scraper | 48 | Repo metadata (stars, forks, topics) | Good repo coverage | No user profiles or org data |
| janbuchar/github-list-scraper | 31 | Awesome Lists, topic listings | Curated list scraping | Narrow use case |
| cryptosignals/github-scraper | New | Repos, users, profiles, orgs — 5 modes | All-in-one, 18 fields/item, rate limit handling | Launching April 2026 |
The market is fragmented. Most actors do one thing — profiles OR repos OR stars. None combine repo search, user search, profile details, repo metadata, and org repos in a single actor.
cryptosignals/github-scraper: The All-in-One Option
Launching on Apify Store this week. Five modes in one actor:
| Mode | Input | Output |
|---|---|---|
| search-repos | keyword, language, sort | Repos with stars, forks, language, description, topics, license |
| search-users | keyword, location, sort | User profiles with bio, company, repos count, followers |
| user-profile | username | Full profile: name, bio, company, location, email, blog, social links, stats |
| repo-details | owner/repo | Full metadata: stars, forks, watchers, issues, license, topics, default branch, created/updated dates |
| org-repos | org name | All public repos for an organization with full metadata |
Each item returns up to 18 structured fields. Rate limiting is handled internally — the actor backs off and retries automatically so your runs do not fail mid-scrape.
Pricing: Free during launch week. USD 4.99/month from April 3, 2026.
Quick Start: Python Example
from apify_client import ApifyClient
client = ApifyClient("YOUR_APIFY_TOKEN")
run = client.actor("cryptosignals/github-scraper").call(input={
"mode": "search-repos",
"query": "llm agent framework",
"language": "python",
"sort": "stars",
"maxItems": 50
})
for item in client.dataset(run["defaultDatasetId"]).iterate_items():
print(f"{item["full_name"]} - {item["stars"]} stars - {item["description"]}")
Ten lines. No pagination logic, no auth tokens, no rate limit handling.
Use Case: Building a Developer Contact List
Say you are sourcing Python developers who contribute to AI/ML projects:
- Search repos with mode search-repos, query machine learning, language python, sorted by stars
- For each repo, use repo-details to get contributor URLs
- Pull user profiles with mode user-profile to get name, company, email, blog, and social links
- Export to CSV from the Apify dataset tab — or pipe it into your CRM via webhook
You can chain these runs in a single Apify task schedule. Run it weekly to keep your pipeline fresh.
When to Use the GitHub API Directly vs. a Scraper Actor
Use the API directly if:
- You need real-time webhooks or event streams
- You are building a GitHub App with OAuth
- You have a PAT and your volume is under 5,000 requests/hour
Use a scraper actor if:
- You do not want to manage auth tokens
- You need to scrape across multiple entities (repos + users + orgs) without writing glue code
- You want built-in proxy rotation and retry logic
- You are doing one-off research or batch exports
Try It This Week
cryptosignals/github-scraper is available on the Apify Store now. Free to run during launch — paid tier starts April 3.
If you are already using one of the other actors listed above, this one consolidates five separate tools into one. Give it a run and see how it works for your use case.
Top comments (0)