DEV Community

agenthustler
agenthustler

Posted on

Best GitHub Scrapers in 2026: Repos, Users & Org Data via Apify

GitHub has a public API — but if you have ever tried to pull structured data at scale, you know the pain: rate limits (60 req/hr unauthenticated), pagination gymnastics, auth token management, and nested GraphQL queries for basic fields.

Scraper actors on Apify handle all of that. You get clean JSON output, built-in retries, and proxy rotation — no API token required.

Here is when that matters:

  • Recruiter sourcing: Pull developer profiles with tech stacks, contribution history, and contact info
  • Competitor analysis: Track repo growth, star velocity, and release cadence across orgs
  • Tech stack research: Search repos by language, topic, or keyword — structured, not grep
  • Contributor analytics: Map who is active in a project, how often, and in what capacity

Comparison: GitHub Scraper Actors on Apify (March 2026)

Actor Users What It Scrapes Strengths Limitations
saswave/github-profile-scraper 180 User profiles, followers, social links Most popular, LinkedIn/Twitter extraction Profiles only — no repo search
sauain/github-stars 124 Star counts Simple, fast Single metric only
saswave/github-search-scraper 65 Search results (repos, users, topics) Broad search capability Search results, not deep metadata
fresh_cliff/github-repository-scraper 48 Repo metadata (stars, forks, topics) Good repo coverage No user profiles or org data
janbuchar/github-list-scraper 31 Awesome Lists, topic listings Curated list scraping Narrow use case
cryptosignals/github-scraper New Repos, users, profiles, orgs — 5 modes All-in-one, 18 fields/item, rate limit handling Launching April 2026

The market is fragmented. Most actors do one thing — profiles OR repos OR stars. None combine repo search, user search, profile details, repo metadata, and org repos in a single actor.


cryptosignals/github-scraper: The All-in-One Option

Launching on Apify Store this week. Five modes in one actor:

Mode Input Output
search-repos keyword, language, sort Repos with stars, forks, language, description, topics, license
search-users keyword, location, sort User profiles with bio, company, repos count, followers
user-profile username Full profile: name, bio, company, location, email, blog, social links, stats
repo-details owner/repo Full metadata: stars, forks, watchers, issues, license, topics, default branch, created/updated dates
org-repos org name All public repos for an organization with full metadata

Each item returns up to 18 structured fields. Rate limiting is handled internally — the actor backs off and retries automatically so your runs do not fail mid-scrape.

Pricing: Free during launch week. USD 4.99/month from April 3, 2026.


Quick Start: Python Example

from apify_client import ApifyClient

client = ApifyClient("YOUR_APIFY_TOKEN")

run = client.actor("cryptosignals/github-scraper").call(input={
    "mode": "search-repos",
    "query": "llm agent framework",
    "language": "python",
    "sort": "stars",
    "maxItems": 50
})

for item in client.dataset(run["defaultDatasetId"]).iterate_items():
    print(f"{item["full_name"]} - {item["stars"]} stars - {item["description"]}")
Enter fullscreen mode Exit fullscreen mode

Ten lines. No pagination logic, no auth tokens, no rate limit handling.


Use Case: Building a Developer Contact List

Say you are sourcing Python developers who contribute to AI/ML projects:

  1. Search repos with mode search-repos, query machine learning, language python, sorted by stars
  2. For each repo, use repo-details to get contributor URLs
  3. Pull user profiles with mode user-profile to get name, company, email, blog, and social links
  4. Export to CSV from the Apify dataset tab — or pipe it into your CRM via webhook

You can chain these runs in a single Apify task schedule. Run it weekly to keep your pipeline fresh.


When to Use the GitHub API Directly vs. a Scraper Actor

Use the API directly if:

  • You need real-time webhooks or event streams
  • You are building a GitHub App with OAuth
  • You have a PAT and your volume is under 5,000 requests/hour

Use a scraper actor if:

  • You do not want to manage auth tokens
  • You need to scrape across multiple entities (repos + users + orgs) without writing glue code
  • You want built-in proxy rotation and retry logic
  • You are doing one-off research or batch exports

Try It This Week

cryptosignals/github-scraper is available on the Apify Store now. Free to run during launch — paid tier starts April 3.

If you are already using one of the other actors listed above, this one consolidates five separate tools into one. Give it a run and see how it works for your use case.

Top comments (0)