How to Spot Fake GitHub Stars Before They Burn You

#github #security #opensource #supplychainattacks

Last month I almost added a dependency to a client project that had 3,400 stars and looked perfectly legit. Nice README, decent docs, recent commits. Then I noticed something weird — the repo had thousands of stars but only two forks and zero open issues.

That's when I fell down the rabbit hole of GitHub's fake star economy. Turns out you can buy stars for as little as a penny each, and the problem is way bigger than most of us realize.

Why This Actually Matters

Let's be honest — we all use star count as a quick trust signal. When you're evaluating two libraries that solve the same problem, the one with 5k stars feels safer than the one with 200. That mental shortcut is exactly what bad actors exploit.

Researchers at Carnegie Mellon University published a study analyzing millions of GitHub accounts and found evidence of millions of suspected fake stars across tens of thousands of repositories. The scarier finding? Many of those repos were linked to malware distribution. Fake stars aren't just vanity metrics — they're an attack vector for supply chain compromise.

GitHub has acknowledged the problem and does purge fake accounts periodically, but it's a cat-and-mouse game. New fake accounts pop up faster than old ones get removed.

The Telltale Patterns of Fake Stars

After digging through research and building some of my own analysis scripts, I've found these patterns are dead giveaways:

Star spikes with no other activity — A repo jumps from 100 to 2,000 stars in a week, but forks, issues, and downloads stay flat. Real popularity creates a rising tide across all metrics.
Empty stargazer profiles — Click through to the profiles of recent stargazers. If most have no avatar, no bio, no repos, and no activity besides starring — that's a red flag.
Bulk account creation dates — Fake accounts are often created in batches. If you see dozens of stargazers whose accounts were all created in the same week, something's off.
Username patterns — Auto-generated usernames often follow patterns like adjective-noun-number or random character strings.
No contributor diversity — A legitimately popular project attracts outside contributors. A fake-starred repo typically has commits from only one or two people.

Step 1: Visualize the Star History

The fastest first check is plotting the star growth over time. A healthy open-source project grows stars gradually, maybe with bumps after conference talks or blog posts. Fake stars show up as vertical cliffs.

You can use the GitHub API to pull stargazer data with timestamps:

import requests
from collections import Counter
from datetime import datetime

def get_star_history(owner, repo, token):
    """Fetch stargazer timestamps using the GitHub API."""
    headers = {
        "Authorization": f"token {token}",
        # This header is required to get the starred_at timestamp
        "Accept": "application/vnd.github.v3.star+json"
    }

    stars_by_month = Counter()
    page = 1

    while True:
        url = f"https://api.github.com/repos/{owner}/{repo}/stargazers"
        resp = requests.get(url, headers=headers, params={"page": page, "per_page": 100})

        if resp.status_code != 200 or not resp.json():
            break

        for entry in resp.json():
            date = datetime.fromisoformat(entry["starred_at"].replace("Z", "+00:00"))
            stars_by_month[date.strftime("%Y-%m")] += 1

        page += 1

    return dict(sorted(stars_by_month.items()))

If you see a month where star count jumps by 10x the normal rate, that's your smoking gun.

Step 2: Profile the Stargazers

Once you've identified a suspicious spike, dig into who's doing the starring. Here's a quick script to score stargazer accounts:

def analyze_stargazers(owner, repo, token):
    """Score stargazer accounts for authenticity signals."""
    headers = {
        "Authorization": f"token {token}",
        "Accept": "application/vnd.github.v3.star+json"
    }

    suspicious_count = 0
    total_checked = 0

    url = f"https://api.github.com/repos/{owner}/{repo}/stargazers"
    resp = requests.get(url, headers=headers, params={"per_page": 100})

    for entry in resp.json():
        user_url = entry["user"]["url"]
        user_data = requests.get(user_url, headers=headers).json()

        red_flags = 0

        if user_data.get("public_repos", 0) == 0:
            red_flags += 1  # No repos at all
        if not user_data.get("bio"):
            red_flags += 1  # Empty profile
        if user_data.get("followers", 0) == 0:
            red_flags += 1  # Zero followers
        if not user_data.get("avatar_url", "").endswith("avatars"):
            # Default GitHub avatar check (simplified)
            red_flags += 1

        if red_flags >= 3:
            suspicious_count += 1
        total_checked += 1

    return {
        "total_checked": total_checked,
        "suspicious": suspicious_count,
        "suspicious_pct": round(suspicious_count / max(total_checked, 1) * 100, 1)
    }

If more than 30-40% of recent stargazers look suspicious, I'd seriously question that repo's legitimacy.

Step 3: Use Existing Tools

You don't have to build everything from scratch. Dagster Labs maintains an open-source tool called star-gazer (dagster-io/star-gazer on GitHub) that automates star pattern analysis and visualization. It pulls stargazer data and helps you spot anomalies without writing custom scripts.

There are also star history visualization sites that chart growth curves. These won't tell you definitively whether stars are fake, but an unnatural growth curve is immediately obvious when you see it plotted.

Step 4: Look Beyond Stars Entirely

Honestly, the best defense is to stop treating stars as a primary trust signal. Here's my actual checklist when evaluating a dependency now:

Commit frequency and recency — Is the project actually maintained?
Issue and PR activity — Real users file real bugs. A project with thousands of stars and zero issues is suspicious.
Fork-to-star ratio — Legitimate projects typically have forks in the range of 10-30% of their star count. A ratio under 1% is a warning sign.
Contributor count — Check git shortlog -sn. A healthy project has multiple contributors.
Download stats — npm, PyPI, and other registries show actual download numbers. These are harder to fake than GitHub stars.
Who uses it — Check the "Used by" badge on GitHub, or search for the package in other repos' dependency files.

# Quick health check for any GitHub repo
gh api repos/OWNER/REPO --jq '{
  stars: .stargazers_count,
  forks: .forks_count,
  open_issues: .open_issues_count,
  fork_ratio: (.forks_count / .stargazers_count * 100 | round),
  last_push: .pushed_at,
  subscribers: .subscribers_count
}'

That fork_ratio field is surprisingly telling. I've started running this on every new dependency before I even look at the README.

Prevention for Project Maintainers

If you maintain an open-source project and you're worried about competitors buying fake stars to outrank you, there's not much you can do directly. But you can:

Document real adoption — List companies using your project, link to case studies, show download stats.
Engage your community visibly — Active discussions, regular releases, and responsive issue triage signal legitimacy far better than a star count.
Report suspicious repos — If you find a repo using fake stars (especially if it's distributing malware), report it to GitHub through their abuse reporting form.

The Bigger Picture

The fake star problem is really a symptom of a deeper issue: we don't have great tools for evaluating software trust. Star counts became a proxy for quality because we didn't have anything better.

Projects like OpenSSF Scorecard are trying to solve this by computing a security score based on actual development practices — branch protection, code review, CI/CD, dependency pinning. That's the direction we need to move.

Until then, treat GitHub stars like online reviews. They're useful signal, but if the numbers look too good to be true, they probably are. Trust the boring metrics — commit history, contributor diversity, and actual usage numbers tell you way more than a star count ever will.