DEV Community

Found a Coordinated GitHub Follow Botnet Hiding in My Followers?

GnomeMan4201 on May 19, 2026

I've been building a personal analytics stack for my GitHub and DEV.to presence — traffic reports, bot audits, the works. While auditing my 97 GitH...

Read full post

Mykola Kondratiuk • May 25

the obvious tell is the 29k following count, but the ratio is what makes it interesting - following 30k with near-zero followers back is pure spam signal. did the account creation years actually cluster or spread across the 8?

GnomeMan4201 • May 25

Spread across six years.. 2015, 2017, three from 2018, then 2019, 2020, 2021. That's kind of the whole point of the evasion. Fresh accounts all born the same week are trivially flagged. A cluster spanning six years with a 0.99+ Jaccard seed list is the operator doing their job.
The following ratio is a tell yeah, but on its own it's not conclusive — there have been legitimate mass-follow waves on platforms before. What the year spread adds is that you can't explain this as one campaign that spun up a batch of accounts. Someone was either warehousing these over years or seeding them from the same list at different points in time. Either way that implies an operator who's been running this for a while, not a one-time thing.

Mykola Kondratiuk • May 25

yeah, six years fools most freshness heuristics. seed coherence at 0.99+ needs actual graph tooling to surface - standard spam reports miss it

GnomeMan4201 • May 26

graph-based clustering is probably the only reliable way to surface networks like this at scale

Mykola Kondratiuk • May 29

agreed - and the scale argument cuts both ways: it's the only tool that works, but it's also the one adversaries know to hide from.

S M Tahosin • May 24

Fascinating investigation! It's alarming how easily these coordinated networks fly under the radar by mimicking organic behavior. Using follower overlap analysis was a brilliant approach to catch what simple cross-follow detection missed. Definitely going to audit my own followers now. Great read!

GnomeMan4201 • May 24

Thanks the follower overlap angle was honestly the one I almost didn't bother running. The cross-follow matrix looked so clean I almost closed the script. Glad I didn't.

The 'mimicking organic behavior' framing is exactly right. Aged accounts + no cross-following + high following counts hits every naive heuristic in the 'not a bot' direction. The shared seed list is the only place the coordination bleeds through, which is why most automated scanners miss it.

If you do run the audit on your own followers, check following counts first — any tight cluster sharing a ceiling is worth pulling the full lists on. The API costs are non-trivial at scale (~300 calls per account), but that's where the signal lives. Would be curious what you find.

Rahul S • May 20

Solid work. The Jaccard overlap proves the shared seed list, but if you can pull follow-event timestamps via GH Archive, the execution cadence between accounts would fingerprint the automation pipeline itself — an operator can shuffle the seed list to drop Jaccard below threshold, but randomizing inter-follow timing without killing throughput is way harder. Also curious whether the ~29,682 common targets share repo-starring patterns, since the same operator probably isn't running follow bots in isolation.

GnomeMan4201 • May 20

Good call on the timing fingerprint angle, seed shuffling can tank Jaccard similarity but you can't easily randomize inter-follow cadence without either throttling throughput or introducing detectable burstiness. GH Archive would let you reconstruct the event stream and look for that signature directly.

The repo-starring angle is the one I want to pull on next. If the same accounts are starring a consistent set of repos, that's either another shared seed list or active coordination around specific targets and either way it's a stronger signal than follows alone since starring requires more intentional action. Follow graphs are cheap to fake; star graphs are slightly less so.

leob • May 20

But what would be the "intention" behind it - would there be something malicious within the repos of these "users"? (of course you can't know till you look in detail)

(well I think we've been warned often enough recently to not just clone/install repos from users you don't know and don't fully trust ...)

GnomeMan4201 • May 20

Honestly can't confirm intent from the data I have the analysis only covers follow graph structure, not what the accounts are actually pushing. But the most plausible theory is social proof laundering: inflate follower counts on otherwise thin accounts, then use that apparent legitimacy to get people to clone or install something. GitHub followers are a trust signal a lot of people don't scrutinize.

The whole reason I went down this rabbit hole is I kept noticing the same pattern… any time a post crosses a certain view threshold, a wave of these accounts shows up in my followers within hours. Got annoying enough that I started documenting it instead of just ignoring it.

Your instinct is right. Follower count on GitHub should carry basically zero weight when deciding whether to run someone's code.

leob • May 20

Very convincing, both the pattern and your analysis of it ... intriguing that people go to these lengths to set this all up, but the intent is probably what you're saying (and what I was hinting at) ... kind of fascinating for sure!

GnomeMan4201 • May 20

Fully agree. The strongest finding is the coordination fingerprint itself. The intent is harder to prove directly, but once you see eight aged accounts tracking an almost identical external population, it becomes difficult to view the behavior as random.

Rasmus Ros • May 21

Following-list overlap is the signal here. It scales badly because the comparisons go quadratic, so GitHub has to rely on reports.

GnomeMan4201 • May 21

Exactly right on the scaling problem..pairwise Jaccard on raw account population is O(n²) in comparisons and O(n·k) in API cost where k is list depth, so GitHub can't run it exhaustively. What makes it tractable here is the two-stage approach: behavioral heuristics (following count ceiling, age spread) cut the candidate pool to a small cluster first, then Jaccard runs on that. At 8 accounts you're doing 28 comparisons, not millions. The real wall isn't the math, it's the API rate limits on pulling ~30k following entries per account. Your point still stands at platform scale though: this only works as an analyst tool or post-hoc investigation, not a real-time prevention layer. Which is probably why GitHub leans on reports.

Rasmus Ros • May 21

Good writeup anyway. Keeping GitHub a little less bot-ridden is work worth doing.

xulingfeng • May 29

Nice catch on the botnet pattern — the creation date clustering and narrow following count range are dead giveaways. We built a similar detection script for Dev.to after noticing suspicious follower patterns.

One thing I'd add: check if the accounts share commit activity on the same repos. Bot accounts often star/fork the same projects. Curious if you found any repo-level correlations in your dataset?

sagar shirsat • May 24

will you please tell what is the motto behind hiding it ?

GnomeMan4201 • May 24

Great question. The short answer is that follower counts function as a credibility signal , hiding in a follower list is the point, not a side effect.

If an account suddenly has 3,000 followers, casual observers assume it's reputable. Algorithms may surface it more. New readers are more likely to trust the content. The accounts don't need to be visible individually they just need to inflate the number.

The operators keep the accounts dormant and inactive specifically so they don't draw attention. A follower that never comments, never posts, and never interacts is harder to manually flag than one that spams. The hiding is the product.