DEV Community

GnomeMan4201
GnomeMan4201

Posted on

Found a Second Layer to a GitHub Follow Botnet?

Beyond Follow Clusters: Second-Order Similarity Patterns in a GitHub Bot Network

This is Parts 2 of an ongoing investigation. Part 1 documented the initial discovery — 8 accounts with Jaccard following-list similarity of 0.99+ across ~29,800 entries each, evading cross-follow detection entirely.

After Part 1 published, I kept pulling the data.

Subsequent analysis expanded the cluster to 9 accounts, recovered infrastructure linkage to a specific GitHub identity, and mapped the generation pipeline responsible for all 552 repositories across the cluster. The pipeline left recoverable artifacts in every repository it produced.

Following that same pipeline fingerprint led to an earlier operation — running nine months before the follow botnet was provisioned. The same GitHub identity appears in both. So does the same generator artifact. Four accounts documented in Part 1 appear in both operations.

This post documents what the data shows. Inference is labeled as inference throughout. I am not establishing intent or ownership beyond what the API evidence directly supports.


Methodology

All findings in this report are derived from:

  • GitHub REST API v3 responses (public endpoints, authenticated requests)
  • Raw git commit metadata
  • Repository file retrieval via raw content endpoints
  • WHOIS and DNS records
  • Graph overlap analysis (Jaccard similarity, set intersection, cosine similarity of similarity vectors)

No private repositories, leaked credentials, or non-public systems were accessed. Every finding in the confirmed findings table is reproducible from public API endpoints with a valid GitHub token.

This analysis is limited to publicly available GitHub API data and does not include private network signals, rate-limited endpoints, or non-indexed interactions. Findings reflect the state of public data at time of retrieval.


Epistemic Boundaries

These boundaries apply to the entire report and are stated once here rather than repeated inline.

What this analysis can establish:

  • Observable properties of public GitHub accounts, repositories, and commit metadata
  • Statistical deviation from a naive independent-uniform baseline model
  • Structural matches between artifacts across two time-separated operations
  • Presence of the same authenticated GitHub ID across multiple contexts

What this analysis cannot establish:

  • Who controls any of the accounts documented
  • Whether hajigur69 is an operator, collaborator, or an identity whose credentials were reused
  • Intent or downstream use of the documented infrastructure
  • Whether lynewinter's pairwise similarity to the non-mariwatts accounts meets cluster inclusion criteria — that data was not retrieved
  • Whether the fallback_ label carries the semantic meaning its name implies in the generating tool's context

Inference is labeled as inference when it appears. The confirmed findings table at the end of this report lists only directly observable, API-verifiable data.


Baseline: What Random Accounts Would Look Like

Before presenting the cluster data, it is worth establishing what Jaccard similarity looks like under independent sampling — the null hypothesis.

GitHub has approximately 100 million user accounts. Each cluster account follows ~29,800. Under a naive independent-uniform model — where two accounts select their follows independently and uniformly at random from the full user population:

E[|A ∩ B|] = k² / N = (29,800)² / 100,000,000 ≈ 8.88 accounts
E[Jaccard]  = 8.88 / (2×29,800 − 8.88)         ≈ 0.000149
Enter fullscreen mode Exit fullscreen mode

The 3-sigma upper bound under this model is approximately 0.000299.

The observed cluster minimum is 0.98986,642× the expected mean overlap under the uniform independence model.

This model makes simplifying assumptions that do not hold on GitHub: following behavior is not uniformly distributed, popular accounts attract disproportionate follows, and community clustering means real accounts share partial follow overlap above the uniform baseline. A realistic null model would produce a higher baseline than 0.000149. The observed values would still exceed it by orders of magnitude, but the precise ratio is model-dependent and should not be read as a formal hypothesis-test result. It is presented as a reference point against the simplest baseline, not as a statistically calibrated rejection threshold.

Every similarity value in the cluster below sits against this reference.


The Cluster Expanded

Running Jaccard similarity analysis against the original 8 accounts and their extended follower graphs surfaced a ninth account: lynewinter.

lynewinter  ↔  mariwatts   jaccard=0.9898   shared≈29,200
Enter fullscreen mode Exit fullscreen mode

The methodology is identical to Part 1. A coefficient of 0.9898 across ~29,800 following entries places this pair within the same anomalous range as the original cluster. Against the null model baseline of 0.000149, this value is statistically incompatible with independent account behavior.

The confirmed cluster is now 9 accounts:

canestein, hazexone, domcomit, kylehyne, jaderytm,
vierystein, hanyvert, mariwatts, lynewinter (partial coverage — 1 confirmed pairwise value)
Enter fullscreen mode Exit fullscreen mode

Similarity Structure of the Full Cluster

The complete pairwise Jaccard matrix for all 9 accounts, computed from following-list intersection over union:

Pairwise Jaccard similarity heatmap for the 9-account cluster. All confirmed values exceed 0.98. Peripheral nodes hazexone and lynewinter highlighted in orange. Matrix includes only confirmed pairwise edges; lynewinter cells against non-mariwatts accounts are unverified and excluded from full-density computation.

Per-account mean first-order similarity within the cluster:

jaderytm      mean=0.9970   min=0.9908   max=0.9998
mariwatts     mean=0.9968   min=0.9898   max=0.9998
kylehyne      mean=0.9970   min=0.9907   max=0.9998
domcomit      mean=0.9967   min=0.9912   max=0.9997
hanyvert      mean=0.9966   min=0.9912   max=0.9997
canestein     mean=0.9969   min=0.9907   max=0.9996
vierystein    mean=0.9962   min=0.9909   max=0.9985
hazexone      mean=0.9912   min=0.9907   max=0.9925  ← peripheral
lynewinter    mean=0.9910   min=0.9898   max=0.9912  ← peripheral
Enter fullscreen mode Exit fullscreen mode

hazexone and lynewinter are the structural outliers of the cluster. Their mean within-cluster similarity (0.9912 and 0.9910 respectively) sits approximately 0.006 below the core group mean of ~0.9969. Both still exceed 0.98 on every confirmed pairwise comparison. One interpretation consistent with, but not uniquely explained by, the data: they were provisioned from the same seed list but at a different time or via a slightly diverged list version. The position in the similarity distribution is observation; the generative explanation is inference.


Second-Order Structure

The "second layer" referenced in the investigation title refers to structure that emerges when comparing the similarity profiles of accounts, not just their direct following overlap.

Each account can be represented as a vector of its Jaccard similarities to all other cluster members. Computing cosine similarity between these vectors yields a second-order metric: how structurally equivalent two accounts are within the similarity graph.

For this cluster, all pairwise cosine similarities between Jaccard vectors compute to >0.9999 at four decimal places.

Important limitation: this result is partly a consequence of low variance across the input vectors. When all Jaccard values in a cluster fall within the range 0.98–0.9998, the similarity vectors are themselves numerically similar regardless of structural origin — cosine saturation at this scale is expected even for accounts that share only approximate overlap. The result does not independently establish shared generation; it is consistent with it, but a high-similarity cluster will produce this outcome under multiple generative models.

What the second-order metric does contribute: the inter-account variance in similarity profiles is extremely low across all 9 accounts. In organic follower networks, accounts accumulate following behavior across different communities over time and typically show differentiated structural positions — some accounts are more similar to high-degree hubs, others to peripheral clusters. The near-zero variance here is consistent with accounts whose following lists were seeded from the same source, but that interpretation is not uniquely supported by this metric alone.

A proper second-order baseline would require computing cosine similarity distributions across random high-degree graph samples of comparable size and density. That computation is not performed here. The metric is presented as a structural observation, not a statistically calibrated discriminator.


Sanity Check: What Doesn't Fit Cleanly

lynewinter is worth examining as a boundary case.

It was added to the cluster based on a single confirmed pairwise value: Jaccard 0.9898 against mariwatts. Its pairwise values against the remaining 7 accounts were not directly retrieved and are not included in the matrix above.

What this means: lynewinter's cluster membership rests on one confirmed measurement. It satisfies the inclusion threshold. It does not have the full evidentiary support of the core 7 accounts, which have complete pairwise matrices. The heatmap reflects only confirmed values for lynewinter; cells against the 7 non-mariwatts accounts should be treated as unverified.

If lynewinter were removed from the cluster, the core 8-account finding from Part 1 is unchanged. The lynewinter inclusion is the weakest link in the cluster membership list, and that is worth stating explicitly.


552 Repositories, One Embedded Timestamp per Account, 34-Minute Span

Each of the 9 accounts has between 57 and 63 public repositories. Total across the cluster: 552 repositories.

Every repository was created on May 12, 2026. Fetching the first repository per account and reading the raw README returned an HTML comment — invisible on the rendered page — containing a creation timestamp and a job identifier:

2026-05-12 11:10:39 | hanyvert   | SwapLink     | job=48099
2026-05-12 11:18:46 | jaderytm   | GasSync      | job=39412
2026-05-12 11:27:52 | hazexone   | BitForge     | job=63871
2026-05-12 11:30:00 | canestein  | BlockLink    | job=51606
2026-05-12 11:33:07 | mariwatts  | MintChain    | job=82564
2026-05-12 11:35:37 | vierystein | HashSync     | job=20845
2026-05-12 11:38:58 | kylehyne   | SmartLink    | job=38575
2026-05-12 11:42:07 | lynewinter | YieldChain   | job=78012
2026-05-12 11:44:30 | domcomit   | ProjectCloud | job=26977
Enter fullscreen mode Exit fullscreen mode

The first and last timestamps are 34 minutes apart. The job IDs are non-sequential across accounts — consistent with a job queue dispatching work across multiple workers concurrently, though the data does not rule out other scheduling patterns.

The comment format is consistent across all sampled READMEs:

<!-- fallback_BlockLink_20260512113000_51606 -->
Enter fullscreen mode Exit fullscreen mode

The fallback_ prefix is present in every instance retrieved. In template generation systems, a fallback_ label typically indicates the primary generation path failed and a static secondary template was substituted. Whether that interpretation applies here is inference. What is directly observable is that the prefix is consistent across all 552 repositories and across both the 2026 and 2025 operations documented below.


Repository Contents

Structural Uniformity

Fetching file trees and raw content from sampled repositories across all 9 accounts returned the same structural pattern.

A representative Python file (blocklink.py, 1,656 bytes):

class BlockLink:
    def run(self) -> bool:
        try:
            self.logger.info("Starting BlockLink processing")
            # Add your main logic here
            self.logger.info("Processing completed successfully")
            return True
Enter fullscreen mode Exit fullscreen mode

# Add your main logic here is the sole content of the method body. Every sampled repository follows this pattern: a class stub, a logging initializer, an argparse entry point, and a test file that instantiates the stub. No functional implementation was found in any sampled file. Repo names follow a [Word][Suffix] pattern; suffixes drawn from a fixed set: Core, Chain, Sync, Vault, Forge, Link.

Engagement Signals

Across all 552 repositories at time of retrieval:

Signal Count
Stars 0
Forks 0
PyPI uploads 0
CI/CD config files 0
Open issues 0
Pull requests 0

These are directly retrievable via the GitHub API. Their absence across 552 repositories is a cluster-level property, not an individual account characteristic.

Generation Artifacts

All 552 repositories contain an embedded HTML comment in the README — invisible on the rendered page — in the format <!-- fallback_NAME_TIMESTAMP_ID -->. The fallback_ label is treated here as an embedded string, not as a confirmed semantic signal about generation pipeline behavior. The LICENSE URL substitution error — documented in the following section — is the more structurally significant generation artifact, as it demonstrates a shared template origin independently of any label interpretation.


A Template Substitution Error Confirms Shared Generation

The LICENSE section of every generated README contains a hardcoded URL using mariwatts as the repository owner regardless of which account's repository it appears in.

From canestein/BlockLink:

See the LICENSE file at https://github.com/mariwatts/BlockLink/blob/main/LICENSE
Enter fullscreen mode Exit fullscreen mode

From lynewinter/YieldChain:

See the LICENSE file at https://github.com/mariwatts/YieldChain/blob/main/LICENSE
Enter fullscreen mode Exit fullscreen mode

The repo name variable was substituted correctly. The account name variable in the LICENSE URL field was not. mariwatts appears to be the base account in the generation template — the value present when the template was authored, not replaced during per-account substitution. Confirmed across multiple accounts. Not present in the mariwatts repositories themselves.


The Pipeline Is Linked to a Specific GitHub Identity via Commit Metadata

Every repository across all 9 accounts contains this co-author trailer:

Co-authored-by: Hajigur <66867581+hajigur69@users.noreply.github.com>
Enter fullscreen mode Exit fullscreen mode

GitHub's authenticated noreply format is NUMERICID+login@users.noreply.github.com. The numeric ID is assigned at account creation and embedded by GitHub's systems when a commit is pushed through an authenticated session. It is not user-configurable.

The GitHub account hajigur69 has internal numeric ID 66867581:

curl -s https://api.github.com/users/hajigur69 | python3 -c \
  "import json,sys; u=json.load(sys.stdin); print(u['id'])"
# 66867581
Enter fullscreen mode Exit fullscreen mode

Commits on hajigur69's own repository (Cloud9, created February 2026) carry the same identifier:

Author: Hajigur | 66867581+hajigur69@users.noreply.github.com
Enter fullscreen mode Exit fullscreen mode

Observation

The same authenticated GitHub ID appears in commits across all 9 cluster accounts and in commits authored directly by hajigur69.

Interpretation

This co-author line links the cluster's commit history to a specific authenticated GitHub identity. It is non-identifying structural compatibility with shared control, credential sharing, or credential compromise — the data does not distinguish between them.

hajigur69: GitHub account created June 13, 2020. At time of retrieval: 903 followers, 679 following. Public bio: lamer.


Infrastructure: carox.tech

Two cluster accounts — canestein and lynewinter — use a custom email domain in their git commit author metadata:

canestein  → locis@carox.tech
lynewinter → doar@carox.tech
Enter fullscreen mode Exit fullscreen mode

WHOIS and DNS:

Creation Date:  2025-07-19
Updated Date:   2025-08-01
Registrar:      Namify Domains Inc
Name Servers:   raphaela.ns.cloudflare.com / uriah.ns.cloudflare.com
A record:       none
MX:             Cloudflare Email Routing (3 records)
TXT:            v=spf1 include:_spf.mx.cloudflare.net ~all
Enter fullscreen mode Exit fullscreen mode

No web presence. MX records point to Cloudflare Email Routing — a free forwarding service. Destination inbox not publicly recoverable. Domain predates the cluster provisioning event by approximately 10 months.


A Malformed Co-Author Address

In addition to the hajigur69 trailer, a second co-author line appears across 8 of the 9 accounts:

Co-authored-by: v <v@users.noreply.github.com>
Enter fullscreen mode Exit fullscreen mode

There is a GitHub account with login v (ID: 627846). Its correct noreply address is 627846+v@users.noreply.github.com. The string in these commits is missing the numeric prefix that GitHub's authentication system generates automatically. It cannot be produced by a normal authenticated push.

The most likely explanations: a user.email set manually in a local git config, a placeholder from a development environment not replaced before deployment, or a test identity carried into production. All three produce the same result — consistent across 8 of 9 accounts, set once, never audited. This is not an attribution of the v GitHub account to this operation.


An Earlier Operation: The Same Fingerprints, Nine Months Prior

The 66867581+hajigur69 co-author string and the fallback_ generator artifact do not appear for the first time in May 2026.

GitHub's commit search API returns the same string across thousands of commits from a cluster of 22 accounts in a July–August 2025 window:

2025-07-08..2025-07-14  →  1,738 hits
2025-07-15..2025-07-21  →    701 hits
2025-07-22..2025-07-31  →    949 hits
2025-08-01..2025-08-15  →  7,194 hits
2025-08-16..2025-08-31  →      0 hits  ← hard stop
Enter fullscreen mode Exit fullscreen mode

Four accounts from the 2026 follow botnet cluster — canestein, hazexone, domcomit, kylehyne — are present in this earlier commit set. The August 16 cutoff is a directly observable fact. Its cause is not established by this data.


Lyne6666

Lyne6666: created May 3, 2025. 163 public repositories, all with a GitHub API creation timestamp of July 9, 2025, 18:55 UTC.

Observation

LICENSE file SHA across all 163 repositories:

8aa26455d23acf904be3ed9dfb3a3efe3e49245a
Enter fullscreen mode Exit fullscreen mode

Git hashes content. Identical SHA across 163 repositories = identical bytes in every file = single source file, copied without modification.

Repository names follow {Tech}{Testnet}{Function}{Suffix}. Every README install section:

pip install git+https://github.com/Lyne6666/{RepoName}.git
Enter fullscreen mode Exit fullscreen mode

Present across all 163 repositories. No postinstall hook content was confirmed in examined repositories.


uhsr

The Lyne6666 commit author email field: uhsr@eteb.me — a private domain, WHOIS-shielded via Identity Digital. Account uhsr created July 10, 2025 — one day after Lyne6666's mass repository creation timestamp.

At time of retrieval: 237 public repositories, 2,972 followers, 30,778 following.

Observation: Commit Volume

July 2025:      1,382 commits  (71% of all-time total at retrieval)
August 2025:      247 commits
September 2025:    21 commits
October 2025:      96 commits
Enter fullscreen mode Exit fullscreen mode

Interpretation

A 71% concentration of all-time commit activity within a single calendar month is statistically atypical for accounts with multi-month histories. It is consistent with a scripted bulk operation rather than incremental development. That is an interpretation; the commit counts are directly retrieved from the API.


The Backdated Commit History

uhsr/AssetMarket contains a .Logs file with ~365 entries spanning January 1–December 31, 2025, format: Logs: YYYY-MM-DD <8charToken>.

Observation

Repository creation date:

curl -s "https://api.github.com/repos/uhsr/AssetMarket" | python3 -c \
  "import json,sys; r=json.load(sys.stdin); print(r['created_at'])"
# 2025-08-02T16:29:22Z
Enter fullscreen mode Exit fullscreen mode

Root commit:

SHA:            4f8f47697eb89c8818820ca92348be01c4544878
Message:        Logs on 2025-01-01
Author date:    2025-01-01T14:47:47Z
Committer date: 2025-01-01T14:47:47Z
Author email:   uhsr@eteb.me
Enter fullscreen mode Exit fullscreen mode

The repository did not exist until August 2, 2025. The root commit carries an author date of January 1, 2025 — 213 days earlier.

Derived Structure

Git stores GIT_AUTHOR_DATE and GIT_COMMITTER_DATE separately. Both are user-configurable before a push. Naive backdating sets only the author date, leaving committer date at the real push timestamp — a detectable mismatch. In this root commit, both fields are set identically. The mismatch that typically exposes backdating is absent.

Interpretation

The presence of a 213-day pre-creation commit, with both date fields aligned to eliminate the typical detection artifact, is consistent with deliberate fabrication of commit history. The .Logs content — uniform daily entries with 8-character tokens across a full calendar year — is consistent with bulk generation rather than organic accumulation. Both are interpretations. The timestamp mismatch between repo creation and root commit author date is a directly observable, verifiable fact.


The Generator Artifact in the 2025 Repositories

Raw README of uhsr/AssetMarket:

<!-- fallback_AssetMarket_20250802163009_95172 -->
Enter fullscreen mode Exit fullscreen mode

Same embedded string format as the 2026 cluster: a fallback_ prefix, repo name, timestamp, and trailing numeric ID. The fallback_ label is treated as an embedded string whose origin is unknown — it may derive from a template engine, a CI scaffold, a repository bootstrap tool, or a custom generation script. The label alone does not establish what tool produced it or what the label means in that tool's context.

What is directly observable: the format fallback_{name}_{timestamp}_{id} appears in both the 2025 uhsr repositories and across all 552 repositories in the 2026 cluster. That structural match is the finding; the label's semantic meaning within any particular system is not established by this analysis.

Two additional repositories:

uhsr/SmartContract  →  <!-- fallback_SmartContract_20250802162757_83653 -->
uhsr/TokenLab       →  <!-- fallback_TokenLab_20250802161931_80263 -->
Enter fullscreen mode Exit fullscreen mode

Three artifacts, 38-minute window:

16:19:31  TokenLab      ID: 80263
16:27:57  SmartContract ID: 83653
16:30:09  AssetMarket   ID: 95172
Enter fullscreen mode Exit fullscreen mode

Observation

The trailing IDs increase non-uniformly — gaps of ~3,390 and ~11,519. On Linux systems, process IDs increment sequentially; irregular gaps are consistent with other processes consuming assignments between runs. This is an interpretation of the pattern, not a definitive conclusion about the execution environment.

The fallback_ prefix and fallback_{name}_{timestamp}_{id} format are identical across both the 2025 and 2026 operations. That is a directly observable structural match.


The Stargazer Overlap

AssetMarket (83 stars), SmartContract (50), DigitalWallet (49) at time of analysis.

Observation

AssetMarket ∩ DigitalWallet ∩ SmartContract = 33 accounts
Enter fullscreen mode Exit fullscreen mode

33 accounts starred all three repositories — 67% of DigitalWallet's total star count from a single overlapping pool. Under a model where starring behavior is distributed independently across repositories with no shared promotion mechanism, the probability of 33 accounts converging on all three repositories with zero followers, zero forks, and no search visibility is not consistent with the observed overlap concentration.

The July 11, 2025 batch — 83 repositories, single day:

★2:  64 repositories  (77%)
★1:  19 repositories  (23%)
★0:   0 repositories
Enter fullscreen mode Exit fullscreen mode

Zero repositories with zero stars. Uniform two-tier distribution with no variance.

Two accounts from the 33-account pool — SAPH1TE and ahnshy — also appear in stargazer lists for Lyne6666 repositories. The uhsr and Lyne6666 clusters share no observable social graph overlap. These two accounts are the only cross-cluster link found in this data.

Interpretation

What produced the 33-account overlap and the uniform star distribution is not established by this data. The overlap pattern and cross-cluster appearance of two accounts are documented as observations. A coordinated engagement mechanism is one explanation consistent with the data; it is not the only possible explanation.


mohammadtzs

One fork of DigitalWallet exists, made by mohammadtzs. Account created March 2025, 506 public repositories, 100 forks — all from accounts returning 404 at time of retrieval. Fork names included alork1, alork2, alork3, alorki1. mohammadtzs is also present in the 33-account stargazer pool.

Observable: forked a cluster repository, present in the shared stargazer pool, prior forks exclusively from accounts no longer present on the platform.


October 2025: Repository Names

uhsr commit activity: 21 commits in September, 96 in October — concentrated in a 15-minute window, October 20 between 05:04 and 05:19 UTC, across 7 repositories:

awesomepythonTech        → name matches vinta/awesome-python (290k+ stars)
freeprogrammingbooksHub  → EbookFoundation/free-programming-books (340k+)
publicapisAI             → public-apis/public-apis (320k+)
codinginterviewuniversityTools → jwasham/coding-interview-university (310k+)
developerroadmapLab      → kamranahmedse/developer-roadmap (300k+)
systemdesignprimerCloud  → donnemartin/system-design-primer (280k+)
buildyourownxTools       → codecrafters-io/build-your-own-x (330k+)
Enter fullscreen mode Exit fullscreen mode

None contain implementation content. developerroadmapLab description: "enterprise enterprise-grade" — a duplicated token consistent with an unresolved template variable.


Alternative Explanations

Could the hajigur69 co-author identity appear in unrelated operations by coincidence? The GitHub noreply format embeds an immutable, account-specific numeric ID. The same ID (66867581) appearing across thousands of commits in a 2025 cluster and across all 552 repositories in the 2026 cluster deviates significantly from baseline expectations under independent sampling models of follower selection. The identity would have to be reused deliberately, or the same credentials used in both operations.

Could the fallback_ artifact format be from a widely distributed open-source tool? Possible. If the prefix is a convention from a publicly available README generation tool, its presence in both operations indicates both used the same tool — not necessarily the same operator. No such tool was identified in this research. The artifact format is not established as unique to a single actor.

Could the template substitution error (the mariwatts LICENSE URL) appear independently across unrelated generation pipelines? Less likely than the fallback_ case. A shared template variable left unsubstituted in the same field across 552 repositories from 9 accounts is more parsimoniously explained by a single template source than by independent pipelines converging on the same substitution gap. However, if a widely-used generation tool ships with mariwatts hardcoded as a default account in its LICENSE URL template, the error would appear across any pipeline using that tool without modification. That scenario is not established as absent.

Could the four accounts in both operations be coincidentally shared? Four accounts from the 2026 cluster — canestein, hazexone, domcomit, kylehyne — appear in the 2025 commit-farming activity. Their simultaneous presence in two temporally separated operations is statistically implausible under independence assumptions given the scale of GitHub's account population. Whether that overlap reflects shared control is an inference; the factual overlap is directly retrievable.

I am not establishing that a single individual or organization controls both operations. I am documenting that the same authenticated GitHub identity, the same generator artifact format, and four of the same accounts are present in both.


Summary of Confirmed Findings

Finding Method
Cluster minimum Jaccard 0.9898 vs null baseline 0.000149 (6,642×) Analytical null model, GitHub API
Second-order cosine similarity >0.9999 (precision saturation) across all 36 pairs Cosine of per-account Jaccard vectors
hazexone, lynewinter structurally peripheral (mean ≤ 0.9912) Within-cluster mean Jaccard
lynewinter cluster membership supported by 1 confirmed pair; 7 unverified Direct pairwise retrieval
All 552 repos created May 12, 2026 in a 34-minute window Embedded HTML comment timestamps
<!-- fallback_NAME_TIMESTAMP_ID --> in every README Direct raw file fetch, all 9 accounts
mariwatts hardcoded in LICENSE URLs across foreign accounts Direct raw file fetch
66867581+hajigur69 co-author on all cluster commits Raw commit data, GitHub API
66867581+hajigur69 author on hajigur69's own repository Raw commit data, GitHub API
v@users.noreply.github.com lacks numeric ID prefix Raw commit data
locis@carox.tech, doar@carox.tech in commit author fields Raw commit data
carox.tech: no A record, Cloudflare MX, created July 2025 WHOIS, DNS
All 552 repos: zero stars, forks, CI, issues, PRs GitHub API
Same fallback_ format in 2025 uhsr repositories Direct raw file fetch
uhsr/AssetMarket root commit 213 days before repo creation GitHub API commit + repo endpoints
Root commit SHA, both date fields set to 2025-01-01 4f8f47697eb89c8818820ca92348be01c4544878
PID artifacts in 3 README files, same machine, 38-minute window Direct raw file fetch
33-account pool in all three high-value stargazer lists (67% overlap) Stargazer API cross-reference
SAPH1TE, ahnshy in both uhsr and Lyne6666 stargazer lists Stargazer API cross-cluster
canestein, hazexone, domcomit, kylehyne in both 2025 and 2026 operations Commit search API, Part 1 data
October 2025 repos named after widely-starred repositories GitHub API, name comparison

Constraints on the Null Hypothesis

These results constrain the likelihood of independent behavior under standard sampling assumptions.

The first-order Jaccard values (0.9898–0.9998) are 6,642× the analytically expected baseline for independent accounts following 29,800 users from a pool of 100 million. The second-order structure — cosine similarity of Jaccard vectors at precision saturation (>0.9999) across all 36 account pairs — is consistent with accounts whose similarity profiles derive from a near-identical source. Higher-precision computation may reveal internal structure not visible at four decimal places; the current data does not resolve it.

Any competing explanation must jointly account for the following linkage classes:

  • Commit identity — the same authenticated GitHub ID (66867581+hajigur69) present across all 552 cluster repositories and in that identity's own repository
  • Generator artifact — the fallback_{name}_{timestamp}_{id} format present across both the 2025 and 2026 operations
  • Cross-operation account overlap — four accounts (canestein, hazexone, domcomit, kylehyne) present in both operations

Disclosure

This report has been submitted in full to GitHub Trust & Safety with API-verifiable evidence including the root backdated commit SHA (4f8f47697eb89c8818820ca92348be01c4544878), the generator artifact URLs, the 33-account stargazer overlap, and the complete account list.

All data was retrieved via the GitHub REST API v3 with authenticated requests. No accounts were accessed beyond their public API surface. No systems were compromised.

All account names published here are publicly visible GitHub profiles. This methodology is only verifiable if the data is reproducible.

If you have seen the hajigur69 co-author string or the fallback_ artifact pattern in your own repositories' commit histories — that is the fingerprint documented here. Worth reporting.


All tooling used in this investigation is in BANANA_TREE.

Top comments (0)