Beyond Follow Clusters: Second-Order Similarity Patterns in a GitHub Bot Network
This is Parts 2 of an ongoing investigation. Part 1 documented the initial discovery — 8 accounts with Jaccard following-list similarity of 0.99+ across ~29,800 entries each, evading cross-follow detection entirely.
After Part 1 published, I kept pulling the data.
Subsequent analysis expanded the cluster to 9 accounts, recovered infrastructure linkage to a specific GitHub identity, and mapped the generation pipeline responsible for all 552 repositories across the cluster. The pipeline left recoverable artifacts in every repository it produced.
Following that same pipeline fingerprint led to an earlier operation — running nine months before the follow botnet was provisioned. The same GitHub identity appears in both. So does the same generator artifact. Four accounts documented in Part 1 appear in both operations.
This post documents what the data shows. Inference is labeled as inference throughout. I am not establishing intent or ownership beyond what the API evidence directly supports.
Methodology
All findings in this report are derived from:
- GitHub REST API v3 responses (public endpoints, authenticated requests)
- Raw git commit metadata
- Repository file retrieval via raw content endpoints
- WHOIS and DNS records
- Graph overlap analysis (Jaccard similarity, set intersection, cosine similarity of similarity vectors)
No private repositories, leaked credentials, or non-public systems were accessed. Every finding in the confirmed findings table is reproducible from public API endpoints with a valid GitHub token.
This analysis is limited to publicly available GitHub API data and does not include private network signals, rate-limited endpoints, or non-indexed interactions. Findings reflect the state of public data at time of retrieval.
Epistemic Boundaries
These boundaries apply to the entire report and are stated once here rather than repeated inline.
What this analysis can establish:
- Observable properties of public GitHub accounts, repositories, and commit metadata
- Statistical deviation from a naive independent-uniform baseline model
- Structural matches between artifacts across two time-separated operations
- Presence of the same authenticated GitHub ID across multiple contexts
What this analysis cannot establish:
- Who controls any of the accounts documented
- Whether
hajigur69is an operator, collaborator, or an identity whose credentials were reused - Intent or downstream use of the documented infrastructure
- Whether
lynewinter's pairwise similarity to the non-mariwattsaccounts meets cluster inclusion criteria — that data was not retrieved - Whether the
fallback_label carries the semantic meaning its name implies in the generating tool's context
Inference is labeled as inference when it appears. The confirmed findings table at the end of this report lists only directly observable, API-verifiable data.
Baseline: What Random Accounts Would Look Like
Before presenting the cluster data, it is worth establishing what Jaccard similarity looks like under independent sampling — the null hypothesis.
GitHub has approximately 100 million user accounts. Each cluster account follows ~29,800. Under a naive independent-uniform model — where two accounts select their follows independently and uniformly at random from the full user population:
E[|A ∩ B|] = k² / N = (29,800)² / 100,000,000 ≈ 8.88 accounts
E[Jaccard] = 8.88 / (2×29,800 − 8.88) ≈ 0.000149
The 3-sigma upper bound under this model is approximately 0.000299.
The observed cluster minimum is 0.9898 — 6,642× the expected mean overlap under the uniform independence model.
This model makes simplifying assumptions that do not hold on GitHub: following behavior is not uniformly distributed, popular accounts attract disproportionate follows, and community clustering means real accounts share partial follow overlap above the uniform baseline. A realistic null model would produce a higher baseline than 0.000149. The observed values would still exceed it by orders of magnitude, but the precise ratio is model-dependent and should not be read as a formal hypothesis-test result. It is presented as a reference point against the simplest baseline, not as a statistically calibrated rejection threshold.
Every similarity value in the cluster below sits against this reference.
The Cluster Expanded
Running Jaccard similarity analysis against the original 8 accounts and their extended follower graphs surfaced a ninth account: lynewinter.
lynewinter ↔ mariwatts jaccard=0.9898 shared≈29,200
The methodology is identical to Part 1. A coefficient of 0.9898 across ~29,800 following entries places this pair within the same anomalous range as the original cluster. Against the null model baseline of 0.000149, this value is statistically incompatible with independent account behavior.
The confirmed cluster is now 9 accounts:
canestein, hazexone, domcomit, kylehyne, jaderytm,
vierystein, hanyvert, mariwatts, lynewinter (partial coverage — 1 confirmed pairwise value)
Similarity Structure of the Full Cluster
The complete pairwise Jaccard matrix for all 9 accounts, computed from following-list intersection over union:
Per-account mean first-order similarity within the cluster:
jaderytm mean=0.9970 min=0.9908 max=0.9998
mariwatts mean=0.9968 min=0.9898 max=0.9998
kylehyne mean=0.9970 min=0.9907 max=0.9998
domcomit mean=0.9967 min=0.9912 max=0.9997
hanyvert mean=0.9966 min=0.9912 max=0.9997
canestein mean=0.9969 min=0.9907 max=0.9996
vierystein mean=0.9962 min=0.9909 max=0.9985
hazexone mean=0.9912 min=0.9907 max=0.9925 ← peripheral
lynewinter mean=0.9910 min=0.9898 max=0.9912 ← peripheral
hazexone and lynewinter are the structural outliers of the cluster. Their mean within-cluster similarity (0.9912 and 0.9910 respectively) sits approximately 0.006 below the core group mean of ~0.9969. Both still exceed 0.98 on every confirmed pairwise comparison. One interpretation consistent with, but not uniquely explained by, the data: they were provisioned from the same seed list but at a different time or via a slightly diverged list version. The position in the similarity distribution is observation; the generative explanation is inference.
Second-Order Structure
The "second layer" referenced in the investigation title refers to structure that emerges when comparing the similarity profiles of accounts, not just their direct following overlap.
Each account can be represented as a vector of its Jaccard similarities to all other cluster members. Computing cosine similarity between these vectors yields a second-order metric: how structurally equivalent two accounts are within the similarity graph.
For this cluster, all pairwise cosine similarities between Jaccard vectors compute to >0.9999 at four decimal places.
Important limitation: this result is partly a consequence of low variance across the input vectors. When all Jaccard values in a cluster fall within the range 0.98–0.9998, the similarity vectors are themselves numerically similar regardless of structural origin — cosine saturation at this scale is expected even for accounts that share only approximate overlap. The result does not independently establish shared generation; it is consistent with it, but a high-similarity cluster will produce this outcome under multiple generative models.
What the second-order metric does contribute: the inter-account variance in similarity profiles is extremely low across all 9 accounts. In organic follower networks, accounts accumulate following behavior across different communities over time and typically show differentiated structural positions — some accounts are more similar to high-degree hubs, others to peripheral clusters. The near-zero variance here is consistent with accounts whose following lists were seeded from the same source, but that interpretation is not uniquely supported by this metric alone.
A proper second-order baseline would require computing cosine similarity distributions across random high-degree graph samples of comparable size and density. That computation is not performed here. The metric is presented as a structural observation, not a statistically calibrated discriminator.
Sanity Check: What Doesn't Fit Cleanly
lynewinter is worth examining as a boundary case.
It was added to the cluster based on a single confirmed pairwise value: Jaccard 0.9898 against mariwatts. Its pairwise values against the remaining 7 accounts were not directly retrieved and are not included in the matrix above.
What this means: lynewinter's cluster membership rests on one confirmed measurement. It satisfies the inclusion threshold. It does not have the full evidentiary support of the core 7 accounts, which have complete pairwise matrices. The heatmap reflects only confirmed values for lynewinter; cells against the 7 non-mariwatts accounts should be treated as unverified.
If lynewinter were removed from the cluster, the core 8-account finding from Part 1 is unchanged. The lynewinter inclusion is the weakest link in the cluster membership list, and that is worth stating explicitly.
552 Repositories, One Embedded Timestamp per Account, 34-Minute Span
Each of the 9 accounts has between 57 and 63 public repositories. Total across the cluster: 552 repositories.
Every repository was created on May 12, 2026. Fetching the first repository per account and reading the raw README returned an HTML comment — invisible on the rendered page — containing a creation timestamp and a job identifier:
2026-05-12 11:10:39 | hanyvert | SwapLink | job=48099
2026-05-12 11:18:46 | jaderytm | GasSync | job=39412
2026-05-12 11:27:52 | hazexone | BitForge | job=63871
2026-05-12 11:30:00 | canestein | BlockLink | job=51606
2026-05-12 11:33:07 | mariwatts | MintChain | job=82564
2026-05-12 11:35:37 | vierystein | HashSync | job=20845
2026-05-12 11:38:58 | kylehyne | SmartLink | job=38575
2026-05-12 11:42:07 | lynewinter | YieldChain | job=78012
2026-05-12 11:44:30 | domcomit | ProjectCloud | job=26977
The first and last timestamps are 34 minutes apart. The job IDs are non-sequential across accounts — consistent with a job queue dispatching work across multiple workers concurrently, though the data does not rule out other scheduling patterns.
The comment format is consistent across all sampled READMEs:
<!-- fallback_BlockLink_20260512113000_51606 -->
The fallback_ prefix is present in every instance retrieved. In template generation systems, a fallback_ label typically indicates the primary generation path failed and a static secondary template was substituted. Whether that interpretation applies here is inference. What is directly observable is that the prefix is consistent across all 552 repositories and across both the 2026 and 2025 operations documented below.
Repository Contents
Structural Uniformity
Fetching file trees and raw content from sampled repositories across all 9 accounts returned the same structural pattern.
A representative Python file (blocklink.py, 1,656 bytes):
class BlockLink:
def run(self) -> bool:
try:
self.logger.info("Starting BlockLink processing")
# Add your main logic here
self.logger.info("Processing completed successfully")
return True
# Add your main logic here is the sole content of the method body. Every sampled repository follows this pattern: a class stub, a logging initializer, an argparse entry point, and a test file that instantiates the stub. No functional implementation was found in any sampled file. Repo names follow a [Word][Suffix] pattern; suffixes drawn from a fixed set: Core, Chain, Sync, Vault, Forge, Link.
Engagement Signals
Across all 552 repositories at time of retrieval:
| Signal | Count |
|---|---|
| Stars | 0 |
| Forks | 0 |
| PyPI uploads | 0 |
| CI/CD config files | 0 |
| Open issues | 0 |
| Pull requests | 0 |
These are directly retrievable via the GitHub API. Their absence across 552 repositories is a cluster-level property, not an individual account characteristic.
Generation Artifacts
All 552 repositories contain an embedded HTML comment in the README — invisible on the rendered page — in the format <!-- fallback_NAME_TIMESTAMP_ID -->. The fallback_ label is treated here as an embedded string, not as a confirmed semantic signal about generation pipeline behavior. The LICENSE URL substitution error — documented in the following section — is the more structurally significant generation artifact, as it demonstrates a shared template origin independently of any label interpretation.
A Template Substitution Error Confirms Shared Generation
The LICENSE section of every generated README contains a hardcoded URL using mariwatts as the repository owner regardless of which account's repository it appears in.
From canestein/BlockLink:
See the LICENSE file at https://github.com/mariwatts/BlockLink/blob/main/LICENSE
From lynewinter/YieldChain:
See the LICENSE file at https://github.com/mariwatts/YieldChain/blob/main/LICENSE
The repo name variable was substituted correctly. The account name variable in the LICENSE URL field was not. mariwatts appears to be the base account in the generation template — the value present when the template was authored, not replaced during per-account substitution. Confirmed across multiple accounts. Not present in the mariwatts repositories themselves.
The Pipeline Is Linked to a Specific GitHub Identity via Commit Metadata
Every repository across all 9 accounts contains this co-author trailer:
Co-authored-by: Hajigur <66867581+hajigur69@users.noreply.github.com>
GitHub's authenticated noreply format is NUMERICID+login@users.noreply.github.com. The numeric ID is assigned at account creation and embedded by GitHub's systems when a commit is pushed through an authenticated session. It is not user-configurable.
The GitHub account hajigur69 has internal numeric ID 66867581:
curl -s https://api.github.com/users/hajigur69 | python3 -c \
"import json,sys; u=json.load(sys.stdin); print(u['id'])"
# 66867581
Commits on hajigur69's own repository (Cloud9, created February 2026) carry the same identifier:
Author: Hajigur | 66867581+hajigur69@users.noreply.github.com
Observation
The same authenticated GitHub ID appears in commits across all 9 cluster accounts and in commits authored directly by hajigur69.
Interpretation
This co-author line links the cluster's commit history to a specific authenticated GitHub identity. It is non-identifying structural compatibility with shared control, credential sharing, or credential compromise — the data does not distinguish between them.
hajigur69: GitHub account created June 13, 2020. At time of retrieval: 903 followers, 679 following. Public bio: lamer.
Infrastructure: carox.tech
Two cluster accounts — canestein and lynewinter — use a custom email domain in their git commit author metadata:
canestein → locis@carox.tech
lynewinter → doar@carox.tech
WHOIS and DNS:
Creation Date: 2025-07-19
Updated Date: 2025-08-01
Registrar: Namify Domains Inc
Name Servers: raphaela.ns.cloudflare.com / uriah.ns.cloudflare.com
A record: none
MX: Cloudflare Email Routing (3 records)
TXT: v=spf1 include:_spf.mx.cloudflare.net ~all
No web presence. MX records point to Cloudflare Email Routing — a free forwarding service. Destination inbox not publicly recoverable. Domain predates the cluster provisioning event by approximately 10 months.
A Malformed Co-Author Address
In addition to the hajigur69 trailer, a second co-author line appears across 8 of the 9 accounts:
Co-authored-by: v <v@users.noreply.github.com>
There is a GitHub account with login v (ID: 627846). Its correct noreply address is 627846+v@users.noreply.github.com. The string in these commits is missing the numeric prefix that GitHub's authentication system generates automatically. It cannot be produced by a normal authenticated push.
The most likely explanations: a user.email set manually in a local git config, a placeholder from a development environment not replaced before deployment, or a test identity carried into production. All three produce the same result — consistent across 8 of 9 accounts, set once, never audited. This is not an attribution of the v GitHub account to this operation.
An Earlier Operation: The Same Fingerprints, Nine Months Prior
The 66867581+hajigur69 co-author string and the fallback_ generator artifact do not appear for the first time in May 2026.
GitHub's commit search API returns the same string across thousands of commits from a cluster of 22 accounts in a July–August 2025 window:
2025-07-08..2025-07-14 → 1,738 hits
2025-07-15..2025-07-21 → 701 hits
2025-07-22..2025-07-31 → 949 hits
2025-08-01..2025-08-15 → 7,194 hits
2025-08-16..2025-08-31 → 0 hits ← hard stop
Four accounts from the 2026 follow botnet cluster — canestein, hazexone, domcomit, kylehyne — are present in this earlier commit set. The August 16 cutoff is a directly observable fact. Its cause is not established by this data.
Lyne6666
Lyne6666: created May 3, 2025. 163 public repositories, all with a GitHub API creation timestamp of July 9, 2025, 18:55 UTC.
Observation
LICENSE file SHA across all 163 repositories:
8aa26455d23acf904be3ed9dfb3a3efe3e49245a
Git hashes content. Identical SHA across 163 repositories = identical bytes in every file = single source file, copied without modification.
Repository names follow {Tech}{Testnet}{Function}{Suffix}. Every README install section:
pip install git+https://github.com/Lyne6666/{RepoName}.git
Present across all 163 repositories. No postinstall hook content was confirmed in examined repositories.
uhsr
The Lyne6666 commit author email field: uhsr@eteb.me — a private domain, WHOIS-shielded via Identity Digital. Account uhsr created July 10, 2025 — one day after Lyne6666's mass repository creation timestamp.
At time of retrieval: 237 public repositories, 2,972 followers, 30,778 following.
Observation: Commit Volume
July 2025: 1,382 commits (71% of all-time total at retrieval)
August 2025: 247 commits
September 2025: 21 commits
October 2025: 96 commits
Interpretation
A 71% concentration of all-time commit activity within a single calendar month is statistically atypical for accounts with multi-month histories. It is consistent with a scripted bulk operation rather than incremental development. That is an interpretation; the commit counts are directly retrieved from the API.
The Backdated Commit History
uhsr/AssetMarket contains a .Logs file with ~365 entries spanning January 1–December 31, 2025, format: Logs: YYYY-MM-DD <8charToken>.
Observation
Repository creation date:
curl -s "https://api.github.com/repos/uhsr/AssetMarket" | python3 -c \
"import json,sys; r=json.load(sys.stdin); print(r['created_at'])"
# 2025-08-02T16:29:22Z
Root commit:
SHA: 4f8f47697eb89c8818820ca92348be01c4544878
Message: Logs on 2025-01-01
Author date: 2025-01-01T14:47:47Z
Committer date: 2025-01-01T14:47:47Z
Author email: uhsr@eteb.me
The repository did not exist until August 2, 2025. The root commit carries an author date of January 1, 2025 — 213 days earlier.
Derived Structure
Git stores GIT_AUTHOR_DATE and GIT_COMMITTER_DATE separately. Both are user-configurable before a push. Naive backdating sets only the author date, leaving committer date at the real push timestamp — a detectable mismatch. In this root commit, both fields are set identically. The mismatch that typically exposes backdating is absent.
Interpretation
The presence of a 213-day pre-creation commit, with both date fields aligned to eliminate the typical detection artifact, is consistent with deliberate fabrication of commit history. The .Logs content — uniform daily entries with 8-character tokens across a full calendar year — is consistent with bulk generation rather than organic accumulation. Both are interpretations. The timestamp mismatch between repo creation and root commit author date is a directly observable, verifiable fact.
The Generator Artifact in the 2025 Repositories
Raw README of uhsr/AssetMarket:
<!-- fallback_AssetMarket_20250802163009_95172 -->
Same embedded string format as the 2026 cluster: a fallback_ prefix, repo name, timestamp, and trailing numeric ID. The fallback_ label is treated as an embedded string whose origin is unknown — it may derive from a template engine, a CI scaffold, a repository bootstrap tool, or a custom generation script. The label alone does not establish what tool produced it or what the label means in that tool's context.
What is directly observable: the format fallback_{name}_{timestamp}_{id} appears in both the 2025 uhsr repositories and across all 552 repositories in the 2026 cluster. That structural match is the finding; the label's semantic meaning within any particular system is not established by this analysis.
Two additional repositories:
uhsr/SmartContract → <!-- fallback_SmartContract_20250802162757_83653 -->
uhsr/TokenLab → <!-- fallback_TokenLab_20250802161931_80263 -->
Three artifacts, 38-minute window:
16:19:31 TokenLab ID: 80263
16:27:57 SmartContract ID: 83653
16:30:09 AssetMarket ID: 95172
Observation
The trailing IDs increase non-uniformly — gaps of ~3,390 and ~11,519. On Linux systems, process IDs increment sequentially; irregular gaps are consistent with other processes consuming assignments between runs. This is an interpretation of the pattern, not a definitive conclusion about the execution environment.
The fallback_ prefix and fallback_{name}_{timestamp}_{id} format are identical across both the 2025 and 2026 operations. That is a directly observable structural match.
The Stargazer Overlap
AssetMarket (83 stars), SmartContract (50), DigitalWallet (49) at time of analysis.
Observation
AssetMarket ∩ DigitalWallet ∩ SmartContract = 33 accounts
33 accounts starred all three repositories — 67% of DigitalWallet's total star count from a single overlapping pool. Under a model where starring behavior is distributed independently across repositories with no shared promotion mechanism, the probability of 33 accounts converging on all three repositories with zero followers, zero forks, and no search visibility is not consistent with the observed overlap concentration.
The July 11, 2025 batch — 83 repositories, single day:
★2: 64 repositories (77%)
★1: 19 repositories (23%)
★0: 0 repositories
Zero repositories with zero stars. Uniform two-tier distribution with no variance.
Two accounts from the 33-account pool — SAPH1TE and ahnshy — also appear in stargazer lists for Lyne6666 repositories. The uhsr and Lyne6666 clusters share no observable social graph overlap. These two accounts are the only cross-cluster link found in this data.
Interpretation
What produced the 33-account overlap and the uniform star distribution is not established by this data. The overlap pattern and cross-cluster appearance of two accounts are documented as observations. A coordinated engagement mechanism is one explanation consistent with the data; it is not the only possible explanation.
mohammadtzs
One fork of DigitalWallet exists, made by mohammadtzs. Account created March 2025, 506 public repositories, 100 forks — all from accounts returning 404 at time of retrieval. Fork names included alork1, alork2, alork3, alorki1. mohammadtzs is also present in the 33-account stargazer pool.
Observable: forked a cluster repository, present in the shared stargazer pool, prior forks exclusively from accounts no longer present on the platform.
October 2025: Repository Names
uhsr commit activity: 21 commits in September, 96 in October — concentrated in a 15-minute window, October 20 between 05:04 and 05:19 UTC, across 7 repositories:
awesomepythonTech → name matches vinta/awesome-python (290k+ stars)
freeprogrammingbooksHub → EbookFoundation/free-programming-books (340k+)
publicapisAI → public-apis/public-apis (320k+)
codinginterviewuniversityTools → jwasham/coding-interview-university (310k+)
developerroadmapLab → kamranahmedse/developer-roadmap (300k+)
systemdesignprimerCloud → donnemartin/system-design-primer (280k+)
buildyourownxTools → codecrafters-io/build-your-own-x (330k+)
None contain implementation content. developerroadmapLab description: "enterprise enterprise-grade" — a duplicated token consistent with an unresolved template variable.
Alternative Explanations
Could the hajigur69 co-author identity appear in unrelated operations by coincidence? The GitHub noreply format embeds an immutable, account-specific numeric ID. The same ID (66867581) appearing across thousands of commits in a 2025 cluster and across all 552 repositories in the 2026 cluster deviates significantly from baseline expectations under independent sampling models of follower selection. The identity would have to be reused deliberately, or the same credentials used in both operations.
Could the fallback_ artifact format be from a widely distributed open-source tool? Possible. If the prefix is a convention from a publicly available README generation tool, its presence in both operations indicates both used the same tool — not necessarily the same operator. No such tool was identified in this research. The artifact format is not established as unique to a single actor.
Could the template substitution error (the mariwatts LICENSE URL) appear independently across unrelated generation pipelines? Less likely than the fallback_ case. A shared template variable left unsubstituted in the same field across 552 repositories from 9 accounts is more parsimoniously explained by a single template source than by independent pipelines converging on the same substitution gap. However, if a widely-used generation tool ships with mariwatts hardcoded as a default account in its LICENSE URL template, the error would appear across any pipeline using that tool without modification. That scenario is not established as absent.
Could the four accounts in both operations be coincidentally shared? Four accounts from the 2026 cluster — canestein, hazexone, domcomit, kylehyne — appear in the 2025 commit-farming activity. Their simultaneous presence in two temporally separated operations is statistically implausible under independence assumptions given the scale of GitHub's account population. Whether that overlap reflects shared control is an inference; the factual overlap is directly retrievable.
I am not establishing that a single individual or organization controls both operations. I am documenting that the same authenticated GitHub identity, the same generator artifact format, and four of the same accounts are present in both.
Summary of Confirmed Findings
| Finding | Method |
|---|---|
| Cluster minimum Jaccard 0.9898 vs null baseline 0.000149 (6,642×) | Analytical null model, GitHub API |
| Second-order cosine similarity >0.9999 (precision saturation) across all 36 pairs | Cosine of per-account Jaccard vectors |
hazexone, lynewinter structurally peripheral (mean ≤ 0.9912) |
Within-cluster mean Jaccard |
lynewinter cluster membership supported by 1 confirmed pair; 7 unverified |
Direct pairwise retrieval |
| All 552 repos created May 12, 2026 in a 34-minute window | Embedded HTML comment timestamps |
<!-- fallback_NAME_TIMESTAMP_ID --> in every README |
Direct raw file fetch, all 9 accounts |
mariwatts hardcoded in LICENSE URLs across foreign accounts |
Direct raw file fetch |
66867581+hajigur69 co-author on all cluster commits |
Raw commit data, GitHub API |
66867581+hajigur69 author on hajigur69's own repository |
Raw commit data, GitHub API |
v@users.noreply.github.com lacks numeric ID prefix |
Raw commit data |
locis@carox.tech, doar@carox.tech in commit author fields |
Raw commit data |
carox.tech: no A record, Cloudflare MX, created July 2025 |
WHOIS, DNS |
| All 552 repos: zero stars, forks, CI, issues, PRs | GitHub API |
Same fallback_ format in 2025 uhsr repositories |
Direct raw file fetch |
uhsr/AssetMarket root commit 213 days before repo creation |
GitHub API commit + repo endpoints |
| Root commit SHA, both date fields set to 2025-01-01 | 4f8f47697eb89c8818820ca92348be01c4544878 |
| PID artifacts in 3 README files, same machine, 38-minute window | Direct raw file fetch |
| 33-account pool in all three high-value stargazer lists (67% overlap) | Stargazer API cross-reference |
SAPH1TE, ahnshy in both uhsr and Lyne6666 stargazer lists |
Stargazer API cross-cluster |
canestein, hazexone, domcomit, kylehyne in both 2025 and 2026 operations |
Commit search API, Part 1 data |
| October 2025 repos named after widely-starred repositories | GitHub API, name comparison |
Constraints on the Null Hypothesis
These results constrain the likelihood of independent behavior under standard sampling assumptions.
The first-order Jaccard values (0.9898–0.9998) are 6,642× the analytically expected baseline for independent accounts following 29,800 users from a pool of 100 million. The second-order structure — cosine similarity of Jaccard vectors at precision saturation (>0.9999) across all 36 account pairs — is consistent with accounts whose similarity profiles derive from a near-identical source. Higher-precision computation may reveal internal structure not visible at four decimal places; the current data does not resolve it.
Any competing explanation must jointly account for the following linkage classes:
-
Commit identity — the same authenticated GitHub ID (
66867581+hajigur69) present across all 552 cluster repositories and in that identity's own repository -
Generator artifact — the
fallback_{name}_{timestamp}_{id}format present across both the 2025 and 2026 operations -
Cross-operation account overlap — four accounts (
canestein,hazexone,domcomit,kylehyne) present in both operations
Disclosure
This report has been submitted in full to GitHub Trust & Safety with API-verifiable evidence including the root backdated commit SHA (4f8f47697eb89c8818820ca92348be01c4544878), the generator artifact URLs, the 33-account stargazer overlap, and the complete account list.
All data was retrieved via the GitHub REST API v3 with authenticated requests. No accounts were accessed beyond their public API surface. No systems were compromised.
All account names published here are publicly visible GitHub profiles. This methodology is only verifiable if the data is reproducible.
If you have seen the hajigur69 co-author string or the fallback_ artifact pattern in your own repositories' commit histories — that is the fingerprint documented here. Worth reporting.
All tooling used in this investigation is in BANANA_TREE.

Top comments (0)