If your platform ranks accounts by how much they post, you have built a Sybil farm with extra steps.
That sounds like a hot take. It is also a measured fact. We just ran a discrimination test on synthetic populations of 50 genuine identities and 50 Sybils, scoring each pubkey four different ways and computing the rank-based AUC for each scoring regime. The result is a clean inversion: the metric every social platform uses to surface "active" accounts is the worst possible separator between humans and bot farms in the test.
This post walks through the numbers, explains the inversion, and shows what scoring regime survives.
The four regimes we tested
Each pubkey in the synthetic populations gets scored four ways:
- Multi-dim depth: sum across four orthogonal dimensions (social engagement, spatial activity, NIP-13 PoW work, inbound vouches), with a "no single dim dominates" structural constraint.
- Social only: just the social dimension. Bidirectional engagement with deep peers, replies, mentions.
- Follower count: distinct accounts the user has p-tagged. The closest Nostr equivalent of "followers."
- Post volume: raw event count.
Genuine identities were drawn from four archetypes (active social user, builder with heavy PoW + modest social, lurker with strong vouch network, balanced moderate user). Sybils were drawn from five grinder strategies (volume grinder, follower gamer, PoW farm, reaction bot, spatial spammer). All synthetic and deterministic, reproducible against the open-source @powforge/identity scoring formula.
We measured AUC (probability that a random genuine ranks above a random Sybil under that regime) and FPR at TPR 90% (at the threshold admitting 90% of genuine identities, what fraction of Sybils sneak through).
The killer stat
| Score regime | AUC | FPR at TPR 90% | Verdict |
|---|---|---|---|
| Multi-dim (4 dims) | 1.000 | 0% | perfect rank separation |
| Single-dim: follower count | 0.645 | 40% | weak, barely above chance |
| Single-dim: social only | 0.531 | 60% | no signal |
| Single-dim: post volume | 0.095 | 98% | INVERTED, actively rewards Sybils |
Look at that bottom row.
AUC 0.095 means that if you sort the population by post volume descending and pick the top accounts, you are 90.5% likely to pick a Sybil over a genuine identity. Volume is not a noisy signal of legitimacy. Volume is a noisy signal of illegitimacy, and the noise is small.
If you set your threshold at "admit the 90% most active accounts," you let through 98% of the Sybils. That is barely better than no filter at all.
Both numbers reproduced across two independent stochastic runs. The result is robust.
Why post volume inverts
The Sybil archetypes by design include a "volume grinder": bot accounts that hammer out 5,000 short notes with 2-8 distinct peers. They look like extremely active accounts. They are extremely active accounts.
The genuine archetypes include a lurker with strong vouch ties and a builder who spends most cycles writing code, not posts. Real humans post a lot less than spam bots, because real humans have other things to do.
In the distribution numbers:
Post volume
Genuine median 131
Genuine p90 254
Sybil median 606 (5x genuine median)
Sybil p90 3,670
The Sybil median is 5x the genuine median. The Sybil p90 is 14x the genuine median. Volume is not even close to a separator. The populations are inverted on it.
Single-dim social only doesn't help either. Reaction bots and follower gamers can fake "social activity" cheaply: 800-3000 reactions on a recurring loop, 200-800 outbound replies to nobody who replies back. The social-only AUC of 0.531 is statistically indistinguishable from random.
Follower count fares slightly better at 0.645 because the more sophisticated grinder strategies stop short of 1000+ follower lists, but it is still the cheapest grind to fake. Sock-puppet rings cross-follow each other for free.
What survives
Multi-dim depth at AUC 1.000 means: on this test, with the v0.7.2 scoring formula, there is no overlap between the genuine and Sybil distributions. Every genuine identity outranks every Sybil. FPR at TPR 90% is 0%. The threshold that admits 90% of genuine identities admits 0 of 50 Sybils.
The structural reason: faking one dimension is cheap; faking four dimensions simultaneously is expensive.
- Volume grinder maxes the post-count dimension but has no inbound vouches and no zaps.
- Follower gamer maxes follower count but has no NIP-13 PoW work invested.
- PoW farm maxes the access dimension but has no real social ties (and after the v0.7.2 log2-scaling fix, can't dominate the score with linear PoW alone).
- Reaction bot looks engaged but has no bidirectional peer relationships.
- Spatial spammer floods one coordinate region with no other dimension activity.
Every grinder strategy spikes one dimension. Genuine identities spread across multiple dimensions because real humans accumulate signal organically across years.
The "no single dim dominates" constraint, implemented in the multi-dim aggregator and not in any individual dim, closes the loop. A flat profile across 4 dimensions beats a sharp spike in one. That shape constraint is what produces AUC 1.000.
What this means for spam detection
The takeaway is not "use multi-dim instead of post volume." The takeaway is harder than that.
Every social platform that surfaces "trending" or "active" accounts using engagement-count metrics is recommending Sybil farms to its users. HN sorts by upvote velocity; Reddit by score and comments; Twitter by reply count; Nostr clients by zap count. Any single-dim metric is grindable, and the cheapest grinds (volume, reactions, cross-following) actively invert against legitimacy on a benchmark Sybil population.
The fix is not a better single-dim metric. There is no better single-dim metric. The fix is composition: make the score depend on multiple dimensions of irreversible work, derived from data the user does not control (peer reactions, real Lightning zaps from funded wallets, NIP-13 PoW bits committed in events, vouches from already-deep identities).
We packaged this scoring regime as @powforge/identity (npm). It is open source, deterministic, derivable from any caller's read of public Nostr history. No allowlist, no KYC, no central scoring server. Pull it in your ranker, your news feed, your governance vote tally, and the Sybil farms get heavily discounted.
Reproduce it yourself
The scoring formula lives in the public @powforge/identity npm package. The synthetic-population generator is a few hundred lines: pick the five grinder strategies, run them through the same scoring engine, sort by score, compute rank-based AUC. Deterministic-modulo-RNG, no database dependency, runs in under 2 seconds on a laptop.
If you want the exact archetype distributions, the stability re-run, and the failure mode that capped earlier versions of this test at AUC 0.800 (a PoW-farm hijack closed by the v0.7.2 log2 scaling), reach out and I'll share the full results dump.
Caveat
AUC 1.000 on a synthetic test is not a claim that real-world adversaries with adaptive strategies are perfectly distinguishable. It says: against the five canonical grinder strategies modeled here, multi-dim with v0.7.2 scoring leaves no overlap. Adversaries will design new strategies that probe the score surface; the next test (real-relay validation against hand-curated Nostr identities) is the harder one.
Cite the number with the synthetic-population qualifier. It is honest evidence that single-dim metrics fail at the population level and that a structural multi-dim regime can close the gap on the strategies we know how to model.
But the inversion on post volume, AUC 0.095, is the headline. If you take one thing away from this post, take that. The metric every platform uses to surface activity is the metric that most reliably surfaces bot farms. Stop using it.
Open source library: npm install @powforge/identity (powforge.dev/explorer)
Whitepaper context: powforge.dev/whitepaper
Top comments (0)