160 addressable Fediverse instances, 28 made it through as brand safe.
The initial harvest pool only stored registration flags (open, approval, captcha). It never filtered for brand safety, so a naive provision fedi run would have hit harassment sites, extremist propaganda, non English regional servers, and niche adult communities. A dry run confirmed the violation.
The first line of defense lives in tools/vet_candidates.py. For each addressable candidate we pull live instance metadata: declared languages, description, and reachable status. The gate then applies three criteria.
A curated toxic domain denylist. An English language requirement that accepts instances declaring English or whose language mix shows a low non Latin script ratio, which catches Japanese, Chinese, and Russian on neutral TLDs. A niche adult keyword and TLD exclusion. Any uncertainty, such as unreachable host, opaque metadata, or unknown TLD, causes an immediate fail. The result is written back into fedi_candidates.json as brand_safe, lang, and vet_reason fields.
Automation alone left 43 survivors. tools/vet_overrides.py encodes the Opus review of those survivors. The review demoted 15 instances that the rule set could not decide on: Latin script Italian, Polish, German, Portuguese, French, and Danish are indistinguishable from English by the simple rule, plus a yaoi art server and a hypnosis kink server. The script aborts if any unreviewed survivor appears, forcing a manual decision.
After enrichment the pool shrank from 160 addressable records to 28 brand safe instances. 21 Mastodon, 6 Mbin, 1 Misskey. The coordinator now selects targets only if brand_safe is true. The provision fedi step can never pick an unvetted instance. Two new tests guard this contract.
We also had to remediate the live state. Two instances, organica social (Brazilian Portuguese) and expressional social (Danish regional), were enabled before the gate existed. Both fail the new criteria, so we disabled them while retaining their tokens for a reversible rollback.
The test registry was stale; it reported nine enabled instances while the on disk state listed a different set. We reconciled the registry to match the true state.
Tradeoffs are clear. The language rule throws out many legitimate non English servers that happen to use Latin script. Manual overrides add operational overhead and introduce human error. The denylist must be kept up to date, or we risk both false positives and false negatives. Latency increased because each candidate requires a live metadata fetch.
If I could redo this, I would replace the binary language rule with a probabilistic model that scores English likelihood based on content snippets. I would also maintain a separate whitelist for known high value regional servers, allowing them to pass the gate after a lightweight review. Finally, I would stage the gate rollout, monitoring false positive rates before hard blocking any instance.
Top comments (0)