I built HardGuard25 because I got fed up at YouTube in 2014, and it took me 12 years to finally ship this properly

#showdev #opensource #a11y #python

Back in 2014 I was working at YouTube and kept running into the same maddening problem: IDs with characters that looked the same. I was focused on standing up their first external facing LMS for YouTube Certified Online and the unique identifiers for training courses, but I noticed how everyone around me was constantly messing up the YouTube video IDs themselves. Misreading an O as a 0, squinting at I and l and 1, and doing doubletakes on B and 8. For the dylexics, it was hard to mind the p's and q's with the d's and b's. Support folks dictating codes over the phone, swapping an S for a 5. Printed labels that were genuinely ambiguous depending on the font.

Now I'm not dyslexic. But my brother is, and I have a dear friend who is legally blind, and family members with limitations who I help navigate technology every day. When I was working out which characters to cut, I wasn't optimizing for a theoretical average user, I was thinking about the specific people in my life who get failed by design decisions that nobody bothered to question. The mirror pairs aren't just a "nice to have" removal. For a meaningful portion of your users, they're the difference between an ID that works and one that doesn't.

I looked around for a standard. There wasn't one people were actually using. So I quietly worked out which characters to cut and started using the resulting set for everything I built.

In 2019 I finally wrote it up. I've shared it on Medium, LinkedIn, and wherever else I could get people to look at it since then. The reaction was always the same: "Oh, that's obvious. Why isn't this the default?"

Today I'm sharing it as an actual installable library for the first time.

HardGuard25 is a 25-character alphanumeric set for human-friendly unique IDs.

0 1 2 3 4 5 6 7 8 9 A C D F G H J K M N P R U W Y

Why 11, not 4

Crockford Base32 has been around since 2002. It removes I, L, O, and U — four characters. It's the most common "unambiguous" encoding people reach for.

Four wasn't enough.

HardGuard25 removes every character that creates visual ambiguity, for any reader, at any size, in any common typeface. Here's the full removal list:

🚫 Digit lookalikes: O (matches 0), I (matches 1), S (matches 5), Z (matches 2), B (matches 8). Crockford gets I, L, O, U. HardGuard25 also removes S, Z, and B.

🪞 Dyslexia mirror pairs: d/b, q/p, 3/E. Dyslexic readers reverse these reliably. No reason to include both sides of a pair.

⚙️ Operator and context lookalikes: V (mirrors U in some fonts), T (looks like +), X (multiplication symbol, also used as a redaction placeholder). These cause parsing confusion in spreadsheets, URLs, and printed labels.

The tradeoff: HardGuard25 codes run 1–2 characters longer than Crockford Base32 for the same entropy. If your IDs are machine-read 99% of the time, Crockford is fine. If a human ever has to read, type, speak, or transcribe your ID, every one of those 11 characters is a support ticket waiting to happen.

Quickstart

JavaScript

npm install @snapsynapse/hardguard25

import { generate, validate, checkDigit } from '@snapsynapse/hardguard25';

generate(8);                        // "AC3H7PUW"
generate(8, { checkDigit: true });  // "AC3H7PUWR"
validate("AC3H-7PUW");              // true
checkDigit("AC3H7PUW");             // "R"

Python

pip install hardguard25

from hardguard25 import generate, validate, check_digit

generate(8)                    # "AC3H7PUW"
generate(8, check_digit=True)  # "AC3H7PUWR"
validate("AC3H-7PUW")          # True
check_digit("AC3H7PUW")        # "R"

import "github.com/snapsynapse/hardguard25/go"

id, _ := hardguard25.Generate(8)           // "AC3H7PUW"
id, _ = hardguard25.GenerateWithCheck(8)   // "AC3H7PUWR"
ok := hardguard25.Validate("AC3H-7PUW")   // true
ch, _ := hardguard25.CheckDigit("AC3H7PUW") // 'R'

How long do IDs need to be?

Length	Unique IDs	Good for
4	390,625	Small inventory, event tickets
6	244 million	Medium business scale
8	152 billion	Large systems
16	3.55 × 10²²	Cross-system identifiers
20	2.11 × 10²⁷	Public tokens

Rule of thumb: 4–5 characters for small business, 8+ for large systems, 16–22 for tokens and cross-org use.

When not to use it

HardGuard25 is not a general-purpose encoding scheme. Skip it for:

Cryptographic keys — use proper key derivation
Blockchain consensus — use domain-specific formats
Systems requiring UUID guarantees — use UUIDv7 or ULID

It's for IDs that humans interact with. Full stop.

Try it

Interactive generator: hardguard25.com

Full spec in SPEC.md — covers entropy math, collision guidance, normalization rules, check digit algorithm, test vectors, and accessibility notes.

Got issues with this? Please raise them to the Github project!

Do you use a custom character set for your IDs, or do you default to UUID/ULID? Curious how many people are still hitting the O/0 problem in production 👇