Every transaction needs an identity. It is the thing you point to when something goes wrong, when support gets involved, or when a customer asks, “Can you check this?” That is what a transaction ID is. It may look like a small piece of data, but it carries a lot of responsibility.
What sounds simple, just a short identifier, quickly becomes more nuanced when it is customer-facing. It has to be easy to read, easy to share, and still reliable behind the scenes. That is exactly where the real challenge began. The ID had to be short and easy for customers to read and share, yet unique at scale and safe to expose publicly. It needed to work reliably across distributed systems, handle retries without duplication, and avoid confusing or ambiguous characters.
Every choice came with trade-offs. More randomness meant less readability. More structure risked predictability. Shorter length increased collision risk, while longer length hurt usability.
This post walks through the constraints, trade-offs, and the design that ultimately held up.
BOUNDARIES WE NEED TO CONSIDER
- Exactly 10 characters
- Alphanumeric
- Globally unique, forever
- Customer-facing - read aloud, typed on phones, screenshotted, dictated to support agents
That last point is what makes this hard. An internal ID can be ugly. A customer-facing ID has to survive the real world.
THE OPTIONS WE CONSIDERED
- Sequential counter (1, 2, 3...)
- Wastes most of UUID's randomness
- Higher collision probability than necessary
- UUID truncated to 10 chars
- Encoding 64 bits into 10 chars requires lossy truncation
- Time-ordered IDs leak business metrics
- Snowflake-style (timestamp + machine ID + sequence)
- Leaks transaction volume to anyone watching
- Predictable + fraud risk
- Timestamp + random hybrid
- In a large system (~10K TPS), multiple transactions share the same millisecond
- Time portion eats your character budget
- A 4-character Base62 timestamp overflows in ~5 months
- Still leaks volume
- Pure CSPRNG random + DB unique constraint [Our Recommended Solution]
A Cryptographically Secure Pseudo-Random Number Generator (CSPRNG) is an algorithm designed to produce sequences of numbers that are practically indistinguishable from true randomness and, crucially, are unpredictable.
THE FINAL DESIGN
- 9 random characters from Crockford Base32
- 1 Damm checksum character
- Database UNIQUE constraint as the source of truth
- Bounded retry on the rare collision
Why Crockford Base32 (not Base64)?
The alphabet is "0123456789ABCDEFGHJKMNPQRSTVWXYZ" — 32 characters, all uppercase, with I, L, O, U deliberately removed.
Why? Because every customer-facing ID eventually gets:
- Read over a phone call
- Typed on a small keyboard
- Screenshotted and re-typed by someone else
- Spoken in Bangla, English, or both
Mixed-case Base62 might give you more entropy per character, but it creates real failure modes:
"Capital K or small k?"
"Was that O or zero?"
"That's a 1, an l, or an I?"
Single-case Crockford Base32 eliminates these conversations entirely.
WHY DAMM CHECKSUM?
Most homemade checksums use weighted sums like sum(i * char_value) mod N. These catch single-character typos but miss adjacent transpositions ("KH" mistyped as "HK") — which is the SECOND most common human error.
Damm checksum, when applied over a 32-symbol quasigroup, catches:
- 100% of single-character substitutions
- 100% of adjacent transpositions
For a payment system where customers dictate IDs over the phone, this matters a lot. A miss here means a customer's typo gets accepted as valid and looks up the wrong transaction.
WHY NOT JUST USE EPOCH AS A SEED?
A common temptation: "Let me seed Random() with currentTimeMillis() for extra randomness."
This is a security anti-pattern.
- java.util.Random has only 48 bits of state — recoverable from 2 outputs
- Epoch time has only ~10 bits of entropy if the attacker knows roughly when
- XOR-ing low-entropy sources doesn't create high entropy
SecureRandom already pulls from the OS entropy pool — clock readings, hardware interrupts, RDRAND, the works. Mixed by experts who audit it for a living.
The rule: trust your CSPRNG. Don't try to "improve" it.
THE NUMBERS
Random portion keyspace: 32^9 ≈ 35 trillion
Even at 1 billion transactions issued, the per-insert collision probability is:
10^9 / (3.5 × 10^13) ≈ 0.00003
That's about 1 retry per 35,000 inserts — completely operational.
The DB UNIQUE constraint catches it; the app retries; the customer never knows.
KEY LESSONS
- Customer-facing IDs are a UX problem first, an engineering problem second.
- Time in the ID is a leak, not a feature. Keep timestamps in a separate column.
- A large random keyspace + a DB unique constraint is simpler and safer than any "guarantee uniqueness" algorithm.
- The checksum matters more than people think. Use Damm or Verhoeff, not a homemade weighted sum.
- SecureRandom is the floor, not the ceiling. Anything less is malpractice for payments.
- Keep internal sequence IDs (BIGSERIAL) for ordering and audit. Never expose them to customers.
If you're designing payment infrastructure, financial IDs, or any high-stakes user-facing identifier — happy to discuss in the comments.
What does your team use for transaction ID generation? Any war stories?
Top comments (0)