Designing a Transaction ID for a Payment System: What I Learned

Every transaction needs an identity. It is the thing you point to when something goes wrong, when support gets involved, or when a customer asks, “Can you check this?” That is what a transaction ID is. It may look like a small piece of data, but it carries a lot of responsibility.

What sounds simple, just a short identifier, quickly becomes more nuanced when it is customer-facing. It has to be easy to read, easy to share, and still reliable behind the scenes. That is exactly where the real challenge began. The ID had to be short and easy for customers to read and share, yet unique at scale and safe to expose publicly. It needed to work reliably across distributed systems, handle retries without duplication, and avoid confusing or ambiguous characters.

Every choice came with trade-offs. More randomness meant less readability. More structure risked predictability. Shorter length increased collision risk, while longer length hurt usability.

This post walks through the constraints, trade-offs, and the design that ultimately held up.

BOUNDARIES WE NEED TO CONSIDER

Exactly 10 characters
Alphanumeric
Globally unique, forever
Customer-facing - read aloud, typed on phones, screenshotted, dictated to support agents

That last point is what makes this hard. An internal ID can be ugly. A customer-facing ID has to survive the real world.

THE OPTIONS WE CONSIDERED

Sequential counter (1, 2, 3...)
- Predictable
- Higher collision probability than necessary
UUID truncated to 10 chars
- Encoding 64 bits into 10 chars requires lossy truncation
- Time-ordered IDs leak business metrics
Snowflake-style (timestamp + machine ID + sequence)
- Leaks transaction volume to anyone watching
- Predictable + fraud risk
Timestamp + random hybrid
- In a large system (~10K TPS), multiple transactions share the same millisecond
- Time portion eats your character budget
- A 4-character Base62 timestamp overflows in ~5 months
- Still leaks volume
Pure CSPRNG random + DB unique constraint [Our Recommended Solution]

A Cryptographically Secure Pseudo-Random Number Generator (CSPRNG) is an algorithm designed to produce sequences of numbers that are practically indistinguishable from true randomness and, crucially, are unpredictable.

THE FINAL DESIGN

9 random characters from Crockford Base32
1 Damm checksum character
Database UNIQUE constraint as the source of truth
Bounded retry on the rare collision

Why Crockford Base32 (not Base62)?

The alphabet is "0123456789ABCDEFGHJKMNPQRSTVWXYZ" — 32 characters, all uppercase, with I, L, O, U deliberately removed.

Why? Because every customer-facing ID eventually gets:

Read over a phone call
Typed on a small keyboard
Screenshotted and re-typed by someone else
Spoken in Bangla, English, or both

Mixed-case Base62 might give you more entropy per character, but it creates real failure modes:
"Capital K or small k?"
"Was that O or zero?"
"That's a 1, an l, or an I?"

Single-case Crockford Base32 eliminates these conversations entirely.

WHY DAMM CHECKSUM?
Most homemade checksums use weighted sums like sum(i * char_value) mod N. These catch single-character typos but miss adjacent transpositions ("KH" mistyped as "HK") — which is the SECOND most common human error.

Damm checksum, when applied over a 32-symbol quasigroup, catches:

100% of single-character substitutions
100% of adjacent transpositions

For a payment system where customers dictate IDs over the phone, this matters a lot. A miss here means a customer's typo gets accepted as valid and looks up the wrong transaction.

WHY NOT JUST USE EPOCH AS A SEED?
A common temptation: "Let me seed Random() with currentTimeMillis() for extra randomness."

This is a security anti-pattern.

java.util.Random has only 48 bits of state — recoverable from 2 outputs
Epoch time has only ~10 bits of entropy if the attacker knows roughly when
XOR-ing low-entropy sources doesn't create high entropy

SecureRandom already pulls from the OS entropy pool — clock readings, hardware interrupts, RDRAND, the works. Mixed by experts who audit it for a living.

The rule: trust your CSPRNG. Don't try to "improve" it.

THE NUMBERS

Random portion keyspace: 32^9 ≈ 35 trillion

Even at 1 billion transactions issued, the per-insert collision probability is:
10^9 / (3.5 × 10^13) ≈ 0.00003

That's about 1 retry per 35,000 inserts — completely operational.
The DB UNIQUE constraint catches it; the app retries; the customer never knows.

KEY LESSONS

Customer-facing IDs are a UX problem first, an engineering problem second.
Time in the ID is a leak, not a feature. Keep timestamps in a separate column.
A large random keyspace + a DB unique constraint is simpler and safer than any "guarantee uniqueness" algorithm.
The checksum matters more than people think. Use Damm or Verhoeff, not a homemade weighted sum.
SecureRandom is the floor, not the ceiling. Anything less is malpractice for payments.
Keep internal sequence IDs (BIGSERIAL) for ordering and audit. Never expose them to customers.

If you're designing payment infrastructure, financial IDs, or any high-stakes user-facing identifier — happy to discuss in the comments.

What does your team use for transaction ID generation? Any war stories?