Why UUID v4 Is the Safe Default (And When to Break That Rule)

#productivity #tools #webdev

Most developers reach for UUID v4 by default and never think much more about it. That instinct is correct for the vast majority of situations. Understanding exactly why it is correct will help you recognize the minority of situations where a different version makes more sense.

This is not a comprehensive tour of all five UUID versions. It is a focused explanation of why v4 became the default, the specific conditions that make it the wrong choice, and what to use instead when those conditions arise.

Photo by blickpixel on Pixabay

Why v4 Became the Default

UUID v4 generates 128 bits of cryptographically random data, minus the four bits used to encode the version number and the two to three bits used to encode the variant. Everything else is random. There is no timestamp, no machine identifier, no sequence number, and no ordering information embedded in the value.

This makes v4 the right default for three reasons.

No information leakage. The value reveals nothing about when or where it was created. If you expose a v4 UUID in a URL, an API response, a cookie, or a log file, you are leaking nothing about your infrastructure. This is not the case with v1, which embeds the MAC address of the generating machine, or with ULID, which embeds a millisecond-precision creation timestamp.

No coordination required. Because v4 is random, any node in any system can generate a UUID independently without communicating with any central authority. There are no clocks to synchronize, no sequence counters to maintain, and no risk of two machines generating the same ID at the same second. Distributed systems that need to generate identifiers at each node without round-tripping to a centralized ID generator benefit directly from this.

Universal support. The UUID v4 format is defined in RFC 4122 and built into virtually every language standard library, database engine, ORM, and client SDK in use today. When you pass a UUID v4 to a third-party API, a database driver, or a validation library, it is expected to work without custom handling.

The Collision Question

The objection that comes up most often is collision probability. If v4 UUIDs are random, how do you know two systems won't generate the same one?

The answer is probability. A 128-bit identifier space holds 2^128 possible values. After generating 2.7 quintillion UUIDs (roughly 2.7 x 10^18), the probability of a collision reaches 50%. No organization is generating UUIDs anywhere close to that volume. For all practical purposes, collision probability is zero.

The random source matters. Cryptographically random generators produce well-distributed output. Poorly seeded pseudo-random number generators can produce predictable sequences. In modern environments this is not a concern: browsers use crypto.randomUUID() backed by the OS's cryptographic RNG, and Node.js exposes the same interface. The uuid npm package falls back to the system cryptographic RNG in environments that support it.

When v4 Is the Wrong Choice

Two scenarios genuinely call for a different version.

When two systems need to agree on an identifier without communicating. If you have an event sourcing system where an event ID should be derivable from the event's content, or a deduplication pipeline where you need to know whether you have already processed a given record, v4 is the wrong tool. V4 is random by definition, so two systems cannot independently arrive at the same v4 UUID for the same entity.

V5 is the correct choice here. V5 takes a namespace UUID and a name string as inputs and applies SHA-1 to produce a deterministic UUID. The same namespace plus the same name always produces the same v5 UUID, on any machine, at any time.

import { v5 as uuidv5 } from 'uuid';

const MY_NAMESPACE = '6ba7b810-9dad-11d1-80b4-00c04fd430c8'; // DNS namespace
const id = uuidv5('user@example.com', MY_NAMESPACE);
// Always returns the same UUID for this email in this namespace

When database insert performance at scale is a confirmed problem. V4 UUIDs are random, which means each new row is inserted at a random position in the database's B-tree primary key index. For write-heavy tables at very high volume, this causes more frequent index page splits and higher fragmentation than sequential integer primary keys.

If you have benchmarked your specific workload and confirmed this is causing measurable latency, you have several options: switch to a sortable identifier format like ULID (which encodes a timestamp prefix), use an integer primary key internally and expose UUID v4 at the API layer, or use UUID v7 if your database supports it.

Before making this change, verify the problem exists in your actual workload. Many teams who worry about this never observe it in practice because their write rates are within comfortable bounds.

Photo by HamZa NOUASRIA on Pexels

A Decision Framework

Here is a simple way to think about version selection:

Is the ID random and non-deterministic? Start with v4.
Does the same entity need the same ID across independent systems? Use v5 with a stable namespace.
Do you need natural sort order and have confirmed performance problems with v4? Evaluate ULID or UUID v7.
Do you need creation-time ordering and the IDs will never appear publicly? V1 is an option but weigh the MAC address exposure.

For most applications, the answer is v4 everywhere. The cases where you genuinely need v1, v5, or a sortable format are specific enough that you will know them when you encounter them.

The Version Selection Anti-Pattern to Avoid

The mistake most teams make is not choosing the wrong version initially. It is choosing different versions for different parts of the codebase without documenting why. When you audit the codebase six months later, you find user IDs are v4, order IDs are v1, and content IDs are v5, with no explanation anywhere for why each was chosen.

This creates implicit dependencies that are hard to refactor. Foreign key columns have version expectations built in. API consumers may have written validation logic that checks the version digit. Changing versions mid-product requires either a migration or accepting mixed data in the same column.

The fix is straightforward and costs almost nothing upfront: document the chosen UUID version next to the code that generates each identifier type. A short comment above the ID generation function or in the data model file is enough. This documentation is the first thing anyone reads when they encounter unexpected behavior involving UUIDs, and it prevents the subtle bugs that come from mixing versions unknowingly.

When to Audit Your Current UUID Usage

If you are joining a project that is already in production and has not documented its UUID usage, a brief audit is worth doing. The quickest approach: search for UUID generation calls, note the version used, and check whether foreign keys that reference those IDs are declared with compatible types. Look for any places where UUID strings are stored in VARCHAR columns instead of native UUID types, which is a common source of unexpected behavior in queries and joins.

The audit does not need to result in a migration. More often it results in documentation that makes the current state explicit, which is enough to prevent future consistency problems.

To generate or validate UUIDs without setting up a project, the free UUID generator by EvvyTools produces v4 and v5 UUIDs, validates pasted values, and reports the version and variant information. The detailed guide on UUID versions and use cases covers each version with code examples across JavaScript, Python, and SQL.