Why it's time to ditch UUIDv4 and switch to UUIDv7!

#uuid #database #performance #node

I've been using UUIDv4 as my go-to identifier for database primary keys for quite a long time, moving from sequential integer IDs (auto-increment/SERIAL). UUIDv4 immediately reminds me of the time when we didn't have better alternatives for distributed systems.

Apart from being widely adopted and having massive usage, UUIDv4 has some issues that can be easily fixed with a "more modern" alternative.

I've recently started using UUIDv7 and it does have many advantages in comparison with UUIDv4.

First of all, it's really fast (UUIDv7, with its time-ordered structure, claims to be 2-5x faster for inserts than UUIDv4). Writing records to the database has been a real pleasure. The insert performance and index maintenance are considerably faster.

Second, it naturally sorts by creation time, whereas UUIDv4 doesn't do this at all. With UUIDv4, each time you insert a record, it lands in a random position in your B-tree index, causing page splits and fragmentation. This leads to degraded performance over time. Furthermore, you will still need to add a separate created_at timestamp column if you want to sort records chronologically. In addition, there is always index fragmentation when inserting with UUIDv4 (UUIDv7 appends sequentially to the index, providing better cache locality for index operations).

UUIDv7 (and a few others such as ULID that can also be used as time-ordered identifiers) handle this under the hood. This way, when you are dealing with high-volume inserts and large databases, you won't have any bad surprises like severe performance degradation or bloated indexes, for instance.

For instance, here's how UUIDv7 structures its data:

018c8e8a-9d4e-7890-a123-456789abcdef
└─timestamp─┘ └───random bits────┘

The first 48 bits contain a Unix timestamp in milliseconds, so UUIDs generated over time are naturally sequential. You can also (and very easily) start using UUIDv7 in your existing projects without migrating old UUIDv4 records - both can coexist in the same column.

// Node.js usage
const { v7: uuidv7 } = require('uuid');
const id = uuidv7();
console.log(id); // 018c8e8a-9d4e-7890-a123-456789abcdef

// example database record
{
  "id": "018c8e8a-9d4e-7890-a123-456789abcdef",
  "user_id": "018c8e8a-9d50-7000-c345-6789abcdef01",
  "created_at": "2024-11-08T10:30:00Z"
}

It's also good to mention that UUIDv7 maintains the same 128-bit format as UUIDv4, so it works with all existing UUID columns in your database.

Benchmark

UUIDv4 database inserts: 2,847ms
UUIDv7 database inserts: 2,763ms

Note: My benchmark was run with Node v20.10.0

// Run: node benchmark-uuid-comparison.js

// UUIDv4 benchmark
const { randomUUID } = require("crypto");

console.time("UUIDv4 database inserts");
for (let time = 0; time < 10_000_000; time++) {
  randomUUID();
}
console.timeEnd("UUIDv4 database inserts");

// UUIDv7 benchmark
const { v7: uuidv7 } = require("uuid");

console.time("UUIDv7 database inserts");
for (let time = 0; time < 10_000_000; time++) {
  uuidv7();
}
console.timeEnd("UUIDv7 database inserts");

// note, I've used 10_000_000 with _ which are numeric separators
// https://github.com/pH-7/GoodJsCode#-clearreadable-numbers

The real performance difference shows up in actual database operations where UUIDv7's sequential nature prevents index fragmentation.

Downsides...

The only case where you should still use UUIDv4 is when you explicitly don't want temporal ordering. This occurs either when you're building security tokens, API keys, or session IDs where predictability could be a security concern. The problem is that UUIDv7's timestamp-based structure reveals when the identifier was created. We need something completely unpredictable in these scenarios, such as pure random identifiers that don't leak any information about creation time.

However, for database primary keys and foreign keys, UUIDv7 is the clear winner. Anyway, it's worth trying it in your next project 😉

Now, it will give a significant boost to performance for your database operations, as well as better index efficiency over time, which is the most important at the end of the day, right? 😊

Alternatives

UUIDv6 is another time-based UUID option (still relatively new and not very popular either), that is essentially a fixed version of UUIDv1. It also provides sequential ordering like UUIDv7, but it still includes MAC address information (or random node ID) in its structure, which UUIDv7 avoids entirely for privacy reasons.

When to still use UUIDv4

Lastly (although you might not need this), it's good to mention that UUIDv4 is still perfectly valid for security tokens, API keys, session IDs (where you don't want creation time leakage), or PostgreSQL databases (where heap-based storage reduces the random insertion penalty).