Ban Duong

Posted on Feb 22

The Evolution of UUIDs: How v7 Improves Database Efficiency

#uuid #database #programming

What is a UUID?
UUID Format and Generation Methods
Comparing UUID v1, v4, and v7
How UUID v4 and UUID v7 Affect Databases
- Understanding B-Tree Indexing
- Page Splits in B-Trees
Why UUID v4 Causes Fragmentation and Cache Inefficiency
- Example of UUID v4 Inserts
- How Random Writes Lead to Cache Inefficiency
Why UUID v7 is Better for Databases
- Example of UUID v7 Inserts
- Comparison of UUID v4 vs UUID v7 in Databases
Conclusion
- TL;DR
- Long Version

1. What is a UUID?

A UUID (Universally Unique Identifier) is a 128-bit unique identifier used in computing and data management. It ensures global uniqueness without requiring a central authority, making it ideal for distributed systems.

UUIDs are commonly used in:

Databases (as primary keys to uniquely identify records).
Distributed systems (ensuring uniqueness across multiple nodes).
Session tracking (assigning unique session IDs for users).
Transaction identifiers (maintaining consistency in financial systems).

2. UUID Format and Generation Methods

A UUID is represented as a 36-character string with a standardized format:

xxxxxxxx-xxxx-Mxxx-Nxxx-xxxxxxxxxxxx

M → UUID version (1, 4, 7, etc.).
N → Variant (typically 8, 9, A, or B).
Other parts contain timestamps, random values, or hashed data, depending on the UUID version.

How UUIDs Are Generated

UUIDs can be created using different methods:

Time-based (e.g., UUID v1, UUID v7) → Uses timestamps, sometimes combined with MAC addresses or randomness.
Random (e.g., UUID v4) → Generated using a random number generator.
Hash-based (e.g., UUID v3, UUID v5) → Generated using hashes of fixed input values.

3. Comparing UUID v1, v4, and v7

Feature	UUID v1 (Timestamp + MAC)	UUID v4 (Random)	UUID v7 (Timestamp + Random)
Generation Method	Time-based + MAC address	Fully random	Time-based + Random
Uniqueness	High (MAC ensures uniqueness)	High (randomized)	High (timestamp + random)
Sortability	❌ Mostly ordered, but has imperfections	❌ Not sortable	✅ Fully sequential
Database Performance	❌ Causes minor fragmentation due to timestamp format	❌ Random inserts (poor indexing)	✅ Optimized for indexing
Privacy	❌ MAC address is exposed	✅ Secure	✅ Secure
Use Case	Legacy systems	General purpose, distributed systems	Databases, logs, event tracking

4. How UUID v4 and UUID v7 Affect Databases

1. Understanding B-Tree Indexing

Most relational databases (like MySQL InnoDB, PostgreSQL, and SQLite) use B-Tree indexes to organize primary keys efficiently.

How B-Trees Work

A B-Tree is a self-balancing tree structure where nodes store multiple sorted keys.
When searching for a key (like a UUID), the database traverses the tree from root to leaf.
Since data is sorted, operations like searching, inserting, and deleting run in O(log n) time complexity.

2. Page Splits in B-Trees

Each node (page) has a fixed size (e.g., 16KB in MySQL InnoDB).
When a new key is inserted in sorted order, it usually fits into an existing page.
But if the page is full, the database splits it into two new pages, increasing fragmentation.

5. Why UUID v4 Causes Fragmentation and Cache Inefficiency

UUID v4 is fully random, meaning:

New inserts land anywhere in the index, not in a predictable order.
The database must modify different pages, causing frequent page splits and fragmentation.
Queries on recent records require loading multiple scattered pages, making caching inefficient.

Example of UUID v4 Inserts

Imagine a B-Tree index with 4 pages, each storing sorted UUIDs:

Page 1: [ UUID1 | UUID5 | UUID8 | UUID12 ]  
Page 2: [ UUID15 | UUID18 | UUID22 | UUID26 ]  
Page 3: [ UUID30 | UUID35 | UUID40 | UUID45 ]  
Page 4: [ UUID50 | UUID55 | UUID60 | UUID65 ]

Now, inserting a random UUID v4 (UUID33):

It lands between UUID30 and UUID35 in Page 3.
If Page 3 is full, the database splits it into two pages.
More inserts increase fragmentation, making reads slower.

How Random Writes Lead to Cache Inefficiency

Databases store frequently accessed pages in memory (buffer pool).
But since UUID v4 spreads inserts randomly, each query loads different pages, constantly evicting older pages.
This leads to more disk I/O and poorer cache performance.

6. Why UUID v7 is Better for Databases

UUID v7 solves the fragmentation issue because:

The first 48 bits are a timestamp, making it sequentially ordered.
The last 80 bits are random, ensuring uniqueness.
New inserts always go to the last page in a B-Tree index, avoiding fragmentation.

Example of UUID v7 Inserts

Page 1: [ UUID1 | UUID5 | UUID8 | UUID12 ]  
Page 2: [ UUID15 | UUID18 | UUID22 | UUID26 ]  
Page 3: [ UUID30 | UUID35 | UUID40 | UUID45 ]  
Page 4: [ UUID50 | UUID55 | UUID60 | UUID65 ]  
Page 5: [ UUID70 | UUID75 | UUID80 | UUID85 ]  ← New inserts go here

Since new entries always append to the last page:

Fewer page splits → Less fragmentation → Faster inserts.
Queries on recent data remain in memory → Better cache performance.

Comparison of UUID v4 vs UUID v7 in Databases

Factor	UUID v4 (Random Inserts)	UUID v7 (Sequential Inserts)
Write Pattern	Inserts anywhere, causing fragmentation	Writes append to the last page
Page Splits	❌ Frequent, due to random insert locations	✅ Rare, only when last page is full
Cache Efficiency	❌ Poor – random pages evicted frequently	✅ High – recent data stays in memory
Query Performance	❌ Slow – queries require loading multiple pages from disk	✅ Fast – queries read from cached pages

7. Conclusion

TL;DR

Why UUID v4 is inefficient:

Random inserts cause index fragmentation and frequent page splits.
Sequential queries suffer, requiring multiple page fetches.
Cache performance drops, as pages are constantly replaced.

Why UUID v7 is better:

Sorted inserts lead to fewer page splits.
Faster sequential reads, as queries load a single page.
Recent data remains in memory, improving database performance.

Long Version

Why UUID v4 is Inefficient

1. Random Inserts Cause Index Fragmentation

UUIDv4 values are completely random, meaning new entries can be inserted anywhere in the B-Tree index. This leads to data fragmentation, as records are scattered across different pages instead of being grouped together.

2. Frequent Page Splits Increase Write Overhead

When a UUID v4 is inserted into a full page, the database must split the page to make room for new values. Since inserts happen randomly, page splits occur more frequently, increasing the database's workload and reducing efficiency.

3. Poor Sequential Read Performance

Because UUID v4 values are unordered, reading a range of UUIDs requires fetching data from multiple non-contiguous pages. This results in:

More disk I/O, slowing down queries.
Inefficient caching, as different pages are loaded into memory instead of reusing recently accessed ones.

Why UUID v7 is Better

1. Sequential Inserts Improve Write Performance

UUID v7 is sorted by timestamp, meaning new values are always appended to the latest page. This results in:

Fewer random writes, reducing fragmentation.
Minimal page splits, since new entries naturally go to the end of the index.

2. Faster Sequential Reads

Since UUID v7 values are stored in increasing order, sequential queries can retrieve data from a single page or adjacent pages, making range scans much faster.

3. Better Cache Efficiency

With UUID v7, recent data remains in memory longer because:

New records are appended rather than scattered.
Queries accessing recent entries will likely hit cached pages, reducing disk reads and improving performance.

DEV Community