DEV Community

Cover image for The Evolution of UUIDs: How v7 Improves Database Efficiency
Ban Duong
Ban Duong

Posted on

The Evolution of UUIDs: How v7 Improves Database Efficiency

Table of Contents

  1. What is a UUID?
  2. UUID Format and Generation Methods
  3. Comparing UUID v1, v4, and v7
  4. How UUID v4 and UUID v7 Affect Databases
  5. Why UUID v4 Causes Fragmentation and Cache Inefficiency
  6. Why UUID v7 is Better for Databases
  7. Conclusion

1. What is a UUID?

A UUID (Universally Unique Identifier) is a 128-bit unique identifier used in computing and data management. It ensures global uniqueness without requiring a central authority, making it ideal for distributed systems.

UUIDs are commonly used in:

  • Databases (as primary keys to uniquely identify records).
  • Distributed systems (ensuring uniqueness across multiple nodes).
  • Session tracking (assigning unique session IDs for users).
  • Transaction identifiers (maintaining consistency in financial systems).

2. UUID Format and Generation Methods

UUID v1
A UUID is represented as a 36-character string with a standardized format:

xxxxxxxx-xxxx-Mxxx-Nxxx-xxxxxxxxxxxx
Enter fullscreen mode Exit fullscreen mode
  • M → UUID version (1, 4, 7, etc.).
  • N → Variant (typically 8, 9, A, or B).
  • Other parts contain timestamps, random values, or hashed data, depending on the UUID version.

How UUIDs Are Generated

UUIDs can be created using different methods:

  • Time-based (e.g., UUID v1, UUID v7) → Uses timestamps, sometimes combined with MAC addresses or randomness.
  • Random (e.g., UUID v4) → Generated using a random number generator.
  • Hash-based (e.g., UUID v3, UUID v5) → Generated using hashes of fixed input values.

3. Comparing UUID v1, v4, and v7

Feature UUID v1 (Timestamp + MAC) UUID v4 (Random) UUID v7 (Timestamp + Random)
Generation Method Time-based + MAC address Fully random Time-based + Random
Uniqueness High (MAC ensures uniqueness) High (randomized) High (timestamp + random)
Sortability ❌ Mostly ordered, but has imperfections ❌ Not sortable ✅ Fully sequential
Database Performance ❌ Causes minor fragmentation due to timestamp format ❌ Random inserts (poor indexing) ✅ Optimized for indexing
Privacy ❌ MAC address is exposed ✅ Secure ✅ Secure
Use Case Legacy systems General purpose, distributed systems Databases, logs, event tracking

4. How UUID v4 and UUID v7 Affect Databases

1. Understanding B-Tree Indexing

marita-kavelashvili-ugnrXk1129g-unsplash
Most relational databases (like MySQL InnoDB, PostgreSQL, and SQLite) use B-Tree indexes to organize primary keys efficiently.

How B-Trees Work

  • A B-Tree is a self-balancing tree structure where nodes store multiple sorted keys.
  • When searching for a key (like a UUID), the database traverses the tree from root to leaf.
  • Since data is sorted, operations like searching, inserting, and deleting run in O(log n) time complexity.

2. Page Splits in B-Trees

B-Tree

  • Each node (page) has a fixed size (e.g., 16KB in MySQL InnoDB).
  • When a new key is inserted in sorted order, it usually fits into an existing page.
  • But if the page is full, the database splits it into two new pages, increasing fragmentation.

5. Why UUID v4 Causes Fragmentation and Cache Inefficiency

UUID v4
UUID v4 is fully random, meaning:

  • New inserts land anywhere in the index, not in a predictable order.
  • The database must modify different pages, causing frequent page splits and fragmentation.
  • Queries on recent records require loading multiple scattered pages, making caching inefficient.

Example of UUID v4 Inserts

Imagine a B-Tree index with 4 pages, each storing sorted UUIDs:

Page 1: [ UUID1 | UUID5 | UUID8 | UUID12 ]  
Page 2: [ UUID15 | UUID18 | UUID22 | UUID26 ]  
Page 3: [ UUID30 | UUID35 | UUID40 | UUID45 ]  
Page 4: [ UUID50 | UUID55 | UUID60 | UUID65 ]  
Enter fullscreen mode Exit fullscreen mode

Now, inserting a random UUID v4 (UUID33):

  • It lands between UUID30 and UUID35 in Page 3.
  • If Page 3 is full, the database splits it into two pages.
  • More inserts increase fragmentation, making reads slower.

How Random Writes Lead to Cache Inefficiency

  • Databases store frequently accessed pages in memory (buffer pool).
  • But since UUID v4 spreads inserts randomly, each query loads different pages, constantly evicting older pages.
  • This leads to more disk I/O and poorer cache performance.

6. Why UUID v7 is Better for Databases

UUID v7
UUID v7 solves the fragmentation issue because:

  • The first 48 bits are a timestamp, making it sequentially ordered.
  • The last 80 bits are random, ensuring uniqueness.
  • New inserts always go to the last page in a B-Tree index, avoiding fragmentation.

Example of UUID v7 Inserts

Page 1: [ UUID1 | UUID5 | UUID8 | UUID12 ]  
Page 2: [ UUID15 | UUID18 | UUID22 | UUID26 ]  
Page 3: [ UUID30 | UUID35 | UUID40 | UUID45 ]  
Page 4: [ UUID50 | UUID55 | UUID60 | UUID65 ]  
Page 5: [ UUID70 | UUID75 | UUID80 | UUID85 ]  ← New inserts go here
Enter fullscreen mode Exit fullscreen mode

Since new entries always append to the last page:

  • Fewer page splits → Less fragmentation → Faster inserts.
  • Queries on recent data remain in memory → Better cache performance.

Comparison of UUID v4 vs UUID v7 in Databases

Factor UUID v4 (Random Inserts) UUID v7 (Sequential Inserts)
Write Pattern Inserts anywhere, causing fragmentation Writes append to the last page
Page Splits ❌ Frequent, due to random insert locations ✅ Rare, only when last page is full
Cache Efficiency ❌ Poor – random pages evicted frequently ✅ High – recent data stays in memory
Query Performance ❌ Slow – queries require loading multiple pages from disk ✅ Fast – queries read from cached pages

7. Conclusion

rene-bohmer-YeUVDKZWSZ4-unsplash

TL;DR

Why UUID v4 is inefficient:

  • Random inserts cause index fragmentation and frequent page splits.
  • Sequential queries suffer, requiring multiple page fetches.
  • Cache performance drops, as pages are constantly replaced.

Why UUID v7 is better:

  • Sorted inserts lead to fewer page splits.
  • Faster sequential reads, as queries load a single page.
  • Recent data remains in memory, improving database performance.

Long Version

Why UUID v4 is Inefficient

1. Random Inserts Cause Index Fragmentation

UUIDv4 values are completely random, meaning new entries can be inserted anywhere in the B-Tree index. This leads to data fragmentation, as records are scattered across different pages instead of being grouped together.

2. Frequent Page Splits Increase Write Overhead

When a UUID v4 is inserted into a full page, the database must split the page to make room for new values. Since inserts happen randomly, page splits occur more frequently, increasing the database's workload and reducing efficiency.

3. Poor Sequential Read Performance

Because UUID v4 values are unordered, reading a range of UUIDs requires fetching data from multiple non-contiguous pages. This results in:

  • More disk I/O, slowing down queries.
  • Inefficient caching, as different pages are loaded into memory instead of reusing recently accessed ones.

Why UUID v7 is Better

1. Sequential Inserts Improve Write Performance

UUID v7 is sorted by timestamp, meaning new values are always appended to the latest page. This results in:

  • Fewer random writes, reducing fragmentation.
  • Minimal page splits, since new entries naturally go to the end of the index.

2. Faster Sequential Reads

Since UUID v7 values are stored in increasing order, sequential queries can retrieve data from a single page or adjacent pages, making range scans much faster.

3. Better Cache Efficiency

With UUID v7, recent data remains in memory longer because:

  • New records are appended rather than scattered.
  • Queries accessing recent entries will likely hit cached pages, reducing disk reads and improving performance.

Top comments (0)