Adil Khan

Posted on Jun 12

UUIDv4 vs UUIDv7 in PostgreSQL: A B-Tree Index Performance Deep Dive

#postgres #python #fastapi #systemdesign

Migrating MedicoSync: UUIDv4 → UUIDv7

(Why I'm doing it, and why it's cheap to defer)

When building MedicoSync (an open-source FastAPI + PostgreSQL medical records platform), hiding internal database IDs was a key requirement on Day 1 [source: 0.1.1].

The Problem with Integers: Simple sequential numbers like /api/v1/patients/42 leak data. A malicious user can guess the next numbers and scan your endpoints like /43, /44, or /45.
The Vulnerability: This mistake is called an IDOR (Insecure Direct Object Reference) vulnerability.

To fix this security risk, I used standard UUIDv4 keys [source: 0.1.1]. It stopped the endpoint scanning problem perfectly. However, random text strings create a hidden slowdown inside relational database storage engines over time.

THE PROBLEM: B-Tree Page Splits with UUIDv4

PostgreSQL keeps database rows organized on disk using a B-Tree index map. Think of your drive storage space as numbered storage boxes that are already completely full:

┌───────────┐ ┌───────────┐ ┌───────────┐ ┌───────────┐
│   Box 1   │ │   Box 2   │ │   Box 3   │ │   Box 4   │
│  [ FULL ] │ │  [ FULL ] │ │  [ FULL ] │ │  [ FULL ] │
└───────────┘ └───────────┘ └───────────┘ └───────────┘

Because UUIDv4 is fully random, a brand-new row ID might belong right in the middle of Box 2. PostgreSQL cannot just append it to the end. It must force Box 2 open, execute a B-Tree page split, and reorganize the disk layout:

┌───────────┐ ┌───────────┐┌───────────┐ ┌───────────┐ ┌───────────┐
│   Box 1   │ │  Box 2A   ││  Box 2B   │ │   Box 3   │ │   Box 4   │
│  [ FULL ] │ │  [ HALF ] ││  [ HALF ] │ │  [ FULL ] │ │  [ FULL ] │
└───────────┘ └───────────┘└─────▲─────┘ └───────────┘ └───────────┘
                                 │
                     [ New random row forced here. ]
                     [ Everything else shifts down. ]

When tables grow to 1 million+ rows, this constant random disk fragmentation slows down write speeds.

THE FIX: UUIDv7's Time-Ordered Prefix

UUIDv7 solves this problem by arranging its internal layout chronologically:

Layout = [ 48-bit UNIX Timestamp ][ 80 Random Bits ]

Because the prefix is time-prefixed, every newly generated UUIDv7 is greater than the last one. PostgreSQL can now cleanly append new rows directly to the very end of the index tree:

┌───────────┐ ┌───────────┐ ┌───────────┐ ┌───────────┐ ┌───────────┐
│   Box 1   │ │   Box 2   │ │   Box 3   │ │   Box 4   │ │   Box 5   │
│  [ FULL ] │ │  [ FULL ] │ │  [ FULL ] │ │  [ FULL ] │ │ [ NEW ROW]│
└───────────┘ └───────────┘ └───────────┘ └───────────┘ └─────▲─────┘
                                                             │
                                                  [ Appends cleanly ]
                                                  [ with zero splits]

No page splits, no disk index reorganizing, and smooth data writing performance at scale.

THE STORAGE TRADE-OFF

You must balance system storage memory against data security:

UUID (v4 or v7) ➔ Consumes 16 bytes of disk and RAM index space.
BIGINT ➔ Consumes 8 bytes of disk and RAM index space.

UUIDs take up double the index storage space compared to standard integers, but they keep sensitive resources hidden.

RESOLVING QUERY OVERHEAD: THE DUAL-ID PATTERN

Fixing the write path with UUIDv7 doesn't fully close the topic. There's a related pattern worth knowing about for systems like this: the internal ID / external ID split, where a lightweight BIGINT is used for internal joins and lookups, while a UUID is exposed externally as the public-facing identifier.

A 16-byte UUID is more expensive to push through CPU cache lines and join operations than an 8-byte integer. At a high enough query volume, decoupling "what the database uses internally" from "what the API exposes externally" can shave meaningful overhead off hot-path queries.

┌────────────────────────────────────────────────────────┐
│                   INCOMING API REQUEST                 │
│              GET /api/v1/patients/[ UUID ]             │
└───────────────────────────┬────────────────────────────┘
                            │ Public Route Boundary
                            ▼
┌────────────────────────────────────────────────────────┐
│                   BACKEND SERVICE LAYER                │
│    1. Match external public 16-byte UUID               │
│    2. Extract internal private 8-byte BIGINT (Fast)    │
└───────────────────────────┬────────────────────────────┘
                            │ Internal Network Only
                            ▼
┌────────────────────────────────────────────────────────┐
│                   POSTGRESQL DATABASE ENGINE           │
│     Executes blazing fast JOINs via internal BIGINT    │
└────────────────────────────────────────────────────────┘

1. The Internal ID (BIGINT / 8 Bytes): Acts as the primary database key for table lookups and internal resource joins. This key never leaves your private backend network.
2. The External ID (UUID / 16 Bytes): A standalone unique column used solely for frontend payloads and public routing links to hide database context from automated scanners.

📝 Note: I haven't implemented this in MedicoSync — at MVP scale, the query-cost difference is negligible compared to the UUIDv7 write-path fix. But I saw this pattern often, and it's the natural next thing to evaluate if MedicoSync's traffic ever grows to where join performance on the patients/records tables becomes a bottleneck. Worth flagging as the "next escalation point" if you're designing a similar system and expect serious scale early. I will publish a full step-by-step implementation guide on this dual-ID abstraction layer next week!

ARCHITECTURAL SUMMARY: WHICH ONE SHOULD YOU CHOOSE?

Choosing between these options depends on the current lifecycle state of your database project:

Scenario 1: Starting a Brand New Project ➔ Use native UUIDv7. It provides security for public endpoints out of the box while naturally preserving index write performance on disk.
Scenario 2: Inheriting a Legacy System (Stuck on UUIDv4) ➔ Do not rush into a high-risk schema rewrite. You can protect endpoints against scanning attacks by embedding tracking metadata directly inside secure, encrypted JWT (JSON Web Token) Payloads. (I will be breaking down this pattern in an upcoming article!).
Scenario 3: High-Throughput Systems ➔ If your system writes millions of entries a day, decouple lookup stress entirely by adding a distributed caching store like Redis to protect your primary database tables.

THE ARCHITECTURAL VERDICT

At MedicoSync's current MVP scale, the UUIDv4 → UUIDv7 gap isn't an urgent fire [source: 0.1.1].

The reason it's cheap to defer is a foundational choice made on Day 1: Alembic migrations [source: 0.1.1]. Because the schema is wrapped in Alembic from the start, updating the SQLAlchemy model's UUID generation strategy and auto-generating the migration is a scaffolded, low-risk change — not a rewrite [source: 0.1.1].

Engineering isn't about shipping a flawless schema on day one [source: 0.1.1]. It's about building something clean enough to evolve when you learn there's a better way [source: 0.1.1].

LET'S CONNECT

🛠️ Building a secure, high-performance Python/FastAPI backend, data pipeline, or custom workflow automation engine? Let’s design your platform together:
👉 Work with me on Fiverr: Fiverr Profile

💾 Explore the full tested architecture codebase:
👉 GitHub Profile: FanoyG GitHub

🛠️ Working through similar backend/schema decisions? Happy to chat — find me here on dev.to or connect on LinkedIn.

DEV Community