ahmet gedik

Posted on Jul 1

Adopting PostgreSQL UUID v7 Primary Keys for Video Metadata at Scale

#postgres #database #scaling #backend

Last quarter our discovery catalog crossed 40 million video-metadata rows spread across eight regions, and our BIGSERIAL primary keys finally became the bottleneck we could no longer paper over. Each region runs its own ingest cron on its own cadence, and merging regional shards meant remapping integer IDs by hand. Worse, our public watch URLs exposed sequential integers, so anyone could fetch two URLs a day apart, subtract, and estimate our daily ingest volume. We run TrendVidStream, a global multi-region streaming-discovery service, and the change that finally stuck was moving primary keys to PostgreSQL UUID v7. This is the concrete story of why we did it and exactly how, including a backfill of 40 million live rows without a maintenance window.

The problem with BIGSERIAL at multi-region scale

We operate eight regional ingest pipelines. Each is a cron job that pulls fresh streaming metadata, normalizes it, and writes it into Postgres. With a single monotonic BIGSERIAL sequence, growth exposed four distinct failures:

Shard merges required ID remapping. When the same video surfaced in two regions, reconciling rows meant translating one region's integer ID space into another. Every foreign key into comments, watch history, and recommendation edges had to be rewritten inside the same transaction.
Sequential IDs leaked business data. A watch URL like /watch/4821993 is a gift to competitors. Catalog velocity is a trade secret for a discovery product, and a plain incrementing counter hands it out for free.
The right-most index page became a write hotspot. Every region appended to the same increasing sequence, so under concurrent ingest the last B-tree leaf page turned into a lock-contention point.
ID generation needed a database round trip. You cannot know a row's ID until after the INSERT returns. That makes client-side reference graphs and large batch inserts awkward, especially in our PHP 8.4 ingest layer where we want to build the whole object graph in memory first.

What we actually needed was globally unique identifiers that any process — PHP, a Go worker, a Python backfill script — could mint locally, with no coordination, while still keeping the locality benefits that made BIGSERIAL fast in the first place.

Why random UUID v4 was the wrong fix

The obvious reaction is "just use uuid with gen_random_uuid()." We tried it in a staging shard and the numbers were ugly. Random UUID v4 keys are uniformly distributed, which means consecutive inserts land in random positions across the primary-key B-tree. The consequences at our row counts:

Index fragmentation. Inserts touch random leaf pages, so the index never stays densely packed. Our v4 staging index was roughly 30% larger on disk than the equivalent BIGSERIAL index for the same row count.
Cache thrashing. Because writes scatter, the pages you need are rarely the pages already in shared_buffers. Buffer hit ratio dropped and read latency on the ingest path climbed.
WAL amplification. Random insert positions trigger more full-page writes, which inflates WAL volume and slows down streaming replication to our read replicas.

UUID v4 solved uniqueness and the leakage problem but traded away the locality that made our hot insert path cheap. UUID v7 is the version that keeps both.

What UUID v7 actually is

UUID v7, standardized in RFC 9562, is a 128-bit identifier whose leading 48 bits are a Unix timestamp in milliseconds, followed by version and variant bits and then random data. The layout matters:

Bits 0-47: Unix epoch milliseconds, big-endian.
Bits 48-51: version (0111 = 7).
Bits 52-63: 12 bits of randomness (rand_a).
Bits 64-65: variant (10).
Bits 66-127: 62 bits of randomness (rand_b).

Because the timestamp is the most significant part, IDs minted later sort after IDs minted earlier. New rows therefore append near the right edge of the B-tree, exactly like BIGSERIAL, so you keep insert locality, dense index pages, and cheap range scans. At the same time the 74 bits of randomness give you collision-resistant global uniqueness without any central sequence. That is the whole pitch: time-ordered like an integer, decentralized like a UUID.

Generating UUID v7 inside PostgreSQL

PostgreSQL 18 ships a native uuidv7() function, which is what we use on our primary cluster. The table definition is refreshingly boring:

-- PostgreSQL 18+: native generator
CREATE TABLE videos (
    uuid       uuid PRIMARY KEY DEFAULT uuidv7(),
    region     text        NOT NULL,
    title      text        NOT NULL,
    duration_s int         NOT NULL,
    published  timestamptz NOT NULL DEFAULT now()
);

If you are still on Postgres 14-17, you do not need an extension. This pure-SQL function reuses gen_random_uuid() for the random bytes, overlays the millisecond timestamp into the first six bytes, and flips the version bits:

-- PostgreSQL 14-17: no extension required
CREATE OR REPLACE FUNCTION uuid_generate_v7()
RETURNS uuid
AS $$
BEGIN
  RETURN encode(
    set_bit(
      set_bit(
        overlay(
          uuid_send(gen_random_uuid())
          PLACING substring(
            int8send(floor(extract(epoch FROM clock_timestamp()) * 1000)::bigint)
            FROM 3
          )
          FROM 1 FOR 6
        ),
        52, 1
      ),
      53, 1
    ),
    'hex'
  )::uuid;
END
$$ LANGUAGE plpgsql VOLATILE;

The int8send call serializes the 64-bit millisecond timestamp; we take its trailing six bytes (FROM 3) because the top two bytes are always zero for current dates, and overlay drops them into the front of the UUID. The two set_bit calls turn the version nibble into 0111. Wrap either form in DEFAULT and your existing INSERT statements need no changes at all.

Generating UUID v7 in the application layer with PHP 8.4

The real win is generating IDs before the row ever reaches the database. Our ingest layer builds an entire object graph — video, channel reference, region tags — in memory, then bulk-inserts it. We mint the IDs in PHP 8.4 so foreign keys are wired up client-side. You can pull in symfony/uid and call Uuid::v7(), but the dependency-free version is short enough that we inlined it:

<?php
declare(strict_types=1);

function uuidv7(): string
{
    // 48-bit millisecond timestamp
    $unixTsMs = (int) (microtime(true) * 1000);
    $tsHex    = str_pad(dechex($unixTsMs), 12, '0', STR_PAD_LEFT);

    // 80 random bits; we overwrite version + variant nibbles below
    $rand = random_bytes(10);
    $rand[0] = chr((ord($rand[0]) & 0x0f) | 0x70); // version 7
    $rand[2] = chr((ord($rand[2]) & 0x3f) | 0x80); // variant 10xx

    $hex = $tsHex . bin2hex($rand);

    return sprintf(
        '%s-%s-%s-%s-%s',
        substr($hex, 0, 8),
        substr($hex, 8, 4),
        substr($hex, 12, 4),
        substr($hex, 16, 4),
        substr($hex, 20, 12)
    );
}

// Build the graph in memory, then one round trip:
$videoId = uuidv7();
$pdo->prepare(
    'INSERT INTO videos (uuid, region, title, duration_s) VALUES (?, ?, ?, ?)'
)->execute([$videoId, 'eu-west', 'Highlight Reel', 212]);

Generating IDs in PHP also lets our SQLite FTS5 search index — which we keep as a per-region read-side cache for instant autocomplete — reference the exact same UUID the moment a row is created, instead of waiting on a RETURNING clause and a second write. The canonical metadata lives in Postgres; the FTS5 mirror just copies the already-known key.

Generating UUID v7 in Go for the ingest workers

Our highest-throughput ingest workers are written in Go, where minting IDs locally removes a round trip from the hot loop entirely. The implementation maps directly onto the RFC layout:

package main

import (
    "crypto/rand"
    "fmt"
    "time"
)

// UUIDv7 returns a time-ordered RFC 9562 UUID v7 string.
func UUIDv7() (string, error) {
    var u [16]byte

    ts := uint64(time.Now().UnixMilli())
    u[0] = byte(ts >> 40)
    u[1] = byte(ts >> 32)
    u[2] = byte(ts >> 24)
    u[3] = byte(ts >> 16)
    u[4] = byte(ts >> 8)
    u[5] = byte(ts)

    if _, err := rand.Read(u[6:]); err != nil {
        return "", err
    }
    u[6] = (u[6] & 0x0f) | 0x70 // version 7
    u[8] = (u[8] & 0x3f) | 0x80 // variant 10xx

    return fmt.Sprintf("%x-%x-%x-%x-%x",
        u[0:4], u[4:6], u[6:8], u[8:10], u[10:16]), nil
}

func main() {
    id, err := UUIDv7()
    if err != nil {
        panic(err)
    }
    fmt.Println(id) // e.g. 0195a3f1-7c2a-7b3e-9d41-3f2a8c1e6b00
}

Because every worker stamps the leading 48 bits from its own clock, IDs minted across eight regions still sort into a single coherent global timeline as long as the boxes are within a second or two of NTP — which they are. No sequence, no coordination, no contention on a shared counter.

Backfilling 40 million live rows without downtime

The hard part was migrating the existing catalog. We could not lock 40 million rows or take the catalog offline. The plan was incremental: add a nullable uuid column, backfill it in small batches from a Python cron job, then promote it to the primary key in a final swap.

The critical property of UUID v7 made this safe: for historical rows we set the timestamp portion from each row's real published time, so the backfilled UUIDs sort in the same order as the original sequence. New rows mint a UUID from the current clock. Both interleave cleanly.

import os
import time
import uuid
import psycopg2
from psycopg2.extras import execute_values

BATCH = 5000

def uuid7(unix_ms: int) -> uuid.UUID:
    rand_b = os.urandom(10)
    raw = bytearray(unix_ms.to_bytes(6, "big") + rand_b)
    raw[6] = (raw[6] & 0x0F) | 0x70  # version 7
    raw[8] = (raw[8] & 0x3F) | 0x80  # variant 10xx
    return uuid.UUID(bytes=bytes(raw))

def backfill(dsn: str) -> None:
    conn = psycopg2.connect(dsn)
    conn.autocommit = False
    with conn.cursor() as cur:
        while True:
            cur.execute(
                "SELECT id, extract(epoch FROM published) * 1000 "
                "FROM videos WHERE uuid IS NULL ORDER BY id LIMIT %s",
                (BATCH,),
            )
            rows = cur.fetchall()
            if not rows:
                break
            updates = [(str(uuid7(int(ms))), pk) for pk, ms in rows]
            execute_values(
                cur,
                "UPDATE videos SET uuid = data.u::uuid "
                "FROM (VALUES %s) AS data(u, id) "
                "WHERE videos.id = data.id",
                updates,
            )
            conn.commit()
            print(f"backfilled {len(rows)} rows")
            time.sleep(0.2)  # keep replication lag low
    conn.close()

if __name__ == "__main__":
    backfill(os.environ["DATABASE_URL"])

The time.sleep(0.2) between commits is deliberate: it keeps replica lag bounded so our read-side search caches never fall behind during the multi-hour backfill. We ran this as an ordinary cron entry, the same way every other job in our FTP-deployed stack ships — no orchestration framework, just a script and a crontab line. When the column was fully populated we added the unique index CONCURRENTLY, then ran one short transaction to swap the primary key over.

Index behavior and time-range partitioning

Once UUID v7 was the primary key, an unexpected bonus appeared: because the leading 48 bits encode time, a range of UUIDs maps directly to a range of time. That makes range partitioning on the UUID column behave almost exactly like partitioning on a timestamp.

CREATE TABLE videos (
    uuid       uuid NOT NULL DEFAULT uuidv7(),
    region     text NOT NULL,
    title      text NOT NULL,
    published  timestamptz NOT NULL DEFAULT now(),
    PRIMARY KEY (uuid)
) PARTITION BY RANGE (uuid);

-- The leading hex of a v7 UUID is the millisecond timestamp,
-- so these bounds carve the table into time windows.
CREATE TABLE videos_2026_q1 PARTITION OF videos
    FOR VALUES FROM ('01952000-0000-7000-8000-000000000000')
                 TO ('01964000-0000-7000-8000-000000000000');

CREATE TABLE videos_2026_q2 PARTITION OF videos
    FOR VALUES FROM ('01964000-0000-7000-8000-000000000000')
                 TO ('01975000-0000-7000-8000-000000000000');

What this bought us in practice:

Cheap retention. Dropping a stale quarter is a DROP TABLE on one partition, not a 40-million-row DELETE.
Pruning for free. Queries that filter on recent activity hit only the newest partitions, because recent UUIDs live in a known numeric range.
Dense indexes again. Inserts append to the active partition's right edge, restoring the BIGSERIAL locality we missed with v4.

What changed in production

After the cutover, the measurable differences across our eight-region deployment were:

Shard merges stopped requiring ID remapping — a row's identity is the same everywhere it appears, so reconciliation became a simple ON CONFLICT DO NOTHING.
Watch URLs stopped leaking catalog size, because the ID no longer encodes a guessable counter.
Insert-path contention dropped, since each region appends to its own right-most leaf pages instead of fighting over one shared sequence.
Index size stayed close to the old BIGSERIAL footprint, far below the random-v4 staging numbers, thanks to time-ordered inserts.

Conclusion

UUID v7 turned out to be the rare change that fixed a structural problem without introducing a new one. It gave us decentralized, globally unique, non-guessable identifiers while preserving the insert locality and small indexes that made integer keys fast. The migration path — nullable column, batched backfill from real timestamps, concurrent index build, atomic swap — let us move 40 million rows with zero downtime on the same plain cron-and-FTP tooling we use for everything else. If you are running a multi-region service still leaning on BIGSERIAL, and especially if your IDs are visible in public URLs, UUID v7 is worth the afternoon. Generate the keys in your application layer, lean on the time prefix for partitioning, and you keep almost everything you liked about sequential integers while shedding the parts that do not scale.

DEV Community