ahmet gedik

Posted on Jun 2

PostgreSQL UUID v7 Primary Keys for Video Metadata at European Scale

#postgres #performance #php #database

Last quarter our video metadata table crossed 47 million rows. Insert latency on the primary key index doubled in six weeks, autovacuum started chewing through I/O at peak European traffic hours (19:00-22:00 CET), and our nightly export to Cloudflare R2 went from 11 minutes to 38. The culprit was not query volume — it was the uuid_generate_v4() we picked three years ago when ViralVidVault was a weekend project. Random UUIDs were shredding our B-tree.

This is the story of migrating 47M rows from UUID v4 to UUID v7 on a live PostgreSQL 16 cluster, without downtime, while staying GDPR-compliant for our European user base. If you run a write-heavy table with UUID primary keys, you probably have the same problem and don't know it yet.

Why Random UUIDs Hurt at Scale

PostgreSQL's default index is a B-tree. B-trees love sequential inserts: new keys land in the rightmost leaf page, that page stays hot in shared_buffers, and you get near-O(1) inserts with minimal WAL churn. Random UUIDs do the opposite. Every insert hits a different leaf page, evicts something useful from cache, and forces a full page write to WAL the first time it's touched in a checkpoint window.

With our workload — roughly 180 video metadata inserts per second during European peak, plus 40 updates/sec for view counters — we measured:

14.2 GB/hour of WAL during peak (mostly full-page writes)
Cache hit ratio on the PK index dropped from 99.4% to 91.7% over six months
p99 insert latency climbed from 3ms to 27ms
pg_stat_user_indexes showed the PK index at 4.2x the size of the table data

UUID v7 fixes this by encoding a millisecond Unix timestamp in the first 48 bits, followed by 74 bits of randomness. Inserts cluster at the right edge of the B-tree, exactly like a bigserial would, but you keep the distributed-generation property that made you pick UUIDs in the first place. RFC 9562 was published in May 2024, so you're not on the bleeding edge anymore.

Generating UUID v7 in PHP 8.4

We run PHP 8.4 on LiteSpeed. The ramsey/uuid library added v7 support in 4.5, but for hot paths I prefer a dependency-free generator. Here's what we ship:

<?php
declare(strict_types=1);

final class UuidV7
{
    public static function generate(): string
    {
        $unixTsMs = (int) (microtime(true) * 1000);
        $randBytes = random_bytes(10);

        $bytes = pack('J', $unixTsMs << 16) . $randBytes;
        $bytes = substr($bytes, 0, 16);

        $bytes[6] = chr((ord($bytes[6]) & 0x0f) | 0x70);
        $bytes[8] = chr((ord($bytes[8]) & 0x3f) | 0x80);

        $hex = bin2hex($bytes);
        return sprintf(
            '%s-%s-%s-%s-%s',
            substr($hex, 0, 8),
            substr($hex, 8, 4),
            substr($hex, 12, 4),
            substr($hex, 16, 4),
            substr($hex, 20, 12)
        );
    }

    public static function extractTimestamp(string $uuid): int
    {
        $hex = str_replace('-', '', $uuid);
        $tsHex = substr($hex, 0, 12);
        return (int) hexdec($tsHex);
    }
}

$id = UuidV7::generate();
echo $id . "\n";
echo date('c', (int)(UuidV7::extractTimestamp($id) / 1000)) . "\n";

The extractTimestamp helper is genuinely useful in production. We use it for shard routing and for sanity-checking that a row's created_at matches its ID — if they diverge by more than a few seconds, something is wrong with clock sync on the writer.

Benchmarked on our app servers (Xeon Gold 6338, PHP 8.4 with OPcache JIT), this generates 1.1M UUIDs/sec single-threaded. The ramsey/uuid equivalent does about 340K/sec because of object allocation overhead. For a hot ingestion path that matters.

The Postgres Side

PostgreSQL 18 will ship uuidv7() as a built-in. Until then, we use a SQL function. Here's the one we deployed:

CREATE OR REPLACE FUNCTION uuidv7() RETURNS uuid
AS $$
  SELECT encode(
    set_bit(
      set_bit(
        overlay(uuid_send(gen_random_uuid())
                placing substring(int8send((extract(epoch FROM clock_timestamp()) * 1000)::bigint) from 3)
                from 1 for 6),
        52, 1),
      53, 1),
    'hex')::uuid;
$$ LANGUAGE SQL VOLATILE;

CREATE TABLE videos_v7 (
    id uuid PRIMARY KEY DEFAULT uuidv7(),
    youtube_id text NOT NULL,
    title text NOT NULL,
    duration_seconds integer NOT NULL,
    view_count bigint NOT NULL DEFAULT 0,
    region_code char(2) NOT NULL,
    published_at timestamptz NOT NULL,
    fetched_at timestamptz NOT NULL DEFAULT now(),
    metadata jsonb NOT NULL DEFAULT '{}'::jsonb
) WITH (fillfactor = 95);

CREATE INDEX videos_v7_published_brin ON videos_v7 USING brin (published_at) WITH (pages_per_range = 32);
CREATE INDEX videos_v7_region_published ON videos_v7 (region_code, published_at DESC) WHERE view_count > 10000;

Two details worth calling out. First, fillfactor = 95 leaves room for HOT updates on view_count without page splits — critical because we update view counts constantly. Second, the BRIN index on published_at is tiny (under 200KB for the full table) because sequential UUIDs mean physical row order roughly matches insertion time, which roughly matches published_at. With v4 IDs, a BRIN on published_at was useless because rows were scattered.

The partial index on region_code, published_at only covers videos with significant view counts. About 8% of our rows qualify. That index is 340MB instead of the 4.2GB it would be without the WHERE clause.

Migrating 47 Million Rows Without Downtime

This is where most v4-to-v7 migration guides hand-wave. You cannot just ALTER TABLE a primary key on a live 47M-row table. Here's what actually worked.

Step 1: Add a new column with v7 defaults for new rows. Backfill existing rows in batches during off-peak hours.

ALTER TABLE videos ADD COLUMN id_v7 uuid DEFAULT uuidv7();

CREATE INDEX CONCURRENTLY videos_id_v7_idx ON videos (id_v7);

Step 2: Backfill. We did this with a Python script using a server-side cursor and small batches. The naive UPDATE ... WHERE id_v7 IS NULL LIMIT 5000 approach is a trap because it does a sequential scan every iteration.

import psycopg
import time
import uuid_utils as uuid

BATCH_SIZE = 2000
SLEEP_BETWEEN_BATCHES = 0.05

def backfill(conn):
    last_id = '00000000-0000-0000-0000-000000000000'
    processed = 0
    started = time.monotonic()

    while True:
        with conn.cursor() as cur:
            cur.execute(
                """
                SELECT id FROM videos
                WHERE id > %s AND id_v7 IS NULL
                ORDER BY id
                LIMIT %s
                """,
                (last_id, BATCH_SIZE)
            )
            rows = cur.fetchall()
            if not rows:
                break

            updates = [(str(uuid.uuid7()), r[0]) for r in rows]
            cur.executemany(
                "UPDATE videos SET id_v7 = %s WHERE id = %s AND id_v7 IS NULL",
                updates
            )
            conn.commit()
            last_id = rows[-1][0]
            processed += len(rows)

            if processed % 50000 == 0:
                rate = processed / (time.monotonic() - started)
                print(f'{processed:,} rows, {rate:.0f}/sec, last_id={last_id}')

        time.sleep(SLEEP_BETWEEN_BATCHES)

with psycopg.connect('dbname=vvv user=migrator', autocommit=False) as conn:
    backfill(conn)

Keyset pagination on the existing PK avoids the sequential-scan trap. Small batches with a brief sleep keep replication lag under 200ms on our read replicas. The 50ms sleep is enough — without it, we saw lag spike to 4-5 seconds during the backfill.

Total time for 47M rows: 11 hours, mostly run overnight in 4-hour windows. WAL generation stayed manageable (about 9 GB/hour during backfill, well under our archive throughput).

Step 3: Add a NOT NULL constraint and swap the primary key. This is the only step that needs an exclusive lock, and we keep it under 2 seconds.

ALTER TABLE videos ALTER COLUMN id_v7 SET NOT NULL;

BEGIN;
LOCK TABLE videos IN ACCESS EXCLUSIVE MODE;
ALTER TABLE videos DROP CONSTRAINT videos_pkey;
ALTER TABLE videos ADD CONSTRAINT videos_pkey PRIMARY KEY USING INDEX videos_id_v7_idx;
ALTER TABLE videos RENAME COLUMN id TO id_v4_legacy;
ALTER TABLE videos RENAME COLUMN id_v7 TO id;
COMMIT;

Application code reads from id and doesn't care which UUID version sits behind it. We kept id_v4_legacy around for 60 days for our analytics exports that referenced the old IDs, then dropped it.

Cloudflare Workers and Edge Caching

We sit behind Cloudflare in front of LiteSpeed. UUID v7 has a nice property at the edge: because IDs sort chronologically, you can implement efficient cursor-based pagination without an extra index. Here's the Worker we use for our /api/videos endpoint:

export default {
  async fetch(request, env) {
    const url = new URL(request.url);
    const cursor = url.searchParams.get('cursor');
    const region = url.searchParams.get('region') || 'EU';
    const limit = Math.min(parseInt(url.searchParams.get('limit') || '50'), 100);

    const cacheKey = `videos:${region}:${cursor || 'head'}:${limit}`;
    const cached = await env.VIDEO_KV.get(cacheKey, 'json');
    if (cached) {
      return Response.json(cached, {
        headers: { 'cache-control': 'public, max-age=60', 'x-cache': 'HIT' }
      });
    }

    const params = new URLSearchParams({ region, limit: String(limit) });
    if (cursor) params.set('after', cursor);

    const origin = await fetch(`${env.ORIGIN}/api/videos?${params}`, {
      headers: { 'x-internal-token': env.ORIGIN_TOKEN }
    });
    const data = await origin.json();

    await env.VIDEO_KV.put(cacheKey, JSON.stringify(data), { expirationTtl: 60 });
    return Response.json(data, {
      headers: { 'cache-control': 'public, max-age=60', 'x-cache': 'MISS' }
    });
  }
};

The origin query is just SELECT ... FROM videos WHERE region_code = $1 AND id > $2 ORDER BY id DESC LIMIT $3. No separate created_at index needed because UUID v7 IDs already encode time. We dropped a 1.8GB (region_code, created_at) index when we moved to v7.

GDPR Considerations

Serving European users means thinking carefully about what a UUID leaks. UUID v7 embeds a millisecond timestamp, which is metadata. For our videos table this is fine — the row already has published_at and fetched_at columns, and the video itself is public.

For anything tied to a user, we use a different strategy. Our user_sessions table uses UUID v4 specifically because we don't want the session ID to leak the creation time. Our analytics_events table uses UUID v7 but the event IDs are never exposed to clients — they exist only for internal sorting and joining. The rule of thumb we follow:

Public, time-relevant entities (videos, blog posts, channels) — UUID v7
User-linked, privacy-sensitive entities (sessions, consent records, individual events) — UUID v4 or opaque tokens
Server-internal entities never exposed to clients — UUID v7, get the index performance benefit

This maps cleanly to GDPR Article 5(1)(c) — data minimisation. If a client doesn't need to know when a record was created, don't encode it in the ID you hand them.

What We Measured After the Migration

Three weeks after cutover, on the same hardware with similar traffic:

WAL generation: 14.2 GB/hour → 3.8 GB/hour during peak
p99 insert latency: 27ms → 4.1ms
PK index size: 4.2x table data → 1.3x table data after a REINDEX CONCURRENTLY
Cache hit ratio on PK index: 91.7% → 99.6%
Nightly R2 export: 38 minutes → 9 minutes
Autovacuum I/O at peak: dropped roughly 60% based on pg_stat_io

The nightly export win was the biggest surprise. The export reads rows in PK order and ships them to Cloudflare R2 for our cold archive. With v4 IDs, that was effectively a random read pattern against a 280GB table. With v7 IDs, it's a sequential scan and the kernel readahead actually helps.

When Not to Use UUID v7

A few cases where v7 is the wrong answer:

You need unguessable IDs for security purposes (use opaque tokens, not UUIDs of any kind)
You have multi-master writes across regions with significant clock skew — v7 assumes synchronised clocks, and skew above a few seconds will create out-of-order inserts that partially defeat the clustering benefit
Your table is small (under a million rows) and will stay small — v4 is fine and you don't need the migration risk
You're on PostgreSQL 12 or older — gen_random_uuid() requires pgcrypto and the SQL helper above gets uglier

For everything else, especially write-heavy tables on modern Postgres, UUID v7 is the default I'd pick for any new project today. It gives you distributed generation, sortability, B-tree friendliness, and a free timestamp — without forcing you to centralise ID allocation through a sequence server.

Conclusion

Migrating 47M rows from UUID v4 to v7 took us about two weeks of planning and one weekend of careful execution. The payoff was measurable across every metric that mattered: latency, WAL volume, index size, backup duration. If you're running a write-heavy Postgres table with random UUIDs and you're starting to feel the pain, you don't need a new database — you need v7. PostgreSQL 18 will make this even easier with a built-in function, but the SQL helper above works fine on 14 and later. The migration playbook is straightforward once you accept that the backfill has to be batched and paced; the swap itself is a two-second lock window. Worth every minute.

DEV Community