DEV Community

Cover image for 🚀 I Built UUIDs That Look Random But Sort Like Timestamps (50% Smaller Indexes!)
Eugene
Eugene

Posted on

🚀 I Built UUIDs That Look Random But Sort Like Timestamps (50% Smaller Indexes!)

TL;DR

Built a PostgreSQL extension that creates UUIDs looking like random v4 but containing hidden timestamps. Result: 50% smaller indexes with perfect privacy protection.

Available now on PGXN: pgxn install pg_uuid_v8


The Problem Every Backend Dev Faces 🤔

You know this pain:

  • UUID v4: Random, secure, but terrible for DB indexing (fragmentation nightmare)
  • UUID v7: Great for indexing, but reveals creation timing (privacy issues)

For years, we've been stuck choosing between performance and privacy.

The Solution: Steganographic UUIDs 🔮

What if UUIDs could be both random AND fast? Enter steganography - hiding encrypted timestamps inside random-looking UUIDs.

-- Looks like normal UUID v4
SELECT uuid_v8_generate();
-- bf3fcf45-9476-4138-bf48-03933d90dc2d

-- But contains hidden timestamp!
SELECT uuid_stego_extract_timestamp('bf3fcf45-9476-4138-bf48-03933d90dc2d');
-- 1714127712849302 (microseconds since epoch)
Enter fullscreen mode Exit fullscreen mode

Technical Deep Dive 🛠️

UUID Structure

Standard v4:    xxxxxxxx-xxxx-4xxx-yxxx-xxxxxxxxxxxx  
Steganographic: TTTTTTTT-TTTT-4RRR-yRRR-RRRRRRRRRRRR

T = Encrypted timestamp bits (48 bits)
R = Random bits  
4 = Version marker (v4 compliant)
Enter fullscreen mode Exit fullscreen mode

Encryption Algorithm

// XOR encryption with SHA-256 key derivation
uint64 encrypt_timestamp(uint64 timestamp, const char* seed) {
    uint64 key = sha256_derive_key(seed);
    return (timestamp ^ key) & 0xFFFFFFFFFFFFULL;
}
Enter fullscreen mode Exit fullscreen mode

PostgreSQL C Extension Implementation

PG_FUNCTION_INFO_V1(uuid_v8_generate);
Datum uuid_v8_generate(PG_FUNCTION_ARGS) {
    pg_uuid_t *uuid = palloc(sizeof(pg_uuid_t));

    uint64 timestamp = get_current_timestamp_us();
    uint64 key = generate_key_from_seed(stego_seed);
    uint64 encrypted = crypt_timestamp(timestamp, key);

    // Embed in first 48 bits + random fill
    embed_in_uuid(uuid, encrypted);
    set_uuid_v4_bits(uuid);

    PG_RETURN_UUID_P(uuid);
}
Enter fullscreen mode Exit fullscreen mode

Performance Results 📈

Tested on 500k records:

Index Type Size Performance
UUID v4 Full 3.1 MB Sequential Scan
Stego Functional 1.5 MB Index Scan
-- The magic: functional indexing
CREATE INDEX events_time_idx ON events 
USING btree (uuid_stego_extract_timestamp(id));

-- Fast time-based queries
EXPLAIN ANALYZE SELECT * FROM events 
WHERE uuid_stego_extract_timestamp(id) BETWEEN start_ts AND end_ts;
-- Index Scan using events_time_idx (fast!)
Enter fullscreen mode Exit fullscreen mode

Getting Started (Now Available on PGXN!) 📦

Easy Installation

# Install from PGXN (recommended)
pgxn install pg_uuid_v8

# Or build from source
git clone https://github.com/ineron/pg_uuid_v8
cd pg_uuid_v8
make && sudo make install
Enter fullscreen mode Exit fullscreen mode

Setup

CREATE EXTENSION pg_uuid_v8;
SELECT uuid_v8_set_seed('your_secret_2024');

CREATE TABLE events (
    id uuid PRIMARY KEY DEFAULT uuid_v8_generate(),
    data jsonb
);

CREATE INDEX ON events (uuid_stego_extract_timestamp(id));
Enter fullscreen mode Exit fullscreen mode

Queries That Actually Work

-- Efficient pagination
SELECT * FROM events 
WHERE uuid_stego_extract_timestamp(id) > last_timestamp
ORDER BY uuid_stego_extract_timestamp(id) LIMIT 100;

-- Time range analytics  
SELECT count(*) FROM events 
WHERE uuid_stego_in_range(id, '2024-01-01', '2024-12-31');
Enter fullscreen mode Exit fullscreen mode

The Secret Sauce: Functional Indexes 🔑

PostgreSQL's functional indexes are the key. Instead of indexing the full 16-byte UUID, we index the extracted 8-byte timestamp:

-- Traditional approach (large index)
CREATE INDEX ON table (uuid_column);  -- 16 bytes per entry

-- Steganographic approach (compact index)  
CREATE INDEX ON table (uuid_stego_extract_timestamp(uuid_column));  -- 8 bytes per entry
Enter fullscreen mode Exit fullscreen mode

Result: 50% space savings + Index Scan performance!

Security & Encryption Modes 🔒

The extension supports multiple encryption modes:

-- Fast privacy protection (default)
SELECT uuid_v8_set_encryption_mode('XOR');

-- GDPR/compliance ready
SELECT uuid_v8_set_encryption_mode('AES128');

-- Maximum security
SELECT uuid_v8_set_encryption_mode('AES256');
Enter fullscreen mode Exit fullscreen mode

Threat Model

Prevents: Timing analysis attacks, creation pattern discovery

Protects: Privacy-sensitive applications (healthcare, finance)

⚠️ Note: XOR mode provides privacy, AES modes provide cryptographic security

Real-World Use Cases 🎯

Perfect For:

  • High-volume APIs (millions of records daily)
  • Privacy-sensitive systems (healthcare, finance)
  • Multi-tenant platforms (SaaS applications)
  • Time-series data with privacy requirements

Migration Example:

-- Existing UUID v4 tables
ALTER TABLE existing_table 
ADD COLUMN new_id uuid DEFAULT uuid_v8_generate();

-- Create functional index
CREATE INDEX ON existing_table (uuid_stego_extract_timestamp(new_id));

-- Gradually migrate queries
Enter fullscreen mode Exit fullscreen mode

Performance Deep Dive 📊

Index Size Analysis

-- Compare index sizes
SELECT 
    indexname,
    pg_size_pretty(pg_total_relation_size(indexname::regclass)) as size
FROM pg_indexes 
WHERE tablename = 'test_table';

--           indexname           |  size  
-- ------------------------------+--------
--  test_table_uuid_idx          | 3.1 MB
--  test_table_stego_func_idx    | 1.5 MB
Enter fullscreen mode Exit fullscreen mode

Query Performance

-- Query plan comparison
EXPLAIN (ANALYZE, BUFFERS) 
SELECT * FROM large_table 
WHERE uuid_stego_extract_timestamp(id) BETWEEN start_ts AND end_ts;

-- Result: Index Scan using large_table_stego_time_idx
-- Buffers: shared hit=3, Execution Time: 0.040 ms
Enter fullscreen mode Exit fullscreen mode

Lessons Learned 📚

What Worked:

  1. Functional indexes are PostgreSQL's superpower
  2. Standard compliance (UUID v4) ensures easy adoption
  3. PGXN distribution makes installation trivial
  4. Multiple encryption modes satisfy different security requirements

Challenges Overcome:

  1. LLVM bitcode compilation issues (solved with NO_LLVM=1)
  2. Memory management in PostgreSQL extensions (palloc vs malloc)
  3. PGXN metadata spec compliance (v1.0.0 validation)

Architecture Decisions 🏗️

Why C Extension vs PL/pgSQL?

  • Performance: Native speed for cryptographic operations
  • Integration: Deep PostgreSQL type system integration
  • Security: Compiled code vs interpreted SQL

Why Functional Indexes vs Custom Index Types?

  • Compatibility: Works with all PostgreSQL versions
  • Maintenance: Leverages existing B-tree infrastructure
  • Flexibility: Standard PostgreSQL query optimization

Community & Open Source 🌟

  • License: PostgreSQL License (permissive)
  • PGXN: https://pgxn.org/dist/pg_uuid_v8/
  • GitHub: Full source + comprehensive docs
  • Tests: Regression suite across PostgreSQL versions

Try It Yourself! 🚀

# Quick start
pgxn install pg_uuid_v8
psql -c "CREATE EXTENSION pg_uuid_v8; SELECT uuid_v8_generate();"
Enter fullscreen mode Exit fullscreen mode

Benchmark Your Use Case

-- Create test table
CREATE TABLE benchmark (
    id uuid DEFAULT uuid_v8_generate(),
    data text DEFAULT 'sample data'
);

-- Insert test data
INSERT INTO benchmark (data) 
SELECT 'test-' || generate_series(1,100000);

-- Create functional index
CREATE INDEX benchmark_time_idx ON benchmark 
(uuid_stego_extract_timestamp(id));

-- Test performance
EXPLAIN ANALYZE 
SELECT * FROM benchmark 
WHERE uuid_stego_extract_timestamp(id) > 
    (EXTRACT(EPOCH FROM NOW() - INTERVAL '1 hour') * 1000000)::bigint;
Enter fullscreen mode Exit fullscreen mode

What's Next? 🔮

Working on:

  • Version management for seamless upgrades
  • Monitoring functions for performance analytics
  • ORM integrations for popular frameworks
  • Cloud platform deployment guides (AWS RDS, Google Cloud SQL)

Discussion 💬

  • Have you hit UUID performance walls in your applications?
  • What's your current approach to time-based indexing?
  • Tried steganographic UUIDs yet? Drop your benchmark results!

The PostgreSQL ecosystem is amazing for solving real-world problems. What database challenges are you tackling?


pg_uuid_v8

A PostgreSQL extension for steganographic UUIDs with embedded timestamps.

PGXN version PostgreSQL 12+

Overview

pg_uuid_v8 addresses the performance vs privacy trade-off in UUID usage by implementing steganographic UUIDs. These UUIDs maintain full compatibility with the UUID v4 format while embedding hidden timestamps that enable efficient indexing and range queries.

Features

  • UUID v4 Compatibility: Generated UUIDs pass standard v4 validation (correct version and variant bits)
  • Hidden Timestamps: Microsecond-precision timestamps embedded using steganographic techniques
  • Configurable Encryption: XOR, AES-128, and AES-256 modes for timestamp obfuscation
  • Functional Indexing: Support for PostgreSQL functional indexes on extracted timestamps
  • Range Queries: Efficient time-based queries using hidden timestamp data
  • Seed Management: Configurable encryption seeds via PostgreSQL GUC variables

Technical Approach

Standard UUID implementations present a trade-off between indexing performance and timestamp privacy:

  • UUID v4: Random values provide good privacy but result in poor B-tree index performance due to random insertion patterns
  • UUID v7

Building better databases one UUID at a time 🛠️

Links:

Top comments (0)