Aggregate Identity in Go: UUID v7, ULID, or Snowflake?

#go #ddd #architecture #postgres

Book: Hexagonal Architecture in Go
Also by me: Thinking in Go (2-book series) — Complete Guide to Go Programming + Hexagonal Architecture in Go
My project: Hermes IDE | GitHub — an IDE for developers who ship with Claude Code and other AI coding tools
Me: xgabriel.com | GitHub

You ship a clean DDD layout. Aggregates, value objects, repositories, the lot. Then a Postgres consultant looks at your orders table and asks why the primary key index is twice the size it should be and why writes started getting slower around the time the table crossed 50 million rows. The answer is sitting in your domain layer. Your OrderID is a UUID v4. Random 128 bits. Every insert lands on a different B-tree leaf page, splits it, and burns write IOPS for nothing.

Aggregate identity feels like a footnote when you're sketching the domain. It is not. The ID type is the one piece of the aggregate that touches every layer: constructor, repository, index. URLs and logs too. Pick wrong once and the cost compounds for the lifetime of the system.

Four candidates worth considering: UUID v4, UUID v7, ULID, Snowflake. The differences show up in production.

What "good identity" actually means for an aggregate

Five properties matter. Different IDs trade them off differently.

Distributed generation, no coordination. You should be able to mint an ID inside the aggregate constructor on any pod, without calling a sequence server.
Lexicographic ordering matches creation time. When the string sorts, the rows sort. Cursor pagination, log debugging, time-window queries all get easier.
Embedded creation timestamp. Reading an ID and recovering "when was this aggregate born" is a small superpower for ops.
B-tree friendliness on the write side. Monotonic-ish IDs append to the rightmost leaf. Random IDs splatter across the tree, splitting pages and bloating the index.
URL safety + reasonable length. It will appear in URLs, in logs, on customer support tickets. Shorter and case-tolerant beats longer and confusing.

A sixth one — collision space — is table stakes. Anything below ~80 bits of randomness per millisecond is too risky for a distributed system. All four candidates pass.

UUID v4: the default that quietly hurts

UUID v4 is 128 bits, 122 of which are random (RFC 9562). It is what uuid.New() gives you in most ecosystems and what most aggregates default to.

package order

import "github.com/google/uuid"

type OrderID uuid.UUID

func NewOrderID() OrderID {
    return OrderID(uuid.New()) // v4, random
}

First, every OrderID you mint is uniformly distributed across the entire 128-bit space. That kills B-tree locality on the write side. On a Postgres btree(id) primary key with random UUIDs, every insert touches a near-random leaf page. Under sustained insert load you pay random-write IOPS, frequent page splits, and a noticeably larger index than the equivalent monotonic case. Easy to reproduce on a laptop with the harness later in this post. Second, you cannot recover when the aggregate was created from the ID. Every debugging session that starts with "what happened around 14:32 UTC?" needs a second column.

UUID v4 is fine for low-write tables and for IDs that never see an index. For an aggregate root that gets inserted thousands of times per second, it is the wrong default.

UUID v7: v4's better-behaved successor

UUID v7 was standardised in RFC 9562 (May 2024, replacing the old RFC 4122). The layout is the part that matters: 48 bits of Unix milliseconds at the front, then a 4-bit version field, then 74 bits split between two random fields (rand_a is 12 bits, rand_b is 62 bits) with the 2-bit variant in the middle. What that buys you:

IDs minted in the same millisecond differ only in their random tail. The string representation sorts close to chronological order.
Writes hit a small, hot region of the B-tree — the rightmost leaf page and a few neighbours. Page splits drop. Index size drops. Range scans for "the last hour of orders" get cheap.

package order

import "github.com/google/uuid"

type OrderID uuid.UUID

func NewOrderID() (OrderID, error) {
    u, err := uuid.NewV7()
    if err != nil {
        return OrderID{}, err
    }
    return OrderID(u), nil
}

func (id OrderID) String() string {
    return uuid.UUID(id).String()
}

The randomness budget is 74 bits, not 122. That's still ~1.9e22 distinct IDs per millisecond. Collision risk is not the deciding factor.

The one footgun: not every Go UUID library returns monotonic v7 IDs within a millisecond. The spec allows it, the spec does not require it. google/uuid v1.6+ supports NewV7. If you need strict per-process monotonicity (rare for aggregates, common for event-sourcing keys), check the library's docs or pick a v7 lib that documents monotonic counters.

ULID: same idea, friendlier on the eye

ULID is a 128-bit ID with 48 bits of millisecond timestamp and 80 bits of randomness, encoded in 26 characters of Crockford base32 (no I, L, O, U). Case-insensitive and URL-safe. It predates UUID v7 by years and was the original "lexicographically sortable UUID" answer. The spec also defines a monotonic generation mode where same-millisecond IDs increment a 80-bit counter instead of regenerating randomness.

package order

import (
    "crypto/rand"
    "time"

    "github.com/oklog/ulid/v2"
)

type OrderID ulid.ULID

var entropy = ulid.Monotonic(rand.Reader, 0)

func NewOrderID() (OrderID, error) {
    u, err := ulid.New(
        ulid.Timestamp(time.Now()), entropy,
    )
    if err != nil {
        return OrderID{}, err
    }
    return OrderID(u), nil
}

func (id OrderID) String() string {
    return ulid.ULID(id).String()
}

A ULID renders as 01ARZ3NDEKTSV4RRFFQ69G5FAV — 26 characters versus a UUID's 36. That difference shows up in every URL, every log line, every JSON payload. Over a billion rows it is real bytes saved.

The trade-off: Postgres has a native uuid type with 16-byte storage and standard tooling. ULIDs are stored as bytea (16 bytes, no native indexing pretty-print) or as text (26 bytes, larger). UUID v7 gets you most of ULID's ergonomics (same B-tree locality, same embedded timestamp) with native database support. For greenfield Postgres-backed aggregates, that often tips the decision toward v7.

Snowflake: 64 bits, but you pay for it

Twitter's Snowflake (the ID format originally published by Twitter in 2010, unrelated to the Snowflake data warehouse from Snowflake Inc.) has a layout of 1 sign bit + 41 bits of millisecond timestamp (Twitter's epoch, ~69 years of headroom) + 10 bits of machine ID + 12 bits of per-millisecond sequence. The whole thing fits in a signed int64.

package order

import "github.com/bwmarrin/snowflake"

var node *snowflake.Node

func InitNode(machineID int64) error {
    n, err := snowflake.NewNode(machineID)
    if err != nil {
        return err
    }
    node = n
    return nil
}

type OrderID int64

func NewOrderID() OrderID {
    return OrderID(node.Generate().Int64())
}

8-byte primary keys (half a UUID), strictly monotonic per-machine output, a clear "later ID always means later creation" guarantee. The price: 10 bits of machine ID is now configuration you own. Every pod, every Lambda, every cron worker that mints IDs needs a unique value in [0, 1024). Container schedulers don't hand those out. You end up with a sidecar service, a ConfigMap, or a leases table. All infrastructure you didn't have before. The other gotcha is the 41-bit timestamp horizon. Pick the wrong epoch (or the default lib epoch) and you have an opaque silent rollover scheduled 69 years from now.

Snowflake makes sense when ID size is the bottleneck: high-volume event tables, IoT telemetry, anything with a 16-byte primary key that the team already wishes were 8. For most line-of-business aggregates, 8 bytes saved per row is not worth the operational tax.

A B-tree shape benchmark you can run

The "random IDs hurt your index" claim is easy to verify. The skeleton below is the inner loop you can drop into a small program that creates four id tables, calls benchInsert for each ID generator, and then reads pg_relation_size('orders_v4_id_idx') etc. for the index sizes. What you should expect to see (not what is reported here — run it on your own hardware before quoting any figure in a PR review):

package main

import (
    "context"
    "fmt"
    "time"

    "github.com/jackc/pgx/v5"
)

const N = 1_000_000

func benchInsert(
    ctx context.Context,
    conn *pgx.Conn,
    table string,
    gen func() string,
) (time.Duration, error) {
    start := time.Now()
    for i := 0; i < N; i++ {
        _, err := conn.Exec(
            ctx,
            "INSERT INTO "+table+"(id) VALUES ($1)",
            gen(),
        )
        if err != nil {
            return 0, err
        }
    }
    return time.Since(start), nil
}

The expected shape on an unloaded Postgres 16 with synchronous_commit=off: UUID v4 inserts are slow and the index ends up bigger. v7, ULID, Snowflake all cluster together at the fast end. The exact ratio depends on shared_buffers and cache warmth. Expect a meaningful, often several-fold difference in insert throughput between random and monotonic IDs at sustained high concurrency on a tight buffer pool. Expect very little difference if your table fits in RAM.

The point of running it yourself is not to learn that v7 is faster than v4 (it is). It is to see the gap on your hardware, with your row sizes, with your fill factor. That number is what should drive the decision.

The decision rule

Short form, in the order you should ask the questions:

Are you on Postgres or any DB with a native uuid type, and is the table going to grow past a few million rows with sustained writes? Use UUID v7. It is the new default. RFC-standardised, native DB support, B-tree-friendly, embedded timestamp.
Are IDs going to appear in URLs, customer-facing tickets, or logs read by humans, and do you want the shorter, case-tolerant string? Use ULID. Same B-tree behaviour as v7, friendlier rendering, mature Go library.
Is the primary key size itself the bottleneck (IoT, event ingestion, billions of rows where 8 bytes vs 16 changes the architecture)? Use the Snowflake ID format (Twitter's, not the data warehouse). Accept the machine-ID infrastructure cost and pick a sane epoch.
None of the above, and the table is small or write-light? UUID v4 is fine. Don't optimise the boring case.

What never works: picking UUID v4 because "it's the default" and discovering at 50M rows that your hottest aggregate's primary key is also your biggest index. The cost of switching the ID type after the fact is migrating every row, every foreign key, every event in your archive. Pick once, pick deliberately.

Where it lives in the aggregate

Whichever ID you pick, the constructor is where you mint it. Never the database. Never the HTTP handler.

package order

func NewOrder(customerID CustomerID) (*Order, error) {
    id, err := NewOrderID()
    if err != nil {
        return nil, err
    }
    return &Order{
        id:         id,
        customerID: customerID,
        status:     StatusDraft,
        createdAt:  time.Now().UTC(),
    }, nil
}

The aggregate owns its identity. The repository receives an aggregate that already knows its ID and writes it. The HTTP layer never invents IDs and never patches them. That keeps the boundary clean and lets you swap the underlying ID strategy in one place when the team moves from v4 to v7. Which, if your tables are growing, is the call.

If this was useful

Aggregate identity is one of those topics where a 90-minute decision compounds for years. Hexagonal Architecture in Go spends a chunk of its persistence chapter on exactly this — ID generation at the aggregate root, repository contracts, how the chosen ID type leaks (or doesn't) into the database adapter, and what to do when you inherit a v4-keyed table and need to migrate it without downtime. Pairs with The Complete Guide to Go Programming if the language fundamentals are still settling in.