DEV Community

erickh826
erickh826

Posted on

No wall clock required: a 64-bit ID generator from a counter + reversible permutation

No wall clock required: a 64-bit ID generator from a counter + reversible permutation

permid64 — Clock-free, persistent, reversible-permutation 64-bit ID generation.

GitHub: erickh826/permid64 · PyPI: pip install permid64

English version of the article; roadmap and limitation notes aligned with the current project README.


Part 1 — The classic trilemma

Almost every system that needs unique identifiers forces a choice among three familiar patterns: database auto-increment, random UUIDs, or timestamp-centric Snowflake-style IDs. Each looks workable at first; the painful trade-offs often show up only at scale. Before we see how permid64 sidesteps this triangle, it helps to be precise about what hurts in each option.

Auto-increment: your volume is public

An auto-increment primary key is the simplest uniqueness story, but it exposes the counter at the external boundary. An attacker who places two orders and subtracts the IDs learns exactly how many orders were issued in between.

Example: at 10:00 you receive order_id = 100420; at 14:00 another order returns 100780 — 360 orders in four hours, your growth curve laid bare. A competitor can automate small orders to sample that curve over time. For e-commerce or SaaS where such leakage matters, it is unacceptable. Sequential IDs also invite IDOR-style probing: increment or decrement the id and try other users’ resources.

UUID v4: the 128-bit random tax

UUID v4 uses 122 random bits so that, statistically, you get global uniqueness without coordination — which is why it is everywhere. The 128-bit width costs space in every index entry; worse, fully random keys interact badly with B+ tree indexes.

Page splits depend strongly on key distribution:

  • Sequential keys: new rows land in the rightmost leaf; on average you trigger one split only every BB inserts ( BB = keys per page). The working set stays small and buffer-pool friendly.
  • Random keys: each insert may hit an arbitrary leaf. Once the index exceeds RAM, almost every insert becomes a cache miss and a disk read.

With buffer pool size CbufC_{\mathrm{buf}} (in pages), insert rate rr (pages/sec), and fanout BB , random-key cache-miss pressure rises sharply once row count nn passes roughly:

n>CbufrB n > \frac{C_{\mathrm{buf}}}{r \cdot B}

After that, random inserts tend to cost one random I/O each and throughput collapses. On PostgreSQL or MySQL/InnoDB with a B+ tree clustered primary key, UUID v4 keys can slow writes by orders of magnitude once data outgrows memory. That is the structural UUID-as-PK problem.

Snowflake / timestamp IDs: you married the wall clock

Snowflake and relatives (TSID, ULID, …) embed wall-clock time in the high bits so ids are roughly monotonic — which fixes the B+ tree random-write pain. Correctness then depends on monotonic real time, which production clusters do not always provide.

Where clock issues bite:

  • NTP step backward: correcting a fast clock can move system time backwards; if the step exceeds the sequence field’s capacity, duplicate ids become possible.
  • VM pause / resume: after live migration or snapshot restore, guest time can jump backward by seconds or minutes.
  • Leap seconds: OSes disagree on how to represent them; some paths can yield duplicate gettimeofday() results.

Once wall time goes backward, a timestamp scheme either emits duplicates or blocks until time catches up — neither is great. “Wait until the clock exceeds the last timestamp” moves the failure mode from duplicates to stalls; large NTP corrections can stall long enough to trip upstream timeouts.


Part 2 — Breaking the deadlock: counter in, permutation out

permid64 can be summarized in one line:

Uniqueness comes from the counter; the random-looking surface comes from a reversible permutation.

The counter supplies a strictly increasing sequence — under the documented persistence rules, that is the core uniqueness argument. A bijection over 64 bits maps consecutive inner values to values that look unstructured from the outside, so observers cannot read business volume or ordering off the wire. Because the map is invertible, every id can be decode()’d back to metadata with no information loss. You get “opaque externally” and “traceable internally” at the same time.

A clock-free observation

Claim: If an id generator ff never reads wall-clock time, then ff is immune to clock rollback (with respect to time as an input).

Argument: Suppose ff depends only on persistent counter state sZ0s \in \mathbb{Z}_{\geq 0} and fixed parameters θ\theta . Then f(s,θ)f(s,\theta) does not mention a time variable tt ; whatever NTP or the hypervisor does to the clock, the issued sequence is driven by monotonic advances of ss . So clock skew does not enter the uniqueness story for ff as defined. \blacksquare

Pipeline shape

Encode (issue an id):

seq  = source.next()                    # monotonic counter (persistent)
raw  = layout.compose(instance_id, seq) # pack 16-bit shard + 48-bit seq
id64 = permutation.forward(raw)         # bijection: obfuscate
Enter fullscreen mode Exit fullscreen mode

Decode (recover metadata):

raw  = permutation.inverse(id64)        # reverse bijection
meta = layout.decompose(raw)            # → DecodedId(instance_id, sequence)
Enter fullscreen mode Exit fullscreen mode

End-to-end, the design is a three-stage pipeline: Source → Layout → Permutation. Each stage has a narrow job and can be swapped independently (single responsibility). For example, you could replace the permutation with an AES-based format-preserving encryption (FPE) construction without rewriting the counter or layout layers. permid64 expresses layer contracts with Python Protocol so implementations can be injected. The next parts unpack each layer.


Part 3 — The three layers in depth

3.1 Source: block reservation

Problem: After restart, the counter must still advance; the naive approach is to fsync on every next() — too slow. On a fast NVMe, one fsync might be on the order of tens of microseconds, capping you around ~100k durable ops/sec, which is not enough for hot id paths.

Idea: Block reservation. You do not persist every value; you persist a high-water mark — the upper bound of the reserved range. Reserve block_size ids at once (default 4096), then serve the next 4095 ids from memory with no extra disk I/O.

Restart-safety (sketch)

Claim: With block reservation, a crash after restart cannot re-issue an id that was already issued before the crash.

Setup:

  • Let HH be the last high-water mark successfully written to disk before any sequence sHBs \geq H - B is handed out ( BB = block size).
  • A crash loses all in-memory state.

Goal: Any ss' issued after restart is strictly greater than any ss issued before the crash.

Sketch:

  1. Before crash, the last durable watermark is HH . In that block, issued values satisfy HBs<HH - B \leq s < H .
  2. After crash, the process reloads HH from disk and reserves a new block: H=H+BH' = H + B .
  3. The new block starts at HH , so post-restart values satisfy sH>ss' \geq H > s for all pre-crash ss .

So the old and new ranges are disjoint — no duplicates. \blacksquare

Gaps are acceptable

A crash wastes the rest of the current block (up to B1B-1 unused values). That is intentional:

Accept gaps to guarantee uniqueness.

Gaps do not create collisions; they only skip numbers. For an id generator, uniqueness is the hard requirement; consecutive values are not. Because ids pass through a permutation, external observers cannot infer where gaps occurred anyway.

Pseudocode

class PersistentCounterSource:
    def __init__(self, state_file: str, block_size: int = 4096):
        self.path = Path(state_file)
        self.block_size = block_size
        self._next = 0
        self._limit = 0    # _next < _limit while block is live

    def _reserve_block(self):
        """Read H from disk, write H' = H + B, set in-memory range."""
        H = read_highwater(self.path)       # 0 if file missing
        H_new = H + self.block_size
        atomic_write(self.path, H_new)      # fsync + atomic rename
        self._next  = H
        self._limit = H_new

    def next(self) -> int:
        if self._next >= self._limit:
            self._reserve_block()           # one I/O every block_size calls
        val = self._next
        self._next += 1
        return val
Enter fullscreen mode Exit fullscreen mode

Amortized disk cost is O(1/B)O(1/B) per next().


3.2 Layout: bit packing

The layout packs instance_id (shard tag) and sequence into one 64-bit integer. The default split is:

b63b48instanceid (16 bits)b47b0sequence (48 bits) \underbrace{b_{63} \cdots b_{48}}{\mathrm{instance_id\ (16\ bits)}} \underbrace{b{47} \cdots b_0}_{\mathrm{sequence\ (48\ bits)}}

Capacity:

  • Up to 216=65,5352^{16} = 65{,}535 distinct shards.
  • Per shard, 2482.81×10142^{48} \approx 2.81 \times 10^{14} ids.

Even at one million ids per second, a single shard would take on the order of thousands of years to exhaust 48 bits.

Compose:

raw=(instanceid48)seq \mathrm{raw} = (\mathrm{instance_id} \ll 48) \mid \mathrm{seq}

Decompose:

seq=rawmod248,instanceid=raw48 \mathrm{seq} = \mathrm{raw} \bmod 2^{48}, \quad \mathrm{instance_id} = \mathrm{raw} \gg 48

Layout64 is configurable — e.g. Layout64(instance_bits=10, sequence_bits=54) trades shard count for per-shard headroom. The only hard rule is instance_bits + sequence_bits == 64.


3.3 Permutation: mixing without losing invertibility

Why a bijection, not a hash?

If you “hid” the counter with SHA-256, you could not implement decode() to recover instance_id and sequence — hashes are one-way. permid64 needs a permutation (bijection) on 64-bit strings:

Definition: A map P:Z264Z264P: \mathbb{Z}{2^{64}} \to \mathbb{Z}{2^{64}} with P1(P(x))=xP^{-1}(P(x)) = x for all x[0,264)x \in [0, 2^{64}) .

Two built-ins ship today.


MultiplyOdd permutation

Map:

f(x)=(ax+b)mod264 f(x) = (ax + b) \bmod 2^{64}

with aa odd and bb any 64-bit integer.

Invertibility

Claim: Odd aa has a modular inverse a1a^{-1} modulo 2642^{64} , and

f1(y)=a1(yb)mod264 f^{-1}(y) = a^{-1}(y - b) \bmod 2^{64}

Sketch:

  1. Odd aa implies gcd(a,264)=1\gcd(a, 2^{64}) = 1 ; Bézout gives a1a^{-1} with aa11(mod264)a \cdot a^{-1} \equiv 1 \pmod{2^{64}} .
  2. From yax+by \equiv ax + b : ybaxy - b \equiv ax , multiply by a1a^{-1} to recover xx .
  3. Compose to check f(f1(y))=yf(f^{-1}(y)) = y .

So ff is a bijection on Z264\mathbb{Z}_{2^{64}} . \blacksquare

Default constants
  • a=0x9E3779B185EBCA87a = \texttt{0x9E3779B185EBCA87} — a 64-bit golden-ratio–related multiplier (odd), in the spirit of Knuth’s multiplicative hashing for good bit diffusion.
  • b=0x6A09E667F3BCC909b = \texttt{0x6A09E667F3BCC909} — first 64 bits of SHA-256’s initial state, used as an independent additive offset.

Feistel64 permutation

A Feistel network is a classic construction: the round function FF need not be invertible; the round structure still yields a permutation.

permid64 uses a balanced Feistel on 64 bits: left and right halves L,RL, R are 32 bits each.

One Feistel round
Li+1=Ri,Ri+1=LiF(Ki,Ri) L_{i+1} = R_i, \quad R_{i+1} = L_i \oplus F(K_i, R_i)
Inverting one round

Claim: Given (Li+1,Ri+1)(L_{i+1}, R_{i+1}) and round key KiK_i , you can recover (Li,Ri)(L_i, R_i) uniquely.

Proof:

From Li+1=RiL_{i+1} = R_i , we get Ri=Li+1R_i = L_{i+1} .

From Ri+1=LiF(Ki,Ri)R_{i+1} = L_i \oplus F(K_i, R_i) and substituting Ri=Li+1R_i = L_{i+1} :

Ri+1=LiF(Ki,Li+1) R_{i+1} = L_i \oplus F(K_i, L_{i+1})

XOR both sides with F(Ki,Li+1)F(K_i, L_{i+1}) :

Li=Ri+1F(Ki,Li+1) L_i = R_{i+1} \oplus F(K_i, L_{i+1})

So the inverse round is:

Li=Ri+1F(Ki,Li+1),Ri=Li+1 L_i = R_{i+1} \oplus F(K_i, L_{i+1}), \quad R_i = L_{i+1}

Takeaway: FF does not need an inverse; on the way back, RiR_i is known as Li+1L_{i+1} , so you only re-evaluate F(Ki,Li+1)F(K_i, L_{i+1}) . \blacksquare

Six-round forward / inverse (symbolic)

Forward (keys K0,,K5K_0,\ldots,K_5 in order):

Round 0:L1=R0,R1=L0F(K0,R0) Round 1:L2=R1,R2=L1F(K1,R1) Round 2:L3=R2,R3=L2F(K2,R2) Round 3:L4=R3,R4=L3F(K3,R3) Round 4:L5=R4,R5=L4F(K4,R4) Round 5:L6=R5,R6=L5F(K5,R5) \begin{aligned} \text{Round }0\text{:} \quad & L_1 = R_0, \quad R_1 = L_0 \oplus F(K_0, R_0) \ \text{Round }1\text{:} \quad & L_2 = R_1, \quad R_2 = L_1 \oplus F(K_1, R_1) \ \text{Round }2\text{:} \quad & L_3 = R_2, \quad R_3 = L_2 \oplus F(K_2, R_2) \ \text{Round }3\text{:} \quad & L_4 = R_3, \quad R_4 = L_3 \oplus F(K_3, R_3) \ \text{Round }4\text{:} \quad & L_5 = R_4, \quad R_5 = L_4 \oplus F(K_4, R_4) \ \text{Round }5\text{:} \quad & L_6 = R_5, \quad R_6 = L_5 \oplus F(K_5, R_5) \end{aligned}

Output is (L6,R6)(L_6, R_6) .

Inverse (keys K5,,K0K_5,\ldots,K_0 in reverse):

Round 5:L5=R6F(K5,L6),R5=L6 Round 4:L4=R5F(K4,L5),R4=L5 Round 3:L3=R4F(K3,L4),R3=L4 Round 2:L2=R3F(K2,L3),R2=L3 Round 1:L1=R2F(K1,L2),R1=L2 Round 0:L0=R1F(K0,L1),R0=L1 \begin{aligned} \text{Round }5^{\prime}\text{:} \quad & L_5 = R_6 \oplus F(K_5, L_6), \quad R_5 = L_6 \ \text{Round }4^{\prime}\text{:} \quad & L_4 = R_5 \oplus F(K_4, L_5), \quad R_4 = L_5 \ \text{Round }3^{\prime}\text{:} \quad & L_3 = R_4 \oplus F(K_3, L_4), \quad R_3 = L_4 \ \text{Round }2^{\prime}\text{:} \quad & L_2 = R_3 \oplus F(K_2, L_3), \quad R_2 = L_3 \ \text{Round }1^{\prime}\text{:} \quad & L_1 = R_2 \oplus F(K_1, L_2), \quad R_1 = L_2 \ \text{Round }0^{\prime}\text{:} \quad & L_0 = R_1 \oplus F(K_0, L_1), \quad R_0 = L_1 \end{aligned}

This recovers (L0,R0)(L_0,R_0) .

Round function FF (as implemented)
def _round_f(r: int, k: int) -> int:
    x = (r ^ k) & MASK32                   # XOR round key
    x = (x * 0x9E3779B1) & MASK32          # Knuth-style multiply (32-bit)
    x ^= (x >> 16)                          # xor-shift: high → low bits
    x = ((x << 5) | (x >> 27)) & MASK32    # rotate left by 5
    x = (x * 0x85EBCA6B) & MASK32          # MurmurHash3 finalizer constant
    x ^= (x >> 13)                          # xor-shift mix again
    return x & MASK32
Enter fullscreen mode Exit fullscreen mode

Constants:

  • 0x9E3779B1 — 32-bit golden-ratio–style multiplier for avalanche.
  • 0x85EBCA6B — MurmurHash3 finalizer constant for a second mixing stage.
Round-key derivation

From one 64-bit seed, xorshift64*–style steps yield rounds 32-bit subkeys:

@staticmethod
def _derive_round_keys(key: int, rounds: int) -> list[int]:
    keys = []
    x = key & MASK64
    for _ in range(rounds):
        x ^= (x >> 12) & MASK64
        x ^= (x << 25) & MASK64
        x ^= (x >> 27) & MASK64
        x = (x * 0x2545F4914F6CDD1D) & MASK64
        keys.append(x & MASK32)
    return keys
Enter fullscreen mode Exit fullscreen mode

Comparing the two permutations

MultiplyOdd Feistel64
Speed ~500M ids/sec ~150M ids/sec
Mixing Affine (single round) Nonlinear, 6 rounds
Invertibility Odd multiply has mod inverse Structural Feistel property
Math f(x)=(ax+b)mod264f(x)=(ax+b)\bmod 2^{64} Feistel network
When Throughput-first, “good enough” mixing Stronger statistical obscurity

Part 4 — Engineering trade-offs and roadmap

Limitations (read before adopting)

permid64 is not trying to be the one id scheme for every system. A few non-negotiables:

  1. Single-process state file today: PersistentCounterSource is not safe for multiple processes sharing one state file. POSIX uses a best-effort fcntl.flock during block reservation, but that is not a substitute for correct sharding. The supported pattern is distinct instance_id + distinct state file per process until multi-process locking lands. See README Limitations for detail.

  2. Not cryptography: Feistel here is obfuscation, not AES. Anyone who knows the permutation parameters can decode() ids. Do not use these ids as secret tokens or auth credentials.

  3. Manual instance_id: You must assign unique ids per shard/process yourself. Collisions happen if two live generators reuse the same instance_id. Use env vars or config management and validate at startup. Central allocation is on the roadmap (ReservedBlockSource, v0.5 in the table below).

Benchmarks (permutation only)

Apple M2 / Python 3.12–class numbers from the project’s benchmark script:

Permutation Throughput Latency (order of)
MultiplyOdd ~480M ids/sec ~2 ns
Feistel (6 rounds) ~140M ids/sec ~7 ns
Identity (no mixing) ~520M ids/sec ~2 ns

These figures are pure CPU permutation cost; block reservation amortizes fsync to about once per 4096 ids in steady state. For the authoritative roadmap (multi-process locking, encodings, allocator, reference implementations), see the README in the repo.

Roadmap (from README)

Version Theme What's included Status
v0.1 Core primitives PersistentCounterSource, Feistel / Multiplicative permutation, decode() Released
v0.2 Encoding & Config IdentityPermutation, fixed-width Base62 + Crockford Base32, Id64Config + build_id64 Released
v0.3 Multi-process safety Process-safe file locking (fcntl / msvcrt), documented single-machine multi-process patterns, safer state file semantics Next
v0.4 Human-friendly output Check digit, PrefixedEncoder, FormatSpec (segmented display, ambiguity-reduced alphabets) Planned
v0.5 Distributed sources ReservedBlockSource (central allocator, Redis / PG-backed block rental), instance_id helpers Planned
v0.6 Solution presets OrderIdGenerator, TicketIdGenerator, CorrelationIdGenerator, IoTEventIdGenerator Planned
v0.7 Formal spec Cross-language bit layout, profile aliases, compatibility document Planned
v1.0 Reference implementations Go + Rust reference implementations, three-language cross-decode tests Planned

Quick start

from permid64 import Id64

gen = Id64.feistel(
    instance_id=42,
    state_file="my_service.state",
    key=0xDEADBEEFCAFEBABE,
)

id_val = gen.next_u64()          # int
token  = gen.next_base62()     # str

meta = gen.decode(id_val)
print(meta.instance_id)          # 42
print(meta.sequence)             # 0, 1, 2, ...

meta2 = gen.decode_base62(token)
Enter fullscreen mode Exit fullscreen mode

Summary

permid64 escapes parts of the auto-increment / UUID / timestamp trilemma by proving uniqueness from a monotonic counter and hiding structure with a bijection you can invert for operations.

Auto-increment UUID v4 Snowflake permid64
Width 32/64 128 64 64
Depends on DB sequence RNG Wall clock (NTP) Persistent counter (file)
Clock rollback none none high risk none (no clock in path)
Surface strictly increasing random roughly increasing random-looking (invertible)
B+ tree friendliness excellent poor good moderate
Decodable trivially leaks count no yes (time + worker) yes (instance_id + sequence)
Volume leakage severe none some inference possible none from sequential counter alone
Infra database none coordination for worker ids durable local storage
Cross-process DB guarantees probabilistic worker-id scheme today: manual shards; roadmap: locking + allocator

There is no perfect id — only the right id for the constraints. If you want 64-bit, clock-free, decodable, opaque-looking identifiers in pure Python with zero runtime dependencies, permid64 is worth a serious look. The layered design also leaves room to swap in stronger permutations or add allocation without breaking the mental model.

Top comments (0)