erickh826

Posted on Apr 13

No wall clock required: a 64-bit ID generator from a counter + reversible permutation

#algorithms #computerscience #python #systemdesign

No wall clock required: a 64-bit ID generator from a counter + reversible permutation

permid64 — Clock-free, persistent, reversible-permutation 64-bit ID generation.

GitHub: erickh826/permid64 · PyPI: pip install permid64

English version of the article; roadmap and limitation notes aligned with the current project README.

Part 1 — The classic trilemma

Almost every system that needs unique identifiers forces a choice among three familiar patterns: database auto-increment, random UUIDs, or timestamp-centric Snowflake-style IDs. Each looks workable at first; the painful trade-offs often show up only at scale. Before we see how permid64 sidesteps this triangle, it helps to be precise about what hurts in each option.

Auto-increment: your volume is public

An auto-increment primary key is the simplest uniqueness story, but it exposes the counter at the external boundary. An attacker who places two orders and subtracts the IDs learns exactly how many orders were issued in between.

Example: at 10:00 you receive order_id = 100420; at 14:00 another order returns 100780 — 360 orders in four hours, your growth curve laid bare. A competitor can automate small orders to sample that curve over time. For e-commerce or SaaS where such leakage matters, it is unacceptable. Sequential IDs also invite IDOR-style probing: increment or decrement the id and try other users’ resources.

UUID v4: the 128-bit random tax

UUID v4 uses 122 random bits so that, statistically, you get global uniqueness without coordination — which is why it is everywhere. The 128-bit width costs space in every index entry; worse, fully random keys interact badly with B+ tree indexes.

Page splits depend strongly on key distribution:

Sequential keys: new rows land in the rightmost leaf; on average you trigger one split only every $B$ inserts ( $B$ = keys per page). The working set stays small and buffer-pool friendly.
Random keys: each insert may hit an arbitrary leaf. Once the index exceeds RAM, almost every insert becomes a cache miss and a disk read.

With buffer pool size $C_{\mathrm{buf}}$ (in pages), insert rate $r$ (pages/sec), and fanout $B$ , random-key cache-miss pressure rises sharply once row count $n$ passes roughly:

n > \frac{C_{\mathrm{buf}}}{r \cdot B}

After that, random inserts tend to cost one random I/O each and throughput collapses. On PostgreSQL or MySQL/InnoDB with a B+ tree clustered primary key, UUID v4 keys can slow writes by orders of magnitude once data outgrows memory. That is the structural UUID-as-PK problem.

Snowflake / timestamp IDs: you married the wall clock

Snowflake and relatives (TSID, ULID, …) embed wall-clock time in the high bits so ids are roughly monotonic — which fixes the B+ tree random-write pain. Correctness then depends on monotonic real time, which production clusters do not always provide.

Where clock issues bite:

NTP step backward: correcting a fast clock can move system time backwards; if the step exceeds the sequence field’s capacity, duplicate ids become possible.
VM pause / resume: after live migration or snapshot restore, guest time can jump backward by seconds or minutes.
Leap seconds: OSes disagree on how to represent them; some paths can yield duplicate gettimeofday() results.

Once wall time goes backward, a timestamp scheme either emits duplicates or blocks until time catches up — neither is great. “Wait until the clock exceeds the last timestamp” moves the failure mode from duplicates to stalls; large NTP corrections can stall long enough to trip upstream timeouts.

Part 2 — Breaking the deadlock: counter in, permutation out

permid64 can be summarized in one line:

Uniqueness comes from the counter; the random-looking surface comes from a reversible permutation.

The counter supplies a strictly increasing sequence — under the documented persistence rules, that is the core uniqueness argument. A bijection over 64 bits maps consecutive inner values to values that look unstructured from the outside, so observers cannot read business volume or ordering off the wire. Because the map is invertible, every id can be decode()’d back to metadata with no information loss. You get “opaque externally” and “traceable internally” at the same time.

A clock-free observation

Claim: If an id generator $f$ never reads wall-clock time, then $f$ is immune to clock rollback (with respect to time as an input).

Argument: Suppose $f$ depends only on persistent counter state $s \in \mathbb{Z}_{\geq 0}$ and fixed parameters $\theta$ . Then $f(s,\theta)$ does not mention a time variable $t$ ; whatever NTP or the hypervisor does to the clock, the issued sequence is driven by monotonic advances of $s$ . So clock skew does not enter the uniqueness story for $f$ as defined. $\blacksquare$

Pipeline shape

Encode (issue an id):

seq  = source.next()                    # monotonic counter (persistent)
raw  = layout.compose(instance_id, seq) # pack 16-bit shard + 48-bit seq
id64 = permutation.forward(raw)         # bijection: obfuscate

Decode (recover metadata):

raw  = permutation.inverse(id64)        # reverse bijection
meta = layout.decompose(raw)            # → DecodedId(instance_id, sequence)

End-to-end, the design is a three-stage pipeline: Source → Layout → Permutation. Each stage has a narrow job and can be swapped independently (single responsibility). For example, you could replace the permutation with an AES-based format-preserving encryption (FPE) construction without rewriting the counter or layout layers. permid64 expresses layer contracts with Python Protocol so implementations can be injected. The next parts unpack each layer.

Part 3 — The three layers in depth

3.1 Source: block reservation

Problem: After restart, the counter must still advance; the naive approach is to fsync on every next() — too slow. On a fast NVMe, one fsync might be on the order of tens of microseconds, capping you around ~100k durable ops/sec, which is not enough for hot id paths.

Idea: Block reservation. You do not persist every value; you persist a high-water mark — the upper bound of the reserved range. Reserve block_size ids at once (default 4096), then serve the next 4095 ids from memory with no extra disk I/O.

Restart-safety (sketch)

Claim: With block reservation, a crash after restart cannot re-issue an id that was already issued before the crash.

Setup:

Let $H$ be the last high-water mark successfully written to disk before any sequence $s \geq H - B$ is handed out ( $B$ = block size).
A crash loses all in-memory state.

Goal: Any $s'$ issued after restart is strictly greater than any $s$ issued before the crash.

Sketch:

Before crash, the last durable watermark is $H$ . In that block, issued values satisfy $H - B \leq s < H$ .
After crash, the process reloads $H$ from disk and reserves a new block: $H' = H + B$ .
The new block starts at $H$ , so post-restart values satisfy $s' \geq H > s$ for all pre-crash $s$ .

So the old and new ranges are disjoint — no duplicates. $\blacksquare$

Gaps are acceptable

A crash wastes the rest of the current block (up to $B-1$ unused values). That is intentional:

Accept gaps to guarantee uniqueness.

Gaps do not create collisions; they only skip numbers. For an id generator, uniqueness is the hard requirement; consecutive values are not. Because ids pass through a permutation, external observers cannot infer where gaps occurred anyway.

Pseudocode

class PersistentCounterSource:
    def __init__(self, state_file: str, block_size: int = 4096):
        self.path = Path(state_file)
        self.block_size = block_size
        self._next = 0
        self._limit = 0    # _next < _limit while block is live

    def _reserve_block(self):
        """Read H from disk, write H' = H + B, set in-memory range."""
        H = read_highwater(self.path)       # 0 if file missing
        H_new = H + self.block_size
        atomic_write(self.path, H_new)      # fsync + atomic rename
        self._next  = H
        self._limit = H_new

    def next(self) -> int:
        if self._next >= self._limit:
            self._reserve_block()           # one I/O every block_size calls
        val = self._next
        self._next += 1
        return val

Amortized disk cost is $O(1/B)$ per next().

3.2 Layout: bit packing

The layout packs instance_id (shard tag) and sequence into one 64-bit integer. The default split is:

\underbrace{b_{63} \cdots b_{48}}{\mathrm{instance_id\ (16\ bits)}} \underbrace{b{47} \cdots b_0}_{\mathrm{sequence\ (48\ bits)}}

Capacity:

Up to $2^{16} = 65{,}535$ distinct shards.
Per shard, $2^{48} \approx 2.81 \times 10^{14}$ ids.

Even at one million ids per second, a single shard would take on the order of thousands of years to exhaust 48 bits.

Compose:

\mathrm{raw} = (\mathrm{instance_id} \ll 48) \mid \mathrm{seq}

Decompose:

\mathrm{seq} = \mathrm{raw} \bmod 2^{48}, \quad \mathrm{instance_id} = \mathrm{raw} \gg 48

Layout64 is configurable — e.g. Layout64(instance_bits=10, sequence_bits=54) trades shard count for per-shard headroom. The only hard rule is instance_bits + sequence_bits == 64.

3.3 Permutation: mixing without losing invertibility

Why a bijection, not a hash?

If you “hid” the counter with SHA-256, you could not implement decode() to recover instance_id and sequence — hashes are one-way. permid64 needs a permutation (bijection) on 64-bit strings:

Definition: A map $P: \mathbb{Z}{2^{64}} \to \mathbb{Z}{2^{64}}$ with $P^{-1}(P(x)) = x$ for all $x \in [0, 2^{64})$ .

Two built-ins ship today.

MultiplyOdd permutation

Map:

f(x) = (ax + b) \bmod 2^{64}

with $a$ odd and $b$ any 64-bit integer.

Invertibility

Claim: Odd $a$ has a modular inverse $a^{-1}$ modulo $2^{64}$ , and

f^{-1}(y) = a^{-1}(y - b) \bmod 2^{64}

Sketch:

Odd $a$ implies $\gcd(a, 2^{64}) = 1$ ; Bézout gives $a^{-1}$ with $a \cdot a^{-1} \equiv 1 \pmod{2^{64}}$ .
From $y \equiv ax + b$ : $y - b \equiv ax$ , multiply by $a^{-1}$ to recover $x$ .
Compose to check $f(f^{-1}(y)) = y$ .

So $f$ is a bijection on $\mathbb{Z}_{2^{64}}$ . $\blacksquare$

Default constants

$a = \texttt{0x9E3779B185EBCA87}$ — a 64-bit golden-ratio–related multiplier (odd), in the spirit of Knuth’s multiplicative hashing for good bit diffusion.
$b = \texttt{0x6A09E667F3BCC909}$ — first 64 bits of SHA-256’s initial state, used as an independent additive offset.

Feistel64 permutation

A Feistel network is a classic construction: the round function $F$ need not be invertible; the round structure still yields a permutation.

permid64 uses a balanced Feistel on 64 bits: left and right halves $L, R$ are 32 bits each.

One Feistel round

L_{i+1} = R_i, \quad R_{i+1} = L_i \oplus F(K_i, R_i)

Inverting one round

Claim: Given $(L_{i+1}, R_{i+1})$ and round key $K_i$ , you can recover $(L_i, R_i)$ uniquely.

Proof:

From $L_{i+1} = R_i$ , we get $R_i = L_{i+1}$ .

From $R_{i+1} = L_i \oplus F(K_i, R_i)$ and substituting $R_i = L_{i+1}$ :

R_{i+1} = L_i \oplus F(K_i, L_{i+1})

XOR both sides with $F(K_i, L_{i+1})$ :

L_i = R_{i+1} \oplus F(K_i, L_{i+1})

So the inverse round is:

L_i = R_{i+1} \oplus F(K_i, L_{i+1}), \quad R_i = L_{i+1}

Takeaway: $F$ does not need an inverse; on the way back, $R_i$ is known as $L_{i+1}$ , so you only re-evaluate $F(K_i, L_{i+1})$ . $\blacksquare$

Six-round forward / inverse (symbolic)

Forward (keys $K_0,\ldots,K_5$ in order):

\begin{aligned} \text{Round }0\text{:} \quad & L_1 = R_0, \quad R_1 = L_0 \oplus F(K_0, R_0) \ \text{Round }1\text{:} \quad & L_2 = R_1, \quad R_2 = L_1 \oplus F(K_1, R_1) \ \text{Round }2\text{:} \quad & L_3 = R_2, \quad R_3 = L_2 \oplus F(K_2, R_2) \ \text{Round }3\text{:} \quad & L_4 = R_3, \quad R_4 = L_3 \oplus F(K_3, R_3) \ \text{Round }4\text{:} \quad & L_5 = R_4, \quad R_5 = L_4 \oplus F(K_4, R_4) \ \text{Round }5\text{:} \quad & L_6 = R_5, \quad R_6 = L_5 \oplus F(K_5, R_5) \end{aligned}

Output is $(L_6, R_6)$ .

Inverse (keys $K_5,\ldots,K_0$ in reverse):

\begin{aligned} \text{Round }5^{\prime}\text{:} \quad & L_5 = R_6 \oplus F(K_5, L_6), \quad R_5 = L_6 \ \text{Round }4^{\prime}\text{:} \quad & L_4 = R_5 \oplus F(K_4, L_5), \quad R_4 = L_5 \ \text{Round }3^{\prime}\text{:} \quad & L_3 = R_4 \oplus F(K_3, L_4), \quad R_3 = L_4 \ \text{Round }2^{\prime}\text{:} \quad & L_2 = R_3 \oplus F(K_2, L_3), \quad R_2 = L_3 \ \text{Round }1^{\prime}\text{:} \quad & L_1 = R_2 \oplus F(K_1, L_2), \quad R_1 = L_2 \ \text{Round }0^{\prime}\text{:} \quad & L_0 = R_1 \oplus F(K_0, L_1), \quad R_0 = L_1 \end{aligned}

This recovers $(L_0,R_0)$ .

Round function $F$ (as implemented)

def _round_f(r: int, k: int) -> int:
    x = (r ^ k) & MASK32                   # XOR round key
    x = (x * 0x9E3779B1) & MASK32          # Knuth-style multiply (32-bit)
    x ^= (x >> 16)                          # xor-shift: high → low bits
    x = ((x << 5) | (x >> 27)) & MASK32    # rotate left by 5
    x = (x * 0x85EBCA6B) & MASK32          # MurmurHash3 finalizer constant
    x ^= (x >> 13)                          # xor-shift mix again
    return x & MASK32

Constants:

0x9E3779B1 — 32-bit golden-ratio–style multiplier for avalanche.
0x85EBCA6B — MurmurHash3 finalizer constant for a second mixing stage.

Round-key derivation

From one 64-bit seed, xorshift64*–style steps yield rounds 32-bit subkeys:

@staticmethod
def _derive_round_keys(key: int, rounds: int) -> list[int]:
    keys = []
    x = key & MASK64
    for _ in range(rounds):
        x ^= (x >> 12) & MASK64
        x ^= (x << 25) & MASK64
        x ^= (x >> 27) & MASK64
        x = (x * 0x2545F4914F6CDD1D) & MASK64
        keys.append(x & MASK32)
    return keys

Comparing the two permutations

	MultiplyOdd	Feistel64
Speed	~500M ids/sec	~150M ids/sec
Mixing	Affine (single round)	Nonlinear, 6 rounds
Invertibility	Odd multiply has mod inverse	Structural Feistel property
Math	$f(x)=(ax+b)\bmod 2^{64}$	Feistel network
When	Throughput-first, “good enough” mixing	Stronger statistical obscurity

Part 4 — Engineering trade-offs and roadmap

Limitations (read before adopting)

permid64 is not trying to be the one id scheme for every system. A few non-negotiables:

Single-process state file today: PersistentCounterSource is not safe for multiple processes sharing one state file. POSIX uses a best-effort fcntl.flock during block reservation, but that is not a substitute for correct sharding. The supported pattern is distinct instance_id + distinct state file per process until multi-process locking lands. See README Limitations for detail.
Not cryptography: Feistel here is obfuscation, not AES. Anyone who knows the permutation parameters can decode() ids. Do not use these ids as secret tokens or auth credentials.
Manual instance_id: You must assign unique ids per shard/process yourself. Collisions happen if two live generators reuse the same instance_id. Use env vars or config management and validate at startup. Central allocation is on the roadmap (ReservedBlockSource, v0.5 in the table below).

Benchmarks (permutation only)

Apple M2 / Python 3.12–class numbers from the project’s benchmark script:

Permutation	Throughput	Latency (order of)
MultiplyOdd	~480M ids/sec	~2 ns
Feistel (6 rounds)	~140M ids/sec	~7 ns
Identity (no mixing)	~520M ids/sec	~2 ns

These figures are pure CPU permutation cost; block reservation amortizes fsync to about once per 4096 ids in steady state. For the authoritative roadmap (multi-process locking, encodings, allocator, reference implementations), see the README in the repo.

Roadmap (from README)

Version	Theme	What's included	Status
v0.1	Core primitives	`PersistentCounterSource`, Feistel / Multiplicative permutation, `decode()`	Released
v0.2	Encoding & Config	`IdentityPermutation`, fixed-width Base62 + Crockford Base32, `Id64Config` + `build_id64`	Released
v0.3	Multi-process safety	Process-safe file locking (`fcntl` / `msvcrt`), documented single-machine multi-process patterns, safer state file semantics	Next
v0.4	Human-friendly output	Check digit, `PrefixedEncoder`, `FormatSpec` (segmented display, ambiguity-reduced alphabets)	Planned
v0.5	Distributed sources	`ReservedBlockSource` (central allocator, Redis / PG-backed block rental), `instance_id` helpers	Planned
v0.6	Solution presets	`OrderIdGenerator`, `TicketIdGenerator`, `CorrelationIdGenerator`, `IoTEventIdGenerator`	Planned
v0.7	Formal spec	Cross-language bit layout, profile aliases, compatibility document	Planned
v1.0	Reference implementations	Go + Rust reference implementations, three-language cross-decode tests	Planned

Quick start

from permid64 import Id64

gen = Id64.feistel(
    instance_id=42,
    state_file="my_service.state",
    key=0xDEADBEEFCAFEBABE,
)

id_val = gen.next_u64()          # int
token  = gen.next_base62()     # str

meta = gen.decode(id_val)
print(meta.instance_id)          # 42
print(meta.sequence)             # 0, 1, 2, ...

meta2 = gen.decode_base62(token)

Summary

permid64 escapes parts of the auto-increment / UUID / timestamp trilemma by proving uniqueness from a monotonic counter and hiding structure with a bijection you can invert for operations.

	Auto-increment	UUID v4	Snowflake	permid64
Width	32/64	128	64	64
Depends on	DB sequence	RNG	Wall clock (NTP)	Persistent counter (file)
Clock rollback	none	none	high risk	none (no clock in path)
Surface	strictly increasing	random	roughly increasing	random-looking (invertible)
B+ tree friendliness	excellent	poor	good	moderate
Decodable	trivially leaks count	no	yes (time + worker)	yes (`instance_id` + `sequence`)
Volume leakage	severe	none	some inference possible	none from sequential counter alone
Infra	database	none	coordination for worker ids	durable local storage
Cross-process	DB guarantees	probabilistic	worker-id scheme	today: manual shards; roadmap: locking + allocator

There is no perfect id — only the right id for the constraints. If you want 64-bit, clock-free, decodable, opaque-looking identifiers in pure Python with zero runtime dependencies, permid64 is worth a serious look. The layered design also leaves room to swap in stronger permutations or add allocation without breaking the mental model.

GitHub: https://github.com/erickh826/permid64
PyPI: pip install permid64

DEV Community

No wall clock required: a 64-bit ID generator from a counter + reversible permutation

No wall clock required: a 64-bit ID generator from a counter + reversible permutation

Part 1 — The classic trilemma

Auto-increment: your volume is public

UUID v4: the 128-bit random tax

Snowflake / timestamp IDs: you married the wall clock

Part 2 — Breaking the deadlock: counter in, permutation out

A clock-free observation

Pipeline shape

Part 3 — The three layers in depth

3.1 Source: block reservation

Restart-safety (sketch)

Gaps are acceptable

Pseudocode

3.2 Layout: bit packing

3.3 Permutation: mixing without losing invertibility

Why a bijection, not a hash?

MultiplyOdd permutation

Invertibility

Default constants

Feistel64 permutation

One Feistel round

Inverting one round

Six-round forward / inverse (symbolic)

Round function $F$ (as implemented)

Round-key derivation

Comparing the two permutations

Part 4 — Engineering trade-offs and roadmap

Limitations (read before adopting)

Benchmarks (permutation only)

Roadmap (from README)

Quick start

Summary

Top comments (0)

No wall clock required: a 64-bit ID generator from a counter + reversible permutation

Part 1 — The classic trilemma

Auto-increment: your volume is public

UUID v4: the 128-bit random tax

Snowflake / timestamp IDs: you married the wall clock

Part 2 — Breaking the deadlock: counter in, permutation out

A clock-free observation

Pipeline shape

Part 3 — The three layers in depth

3.1 Source: block reservation

Restart-safety (sketch)

Gaps are acceptable

Pseudocode

3.2 Layout: bit packing

3.3 Permutation: mixing without losing invertibility

Why a bijection, not a hash?

MultiplyOdd permutation

Invertibility

Default constants

Feistel64 permutation

One Feistel round

Inverting one round

Six-round forward / inverse (symbolic)

Round function FFF (as implemented)

Round-key derivation

Comparing the two permutations

Part 4 — Engineering trade-offs and roadmap

Limitations (read before adopting)

Benchmarks (permutation only)

Roadmap (from README)

Quick start

Summary

Round function $F$ (as implemented)