Fixing a 1-in-256 bug in CLWW order-preserving encryption

#algorithms #computerscience #database #security

Many databases let you sort and range-query encrypted data using order-preserving encryption (OPE): a scheme where ciphertext bytes compare in the same order as their plaintexts, so standard B-tree indexes and ORDER BY clauses Just Work. A well-known construction is Chenette-Lewi-Weis-Wu (CLWW) from 2015 — small, fast, and widely deployed. The paper gives an order-revealing scheme with a custom comparator, and notes in passing that the same ciphertext bytes can be compared lexicographically to yield an order-preserving scheme as a side benefit.

But that reinterpretation is only almost right. Roughly 1 in 256 comparisons disagrees with the true plaintext order. In this post I'll walk through why, with a worked example, and show how two small changes to the encoding eliminate the residual — at the cost of one extra byte per ciphertext and a single arithmetic pass.

This post walks through the construction and the intuition. If you want the full derivation — triangular probability distributions, the signal/noise argument, empirical verification, and a discussion of what the scheme leaks — head over to the companion engineering note: Exact OPE from CLWW ORE via Backward Carry + MSB-Bit Placement.

Why order-preserving encryption?

When you encrypt a column in a database with a standard symmetric cipher — AES-GCM, ChaCha20, pick your favourite — you get great confidentiality. You also lose every other useful property of the column. The ciphertext byte order has nothing to do with the plaintext order, so your B-tree index, your ORDER BY clauses, and your range queries (WHERE price BETWEEN 100 AND 500) stop working entirely.

Order-preserving encryption fixes this, at a cost. In an OPE scheme, if plaintext x < y, then enc(x) < enc(y) under the standard comparison your database already understands — usually byte-lexicographic compare, i.e. memcmp. Drop these ciphertexts into any off-the-shelf sorted index and range scans work out of the box. No custom comparator. No query-rewriting middleware.

To make that concrete, here's a handful of 8-bit integer plaintexts alongside illustrative ciphertexts — the bytes are made up rather than produced by running the scheme, but they follow its rules (9 bytes each, reserved leading byte, shared prefix where the high-order plaintext bits agree, first differing byte offset by ~128). Only the first 6 of 9 bytes are shown:

Plaintext	Ciphertext (hex)
`7`	`00 5e 38 12 9a 2c …`
`42`	`00 5e 38 92 3f c4 …`
`100`	`00 5e b8 4c 92 3f …`
`200`	`00 de 7a 23 51 8c …`
`250`	`00 de 7a a3 ff 22 …`

Plaintexts in ascending order produce ciphertexts in ascending byte-lexicographic order: memcmp over any two rows returns the same verdict as comparing the plaintexts directly. Drop these into a B-tree index and ORDER BY / range queries work out of the box. You can also see the scheme's leakage profile hiding in plain sight — the closer two plaintexts' high-order bits are, the more leading bytes their ciphertexts share. More on that in a minute.

The cost is leakage. A scheme that preserves order also leaks order (tautologically), and typically a bit more — usually some information about the common prefix of two plaintexts. For many workloads that tradeoff is acceptable; for others you'd reach for a construction with a tighter leakage profile, like the Lewi–Wu Block ORE scheme — which is what CipherStash uses for most production paths, exposed to Postgres through EQL. Block ORE needs a custom comparator, which in Postgres means registering a custom operator class via CREATE OPERATOR CLASS / CREATE OPERATOR FAMILY. Some managed platforms permit those statements without superuser (AWS RDS does, which is why EQL runs there today); others don't (Supabase, as of writing). When you're on a platform in the latter category, CLWW OPE is the next best thing — standard lex-compare on the ciphertext works with any stock B-tree index, no custom operator class required. I'll touch on what CLWW specifically leaks toward the end.

CLWW in 60 seconds

CLWW is a bit-by-bit ORE scheme. For an n-bit plaintext p = b₀ b₁ … bₙ₋₁ (MSB-first), it produces an n-byte ciphertext. The construction feeds the plaintext bits through a keyed hash function one bit at a time — we'll model that keyed hash as a pseudorandom function (PRF), a function whose output is indistinguishable from uniform random bytes to anyone who doesn't have the key. In practice that's HMAC-SHA256, BLAKE3 in keyed mode, or similar; our implementation uses BLAKE3. At each step we pull one PRF byte and add the plaintext bit to it:

The arrows between the PRF boxes are the important part. PRFᵢ depends on all the plaintext bits b₀ … bᵢ₋₁ that came before it. So for two plaintexts X and Y that first differ at bit k:

At positions 0 through k, the PRF inputs are identical in both plaintexts, so PRFᵢ^X = PRFᵢ^Y.
At positions k+1 onward, the chain has diverged (one side saw bₖ = 0, the other saw bₖ = 1), so the PRFs look independent-random.

At the first differing byte itself, then, the ciphertexts differ by exactly b_k^Y − b_k^X, which is +1 or −1. The native CLWW comparator exploits this: find the first differing byte, check whether c_Y ≡ c_X + 1 (mod 256). If yes, X < Y; otherwise Y < X. Note the mod 256 in that check — we'll come back to it.

The paper's OPE reinterpretation — and the 1/256 bug

Section 3.2 of the CLWW paper observes that the ciphertext bytes agree up to the first differing byte, and at that byte one side is exactly one more than the other. So if you forget about the custom comparator and just sort the ciphertexts lexicographically, you'll usually get the right answer.

"Usually" is doing a lot of work there.

Consider X = 0 and Y = 1 as 8-bit plaintexts. Bits of X are 0000 0000; bits of Y are 0000 0001. The PRF chain for Y sees only zeros up to bit 6, so PRF₀ … PRF₇ are identical in both plaintexts. Every ciphertext byte agrees, except byte 7 — which encodes the one differing bit.

Now imagine PRF₇ = 0xFF. A 1-in-256 event.

Byte	PRF	X's bit	Y's bit	`c^X`	`c^Y`
0–6	`PRFᵢ`	0	0	`PRFᵢ`	`PRFᵢ`
7	`0xFF`	0	1	`0xFF`	`(0xFF + 1) mod 256 = 0x00`

Lex compare: bytes 0 through 6 agree, byte 7 has 0x00 on the Y side and 0xFF on the X side. The comparator says Y < X. The plaintext says Y > X. Wrong answer.

The CLWW comparator handles this case because its check is c_Y ≡ c_X + 1 (mod 256), which is true for (0xFF, 0x00) as well as for (0x42, 0x43). Lex compare doesn't have the mod-256 awareness — when the byte wraps around, the ordering flips.

Per-comparison error rate: 1/256 ≈ 3.9 × 10⁻³. For a range scan touching a few thousand ciphertexts, that's multiple mis-orderings per query. Unacceptable.

First fix: backward carry

Here's the key reframe: instead of thinking of the ciphertext as a byte stream, think of it as a big-endian integer. When PRFₖ + bₖ wraps past 256, we don't lose the +1 — we propagate it as an arithmetic carry to the byte on the left, exactly like schoolbook long addition. We reserve a leading byte at the top of the ciphertext to absorb any carry that makes it all the way up.

fn encrypt_ope_v1(K, p):
    hasher = keyed_hasher(K)

    # Pass 1: generate PRF bytes, record the plaintext bits.
    prfs = []; bits = []
    for bit in bits_msb_first(p):
        prfs.push(hasher.next_byte())
        bits.push(bit)
        hasher.update(bit)

    # Pass 2: compute P + B as an (n+1)-byte big-endian integer,
    # with a reserved leading byte that absorbs the final carry.
    out = [0] + prfs
    carry = 0
    for i in reversed(0..n):
        total      = out[i + 1] + bits[i] + carry
        out[i + 1] = total mod 256
        carry      = total >> 8       # always 0 or 1
    out[0] = carry
    return out

Comparison is plain byte-lex compare — no custom comparator required.

Let's replay the failing case. X = 0, Y = 1, PRF₇ = 0xFF. The ciphertext is now 9 bytes (one reserved + eight content).

X (all bits zero): pass 2 adds nothing and no carry propagates. c^X = [0, PRF₀, …, PRF₆, 0xFF].
Y (bit 7 = 1): at byte 8 we compute 0xFF + 1 + 0 = 0x100 → byte 8 becomes 0x00, carry out = 1. At byte 7 we compute PRF₆ + 0 + 1 = PRF₆ + 1, carry out = 0. Other bytes unchanged. c^Y = [0, PRF₀, …, PRF₅, PRF₆ + 1, 0x00].

Lex compare: c^X and c^Y agree through byte 6. At byte 7, c^X has PRF₆ and c^Y has PRF₆ + 1. Y is bigger — correctly. ✓

The +1 didn't disappear when the byte wrapped; it just moved one position to the left, where it reappears at the byte the lex comparator hits first.

But wait — this isn't exact

A big improvement, but it turns out this scheme is only almost right. The per-comparison error rate drops from ~1/256 to around 7.7 × 10⁻⁶ — about 500× better — but it's still not zero.

The residual is a signal/noise mismatch. At the first differing byte, the plaintext injects a +1 signal — but the backward carry coming up from the byte to its right can also swing the result by ±1. Signal and noise are the same order of magnitude, so in rare PRF configurations they cancel exactly and the two ciphertexts tie where the plaintexts don't. The full derivation — triangular distributions, tail probabilities, the small correction for the m = 2 case — lives in the engineering note.

The question is: can we kill the residual without making the ciphertext larger?

Second fix: place the bit at the MSB

The signal-vs-noise framing suggests the obvious move: make the signal bigger. We can't shrink the noise — ±1 per byte is intrinsic to how the carry works — but we can inject the plaintext bit at a higher-weight position within its byte.

Concretely: instead of adding +bit (which puts the signal at bit 0 of the byte, weight 1), add +bit · 128 (which puts it at bit 7, weight 128). Same ciphertext size. Same PRF chain. One line of code:

-    total = out[i + 1] + bits[i]       + carry
+    total = out[i + 1] + bits[i] * 128 + carry

The signal is now +128 at the first differing byte. The carry noise is still ±1. They can't cancel — lex compare resolves at the first differing byte by a margin of at least 127 (or, if that byte happens to wrap and the carry propagates one position left, at the byte before it, which sits at 256× the integer weight and so dominates anyway). The engineering note has the full case analysis.

Exact OPE. Verified empirically: 1 310 700 adjacent u16 pairs under 20 randomly sampled keys, zero ordering errors.

Wrap-up

Two small changes — a backward carry pass with a reserved byte, and placing the plaintext bit at the top bit of its byte rather than the bottom — turn CLWW's paper-suggested OPE reinterpretation from a ~1/256-wrong scheme into an exact one. Same number of PRF evaluations, one extra ciphertext byte, no custom comparator needed.

What this gives you: drop-in ciphertexts for any ordered index or sort operation, with ordering that matches plaintext ordering exactly.

What it doesn't give you: strict-ORE-level leakage hiding (the scheme leaks first-differing-bit position, same as CLWW ORE), and no decryption — for round-trip you pair the OPE ciphertext with an authenticated symmetric cipher like AES-GCM.

The implementation lives in our cllw-ore crate on crates.io, with a fuller derivation of the error rate and the exactness argument in the engineering note.

If you've got a production encrypted-database use case that needs order-preserving indexes — or you just enjoy small clean cryptographic primitives — I'd love to hear what you think.