FHE has a precision-testing problem, and random testing misses it
If you compile an ML model to run under Fully Homomorphic Encryption,
the encrypted and plaintext versions should give the same answer on
every input. Under CKKS — the scheme most FHE ML deployments use —
they usually do. But CKKS is approximate: noise accumulates across
multiplications, and on specific inputs it can push the decrypted
output far from the plaintext result.
In one worked example from the patent specification behind this work,
the plaintext circuit returns 0.893 on a particular input while the
FHE circuit returns 0.402 — a prediction error of 0.49. If that
model is making credit decisions, the sign of the decision flips.
Why random testing doesn't find these
High-divergence inputs aren't uniformly distributed. They cluster in
regions where:
- intermediate values feeding a multiplication are large
- deep paths through the circuit have eaten most of the noise budget
- accumulated noise sits close to the decryptability threshold
Those regions can have volume on the order of 10⁻⁵ of the declared
input space. Random sampling almost never lands there. You can run ten
thousand tests, see nothing, ship, and then have a specific customer
profile reproducibly return the wrong answer.
Adversarial search instead
fhe-oracle runs CMA-ES (Covariance Matrix Adaptation Evolution
Strategy) with a three-term noise-aware fitness that rewards
candidates for:
- plaintext vs FHE divergence
- noise budget consumed
- multiplicative depth utilised
When an instrumented FHE adapter is present, all three terms are used.
Without one, it falls back to divergence alone. Either way the search
concentrates its budget on the narrow regions where precision bugs
live.
Numbers from a benchmark you can run
The repo ships a patent-reference benchmark: a simulated CKKS logistic
regression with a 3-term polynomial sigmoid approximation, accurate
for |logit| ≤ 3 and blowing up beyond it.
At an equal 500-evaluation budget:
Random sampling (operational range [-0.3, 0.3]^5)
0 diverging inputs found
max error: 3.52e-4
FHE Oracle / CMA-ES (adversarial range [-5, 5]^5)
max error: 1.50
ratio: 4,259x
The asymmetry between the two ranges is deliberate. Random testing
samples from the training distribution. The oracle explores the full
declared input space of the deployed model. That gap is exactly where
real precision bugs hide.
Reproduce it with no FHE library needed:
pip install fhe-oracle cma numpy
git clone https://github.com/BAder82t/fhe-oracle
cd fhe-oracle
python benchmarks/patent_logistic_regression.py --seed 42
Using it on your own circuit
from fhe_oracle import FHEOracle
def plaintext_fn(x):
... # your reference implementation
def fhe_fn(x):
... # your FHE-compiled circuit (e.g. concrete-ml predict_proba)
oracle = FHEOracle(
plaintext_fn=plaintext_fn,
fhe_fn=fhe_fn,
input_dim=10,
input_bounds=[(-3.0, 3.0)] * 10,
seed=42,
)
result = oracle.run(n_trials=500, threshold=0.01)
print(result.verdict) # "PASS" or "FAIL"
print(result.max_error) # largest divergence found
print(result.worst_input) # input vector that triggered it
worst_input is the actionable artifact. Feed it back into both
functions deterministically, confirm the divergence, and use it to
fix your noise budget, modulus schedule, or polynomial approximation.
CI gate
# oracle_check.py
import os
from fhe_oracle import FHEOracle
from my_model import plaintext_fn, fhe_fn
oracle = FHEOracle(
plaintext_fn=plaintext_fn,
fhe_fn=fhe_fn,
input_dim=10,
input_bounds=[(-3.0, 3.0)] * 10,
)
result = oracle.run(
n_trials=int(os.environ.get("ORACLE_N_TRIALS", "500")),
threshold=float(os.environ.get("ORACLE_THRESHOLD", "0.01")),
)
raise SystemExit(0 if result.verdict == "PASS" else 1)
# .github/workflows/fhe-precision.yml
name: FHE Precision Test
on: [push, pull_request]
jobs:
fhe-check:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4
- uses: actions/setup-python@v5
with: { python-version: "3.11" }
- run: pip install fhe-oracle
- run: python oracle_check.py
The job fails and blocks the merge when divergence exceeds the
threshold. A full template is in examples/ in the repo.
Real FHE backends
Adapters for OpenFHE, Microsoft SEAL, and Zama Concrete ship with the
library. The adapter interface has six methods — wrapping an existing
circuit takes about twenty lines.
Scope
fhe-oracle tests black-box semantic divergence between a plaintext
function and its FHE-compiled counterpart. It does not verify the
cryptographic security of the FHE scheme, test for side-channel leaks,
or replace formal verification. A PASS verdict means the adversarial
search did not find a divergence within the evaluation budget — not a
proof that none exists.
Source and benchmarks:
github.com/BAder82t/fhe-oracle
(AGPL-3.0, commercial licences available)
Patent pending: PCT/IB2026/053378
Happy to answer questions about the fitness function, the three noise
heuristics (Multiplication Magnifier, Depth Seeker, Near-Threshold
Explorer), or wiring up a real FHE adapter.
Top comments (0)