DEV Community

BAder82t
BAder82t

Posted on

FHE programs have precision bugs random testing can't find — here's an adversarial search tool that does

FHE has a precision-testing problem, and random testing misses it

If you compile an ML model to run under Fully Homomorphic Encryption,
the encrypted and plaintext versions should give the same answer on
every input. Under CKKS — the scheme most FHE ML deployments use —
they usually do. But CKKS is approximate: noise accumulates across
multiplications, and on specific inputs it can push the decrypted
output far from the plaintext result.

In one worked example from the patent specification behind this work,
the plaintext circuit returns 0.893 on a particular input while the
FHE circuit returns 0.402 — a prediction error of 0.49. If that
model is making credit decisions, the sign of the decision flips.

Why random testing doesn't find these

High-divergence inputs aren't uniformly distributed. They cluster in
regions where:

  • intermediate values feeding a multiplication are large
  • deep paths through the circuit have eaten most of the noise budget
  • accumulated noise sits close to the decryptability threshold

Those regions can have volume on the order of 10⁻⁵ of the declared
input space. Random sampling almost never lands there. You can run ten
thousand tests, see nothing, ship, and then have a specific customer
profile reproducibly return the wrong answer.

Adversarial search instead

fhe-oracle runs CMA-ES (Covariance Matrix Adaptation Evolution
Strategy) with a three-term noise-aware fitness that rewards
candidates for:

  1. plaintext vs FHE divergence
  2. noise budget consumed
  3. multiplicative depth utilised

When an instrumented FHE adapter is present, all three terms are used.
Without one, it falls back to divergence alone. Either way the search
concentrates its budget on the narrow regions where precision bugs
live.

Numbers from a benchmark you can run

The repo ships a patent-reference benchmark: a simulated CKKS logistic
regression with a 3-term polynomial sigmoid approximation, accurate
for |logit| ≤ 3 and blowing up beyond it.

At an equal 500-evaluation budget:

Random sampling (operational range [-0.3, 0.3]^5)
  0 diverging inputs found
  max error: 3.52e-4

FHE Oracle / CMA-ES (adversarial range [-5, 5]^5)
  max error: 1.50
  ratio: 4,259x
Enter fullscreen mode Exit fullscreen mode

The asymmetry between the two ranges is deliberate. Random testing
samples from the training distribution. The oracle explores the full
declared input space of the deployed model. That gap is exactly where
real precision bugs hide.

Reproduce it with no FHE library needed:

pip install fhe-oracle cma numpy
git clone https://github.com/BAder82t/fhe-oracle
cd fhe-oracle
python benchmarks/patent_logistic_regression.py --seed 42
Enter fullscreen mode Exit fullscreen mode

Using it on your own circuit

from fhe_oracle import FHEOracle

def plaintext_fn(x):
    ...  # your reference implementation

def fhe_fn(x):
    ...  # your FHE-compiled circuit (e.g. concrete-ml predict_proba)

oracle = FHEOracle(
    plaintext_fn=plaintext_fn,
    fhe_fn=fhe_fn,
    input_dim=10,
    input_bounds=[(-3.0, 3.0)] * 10,
    seed=42,
)
result = oracle.run(n_trials=500, threshold=0.01)

print(result.verdict)      # "PASS" or "FAIL"
print(result.max_error)    # largest divergence found
print(result.worst_input)  # input vector that triggered it
Enter fullscreen mode Exit fullscreen mode

worst_input is the actionable artifact. Feed it back into both
functions deterministically, confirm the divergence, and use it to
fix your noise budget, modulus schedule, or polynomial approximation.

CI gate

# oracle_check.py
import os
from fhe_oracle import FHEOracle
from my_model import plaintext_fn, fhe_fn

oracle = FHEOracle(
    plaintext_fn=plaintext_fn,
    fhe_fn=fhe_fn,
    input_dim=10,
    input_bounds=[(-3.0, 3.0)] * 10,
)
result = oracle.run(
    n_trials=int(os.environ.get("ORACLE_N_TRIALS", "500")),
    threshold=float(os.environ.get("ORACLE_THRESHOLD", "0.01")),
)
raise SystemExit(0 if result.verdict == "PASS" else 1)
Enter fullscreen mode Exit fullscreen mode
# .github/workflows/fhe-precision.yml
name: FHE Precision Test
on: [push, pull_request]
jobs:
  fhe-check:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4
      - uses: actions/setup-python@v5
        with: { python-version: "3.11" }
      - run: pip install fhe-oracle
      - run: python oracle_check.py
Enter fullscreen mode Exit fullscreen mode

The job fails and blocks the merge when divergence exceeds the
threshold. A full template is in examples/ in the repo.

Real FHE backends

Adapters for OpenFHE, Microsoft SEAL, and Zama Concrete ship with the
library. The adapter interface has six methods — wrapping an existing
circuit takes about twenty lines.

Scope

fhe-oracle tests black-box semantic divergence between a plaintext
function and its FHE-compiled counterpart. It does not verify the
cryptographic security of the FHE scheme, test for side-channel leaks,
or replace formal verification. A PASS verdict means the adversarial
search did not find a divergence within the evaluation budget — not a
proof that none exists.


Source and benchmarks:
github.com/BAder82t/fhe-oracle
(AGPL-3.0, commercial licences available)

Patent pending: PCT/IB2026/053378

Happy to answer questions about the fitness function, the three noise
heuristics (Multiplication Magnifier, Depth Seeker, Near-Threshold
Explorer), or wiring up a real FHE adapter.

Top comments (0)