Pure State Labs

Posted on Jul 1

The 99% Fidelity Lie: Why Your Quantum Control Models Are Lying to You

#python #opensource #tools #quantum

The shortcut everyone takes when designing a gate

Say you're building a two-qubit gate for a superconducting chip. The path of least resistance looks like this: optimize the pulse against a clean, closed-system simulation, admire the unitary fidelity, then tack on a factor of exp(-t_g / T) at the end to "handle" decoherence.

The problem is that this lies to you. On a qubit pair that's actually decoherence-limited, that after-the-fact factor overstates the CZ fidelity you'll really get by around 3e-3. The reason is simple: the optimizer never saw the noise while it was working, so it never had a reason to trade gate speed against it. You end up with a pulse tuned for a chip that doesn't exist.

gradpulse does the opposite. It drops the whole open system — T1 relaxation, T-phi dephasing, and leakage up into higher transmon levels — right into the forward pass, and then optimizes through it with autodiff. Whatever pulse comes out was shaped against the noise from the start.

There's a second rule it lives by, and it's the one I find more interesting: it won't report a fidelity it can't independently reproduce.

Built and maintained by Pure State Labs Inc.. MIT-licensed.

The whole thing in one snippet

Hand it a qubit pair and a target gate, and it finds the microwave-and-flux pulse that runs that gate through a full open-system sim.

pip install gradpulse
# with the validation + plotting extras
pip install "gradpulse[validate,viz]"

import gradpulse as gp
from gradpulse import viz

# generic profile, or load a measured Braket/IBM calibration
profile = gp.ParametricCouplerProfile()

result = gp.optimize_cz(profile)   # optimize a CZ
viz.plot_pulse(result)             # the pulse
viz.plot_convergence(result)       # how it got there

# where's the error coming from?
budget = result["optimizer"].error_budget(result["best_raw_param"])
print(f"control + leakage: {budget['r_control_leakage']:.2e}")
print(f"decoherence floor: {budget['r_decoherence']:.2e}")
print(f"channel unitarity: {budget['unitarity']:.6f}")

Under the hood it's differentiable GRAPE on PyTorch, so it'll use your GPU if you have one and backprops gradients straight through the Lindblad evolution.

Three solvers, zero shared code

Here's the bug that quietly wrecks control-simulation work: a solver can be totally self-consistent and still flat wrong. Flip a sign in an operator, normalize a collapse rate the wrong way, order your basis states differently than you meant to — the code happily agrees with itself and prints a confident, bogus fidelity.

gradpulse guards against this by running every fidelity through three solvers that don't share any operator-building or matrix-exponential code:

The PyTorch optimizer — the differentiable Lie-Trotter propagator that does the actual work.
A matched QuTiP integrator — same Trotter step, but it builds its operators and contractions on its own. It lines up with the optimizer to about 1e-14. That catches transcription bugs, but it says nothing about the splitting error the two have in common.
A dependency-free NumPy Liouvillian — takes the exact exponential of the full generator, backed up by QuTiP's adaptive mesolve ODE. These two nail down the splitting error the first pair can't see.

On the headline CZ, all of them land on the first-order Trotter error, roughly 2e-7, and that gap shrinks linearly as you shrink dt. Freeze the pulse, sub-step, and Richardson-extrapolate to dt -> 0, and the independent solvers agree to about 1e-13.

If any two solvers drift apart at the operating point, CI fails the build. That's the whole point: agreement is something that gets tested on every commit, not a line in a paper nobody rechecks.

Okay, but does it match real hardware?

What gradpulse actually predicts is a gate's decoherence floor — the error you'd see if T1 and T2 were the only things that mattered. Real gates carry more than that (control error, crosstalk, non-Markovian junk), so the floor is a lower bound. It equals the measured error when a gate is coherence-limited, and sits under it otherwise.

That gives you something you can actually falsify: the floor should never come out above the measured error of a coherence-limited gate.

Three published gates, three different groups

Each one ships as a cited JSON file in the repo. Adding a fourth device means dropping in another JSON, not writing code.

Device	Measured CZ error	gradpulse floor	Ratio
Sung 2021, tunable coupler (PRX 11, 021058)	2.4e-3	2.4e-3	0.99x
Marxer 2023, long coupler (PRX Quantum 4, 010314)	1.9e-3	1.9e-3	1.01x
Stehlik 2021, parametric, pair 11 (PRL 127, 080505)	4.9e-3	5.1e-3	1.05x

Run it across all 11 Stehlik pairs and the floor stays a lower bound every time (11/11, median 0.37x), only catching up to the measurement on pair 11 — the pair under the most decoherence pressure. The fast, high-coherence pairs sit down at 0.2-0.4x, because gradpulse correctly blames most of their error on control sources rather than decoherence. That matches what the authors themselves said: their short-gate error is loss of adiabaticity, not T1/T2.

A live 108-qubit chip

The bigger test was Rigetti's Cepheus-1-108Q. Using the measured interleaved-RB CZ fidelity and both qubits' T1/T2 for all 160 active pairs — and reading each pair's real gate duration out of the native calibration rather than assuming one — the floor comes in at or below the measured error on 150 of 160 pairs, median 0.66x, with nothing fitted and nothing cherry-picked.

A few things from the validation log are worth pulling out, because they're where the discipline shows:

Error bars, not "percent off." Every published CZ error has its own RB standard error, and on this chip that's +/-12-42%. So saying a prediction is "1.9% off" is meaningless noise. The honest number is distance in sigma. On the 41 saturation pairs, 32 land within 1 sigma, all 41 within 2 sigma, median 0.60 sigma. But that subset is defined by saturation, so it tells you how tight the bound is — not how accurate the model is on its own. The real claim stays the one-sided bound across all 160.
It flags bad calibration instead of swallowing it. One pair came back with a CZ error below its own coherence floor, which is impossible for simultaneous measurements. Turned out the fault was in the data: the device benchmarks T1/T2 and the CZ at different times, and it drifts about 2x over a few hours. Because gradpulse never lets a gate beat its own coherence limit, it surfaced the inconsistency rather than quietly "matching" it.
No fudge factor for the gap. The lower-bound behavior isn't the model failing. The Braket calibration only exposes idling-point T1/T2, not the gate-effective values the literature anchors used. A missing input, reported as a missing input.

Four architectures, one bar for validation

There are four gate models in the box, and each one carries its own independent QuTiP cross-check of its operators. Three of them also get a library-independent, NumPy-only Liouvillian referee, so "three solvers, no shared code" holds across architectures, not just for the headline gate.

Parametric-coupler CZ — tunable transmons with a flux-activated coupler that gets dispersively eliminated, plus AC-Stark pre-compensation and a differentiable transmission-line response. This is the flagship gate and the one Cepheus tests.
Explicit tunable coupler — a 27-D open-system model that keeps the coupler as a live transmon (the way it actually is on Rigetti and Google hardware), so coupler leakage gets modeled instead of assumed away.
Cross-resonance ZX — fixed-frequency transmons with always-on exchange, derived-quadrature DRAG, and echoed-CR sequences that refocus ZZ into a single-qubit term.
General N-qubit register — arbitrary coupling graphs, optimizing a target gate on a subset while holding identity on everything else, so crosstalk and frequency collisions land inside the optimization. You can drive disjoint subgraphs at once (parallel CZ), with sparse/Krylov propagation and MPS/TEBD trajectory unraveling kicking in once N is 6 or more.

Pulling a gate's error apart

One fidelity number is where you start, not where you stop. There's a whole set of tools for dissecting and stressing a gate, and all of them run in CI:

Error budgets split the infidelity into a coherent control/leakage piece and a decoherence floor, with channel unitarity computed separately as a sanity check.
Crosstalk and collisions — near-resonant frequency collisions, lossy two-level-system defects that carry their own T1, and always-on ZZ between spectators.
Colored noise — analytic dephasing filter functions F(f), Monte-Carlo 1/f^alpha sweeps with cross-qubit correlation, and finite-temperature bath jumps.
Spectral optimization — build pulses directly in a band-limited Fourier/CRAB basis, so they're band-limited by construction: about 6x fewer parameters and no post-hoc smoothing penalty to babysit.
Leakage in the loop — sweep a cross-resonance gate's duration and split each optimized gate's error. Slow gates stay coherence-limited and the one-line coherence formula tracks reality. Fast gates flip over to leakage-limited, where coherent leakage buries the decoherence floor and that same formula under-predicts the true error by something like 60x. That fast regime is exactly where an open-system, leakage-aware loop pays for itself.

Getting the pulse off the laptop

gradpulse doesn't just score pulses, it designs and exports them:

OpenPulse 3.0 / OpenQASM 3 — vendor-neutral export with the DRAG quadrature baked into the I/Q arrays, re-checked offline against an independent parser. (qiskit.pulse got dropped in Qiskit 2.0, so this targets the standard that's actually alive.)
Amazon Braket — export to PulseSequence and ArbitraryWaveform, plus the native-calibration reader and the per-shot cost math used in the Cepheus study. You only need AWS credentials for the final submission.
Interleaved RB — generate native-transpiled, verbatim-boxed IRB circuits that either benchmark the device's native CZ (which validates the model) or a gradpulse-designed pulse (which tests the optimizer), with canary, cost, and online-status guards before anything submits.

And a gradpulse-designed pulse has already run on Cepheus at canary depth — accepted and executed inside the RB verbatim box.

The section most READMEs hide: what it can't do

gradpulse is upfront about its limits, which is honestly the part I'd steal for my own projects:

It predicts, it doesn't calibrate. The fidelities are simulated, cross-checked, and matched against measured coherence budgets — but a full on-device fidelity for a gradpulse-designed pulse hasn't been run yet. An open-loop transfer starts out below a device's tuned native CZ until closed-loop calibration closes the gap.
The floor is a lower bound. It only equals the measured fidelity when a gate is coherence-limited. The analysis suite models the missing terms — control, crosstalk, structured noise — but none of them is a fitted correction dragging the answer back to a measured number.
Exact optimization taps out around four qubits. Density-matrix GRAPE is a workstation tool up to ~4 qubits. The sparse and MPS paths push evaluation into bigger weakly-entangling registers, but not optimization. gradpulse tunes one entangling gate among its neighbors; it isn't a whole-circuit compiler.

Why I think the design matters

There's no shortage of gorgeous fidelities in the quantum-control literature that fall apart the moment they meet a real chip. gradpulse's two rules — optimize through the actual open system, and never print a number three independent solvers can't reproduce — are a decent blueprint for building simulation tools people are willing to trust.

If you work on superconducting gates, the quickest way to get a feel for it is to run the quickstart above, then open examples/decoherence_in_the_loop.py and watch the multiply-after shortcut fall over, and notebooks/03_hardware_validation.ipynb to reproduce the 160-pair Cepheus scatter (it's free — it just reads the committed sweep).

pip install gradpulse
python -m gradpulse            # welcome banner + quickstart

Repo: https://github.com/PureStateLabs/gradpulse — MIT-licensed — built by Pure State Labs Inc.

Questions or collaboration: info@purestatelabs.com

DEV Community