What happens when you apply production-grade security, formal verification, and 3,300+ tests to a physics problem most people solve with MATLAB scripts.
The Audacity
A year ago, I started building SCPN-Control — a framework for controlling tokamak fusion reactors. By myself. As a solo developer.
If you know anything about fusion energy, you know this is absurd. Tokamak control is the domain of national labs, billion-dollar collaborations, and teams of fifty physicists. The DIII-D Plasma Control System has been in production for twenty years. RAPTOR at EPFL runs on actual hardware at TCV. OMFIT at General Atomics has a thousand users.
I have none of that. I have a laptop, workstation (former Minig Rig) and two rather old servers, a GitHub account, and an unhealthy tolerance for partial differential equations.
But I also have something those projects don't: a neuro-symbolic controller architecture with formal verification, end-to-end differentiable physics, and a safety-case infrastructure that treats a research codebase like flight software.
This post isn't a sales pitch. It's a technical autopsy of what happens when you build production-grade scientific software without a production team — and why the result might matter even if it never touches a real tokamak.
What It Actually Is
SCPN-Control is a multi-layered framework. At the bottom: physics kernels. At the top: a compiler that turns formal specifications into stochastic computing circuits. In between: everything.
Layer 1: The Physics Kernel
The core/ module solves the Grad-Shafranov equation — the elliptic PDE that describes magnetohydrodynamic equilibrium in a tokamak. This isn't a toy implementation. It supports:
- Fixed and free-boundary solvers (SOR, multigrid, Anderson acceleration, Newton-Kantorovich)
-
H-mode pedestal profiles via the modified hyperbolic tangent (
mtanh) - Coil optimization with shape, X-point, and divertor target constraints
- Toroidal 1/R stencil (not the naive Cartesian Laplacian that most educational codes use)
The solver is written in Python with a Rust acceleration backend, but it falls back to Python gracefully if the native library isn't available. This matters because fusion researchers run code on everything from laptops to HPC clusters.
# From src/scpn_control/core/fusion_kernel.py
# The GS* operator — note the sign-corrected toroidal term
a_E = 1.0 / dR2 - 1.0 / (2.0 * R_safe * self.dR) # East coefficient
a_W = 1.0 / dR2 + 1.0 / (2.0 * R_safe * self.dR) # West coefficient
a_N = 1.0 / dZ2 # North
a_S = 1.0 / dZ2 # South
a_C = -2.0 * (1.0 / dR2 + 1.0 / dZ2) # Center
The free-boundary solver is experimental and documented as such. The fixed-boundary solver is robust enough to generate equilibria for controller testing.
Layer 2: The Controller Stack
The control/ module implements five NMPC (Nonlinear Model Predictive Control) backends:
- Internal — projected gradient descent (fallback, zero dependencies)
- SciPy — SLSQP for general nonlinear problems
- OSQP — first-order ADMM for sparse, real-time problems
- CasADi — IPOPT with exact Hessian for research
- acados — structure-exploiting SQP with HPIPM, the solver used in autonomous vehicles and robotics
The acados integration is the one I care about. It uses an augmented-state formulation for slew-rate constraints, validates dynamics residuals post-solve, and fails closed if the solver status isn't zero.
# From src/scpn_control/control/nmpc_controller.py
# Augmented state: [x_k, u_{k-1}] for slew-rate constraints
model.disc_dyn_expr = ca.vertcat(x_next, u) # u becomes next u_last
model.con_h_expr = u - u_last # slew-rate constraint
But the part that's actually unique is the transport gradient tuning. Using JAX, the controller can backpropagate through the gyrokinetic transport model to optimize source schedules. This means the MPC doesn't just optimize control actions — it optimizes the physics model it's using.
# JAX autodiff through transport — unique in fusion control
def tune_transport_coefficients_for_tracking(self, ...):
# Gradient of tracking error w.r.t. transport coefficients
# Finite-difference audit enforced by default
No other fusion control framework has end-to-end differentiable transport. This is either brilliant or insane, and I'm not sure which yet.
Layer 3: The Neuro-Symbolic Compiler
This is the part that makes SCPN-Control different from every other plasma control project.
The scpn/ module implements a Stochastic Petri Net compiler. You write a control policy as a Petri net — places, transitions, inhibitor arcs, timing delays — and the compiler turns it into a stochastic computing circuit that runs on deterministic bitstreams.
Why stochastic computing? Because it's inherently fault-tolerant. A bit-flip in a stochastic bitstream changes the probability by 1/N, not by orders of magnitude. For a fusion reactor where a controller failure means a $2 billion machine eats itself, this matters.
The compiler produces two paths:
- Oracle path: standard floating-point arithmetic for debugging
- Stochastic path: bitstream-based computation with antithetic variates
# From src/scpn_control/scpn/controller.py
# Antithetic variates for variance reduction
base = rng.random((n_pairs, self._nT))
low_hits = np.sum(base < p_fire[None, :], axis=0)
high_hits = np.sum(base > (1.0 - p_fire)[None, :], axis=0)
counts = low_hits + high_hits
The stochastic path is deterministic (seeded RNG), reproducible, and formally verifiable. Which brings me to the part I'm actually proud of.
The Security Story
Scientific software is usually terrible at security. Researchers write code that reads arbitrary files, executes shell commands, and exposes internal state over the network because "it's just for internal use."
I treated this like flight software from day one.
WebSocket Hardening
The phase/ module exposes tokamak state over WebSockets for real-time monitoring. The original implementation was an unauthenticated pipe. The current version has:
- Bearer token + API key authentication
- Token-bucket rate limiting (20 commands/sec per client)
- TLS enforcement with loopback-only default
- Browser origin allowlisting
- Command allowlisting (
set_psi,set_pac_gamma,reset,stop) - 64KB payload caps
- Timeout-based backpressure with explicit disconnect counters
# From src/scpn_control/phase/ws_phase_stream.py
def _bucket_rate_limited(self, buckets, key, now):
capacity = float(self.command_rate_limit)
refill_rate = capacity / self.command_rate_window_s
updated_at, tokens = buckets.get(key, (now, capacity))
elapsed = max(0.0, now - updated_at)
tokens = min(capacity, tokens + elapsed * refill_rate)
limited = tokens < 1.0
if not limited:
tokens -= 1.0
buckets[key] = (now, tokens)
return limited
C++ Compilation Sandbox
The Rust/C++ solver is compiled on-demand if the prebuilt binary isn't available. The compilation path now has:
- SHA-256 source verification against a manifest
-
hmac.compare_digestfor timing-safe comparison - Stack canaries (
-fstack-protector-strong) - Full RELRO binding (
-Wl,-z,relro -Wl,-z,now) -
-mtune=genericinstead of-march=native(CPU feature leak eliminated) - 120-second compilation timeout
- Minimal environment (only PATH, TMPDIR, SystemRoot preserved)
Fault Injection Gating
The stochastic controller has a bit-flip fault injection mode for testing. It requires two independent gates to enable:
# Double-gated — nuclear safety standard
if self._sc_bitflip_rate > 0.0 and not self._allow_fault_injection:
raise ValueError("sc_bitflip_rate > 0 requires allow_fault_injection=True.")
if self._sc_bitflip_rate > 0.0 and os.environ.get("SCPN_ALLOW_CONTROLLER_FAULT_INJECTION") != "1":
raise ValueError("sc_bitflip_rate > 0 requires SCPN_ALLOW_CONTROLLER_FAULT_INJECTION=1.")
This is how SCRAM systems work. You don't accidentally enable safety overrides.
Path Traversal Elimination
JSONL logging is constrained to a verified root directory with symlink protection:
def _resolve_jsonl_log_path(log_path, log_root):
resolved = candidate.resolve(strict=False)
try:
resolved.relative_to(root)
except ValueError as exc:
raise ValueError("log_path must resolve under log_root.") from exc
if resolved.suffix != ".jsonl":
raise ValueError("log_path must use a .jsonl suffix.")
Is this overkill for a research project? Yes. But the discipline carries over. When you write every path resolution as if an attacker controls the input, you stop writing bugs even when no attacker exists.
The Formal Verification Layer
This is where I think SCPN-Control is genuinely ahead of the curve — not just for fusion, but for control systems in general.
The scpn/ module includes a Z3 bounded model checking integration for compiled Petri net controllers. It proves:
- Place invariants: Markings never exceed bounds
- Temporal response: If condition A fires, condition B responds within N steps
- Recurrence: Certain states are always revisited
- Exclusivity: Conflicting actions never fire simultaneously
Each proof produces a manifest with SHA-256 digests, schema versioning, and mandatory counterexample paths for failed proofs. The safety-case infrastructure requires:
- Formal controller proof (Z3/SMT)
- Audited differentiable-transport evidence (JAX + finite-difference validation)
- Digital-twin update evidence (TRANSP/TSC backed)
- All bound to a canonical controller artifact digest
- Explicit readiness gate — blocked until all evidence is present
# Safety-case admission — fail-closed
if not manifest.has_all_required_evidence():
raise SafetyCaseNotReadyError("Controller artifact lacks required evidence.")
This is 10 CFR 50 nuclear safety documentation standard. For a solo project. Written by one person.
I don't know if this makes me disciplined or delusional. But I know no other fusion control project has formal verification of controller logic.
The Honesty Problem
Here's where I stop talking about what's good and tell you what's broken.
49 out of 50 physics fidelity gaps are still open.
The ROADMAP explicitly states:
- "Local-dispersion path overpredicts the GENE CBC reference"
- "Latest 2000-step adiabatic run did not reach saturated chi_i"
- "Do not publish a saturated CBC chi_i value until the longer campaign passes"
- "Must not be presented as quantitative cross-code agreement"
The native TGLF-equivalent transport model exists. The nonlinear gyrokinetic solver exists. But they haven't been quantitatively compared against real TGLF or GENE runs. The infrastructure is there — interfaces to GACODE, GENE, GS2, CGYRO, QuaLiKiz — but the evidence isn't.
There is no real hardware timing evidence yet.
The E2E latency benchmark infrastructure is tamper-evident and schema-versioned, but all measurements are synthetic. I haven't run the control loop on a Raspberry Pi 4 or Jetson Nano and published p50/p95/p99 distributions. The <1ms real-time claim is theoretical.
The H-mode Newton Jacobian is incomplete.
The fixed-boundary Newton solver uses a Jacobian derived for L-mode linear profiles. For H-mode mtanh profiles, the Jacobian is wrong, which means convergence is unreliable for the most physically relevant regime.
I document all of this. The ROADMAP is 50+ entries of "not done yet." But documentation doesn't close gaps. Only work does.
What I Learned About Building Software Alone
If you're a solo developer building something ambitious, here's what actually matters:
1. Test Coverage Is a Forcing Function
I have 3,300+ tests and 99%+ coverage. Not because I'm a testing zealot, but because without a team to catch my mistakes, the tests are the team. The CI runs 25 jobs across Linux, Windows, and macOS. Every PR gate fails if coverage drops by 0.1%.
The ratchet effect is real: once you have 99% coverage, you can't justify a lazy commit that drops it to 98%. The number forces discipline.
2. Security Hardening Is Just Input Validation at Scale
Every security fix I implemented was fundamentally about validating assumptions:
- Is this file path where I think it is?
- Is this config value finite and positive?
- Is this library the one I compiled?
When you write every function as if the caller is malicious, you write better code even for internal APIs. The WebSocket hardening made the streaming layer more robust against network partitions. The C++ compilation sandbox caught a -march=native portability bug. Security and correctness are the same thing viewed from different angles.
3. Type Hints Are Documentation That Doesn't Lie
The entire codebase uses from __future__ import annotations and strict type hints. Pydantic v2 models validate configs at the boundary. This isn't just for IDE autocomplete — it's for catching physics bugs.
When plasma_current_target is typed as float and validated as > 0, you can't accidentally pass a negative current or a string. In scientific computing where a sign error means the plasma goes the wrong way, this matters.
4. Honest Documentation Builds Trust Faster Than Hype
The ROADMAP says "49 open fidelity gaps." The README says "requires external resources for experimental validation." The code says raise ValueError("degenerate equilibrium") instead of silently returning NaN.
This is terrible marketing. It's excellent engineering. The fusion community is skeptical of unvalidated claims — with good reason. I'd rather have ten people trust the code because the limitations are explicit than a thousand people distrust it because I oversold.
5. The "Kitchen Sink" Problem Is Real (for now)
The project has 57+ modules. Equilibrium, transport, MHD, edge physics, neural nets, RL, stochastic computing, FPGA export, real-time control, federated learning, formal verification.
Each of these is a career. Together, they're a maintenance burden. The risk isn't that any single module is wrong — it's that the integration surface becomes too large to validate.
My current strategy: extract a scpn-core package with just the GS solver + basic controllers, and keep scpn-control as the full framework. The Unix philosophy applies even to tokamaks.
Where This Goes Next
The immediate priorities are unglamorous:
- Hardware timing evidence. Run the E2E benchmark on a Raspberry Pi 4. Measure p50/p95/p99. Publish the results.
- Close 5-10 physics fidelity gaps. Install GACODE. Run native TGLF-equivalent vs real TGLF. Document agreement.
-
Fix the H-mode Jacobian. Implement
mtanhderivative for the Newton solver.
The medium-term goals are where the project gets interesting:
- End-to-end differentiable scenario: Couple JAX GS solver → differentiable transport → NMPC. Gradient-through-equilibrium is genuinely unique — no one has this.
- Certified neuro-symbolic control: Expand Z3 proofs to CTL/LTL specifications. Auto-generate safety certificates. This could be the basis for ITER safety documentation.
- Cross-facility federated learning: Extend the FedAvg/FedProx disruption predictor with differential privacy guarantees. Multi-site learning without data sharing.
Why You Should Care (Even If You Don't Care About Fusion)
If you're a software engineer, SCPN-Control is a case study in what happens when you apply production discipline to a research problem. Most scientific code is written to produce a paper. This is written to produce a system.
If you're a physicist, the neuro-symbolic controller architecture is a genuinely new approach to safety-critical control. Stochastic computing + formal verification + differentiable physics is a combination that doesn't exist anywhere else.
If you're a solo developer wondering if you can build something that competes with teams of fifty people: you can. But you have to be more disciplined than they are. You don't have a colleague to catch your sign errors. You have tests, types, and the humility to document what you don't know.
The Code
SCPN-Control is open source under AGPL-3.0-or-later with commercial licensing available.
- GitHub: github.com/anulum/scpn-control
- Docs: README + ROADMAP (the ROADMAP is the most honest document I've ever written)
-
Tests:
pytestwith 3,300+ cases, 99%+ coverage -
Install:
pip install scpn-control(with optional dependencies for acados, JAX, etc.)
I'm not asking for stars (yet I don't mind them). I'm asking for scrutiny. If you know plasma physics, tear apart the transport solver. If you know control theory, break the NMPC. If you know security, find the holes I missed.
The project is only useful if it's correct. And it's only correct if people prove it wrong.
I appreciate sharing, Likes, Contributions, Sponsoring / Donations to keep me going, we are open for collaboration.
Miroslav Šotek builds software for problems that are supposed to require teams. He is usually wrong about how hard things are, but occasionally right about how to build them.
Top comments (0)