I want to tell you about a problem that the BCI industry doesn't talk about openly.
The hardware is impressive. ECoG arrays with 128 electrodes. Utah arrays with 1024 channels. Stentrode endovascular devices that don't even require open-brain surgery. The signal acquisition hardware has improved dramatically over the last decade.
The software is running on Windows.
Neuralink's first clinical system used a Python-based processing pipeline on a laptop. BrainGate2 — arguably the most successful invasive BCI programme in history — runs its signal processing on a PC with a standard OS scheduler. BCI2000, the dominant open-source BCI framework, is 25 years old and doesn't have hard real-time guarantees anywhere in its architecture.
For a system that's physically connected to a human nervous system, this is a problem I couldn't ignore.
The latency constraint that nobody enforces
There's a physiological threshold that matters enormously for BCI usability: 14 milliseconds. That's approximately the delay below which a human perceives a feedback event as caused by their own action. Above it, the loop feels broken — you take an action, wait, then see the result. The brain cannot learn to control an interface it experiences as laggy.
Most clinical BCI systems have end-to-end latencies in the 50–200ms range. Not because of the hardware — the signal acquisition is fast. Because of the software stack.
When you run signal processing in Python, on a non-RT Linux kernel, with a general-purpose OS scheduler, you have no guarantees about when your code runs. You have average latency, not worst-case latency. For a system that needs to provide consistent feedback to retrain someone's motor cortex, average isn't good enough.
I wanted worst-case latency. I wanted to measure it with a logic analyser, not infer it from distributions. So I built AxonOS.
What AxonOS actually is
AxonOS is a bare-metal, #![no_std] Rust kernel designed specifically for BCI devices. It runs on STM32F407 + Cortex-A53 hardware (though the architecture is hardware-agnostic). The full pipeline — from ADC sample to decoded intent output — has a measured WCET of 1.03 milliseconds.
Here's the pipeline breakdown:
ADC DMA transfer (ADS1299, 27 bytes @ 8MHz SPI): 27.2 µs
DMA ISR overhead: 8.4 µs
Kalman filter (8 channels, CCM SRAM): 94.1 µs
FIR bandpass 8–30Hz (8-tap Hann): 78.8 µs
Artifact rejection: 49.4 µs
CSP spatial filter (Jacobi, warm-start): 318.3 µs
LDA classification: 44.6 µs
IPC write (wait-free SPSC): 0.11 µs
─────────────────────────────────────────────────────────────
M4F total WCET: 621.0 µs
Riemannian MDM gate (A53): 192.3 µs
Privacy pipeline (ZKP attestation): 214.7 µs
─────────────────────────────────────────────────────────────
End-to-end WCET: 1028.3 µs
Hard deadline: 3.25ms. Utilisation: 31.6%. Zero deadline misses across 10,000 consecutive epochs.
Why Rust, and why no_std
The choice of Rust wasn't about language preference. It was about two specific properties that matter for a system physically connected to human neural tissue.
Memory safety without a runtime. A use-after-free bug in a BCI kernel isn't a segfault — it's undefined behaviour in a system that's sending electrical signals to someone's brain. Rust's borrow checker eliminates this class of bugs at compile time. No GC, no runtime, no overhead. #![no_std] means none of Rust's standard library either — no heap allocation, no panics, no OS abstractions. Everything that runs in the kernel is explicit and bounded.
Zero-cost abstractions that actually compile to good assembly. The DSP pipeline — Kalman filter, CSP projection, LDA classification — is written as idiomatic iterator-based Rust. The compiler produces SIMD-vectorised assembly on the Cortex-M4F without any hand-written intrinsics. Here's what the Kalman update step looks like:
// axonos-dsp/src/kalman.rs
#![no_std]
/// Single-channel Kalman filter for EEG signal tracking.
/// All state in CCM SRAM (zero wait-state on STM32F407).
#[derive(Debug, Clone)]
pub struct KalmanFilter {
/// State estimate
x: f32,
/// Error covariance
p: f32,
/// Process noise variance (Q)
q: f32,
/// Measurement noise variance (R)
r: f32,
}
impl KalmanFilter {
pub const fn new(q: f32, r: f32) -> Self {
Self { x: 0.0, p: 1.0, q, r }
}
/// Update step. Called at 250 Hz per channel.
/// Inlined — compiler produces ~8 FP instructions, no function call overhead.
#[inline(always)]
pub fn update(&mut self, measurement: f32) -> f32 {
// Predict
self.p += self.q;
// Update
let k = self.p / (self.p + self.r); // Kalman gain
self.x += k * (measurement - self.x);
self.p *= 1.0 - k;
self.x
}
}
The #[inline(always)] and no heap allocation means the compiler can see the entire call chain and optimise across the pipeline boundaries. The 8-channel Kalman stage runs in 94µs worst-case. The same computation in Python with NumPy is ~8ms.
The scheduler: EDF, not preemptive priority
Most RTOS designs use fixed-priority preemptive scheduling. AxonOS uses Earliest Deadline First (EDF) on the A53 application core.
The reason: fixed-priority scheduling wastes headroom. If your high-priority task completes early, that time is gone. EDF is provably optimal — it will meet all deadlines if any schedule can. And for a BCI pipeline where you have a hard 3.25ms budget and 8 concurrent agents that need to run within it, EDF lets you pack more work into the same window.
Measured EDF scheduling jitter across 100,000 scheduling events:
| Percentile | Jitter |
|---|---|
| P50 | 1.2 µs |
| P90 | 3.8 µs |
| P99 | 5.4 µs |
| P99.9 | 6.5 µs |
| Max observed | 11.3 µs |
P99.9 jitter of 6.5µs. On a non-RT Linux kernel, P99.9 scheduling jitter is typically 1–50ms depending on system load. This is the difference between a system you can reason about and one you can only measure statistically.
Privacy as a type constraint
Neural data is the most intimate data that exists. It's not just health data — it can potentially reveal cognitive states, emotional responses, attention, and in some research contexts, proto-linguistic content.
Most BCI systems handle privacy through policy: "we don't store raw data," "data is anonymised before leaving the device." AxonOS handles it through the type system.
Raw neural data is represented as a RawSample type that is !Send and !Sync — it cannot cross thread boundaries. It can only be consumed by the Secure Enclave's classification pipeline, which produces IntentVector output. Application code never has access to RawSample. Not through policy — through the Rust type system making it literally impossible to write code that accesses it.
// axonos-secure-enclave/src/types.rs
/// Raw neural sample from the ADC pipeline.
/// !Send + !Sync: cannot cross thread boundaries.
/// Only constructable within the Secure Enclave module.
pub struct RawSample {
channels: [f32; 8],
timestamp_us: u64,
_private: (), // Prevents construction outside this module
}
// RawSample is explicitly NOT Send or Sync.
// This is enforced at compile time — application code cannot
// store, transmit, or share raw neural data.
impl !Send for RawSample {}
impl !Sync for RawSample {}
/// The only output of the Secure Enclave.
/// Contains no raw neural information — only the classified intent.
#[derive(Debug, Clone, Copy)]
pub struct IntentVector {
pub embedding: [f32; 32],
pub confidence: f32,
pub timestamp_us: u64,
pub session_id: SessionId, // Opaque — maps to a model, not a person
pub epoch: u32,
}
If you try to write application code that accesses RawSample, your code won't compile. That's the guarantee I want: not "we promise not to look," but "we structurally cannot."
The accuracy numbers (honest ones)
AxonOS uses Riemannian geometry for classification — specifically the Minimum Distance to Mean (MDM) classifier operating on covariance matrices in the space of symmetric positive definite matrices. This is not a novel algorithm; it's been in the BCI literature since Barachant et al. (2012). But it's dramatically underdeployed because implementing it correctly in a real-time embedded context is non-trivial.
Results on BCI Competition IV Dataset 2a (4-class motor imagery, standard evaluation):
| Method | Channels | 4-class Accuracy | ITR (bits/min) |
|---|---|---|---|
| FBCSP + SVM (2008 winner) | 22 | 68.7% | 22.4 |
| EEGNet (deep learning) | 22 | 77.8% | 31.6 |
| Riemannian + EA | 22 | 79.4% | 34.2 |
| AxonOS MDM + EA | 8 | 82.4% | 38.1 |
82.4% 4-class accuracy with 8 electrodes, outperforming all published baselines that use 22 electrodes. The channel count reduction (64% fewer electrodes) costs approximately 0.7% accuracy because CSP concentrates relevant motor imagery variance into a small number of spatial components, making additional channels nearly redundant for this task.
The honest per-subject breakdown: Subject S03 achieves 66.2% and S06 achieves 68.4% — both above chance (25%), neither clinically useful without additional work. These are the "BCI illiterate" population (~15-20% of users) for whom standard µ-rhythm modulation is unreliable. Zero-shot adaptation and federated learning personalisation address this separately.
What I actually want from Dev.to readers
I've written 38 articles on Medium covering the full architecture in depth — from the DMA acquisition chain through the Riemannian classification, the IPC design, the privacy layer, and the market context. The series is 100,000+ words and I'm told it reads like a textbook on embedded Rust BCI systems.
But I want to talk to developers directly, not write textbooks.
If you've worked on embedded Rust and see something in the architecture that looks wrong, I want to hear it. The EDF scheduler implementation on a non-RTOS core is the part I'm least confident about — I know it works for my workload, but I'd like a second set of eyes.
If you've built anything in the BCI/neurotechnology space — commercial or academic — I'd genuinely like to compare notes on what the real deployment problems are. The first clinical pilot (ALS centre, northeastern US, dev kit ships April 2026) will surface problems I haven't anticipated.
If you're an embedded Rust developer interested in contributing, the workspace is at github.com/AxonOS-org. The codebase is 5 crates, 20 source files, 3826 lines of #![no_std] Rust. The hard parts are already working. What's missing is the SDK layer that makes it usable by people who aren't me.
The one question I'll pose to start discussion:
The biggest unsolved problem in the pipeline right now is cross-session signal stability. The Riemannian covariance matrices shift between sessions due to electrode impedance changes, neural adaptation, and user fatigue. I use a simplified Kalman-based drift estimator (SDKF) that handles slow drift reasonably well but fails on abrupt state changes. How would you approach this? The constraint is that the adaptation must run in real-time at 250Hz on a Cortex-A53.
Where to go from here
The architecture is documented at depth on Medium @AxonOS — each article goes into one subsystem in detail with full code. The series starts with the philosophical motivation (Article #5 — why conversational AI is architecturally wrong for BCI) and works through the complete technical stack.
The Rust workspace: github.com/AxonOS-org
If you're building anything in embedded real-time systems, BCI, or privacy-preserving edge AI, I'm interested in talking: axonosorg@gmail.com
Top comments (0)