Ruslan Manov

Posted on Apr 6

The Day the Swarm Got Scared -- And Saved Everyone

#rust #robotics #opensource #algorithms

The Day the Swarm Got Scared -- And Saved Everyone

T+0.000s: The Valley

Two hundred drones crossed the ridgeline at 0347 local time, flying a Vee formation at fifteen-meter spacing. They had no GPS. They had not had GPS for eleven minutes, ever since the electronic warfare blanket rolled across the valley like an invisible fog. The SAM corridor below them was a dark geometry of overlapping kill envelopes, and every drone in the swarm knew this because every drone in the swarm had been talking to every other drone, constantly, through a protocol borrowed from epidemiology.

The swarm was not afraid. Not yet.

Fear, in the STRIX system, is not a metaphor. It is a 64-bit floating-point number between zero and one, computed forty times per second by a subsystem adapted from behavioral economics. At T+0, the fleet-wide fear parameter sat at F=0.08. Background noise. The algorithmic equivalent of a steady hand.

What happened next pushed it to 0.73.

The Particle Filter: Navigating Blind

Eleven minutes without GPS is a long time for an inertial measurement unit. IMUs drift. Accelerometers accumulate bias. Gyroscopes precess. Without correction, a drone flying on dead reckoning will be hundreds of meters off-position within minutes.

STRIX does not use dead reckoning. It uses a dual particle filter -- 200 particles per drone, each particle a hypothesis about where the drone actually is in six-dimensional space: [x, y, z, vx, vy, vz].

// strix-core/src/particle_nav.rs
// Each particle is a 6D state hypothesis weighted by likelihood.
// When GPS is denied, the filter relies on IMU prediction alone,
// but cross-validates against gossip-relayed neighbor positions.

pub struct ParticleNavFilter {
    pub particles: Vec<[f64; 6]>,
    pub weights: Vec<f64>,
    pub n_particles: usize,
    // ...
}

Here is the critical insight: even without GPS, the drones are not navigating alone. Each drone broadcasts its best position estimate through the gossip protocol. When Drone 47 hears from Drone 48 that it is approximately 15 meters to its left, and Drone 47's particle filter has a cluster of hypotheses that agree with this, those particles gain weight. The particles that disagree quietly die.

The swarm navigates by consensus. Two hundred particle filters, each with 200 particles, form a distributed estimation engine of 40,000 simultaneous hypotheses about the state of the world. GPS denial does not blind this system. It degrades it. There is a difference.

When the EW engine detects GPS denial, it triggers an automated response:

// strix-core/src/ew_response.rs
// GPS denial triggers noise expansion in the particle filter,
// widening the hypothesis cloud to account for increased uncertainty.

pub enum EwResponse {
    ExpandNavigationNoise { noise_multiplier: f64 },
    GossipFallback { reduced_fanout: usize, priority_only: bool },
    ForceRegime(Regime),
    // ...
}

The process noise multiplier expands. The particle cloud widens. Uncertainty increases, but honestly -- the system knows what it does not know, and acts accordingly.

T+12.400s: First Blood

Drone 7 ceased transmitting at T+12.4 seconds.

There was no warning. No gradual degradation of telemetry. One tick it was there, broadcasting its state through the gossip protocol at three-peer fanout. The next tick it was not. The heartbeat counter incremented past the timeout threshold, and the swarm's loss analyzer activated.

// strix-auction/src/antifragile.rs
// The loss analyzer classifies the kill and creates an exclusion zone.
// This is where the swarm starts learning.

pub fn record_loss(&mut self, record: LossRecord) -> Vec<u32> {
    let orphans = record.orphaned_tasks.clone();
    self.loss_records.push_back(record.clone());
    self.adapt_from_loss(&record);
    orphans
}

Three things happened simultaneously within 2 milliseconds of detecting the loss:

First, the loss was classified. Drone 7 was in ENGAGE regime at 500 meters altitude with a known threat bearing. Classification: SAM. Kill zone radius: 2,000 meters. Penalty weight: 0.8.

pub fn classify_loss(
    regime: Regime,
    threat_bearing: Option<f64>,
    altitude: f64,
) -> LossClassification {
    match (regime, threat_bearing) {
        (Regime::Engage, Some(_)) if altitude > 200.0 => LossClassification::Sam,
        (Regime::Engage, Some(_)) => LossClassification::SmallArms,
        (Regime::Patrol, None) => LossClassification::Collision,
        (_, Some(_)) => LossClassification::ElectronicWarfare,
        _ => LossClassification::Unknown,
    }
}

Second, Drone 7's orphaned tasks were identified and flagged for immediate re-auction. The auctioneer's needs_reauction flag flipped to true.

Third, and this is the part that matters: a kill zone materialized in the swarm's shared spatial memory. Not a GPS coordinate. Not a waypoint. A pheromone. A digital scent of death, deposited at Drone 7's last known position, repelling every drone that came near.

// strix-mesh/src/stigmergy.rs
// Threat pheromone: "Danger here" -- repels drones from hazardous areas.

pub enum PheromoneType {
    Explored,  // "I've been here"
    Threat,    // "Danger here"
    Target,    // "Interesting target"
    Rally,     // "Regroup here"
    Corridor,  // "Safe path"
}

The pheromone field is a sparse 3D grid with 10-meter cells. Each deposit is about 20 bytes. The gradient computation that steers drones away from danger is O(1) per cell -- a central-difference calculation across neighboring cells that returns a three-component vector pointing away from concentration.

The swarm did not need to be told to avoid the area where Drone 7 died. It could smell the danger.

T+12.406s: The Market Reacts

Six milliseconds after the loss, the combinatorial auction repriced everything.

The STRIX auction is a sealed-bid market. Every drone evaluates every available task independently and submits a composite score based on proximity, capability match, energy reserves, urgency, and risk exposure. The auctioneer collects all bids and solves the assignment problem using a modified Hungarian algorithm.

// strix-auction/src/bidder.rs
// Bid scoring function. Note the risk term: kill-zone proximity
// and fear level directly suppress bids on dangerous tasks.
//
// total = urgency*10 + capability*3 + proximity*5 + energy*2 - risk*4

When Drone 7 died, two things changed in the market. First, its tasks became orphans -- supply dropped. Second, the kill zone inflated the risk term for every task near grid 7-Alpha -- demand cratered. The market did not need a commander to say "avoid that area." The prices said it. No drone bid competitively on tasks inside the kill zone because the math would not let them.

The fear parameter rose from 0.08 to 0.31. This was not panic. This was information. The SwarmFearAdapter translated the loss event into the language of behavioral economics:

// strix-swarm/src/fear_adapter.rs
// STRIX telemetry mapped to PhiSim's behavioral economics model:
//
// | PhiSim concept       | STRIX signal                        |
// |----------------------|-------------------------------------|
// | drawdown             | Attrition rate (1 - alive/initial)  |
// | vol_ratio            | Threat intensity (1 + intent score) |
// | anomaly_count        | CUSUM breaks this tick              |
// | consecutive_losses   | Consecutive ticks with drone loss   |

At F=0.31, the formation spacing widened. The FormationConfig applies fear-modulated spacing: as fear rises, drones spread apart. Wider formation, harder to hit with a single salvo. Less aerodynamic efficiency, but the auction already repriced for that -- the scoring function factors in the additional transit cost.

T+23.800s: The Feint

At T+23.8, the adversarial particle filter detected something interesting.

The second particle filter -- the one that does not track friendly drones but enemy threats -- had been watching a cluster of radar returns moving south along the valley floor. The threat tracker maintained its own 100-particle hypothesis cloud per target, and the intent detection pipeline had been analyzing the movement pattern through three layers of signal processing.

// strix-core/src/intent.rs
// 3-layer pipeline: Hurst persistence -> closing acceleration -> vol compression
//
// Layer 1: Hurst persistence     -> purposeful trajectory? [H > 0.55]
// Layer 2: Closing acceleration  -> accelerating toward us?
// Layer 3: Volatility compression -> formation tightening?
//              |
//   Confidence-weighted fusion
//              |
//   IntentScore in [-1, 1] + IntentClass

The Hurst exponent for the southern cluster was 0.42. Below the purposeful threshold of 0.55. The movement was mean-reverting -- zigzagging, not advancing. The closing acceleration was near zero. The volatility ratio was high: 1.8, indicating loose, disorganized movement.

The intent pipeline classified this as IntentClass::Neutral, bordering on Retreating.

But here is where it gets subtle. The CUSUM anomaly detector noticed something the intent pipeline alone would miss: the southern cluster's radar cross-section kept changing. Large, then small, then large. Inconsistent with real aircraft. Consistent with decoys -- inflatable or electronic emitters designed to draw attention and waste resources.

// strix-core/src/anomaly.rs -- CUSUM detects distributional shifts.
// When the signature variance of a target group breaks the cusum threshold,
// the system flags potential deception.

The adversarial particle filter's weight distribution was bimodal: half the particles clustered on "real threat, low intent" and half on "decoy, ignore." The Hurst persistence analysis tipped the balance. Real threat formations show persistent trajectories (H > 0.55). Decoys wander. H=0.42 was the signature of something pretending to be threatening but failing at the physics of it.

The fear parameter ticked up to 0.38 on the initial detection, then back down to 0.29 as the system accumulated evidence of deception. This is the dual-process architecture at work -- fear rises fast (System 1), but the analytical pipeline (System 2) can override it with evidence. The swarm did not freeze. It did not divert resources to chase phantoms. It maintained course.

The XAI narrator logged the reasoning:

[t=24.1s] Threat response (prob=31%): Maintaining course — southern cluster
classified as FEINT. Confidence: 78%.

Every alternative was recorded. "Divert 30 drones south" scored 0.34, rejected for "Insufficient threat probability, Hurst below purposeful threshold." The glass box held.

T+47.200s: The Cascade

This is where the story could have ended badly.

At T+47.2, the EW blanket intensified. The comms jamming layer that had been degrading mesh connectivity surged to SEVERE. Sixty drones lost their gossip links simultaneously. Not destroyed -- silenced. Their particle filters kept running, their IMUs kept integrating, but they could not hear the swarm and the swarm could not hear them.

Then the SAM corridor opened up.

In thirty seconds, between T+47 and T+77, the swarm lost sixty drones. Not lost-connection. Lost. Destroyed. The loss analyzer fired sixty times in thirty seconds. Sixty kill zones materialized across the valley floor. Sixty sets of orphaned tasks flooded the auction queue.

The fear parameter did this:

T+47.0: F = 0.29
T+50.0: F = 0.51
T+55.0: F = 0.62
T+60.0: F = 0.68
T+65.0: F = 0.71
T+70.0: F = 0.73

F=0.73. The swarm was terrified.

What does terror look like in a combinatorial auction? It looks like this:

// strix-auction/src/antifragile.rs
// Fear-amplified kill zone penalties. At F=0.73, the multiplier is 2.095.
// SAM kill zones with base penalty 0.8 become 1.676 -- effectively
// making it economically impossible to bid on tasks inside them.

pub fn kill_zone_penalties_with_fear(&self, fear: f64) -> Vec<(Position, f64, f64)> {
    let f = fear.clamp(0.0, 1.0);
    let multiplier = 1.0 + f * 1.5; // 1.0 -> 2.5
    self.kill_zones
        .iter()
        .map(|kz| (kz.center, kz.radius, kz.penalty * multiplier))
        .collect()
}

At F=0.73, the fear multiplier hit 2.095. Every SAM kill zone's penalty weight of 0.8 became 1.676. The auction's risk term (-risk*4) for tasks inside those zones was so massive that no bid could overcome it. The market priced those areas at infinity. No drone went there. No commander needed to draw a red line on a map. The red line drew itself, from the blood of the fallen.

But here is where anti-fragility kicked in. The kill zones did not just warn. They taught.

// Each additional loss in the same zone GROWS the radius.
// After 4 merges with growth factor 1.3: base * 1.3^4 = base * 2.86
// The system overestimates danger on purpose. Better to avoid
// too much than too little.

fn merge_into_existing_zone(&mut self, record: &LossRecord) -> bool {
    for kz in &mut self.kill_zones {
        let dist = kz.center.distance_to(&record.position);
        if dist < self.merge_distance {
            kz.loss_count += 1;
            kz.radius *= self.zone_growth_factor;
            kz.penalty = (kz.penalty * 1.1).min(1.0);
            // Shift centre towards the new loss (weighted average).
            let w = 1.0 / kz.loss_count as f64;
            kz.center = Position::new(
                kz.center.x * (1.0 - w) + record.position.x * w,
                kz.center.y * (1.0 - w) + record.position.y * w,
                kz.center.z * (1.0 - w) + record.position.z * w,
            );
            return true;
        }
    }
    false
}

Three losses near the same coordinates? The kill zone radius expanded by a factor of 1.3 per loss. The penalty weight climbed toward 1.0. The evade bias at that position -- the probability of entering EVADE regime when nearby -- stacked additively. The swarm was not just avoiding the danger. It was building an increasingly accurate map of it, and the more it suffered, the better the map became.

The antifragile score -- sum over kill zones of (loss_count * ln(1 + loss_count) * radius_growth) -- climbed past 50.0. By Taleb's measure, the system was more robust after losing 60 drones than it had been with 200.

T+78.000s: The Reformation

One hundred and forty drones remained. They were scattered, terrified (F=0.73), and navigating on inertial alone in a GPS-denied environment thick with SAM coverage and comms jamming.

They reformed in four seconds.

The gossip protocol is designed for exactly this scenario. Each surviving drone selected three random peers from its known-alive list and exchanged state digests. If the digests differed -- and they all differed, because sixty drones had just vanished -- full state exchanges followed. Within two gossip rounds, the surviving 140 drones had converged on a shared picture of who was left and where everyone was.

// strix-mesh/src/gossip.rs
// O(log N) convergence via epidemic gossip.
// Two rounds to synchronize 140 nodes after catastrophic loss.

// Conflict resolution:
// - General data: newer timestamp wins.
// - Threat data: union -- never discard threat information.

The formation engine computed new slot positions for 140 drones in Vee formation. The correction velocity vectors pointed each drone toward its new slot using proportional control with speed clamping:

// strix-core/src/formation.rs
// v_corr = (delta / ||delta||) * min(||delta||, v_max)
// If within deadband: v_corr = 0.

And here is where the CBF -- the Control Barrier Function -- earned its keep. One hundred and forty drones, all simultaneously repositioning in three dimensions, in comms-degraded conditions. The potential for mid-air collision was enormous.

// strix-core/src/cbf.rs
// CBF safety clamp: TTC-aware collision avoidance.
// Runs AFTER formation control, BEFORE velocity commands are sent.
// Every velocity vector that would violate the safety barrier gets
// rotated and scaled to the nearest safe vector.

pub struct CbfConfig {
    pub min_separation: f64,     // meters
    pub altitude_floor_ned: f64, // NED convention
    pub altitude_ceiling_ned: f64,
    pub alpha: f64,              // decay rate -- aggressiveness
    pub max_correction: f64,     // m/s cap
}

The CBF is a mathematical guarantee. Not a best-effort collision avoidance. Not a "try to maintain separation." A hard constraint that modifies every velocity command to ensure that the barrier function -- a measure of how close two drones are to colliding -- never decreases below zero. If two drones are on a collision course, the CBF does not ask. It corrects. And it does so with the minimum modification necessary to the desired velocity, preserving mission intent to the maximum extent physics allows.

Zero collisions during the reformation. At 1.15ms per tick for 20 drones, and scaling to the full 140, the system ran the entire CBF pass in under 10ms. Tight enough that the correction commands arrived before the drones had moved appreciably toward each other.

T+82.000s: The Market Finds Equilibrium

The auction re-ran at T+82.0. All surviving drones submitted sealed bids on all remaining tasks, with kill-zone penalties applied, fear-modulated risk terms included, and the intent pipeline's assessment of remaining threats factored into urgency multipliers.

The market cleared in 4.86ms.

Auction result:
  - 137 tasks assigned (of 142 remaining)
  - 5 tasks unassigned (inside active kill zones, no viable bid)
  - Total welfare: 847.3 (down from 1,204.1 pre-attrition)
  - Antifragile score: 58.4

The five unassigned tasks were inside the densest kill zones. The market's judgment: no drone should go there. The risk-adjusted cost exceeded the task value. This was not cowardice. This was the auction computing, in 4.86 milliseconds, a truth that would take a human commander minutes to reach: those tasks were not worth another drone.

The fear parameter began to decay. No new losses. The gossip protocol confirmed all 140 surviving drones were in formation and executing their assigned tasks. The CUSUM detectors settled. The Hurst exponent of the fleet's own movement pattern climbed back above 0.6 -- purposeful, directed.

T+82.0: F = 0.71
T+90.0: F = 0.64
T+100.0: F = 0.55
T+120.0: F = 0.42

The swarm was calming down. Not because someone told it to. Because the math said the danger was receding.

T+127.000s: The Glass Box

The XAI narrator had been recording every decision the entire time. Not summarizing. Not approximating. Every single decision trace, with full reasoning chains, alternatives considered, confidence levels, and input states.

This is the glass box.

// strix-xai/src/trace.rs
// Every decision emits a DecisionTrace with:
// - Timestamp
// - Decision type (TaskAssignment, RegimeChange, FormationChange,
//                  ThreatResponse, ReAuction, LeaderElection)
// - Full inputs (drone IDs, regime, metrics, fear/courage/tension)
// - Reasoning chain (numbered steps with data)
// - All alternatives considered (with scores and rejection reasons)
// - Output action + confidence score

pub struct DecisionTrace {
    pub id: u64,
    pub timestamp: f64,
    pub decision_type: DecisionType,
    pub inputs: TraceInputs,
    pub reasoning: Vec<ReasoningStep>,
    pub alternatives_considered: Vec<Alternative>,
    pub output: TraceOutput,
    pub confidence: f64,
}

At the command center, a human operator -- the one who had been watching the entire engagement unfold -- requested the after-action review. The mission replay system aggregated 4,847 decision traces into a structured timeline.

// strix-xai/src/replay.rs
// MissionReplay aggregates all traces into a timeline with
// statistics, key moments, and what-if analysis capability.

The narrator produced the report at DetailLevel::Detailed. Every decision, every alternative, every rejection reason. But between the lines of structured data, a story emerged.

Not because anyone programmed it to tell stories. Because when you trace the complete decision history of a system that learned from sixty deaths, the trace reads like one.

The After-Action Report

=== STRIX Mission Replay: Operation Ridgeline ===
Duration: 127.0s | Drones: 200 initial, 140 surviving | Traces: 4,847

KEY MOMENTS:

[t=12.4s] Unit 7 ceased. Classification: SAM. Kill zone established at
(3847.2, 1204.5, 502.1), radius 2000m, penalty 0.80.
The market remembered.

[t=12.4s] Re-auction triggered. 3 orphaned tasks redistributed among 199
remaining drones in 4.2ms. No bid entered for grid 7-Alpha.
At t=12.4s, no drone bid on grid 7-Alpha again.

[t=24.1s] Southern cluster assessed as FEINT.
Hurst=0.42 (below purposeful threshold 0.55).
Closing acceleration: -0.12 m/s^2 (below attack threshold 0.50).
Volatility ratio: 1.80 (expanding, not compressing).
Decision: Maintain course. Confidence: 78%.
  Alternative: Divert 30 drones south (score=0.34) -- rejected:
  "Insufficient threat probability, Hurst below purposeful threshold."
The swarm chose not to chase ghosts.

[t=47.2s-77.0s] CASCADE EVENT. 60 units lost in 30.0 seconds.
Fear: 0.29 -> 0.73.
Kill zones established: 60. Merged zones: 12.
Auction repriced: 60 re-auction cycles, mean latency 3.8ms.
Antifragile score: 12.1 -> 58.4.
The swarm suffered. The swarm learned.

[t=78.0s] Reformation complete. 140 drones, Vee formation.
Gossip convergence: 97.1% in 2 rounds (3.2 seconds).
CBF interventions: 23 (zero collisions).
The swarm reformed while scared, and nothing touched.

[t=82.0s] Market equilibrium. 137/142 tasks assigned.
5 tasks unpriced (inside kill zones, welfare < threshold).
The market found the boundary of acceptable risk.

[t=127.0s] Mission complete. Fear: 0.42 (decaying).
Final antifragile score: 62.7.

DETERMINISTIC REPLAY AVAILABLE: All 4,847 traces stored.
Full tick-by-tick replay at original timing enabled.

The operator stared at the screen for a long time after reading it.

At T+12.4s, Unit 7 ceased. The market remembered. At T+12.4s, no drone bid on grid 7-Alpha again.

It was not poetry. It was a database query formatted as text. But it read like an epitaph, because the math of loss and memory and avoidance, when you trace it honestly, has a cadence that sounds like grief.

The Algorithms Are Real

STRIX is an open-source Rust project. Apache 2.0. Every algorithm described in this story is implemented, tested, and benchmarked.

Numbers:

34,889 lines of Rust across 9 crates
7,493 lines of Python (PyO3 bindings + simulation)
671 tests
1.15ms per tick (20 drones)
4.86ms auction clear (100 drones)
Scaling target: 2,000+ drones

The nine crates:

strix-core: Dual particle filter, CUSUM anomaly detection, regime detection, formation control, CBF safety, EW response, threat intent pipeline, ROE engine
strix-auction: Combinatorial auction, sealed-bid market, anti-fragile kill zones, fear-modulated risk pricing
strix-mesh: Gossip protocol (O(log N) convergence), digital pheromone fields (stigmergy), fractal communication
strix-xai: Glass-box trace recording, natural-language narration, deterministic mission replay
strix-swarm: Integration orchestrator, tick loop, PhiSim fear adapter
strix-adapters: MAVLink, ROS2, simulator interfaces
strix-python: PyO3 bindings for the entire stack
strix-playground: Scenario engine, threat presets, benchmarking
strix-optimizer: SMCO parameter optimization, Pareto analysis

What makes it different:

Dual particle filter -- no competitor has both friendly navigation and adversarial intent prediction running simultaneously
Anti-fragile kill zones -- the swarm measurably improves after losses, inspired by Taleb's anti-fragility
Fear meta-parameter -- behavioral economics (Kahneman) modulates every subsystem through a single continuous signal
Combinatorial auction -- market-based task allocation with kill-zone repricing, not centralized planning
Digital pheromones + gossip -- fully decentralized, no single point of failure, bio-inspired coordination
Glass-box XAI -- every decision traced, narrated, replayable; zero black-box decisions
Deterministic replay -- entire missions can be replayed tick-by-tick for after-action review and what-if analysis