What If Your Medical AI Pipeline Could Evolve?

#ai #evolution #machinelearning #opensource

A patient needs a custom knee implant. The clinical workflow looks like this: acquire a CT scan, segment the femur and tibia, reconstruct full 3D bone geometry, extract 77 morphological parameters, and generate a patient-specific implant design. A team at Brest University Hospital recently automated this entire pipeline — from raw CT to finished implant CAD — in 15 minutes.

That's impressive engineering. But look at the architecture: each step is hardcoded into the next. The segmentation model is welded to the reconstruction algorithm, which is welded to the parameter extractor. If a better segmentation model appears next month, swapping it in means rewriting integration code, re-validating the pipeline, and re-running regulatory checks.

This is the static pipeline problem — and it exists far beyond medical imaging. Every AI system that chains models together faces it. The question is: what changes when you stop treating pipeline steps as code and start treating them as genes?

Each Step Is Already a Gene (It Just Doesn't Know It)

Look at the pipeline stages through the lens of the three gene axioms:

Stage	Functional Cohesion	Interface Self-Sufficiency	Independent Evaluability
CT Segmentation	Reads DICOM, outputs 3D mesh	Standard input/output	Dice score, Hausdorff distance
3D Reconstruction	Reads partial mesh, outputs full bone	Standard input/output	Surface deviation (mm)
Parameter Extraction	Reads bone model, outputs 77 landmarks	Standard input/output	Landmark accuracy (mm)
Implant Design	Reads parameters, outputs CAD geometry	Standard input/output	Implant fit accuracy

Each stage does one thing. Each has a well-defined interface. Each can be measured independently. They satisfy the three axioms without any modification — they just happen to be locked inside a monolithic codebase instead of packaged as composable, evaluable units.

In Rotifer terms, each stage is a Gene: an atomic logic unit with a declared phenotype (what it does, what it needs, what it promises) and a measurable fitness score.

Arena: Let Algorithms Compete on Data, Not Papers

Medical imaging researchers publish new segmentation architectures constantly. U-Net, nnU-Net, SegResNet, TransUNet, Swin UNETR — each paper claims state-of-the-art results on specific benchmarks. But which one works best on your patient population, your scanner hardware, your anatomical region?

Currently, answering that question requires a dedicated benchmarking study. Someone has to download the models, standardize inputs, run evaluations, analyze results, and publish a comparison. This takes weeks or months.

The Arena mechanism offers a different model: multiple genes with the same declared phenotype (e.g., segment.knee) are evaluated on the same task distribution automatically and continuously. The fitness function captures what matters:

F(g) = (Success_Rate × log(1 + Utilization) × (1 + Robustness)) / (Complexity × Cost)

For a segmentation gene, this means:

Success Rate: percentage of cases where Dice score exceeds clinical threshold
Utilization: how many cases have been processed (track record matters)
Robustness: performance variance across different patient anatomies
Complexity: model size and code footprint
Cost: inference time per case

No committee. No paper reviews. The data decides. When a new segmentation approach arrives, it enters the Arena, competes against incumbents on real workloads, and either earns adoption or doesn't.

Composition: Pipelines as Algebra, Not Spaghetti Code

Once each step is a gene, the pipeline becomes a composition expression rather than a pile of integration code:

spine_pipeline = Seq(segment.spine, reconstruct.ssm, analyze.morphology, design.implant.spine)
knee_pipeline  = Seq(segment.knee, reconstruct.ssm, analyze.77params, design.implant.tka)

This isn't pseudocode. The gene composition algebra defines operators — Seq for sequential, Par for parallel, Cond for conditional branching, Try for error recovery — that compile into executable data-flow graphs. The algebra preserves type safety: if segment.spine outputs a mesh and reconstruct.ssm expects a mesh, the composition type-checks at compile time.

The payoff is modularity. When a hospital acquires a new MRI scanner that produces higher-resolution data, they don't rebuild the pipeline — they swap in a reconstruction gene optimized for that resolution. When a new anatomical region is needed (shoulder, craniomaxillofacial), they compose existing genes with region-specific ones.

The Controller Gene pattern takes this further. A controller gene is an ordinary gene whose job is to orchestrate other genes dynamically at runtime — deciding which segmentation model to invoke based on the imaging modality, the anatomical region, and the data quality. Think of it as the attending physician of the pipeline: it doesn't do the surgery, but it decides the plan.

HLT: Share Models, Not Patient Data

Here's the scenario that keeps medical AI architects up at night: Hospital A trains a superb spine segmentation model on 500 annotated CT scans. Hospital B wants that model. But sharing the training data violates patient privacy laws (HIPAA, GDPR, China's PIPL). Federated learning is one solution, but it requires continuous coordination, gradient aggregation, and introduces communication overhead.

Horizontal Logic Transfer offers a structurally different approach. What propagates is the gene itself — the trained model, packaged with its phenotype declaration and fitness score — not the data it was trained on. Hospital B evaluates the incoming gene on its own local data. If it outperforms the incumbent, it adopts the gene. If not, it rejects it. No gradients cross institutional boundaries. No patient data leaves the building.

The protocol's privacy-preserving sharing mechanism adds a layer: the gene's fitness score and interface spec are public (so Hospital B can decide whether to evaluate it), but the internal weights and implementation are opaque until the receiving party explicitly accepts.

This is HLT applied to a regulated domain — and it works precisely because genes are self-contained, independently evaluable units. You don't need to trust the source hospital's data. You just need to verify the gene's performance on your own.

The Bigger Picture: From Static Artifacts to Living Systems

The TKA pipeline at Brest automated a 15-minute workflow. That's a solved engineering problem. But the evolution of that pipeline — replacing weak components, adapting to new data distributions, propagating improvements across institutions — remains manual, slow, and fragile.

This pattern repeats across every AI domain that chains models together. Autonomous driving pipelines chain perception → prediction → planning. Drug discovery chains target identification → molecule generation → property prediction. Content moderation chains detection → classification → decision. Each faces the same structural challenge: static logic in a dynamic environment.

The medical imaging case makes the argument concrete because the pipeline stages are clean, the evaluation metrics are well-defined (Dice, Hausdorff, surface deviation), and the regulatory requirements force explicit lifecycle management. But the underlying pattern — encapsulate, evaluate, compose, compete, propagate — is domain-agnostic.

That's the thesis of evolution engineering: the next discipline isn't about how you talk to AI, or what AI knows, or how AI is orchestrated. It's about how AI capabilities improve over time — automatically, measurably, and without rebuilding the system from scratch every time something better comes along.

The Rotifer Protocol is an open-source evolution framework for autonomous software agents. The concepts discussed here — Gene encapsulation, Arena competition, Composition Algebra, and Horizontal Logic Transfer — are defined in the protocol specification and implemented in the Playground CLI.