JXIONG

Posted on Jun 1

Building a Biomedical World Model in Python: SteeraMed Core

#ai #steerability #worldmodel #steeramed

What if you could ask: "Which compound is most likely to reverse this specific patient's molecular aging?" — and get a 4-layer auditable evidence chain, not a black-box recommendation?

That's what SteeraMed Core does. It's an open-source Python package that applies the "world model" concept from reinforcement learning to biomedicine: quantify how an individual's biology deviates from normal, then simulate which compounds can steer it back.

No PyTorch. No TensorFlow. No GPU. Just numpy, pandas, scipy, and matplotlib.

pip install steeramed-core

Why This Matters

Epigenetic clocks (Horvath 2013, Hannum 2013) can measure your "biological age" from DNA methylation. But measuring aging is only step one — the real challenge is intervention: how do you steer molecular state back toward a younger profile?

The Hallmarks of Aging framework (López-Otín et al., Cell 2013, >15k citations) defined 9 hallmarks of aging, expanded to 12 in the 2023 update (Hallmarks of aging: An expanding universe). SteeraMed operationalizes this framework using the MSigDB Hallmark 50 gene sets (Liberzon et al., Cell Systems 2015) — 50 curated biological pathway gene sets covering aging, cancer, immunity, and metabolism.

Note: MSigDB "Hallmark" (50 pathway gene sets) ≠ Hallmarks of Aging (12 aging pillars). SteeraMed uses the former as its functional module definitions.

What Is a Biomedical World Model?

In reinforcement learning, a world model (Ha & Schmidhuber, 2018) is an internal simulator that predicts the consequences of different actions. Think of AlphaGo mentally simulating "if I play here, what will my opponent do?"

Applied to biomedicine:

State representation — Quantify how an individual deviates across 50 biological pathway modules using DNA methylation
Action simulation — Simulate "if we apply compound X, which disrupted modules get corrected" on the PPI network
Auditable reasoning — Generate a 4-layer traceable evidence chain, not a black-box output

	Traditional Systems Biology / AI Drug Discovery	Biomedical World Model
Unit of analysis	Population mean	Individual (N-of-1)
Inference direction	Forward (drug → effect)	Reverse (deviation → corrective drug)
Output	Drug repurposing candidates	4-layer individualized evidence chain
Confidence	Clinical trial statistics	Bootstrap resampling confidence

Project Architecture

steeramed_core/
├── __init__.py              # Entry: EvidenceChain + load_example_patient
├── __main__.py              # CLI: interactive case selector + batch mode
├── core/                    # Core algorithms
│   ├── config.py            # Global config + disease presets
│   ├── delta.py             # N-of-1 delta vector computation
│   ├── evidence_chain.py    # 4-layer evidence chain data structures
│   └── semo.py              # SA scoring + compound ranking
├── presets/                 # Pre-computed data (3 real clinical cases)
│   ├── catalog.json         # Case catalog
│   ├── datasets.json        # GEO dataset metadata
│   ├── positive_controls.json # Known drug ground truth
│   └── example_patients/    # 3 JSON patient files
├── viz/                     # Visualization (Nature-style theme)
│   ├── theme.py             # Color palette + rcParams
│   ├── hallmark_bar.py      # Hallmark perturbation bar chart
│   ├── drug_ranking.py      # Top-10 compound ranking chart
│   ├── evidence_network.py  # Drug-PPI-Hallmark network graph
│   ├── patient_card.py      # Single-page patient summary card
│   ├── evidence_view.py     # Scientist view (Paper Fig 6/8)
│   └── patient_view.py      # Patient view (Paper Fig 4/7)
└── examples/                # Reproduction scripts
    ├── reproduce_aging_patient_view.py
    ├── reproduce_ra_evidence_chain.py
    └── reproduce_dep_evidence_chain.py

Minimal dependencies:

dependencies = [
    "numpy>=1.21",
    "pandas>=1.3",
    "scipy>=1.7",
    "matplotlib>=3.5",
]

The 4-Layer Evidence Chain

Layer 1: Which Hallmark Pathways Are Disrupted?

Map DNA methylation to PPI (protein-protein interaction) network modules. Evaluate all 50 modules, find the ones significantly deviating from age-matched controls.

# core/delta.py — N-of-1 Delta vector
def compute_n1_delta(patient_genes, control_genes):
    """
    Δ_i = x_i - x̄_matched
    Matched controls: same sex, age ±5 years, K=10
    """
    matched_mean = control_genes.mean(axis=0)
    return patient_genes - matched_mean

Example findings in the aging case:

NAD+ metabolism module disrupted (Loss of NAD+)
Inflammatory modules upregulated (TNFα/NF-κB, IL-6/JAK/STAT3)
Protein homeostasis disturbed (Unfolded Protein Response)
Some modules remain normal (Hedgehog, Notch signaling)

Layer 2: Which Compounds Can "Steer Back"?

Compute a Steerability Alignment Score (SA Score) — essentially a Welch-type contrast statistic comparing methylation deltas of compound target genes vs. non-target genes.

# core/semo.py
def compute_sa_score(delta, target_genes, all_genes):
    """
    SA Score = Welch t-statistic
    Compare compound target genes vs non-targets in disrupted modules
    """
    target_delta = delta[delta.index.isin(target_genes)]
    non_target_delta = delta[~delta.index.isin(target_genes)]
    return welch_t(target_delta, non_target_delta)

Compound-target data comes from STITCH database. The ranking uses importance voting across bootstrap samples:

def rank_compounds_by_importance(sa_matrix, compounds):
    """
    Importance Voting: each sample's top-1 compound gets 1 vote
    """
    votes = defaultdict(int)
    for sample_sa in sa_matrix:
        top_compound = sample_sa.idxmax()
        votes[top_compound] += 1
    return sorted(votes.items(), key=lambda x: -x[1])

Aging case results: Niacin #1 (targets NAD+ metabolism), Colchicine #2 (anti-inflammatory). 2/5 top hits are known geroprotectors.

Layer 3: Mechanism Traceability

Trace each compound's mechanism: compound targets → PPI network neighbors → hub genes → corresponding Hallmark pathway.

Example: Niacin → NAMPT/NAPRT → NAD+ metabolism module → Loss of NAD+

Layer 4: Bootstrap Confidence

1000 bootstrap resamples to test ranking stability. Top-1 compound retention rate determines the evidence level:

Level	Bootstrap Stability	Meaning
STRONG	≥80%	Robust recommendation
MODERATE	50-80%	Reasonable evidence
EXPLORATORY	<50%	Hypothesis-generating only

The evidence chain is a clean dataclass:

# core/evidence_chain.py
@dataclass
class EvidenceChain:
    patient_id: str
    disease: str
    perturbed_modules: List[PPIModule]     # Layer 1
    top_compounds: List[CompoundMatch]     # Layer 2
    mechanism_map: dict                    # Layer 3
    bootstrap_stability: dict              # Layer 4

    def summary(self) -> str: ...
    def to_dict(self) -> dict: ...
    def to_json(self, path: str) -> None: ...

    @classmethod
    def from_dict(cls, data: dict) -> 'EvidenceChain':
        # Backward compatible: ignores unknown fields
        ...

Quick Start

Current version: SteeraMed Core is a proof-of-concept demo with 3 built-in real clinical cases from GEO. Custom data upload (450K/EPIC methylation arrays) is coming in future releases. Follow updates at steerable.world.

Interactive Mode

pip install steeramed-core
python -m steeramed_core

SteeraMed Core — N-of-1 Evidence Chain Explorer
=================================================

Select a patient case:
  [1] Aging · Population Screening
  [2] RA · 51M · T-cell Perturbation
  [3] Depression · 52M · Innate Immunity

Enter choice [1-3]:

Batch Mode

python -m steeramed_core --all             # Generate all cases
python -m steeramed_core --case ra_303     # Specific case
python -m steeramed_core --list            # List available cases

Python API — Load & Inspect

from steeramed_core import EvidenceChain, load_example_patient

patient = load_example_patient("ra_patient_303")
print(patient.summary())

# Inspect the 4 layers
print(f"Disrupted modules: {len(patient.perturbed_modules)}")
print(f"Top compound: {patient.top_compounds[0].compound_id}")
print(f"Bootstrap stability: {patient.bootstrap_stability}")

Python API — Generate All Charts

from steeramed_core import load_example_patient
from steeramed_core.viz.patient_card import plot_patient_card
from steeramed_core.viz.drug_ranking import plot_drug_ranking
from steeramed_core.viz.hallmark_bar import plot_hallmark_bar
from steeramed_core.viz.evidence_network import plot_evidence_network

data = load_example_patient("ra_patient_303").to_dict()

# 4 charts: hallmark perturbation, drug ranking, network, patient card
for fn, name in [
    (plot_hallmark_bar, "hallmark_bar"),
    (plot_drug_ranking, "drug_ranking"),
    (plot_evidence_network, "evidence_network"),
    (plot_patient_card, "patient_card"),
]:
    fig = fn(data)
    fig.savefig(f"{name}.png", dpi=300, bbox_inches="tight")

Python API — Publication-Grade Figures

from steeramed_core.viz.evidence_view import plot_evidence_chain
from steeramed_core.viz.patient_view import plot_patient_view

# Scientist view — 3-panel evidence chain (Paper Fig 6/8 style)
fig = plot_evidence_chain(data)
fig.savefig("evidence_chain.png", dpi=300, bbox_inches="tight")

# Patient view — 3-panel card (Paper Fig 4/7 style)
fig = plot_patient_view(data)
fig.savefig("patient_view.png", dpi=300, bbox_inches="tight")

Validation Results

Retrospective positive control validation on 3 GEO datasets:

Cohort	Disease	N	Key Finding	Evidence
GSE40279 (Hannum)	Aging	656	Niacin #1, 2/5 geroprotectors	MODERATE
GSE42861	Rheumatoid Arthritis	689	6/10 known RA drugs recovered, pentoxifylline #1	STRONG
GSE128235	Depression (MDD)	533	Creatine #1, innate immunity dominant	EXPLORATORY

The RA cohort is the strongest result: known RA drugs recovered at 5.8x random expectation, confirming that PPI module-level alignment captures meaningful drug-disease matches.

The depression cohort's top-1 compound (creatine) has only 24.5% bootstrap stability — honestly flagged as EXPLORATORY. This reflects the high heterogeneity and weak methylation signal in MDD.

Configuration

All hyperparameters live in core/config.py:

# PPI network
PPI_SCORE_CUTOFF = 400    # STRING confidence threshold
PPI_MIN_SIZE = 20         # Min genes per module
PPI_MAX_SIZE = 800        # Max genes per module

# Compound targets
STITCH_SCORE = 200        # STITCH confidence threshold
# Target count range: [60, 300]

# Bootstrap
BOOTSTRAP_N1 = 200        # N-of-1 resampling iterations
BOOTSTRAP_GROUP = 100     # Group-level iterations

# Matching
MATCH_K = 10              # Number of matched controls
MATCH_CALIPER = 5         # Age matching window (years)

Visualization API

Function	Chart Type	Size	Purpose
`plot_hallmark_bar()`	Horizontal bar	8xN in	Hallmark perturbation magnitude
`plot_drug_ranking()`	Horizontal bar	8xN in	Top-10 compound ranking
`plot_evidence_network()`	Bipartite network	10x7 in	Drug-Hallmark alignment
`plot_patient_card()`	Card layout	8x10 in	Single-page patient summary
`plot_evidence_chain()`	3-panel	7.2x9.0 in	Scientist view (Paper Fig 6/8)
`plot_patient_view()`	3-panel card	7.5x10.5 in	Patient view (Paper Fig 4/7)

All viz functions use matplotlib.use('Agg') — works on headless servers and CI.

Honest Limitations

Retrospective validation only — positive control recovery, not prospective clinical trials
Bootstrap confidence varies — depression case is only 24.5% stable (EXPLORATORY)
Single omics — DNA methylation only; transcriptomics/proteomics coming in future versions
Simple matching — age ±5 years + sex, doesn't control for cell composition
Demo stage — 3 preset cases, custom data upload coming soon

References

López-Otín C, et al. The hallmarks of aging. Cell, 2013, 153(6): 1194-1217.
López-Otín C, et al. Hallmarks of aging: An expanding universe. Cell, 2023, 186(2): 243-278.
Horvath S. DNA methylation age of human tissues and cell types. Genome Biology, 2013, 14: R115.
Hannum G, et al. Genome-wide methylation profiles reveal quantitative views of human aging rates. Molecular Cell, 2013, 49(4): 621-635.
Ha D, Schmidhuber J. World models. NeurIPS, 2018.
Liberzon A, et al. The Molecular Signatures Database Hallmark Gene Set Collection. Cell Systems, 2015, 1: 417-425.
Xiong J. World Models for Biomedicine: A Steerability Framework. Preprints.org, 2026. DOI: 10.20944/preprints202605.0366.v1
Xiong J. SteeraMed: A Biomedical World Model for N-of-1 Intervention Reasoning. Preprints.org, 2026. DOI: 10.20944/preprints202605.1578.v1

DEV Community