cosmosoneness

Posted on Jun 7 • Originally published at Medium

Building a Brain in Pure Python

#python #neuroscience #opensource #learning

This post originally appeared on Medium. Cross-posting it here for the dev.to community.

Six months ago I gave myself a challenge: build the entire pipeline for whole-brain simulation, in pure Python, on a single laptop. From raw imaging to consciousness metrics. End-to-end. No black boxes.

Today I'm opening up the first preview.

Whole Brain Emulation v0.1 — a connectome-level neural simulation framework with Hodgkin–Huxley channels, multi-compartment cable equations, STDP plasticity, glia, neuromodulation, neurovascular coupling, consciousness metrics, and biological-benchmark validation. 137 tests passing.

This is the second project in my Cosmos Research Institute — a personal portfolio of "computational emulation" systems. The first one was a multi-scale human-body simulator (cardio, respiratory, nervous, endocrine, immune — 255 tests). This one zooms into the brain.

Why bother?

There's a category of software I find irresistibly beautiful: systems that take a real-world process — a beating heart, an ecosystem, a star — and put it inside a Python for loop where you can poke it and watch what happens.

The brain is the most ambitious target in that category. Real whole-brain emulation is decades away. But the architecture — the data flow from imaging through simulation to validation — that we can build today, end-to-end, at small scale. And once that scaffolding exists, scaling up is mostly a matter of compute, not science.

So that's what v0.1 is. A working scaffold, every subsystem represented, ready to be scaled.

What's inside

The system is split into fourteen subpackages that compose into one pipeline:

raw imaging → segmentation → morphology → connectome →
neuron/synapse models → simulation → plasticity & neuromodulation →
consciousness metrics → validation against biological data

Concretely, that's:

wbe.scanner — load EM/MRI/NWB/SWC, segment, build a brain atlas, extract morphology trees
wbe.connectome — directed weighted multigraph of neurons + synapses, scipy-sparse adjacency, small-world / rich-club analysis
wbe.neuron_models — Hodgkin–Huxley, AdEx, Izhikevich, LIF, and full multi-compartment cable solvers (Hines tridiagonal matrices)
wbe.synapse_models — AMPA / NMDA (with Mg²⁺ block) / GABA-A / GABA-B receptors, Tsodyks–Markram short-term plasticity, gap junctions
wbe.simulation — the engine itself: event scheduler, monitors, checkpoints, integrators, distributed partitioning
wbe.plasticity — STDP, triplet-STDP, reward-modulated STDP with eligibility traces, synaptic scaling, structural plasticity
wbe.glial — astrocytes with calcium waves, microglia with state machines, oligodendrocyte myelination
wbe.neuromodulation — dopamine, serotonin, NE, ACh as 3-D reaction-diffusion concentration fields
wbe.vascular — blood-vessel graph, Balloon-Windkessel model for BOLD prediction, metabolism, ischaemia
wbe.consciousness — Integrated Information Φ, Perturbational Complexity Index, Global Workspace ignition detection
wbe.validation — F-I curves, spike waveforms, firing-rate distributions, oscillatory spectra — all measured against biological benchmarks
wbe.io — NWB, SONATA, GraphML, NIfTI, Zarr import/export
wbe.api — FastAPI REST + WebSocket server for external dashboards

Everything is held together by wbe.core — typed identifiers (NeuronID = "CTX-00142857"), Coordinate3D, biophysical constants, math utilities (Nernst, GHK, HH gating, integrators).

The whole pipeline in one script

This is roughly what the end-to-end "hello world" looks like:

from wbe.scanner    import SyntheticDataGenerator, NeuronSegmenter, MorphologyExtractor
from wbe.connectome import ConnectomeBuilder, NeuronNode
from wbe.simulation import SimulationEngine, SimulationConfig
from wbe.plasticity import PairSTDP, SynapticScaling, PlasticityScheduler
from wbe.consciousness import MacroPhiEstimator, PerturbationalComplexityIndex
from wbe.validation import ValidationSuite, FICurveValidator
import math

# 1. Synthetic imaging → morphologies
volume = SyntheticDataGenerator(seed=42).generate_volume((64, 64, 64), n_neurons=50)
labels = NeuronSegmenter().segment_neurons(volume, min_size=100)
morphos = MorphologyExtractor().extract_from_labels(labels, min_voxels=300)

# 2. Build a 50-neuron cortical connectome
builder = ConnectomeBuilder()
builder.add_neurons([NeuronNode(...) for m in morphos])
builder.apply_distance_dependent_rule(lambda d: 0.3 * math.exp(-d / 100))
builder.apply_dales_law()
connectome = builder.build()

# 3. Simulate for 1 second with STDP + homeostasis
config = SimulationConfig(dt=0.025, duration=1000.0, method="rk4", seed=42)
engine = SimulationEngine(connectome=connectome, config=config)
ps = PlasticityScheduler()
ps.add_rule(PairSTDP())
ps.add_rule(SynapticScaling(target_rate=5.0))
for t in range(40_000):
    engine.step(dt=0.025)
    ps.update_all(connectome, dt=0.025)

# 4. Consciousness metrics
phi  = MacroPhiEstimator().estimate(engine.get_state().voltages)
pci  = PerturbationalComplexityIndex().compute(spike_train=..., perturbation_times=[500])

# 5. Validate against biology
report = ValidationSuite(validators=[FICurveValidator()]).run(...)
print(f"Φ={phi:.3f}  PCI={pci:.3f}  fidelity={report.summary_score:.0%}")

Every line above is implemented. Not every line is optimal — at 50 neurons it's a toy. But the data types, the contracts between modules, the validation hooks — those are real, and they're what determines whether the system can grow.

What I learned along the way

A few things I didn't expect when I started:

The hardest part isn't the math. Hodgkin-Huxley fits in 30 lines. Cable equations are tridiagonal solves. STDP is two exponential decays. The hard part is the bookkeeping — neuron IDs, coordinate frames, units, validation, reproducibility, the fact that a Connectome needs to expose itself simultaneously as a list of nodes, a sparse matrix, a NetworkX graph, and a GraphML export. Architecture eats biology.

Pydantic v2 with frozen=True is a quiet superpower. Coordinate3D, SpikeEvent, SynapseID are all frozen — immutable, hashable, usable as dict keys, validated at construction. The whole system gets safer for free.

Struct-of-arrays beats array-of-structs at scale. A Synapse object costs maybe 200 bytes of Python overhead. A million synapses is 200 MB before you've stored any data. Parallel np.ndarray columns for pre_id, post_id, weight, delay cost 4×4×10⁶ ≈ 16 MB. The connectome stores data this way.

The consciousness module is a Rorschach test. Integrated Information Theory says exact Φ is computationally intractable at scale, so what I have are estimators — MacroPhiEstimator does atomic-bipartition sampling, PerturbationalComplexityIndex uses virtual TMS and Lempel-Ziv. Whether they "measure consciousness" depends on which theorist you ask. They are interesting numbers to compute regardless.

What's next

v0.1 is a scaffold. v0.2 is about replacing every synthetic component with a real one:

Load actual mouse cortex data from the Allen Cell Types Database
Hook in the BICCN connectome instead of distance-dependent synthetic rules
Validate against published F-I curves from real patch-clamp recordings
Add CUDA kernels for the inner integration loop via CuPy

The repo is the place to follow along: Cosmos / WholeBrainEmulation.

Full tutorial: TUTORIAL.md — twelve sections, end-to-end worked example, twelve-week curriculum.

If any of this is useful to you, or if you're building something adjacent and want to compare notes, find me on GitHub: @Oscar-Wu-Po-Wei.

More from the Cosmos series soon — stellar dynamics, ecology, economic agents. The goal is to keep building emulators of "the world we live in" until the institute earns its name.

DEV Community