Kwansub Yun

Posted on Apr 27 • Originally published at flamehaven.space

When an AI Pipeline Passes — But One Path Still Must Be Held: EXP-034

#bioinformatics #reproducibility #governance #ai

No efficacy, causal, or clinical claims are made in this report.
RExSyn is an experimental Bio-AI governance pipeline.

You do not need to know the earlier experiments to read this report.

Most AI pipeline reports ask one question:

Did the system pass?

EXP-034 asked a stricter one:

Which path was allowed to count?

That distinction matters.

In a multi-stage AI pipeline, a final PASS can hide a lot of unresolved risk. A branch may be unstable. A regeneration path may drift. A new external API may enter the chain without being governed. A new modality may appear to improve the system while quietly changing the basis of judgment.

So EXP-034 was not designed to produce a clean success story.

It was designed to separate three things:

Path	Status	Meaning
Anchored expansion path	`GO`	Accepted path for EXP-034 reporting
Current regeneration path	`HOLD`	Diagnostic evidence, not acceptance baseline
Next remediation cycle	`EXP-035`	RCA and repair target

That is the real result.

EXP-034 passed, but not because every path passed.

It passed because the accepted anchor remained stable, the expansion tracks did not break the judgment system, and the unresolved regeneration path was explicitly held instead of being silently mixed into acceptance.

What EXP-034 tested

EXP-033 had already established a parity baseline.

EXP-034 asked whether that baseline could survive controlled expansion while adding:

a modal update track,
a live AlphaFold EBI observer endpoint,
and AlphaGenome / AG measurement.

The operating rule was simple:

Reproduce the parity baseline first.
Only then allow expansion.
Only then compare governance behavior across experiment cycles.

If the parity anchor breaks, the rest is not expansion.

It is regression.

The scope was also locked: methodology, governance, and reproducibility only. The experiment did not claim biological efficacy, causal inference, or clinical recommendation.

That boundary is important because this kind of system can easily sound more powerful than what was actually measured. EXP-034 was not asking whether the pipeline discovered a better biological answer.

It was asking whether the judgment system stayed governable after new signals entered the chain.

The key split: PASS did not mean everything passed

Track-A produced the defining decision of the experiment.

The accepted legacy replay anchor preserved the required PASS/BLOCK separation:

Metric	Legacy replay anchor
sample accuracy	`1.0`
sample balanced accuracy	`1.0`
arm accuracy	`1.0`
arm balanced accuracy	`1.0`
dangerous false-pass rate	`0.0`
false reject rate	`0.0`

That was the path allowed to anchor EXP-034.

But the current regeneration path did not recover:

Metric	Current regeneration
sample accuracy	`0.5`
sample balanced accuracy	`0.5`
status	`HOLD`

This is the most important part of the experiment:

EXP-034 did not pretend the regeneration path passed.

It kept that result inside the experiment as diagnostic evidence, but did not allow it to redefine the accepted baseline.

That separation is not a minor operational detail. It is the governance result.

A weak pipeline would have blended the two paths and still reported a final success. EXP-034 did the opposite. It allowed the stable anchor to proceed and held the unstable path for RCA.

That is how a stage-gated system avoids changing its own question after seeing the result.

Why path splitting matters

The concrete governance problem is this:

A pipeline can pass for the wrong reason.

valid_report = stable_anchor × traceable_extension × contained_instability

If the anchor is not stable, the report cannot be trusted.

If the extension is not traceable, the new signal becomes an ungoverned side channel.

If instability is not contained, a diagnostic failure can quietly contaminate acceptance.

A single final PASS is not enough when several branches contribute to a verdict. You need to know which branch produced the accepted decision, which branch failed, which branch was only diagnostic, and which branch is allowed to affect future work.

EXP-034 passed because all three conditions were enforced:

the legacy replay anchor held,
the new observer and AG paths were measured under governance,
and the regeneration HOLD remained outside acceptance.

That is the difference between a pipeline that merely outputs a verdict and a pipeline that controls which verdicts are allowed to count.

Adding AlphaFold EBI as an observer, not a predictor

Relative to EXP-033, EXP-034 added a live AlphaFold Protein Structure Database / EBI observer line.

This was not promoted into a primary predictor.

It was wired as an observer/reference oracle and traced into governance as ebi_g2.

The result:

Check	Result
AlphaFold EBI direct endpoint for `P23219`	`GO`
Stage 7 observer tests	`2 passed`
`ebi_g2` governance traceability	`PASS`
`BLOCKED_IDP` mapping path	validated in test

The point is not simply that an external endpoint responded.

The point is that the external signal entered the system through a governed path. It was not allowed to float beside the pipeline as informal context.

EXP-034 tested whether the new observer could be admitted without becoming an ungoverned side channel.

AG-live: non-degradation, not repair

Track-C tested a simple question:

If AG-live enters the pipeline, does it change the final decision?

The answer was no.

AG-live did enter the pipeline.

The AlphaGenome field was present with:

AG field	Value
source	`alphagenome_api_live`
pathogenicity_score	`0.5`
confidence	`0.7143`
clinical_significance	`uncertain`

These are sanitized branch artifact values, not implementation code or full raw artifacts.

AG-live did not change classification.

Both controls remained governed by the same conservative decision boundary:

Path	Expected	Observed	Interpretation
`EXP032-BLOCK-001` negative control	`BLOCK_EXPECTED`	`BLOCK / ESCALATE`	fail-closed behavior preserved
`EXP032-PASS-001` pass-eligible control	`PASS_ELIGIBLE`	`BLOCK / ESCALATE`	conservative over-blocking persisted

That is the key nuance.

AG-live did not create a dangerous false-pass. The negative control stayed blocked.

But AG-live also did not repair the current regeneration hold. The pass-eligible control still failed to recover and remained blocked under R2_component_floor.

The governance surface moved slightly, but the verdict did not:

Metric	Earlier AG branch	AG-live branch
`p_e2e`	`0.0912`	`0.0947`
clinical status	`BLOCK`	`BLOCK`
rule	`R2_component_floor`	`R2_component_floor`

So the correct conclusion is not:

AG improved the pipeline.

The correct conclusion is:

AG-live changed the measurement surface slightly, but did not change the decision boundary.

That is exactly what non-degradation means here.

It preserved fail-closed behavior on the negative control while leaving the pass-eligible control over-blocked.

This is why Track-C can only be called non-degradation, not repair.

Contract passed, but governance still blocked

One of the most useful details in EXP-034 is that the contract layer and governance layer did not collapse into one verdict.

The contract inspection reported:

Field	Value
pipeline contract score	`0.9077`
weakest connection	`C2`
dangerous pass risk	`0.0`
gate recommendation	`PASS`
overall OK	`true`

But the clinical governance layer still blocked the case.

That is not a contradiction.

It means the pipeline connection was valid enough to inspect, but the decision was not safe enough to accept.

This distinction matters.

A weaker system might treat a passing contract as permission to pass the whole output. EXP-034 did not do that. It allowed the contract layer to say:

The pipeline is connected.

while the governance layer could still say:

The claim should not pass.

That separation is exactly what a governance layer is supposed to preserve.

Cross-cycle comparison: EXP-032 → EXP-033 → EXP-034

Track-D compared the accepted anchor path across cycles.

You do not need the earlier experiments as background. They matter here for one reason only:

EXP-034 was not allowed to invent a new success criterion.

EXP-032 and EXP-033 provided the previous PASS/BLOCK baseline. EXP-034 tested whether that baseline survived expansion.

The classification baseline stayed fixed:

Compare	Accuracy / balanced accuracy
EXP-032 → EXP-034	`1.0 / 1.0`
EXP-033 → EXP-034	`1.0 / 1.0`

At the same time, governance signals moved:

Governance signal	Delta
`ccge_p_e2e_mean`	`+0.04447488775996111`
`nnsl_sr9_tech_mean`	`+0.04692394788063081`
`nnsl_di2_tech_mean`	`-0.03667940951579321`

The interpretation is narrow:

The judgment baseline stayed fixed while the governance surface became more measurable.

That is what EXP-034 was allowed to claim.

It did not prove biological efficacy.

It did not prove that every branch of the system was now stable.

It proved that controlled expansion could happen without breaking the accepted PASS/BLOCK baseline.

Stage-gate result

EXP-034 ended with all five stage gates passing:

Gate	Status
G1 parity	`PASS`
G2 reproducibility	`PASS`
G3 cross-experiment compare	`PASS`
G4 governance traceability	`PASS`
G5 extension safety	`PASS`

Final state:

Field	Value
overall status	`PASS`
anchor mode	`legacy_replay`
first failed gate	`null`
diagnostic hold	`Track-A current regeneration`

This is the important nuance:

The experiment passed with a retained diagnostic hold.

That is not a contradiction. It is the point of the control system.

The accepted anchor path was allowed to proceed. The current regeneration path was not. The remediation target was moved to EXP-035.

That separation is the actual proof EXP-034 provides: not that every branch became stable, but that instability was not allowed to contaminate acceptance.

What EXP-034 actually showed

EXP-034 did not show that the entire pipeline is now stable.

It showed something narrower and more useful:

A method-locked Bio-AI governance pipeline can admit modal expansion, AlphaFold EBI observer wiring, and AG-live measurement without losing its accepted PASS/BLOCK baseline — while keeping the unstable regeneration path out of acceptance.

Track-C sharpened that conclusion.

AG-live entered.

Metrics moved slightly.

The verdict did not change.

Dangerous false-pass did not appear.

Conservative over-blocking remained.

That is not a clean success story.

It is a governed result.

Closing

Stage-gated experimentation is not just about getting a result.

It is about deciding whether the result should be allowed to exist.

In EXP-034, the answer was:

GO   for the anchored expansion path
HOLD for current regeneration
NEXT for EXP-035 remediation

That may sound less dramatic than a clean success story.

But in governance work, that is exactly the point.

A mature AI pipeline is not the one that claims everything passed.

It is the one that can say:

This path passed.

This path did not.

And we did not mix them.

DEV Community