MxBv

Posted on Apr 30 • Originally published at petronus.eu

NC2.5 ↔ HORIZON: On the Structural Reducibility of Long-Horizon Agent Failures to a Single Architectural Deficit

#nc25 #horizon #empiricalprobe #admissibility

NC2.5 ↔ HORIZON: On the Structural Reducibility of Long-Horizon Agent Failures to a Single Architectural Deficit

Maksim Barziankou (MxBv)
PETRONUS™ | research@petronus.eu
DOI: 10.17605/OSF.IO/BJ79D
Axiomatic Core (NC2.5 v2.1): DOI 10.17605/OSF.IO/NHTC5

NC2.5 Empirical Probes — Part I (v1.0). April 2026 · Poznań.

Philosophical Frame

The author builds a fundamental architectural theory of the structural limits of long-horizon adaptation, positioned in the same line as classical results on architectural impossibility (CAP, FLP, RINA), but formulated along an axis that those results do not cover — the axis of bounded viability budget and structural identity preservation. Navigational Cybernetics 2.5 (NC2.5) is not a control framework, not a learning framework, and not a reinforcement-theoretic formulation. It is a structural theory of what it means for a bounded adaptive system to remain itself across sufficiently long time. Its central formal objects are the monotone irreversible burden Φ, the Lyapunov-type viability budget τ = C − Φ, the non-causal admissibility predicate that gates realization without entering optimization, the non-potential divergence-free component called spin that is structurally necessary for non-stagnant identity on bounded orbits, and the non-reconstructibility bounds (NR-ε, NR-LR) that formally prevent the admissibility layer from leaking into any causal or distributional channel of the agent.

The purpose of the present work is not to argue for NC2.5. The purpose is to examine a recently published empirical diagnostic — the HORIZON benchmark (Wang, Bai, Sun, Wang, Zhang, Hu, Schroder, Mutlu, Song, Nowak, 2026; arXiv:2604.11978) — and to examine whether the failures it documents across four cognitive domains can be parsimoniously reduced, under the NC2.5 formalism, to a single primary architectural deficit, with the seven observed failure categories arising as symptomatic projections of that deficit through specific architectural layers.

This is not a claim that HORIZON validates NC2.5. It is a claim that HORIZON provides the first independent empirical surface on which NC2.5 can be falsified or supported by direct test. The reduction proposed here is therefore an architectural hypothesis with attached operational tests, not a verification result.

The technical primitives invoked throughout this work — the admissibility predicate, the viability budget τ, the structural burden Φ, the spin component, the non-reconstructibility bounds NR-ε and NR-LR — are formally defined in the NC2.5 axiomatic core (DOI 10.17605/OSF.IO/NHTC5). This paper uses them as primitives and does not re-derive them. The reader who finds the language opaque is invited to consult the core; the language is not closed-system terminology, but a corpus already on independent record.

For convenience within this paper, the working senses of these primitives are:

Φ (structural burden) — accumulated irreversible structural load.
τ (viability budget) — remaining viability scalar, τ = C − Φ.
admissibility — pre-optimization structural permission predicate, applied to candidate realizations before optimization selects among them.
spin — directional asymmetry signal carried over the contracting admissibility geometry.
NR-ε, NR-LR — non-reconstructibility bounds preventing recovery of admissibility / navigation from downstream behavioral signals.

This is the first probe in a series that brings NC2.5 into contact with independent empirical work. Each probe carries the architectural primitives of NC2.5 into a separate empirical corpus and extracts what they pick up there in the form of testable predictions.

1. What HORIZON Measures

HORIZON proposes a diagnostic benchmark for long-horizon agent evaluation. Unlike leaderboard benchmarks that rank models by aggregate success, HORIZON is explicitly constructed to characterize where and why agents break as the horizon of a task grows. It introduces three methodological objects worth isolating.

Intrinsic Horizon H*. The minimum number of effective actions required by an optimal policy to solve a task, defined independently of any particular agent through expert demonstrations or oracle solvers. H* is a property of the task, not of the solver.

Extension methods. Tasks are systematically scaled along two axes. Breadth extension composes multiple independent subtasks into a single workflow. Depth extension inserts non-skippable intermediate states into an otherwise solvable task — states that the optimal policy cannot bypass. Extension level s parameterizes the degree of horizon inflation applied to a baseline task.

Seven-category failure taxonomy. Every failed trajectory is classified by an LLM judge into one of seven primary failure modes: Environment Disturbance, Instruction (ill-defined / partial understanding), Planning Error, False Assumption, History Error Accumulation, Catastrophic Forgetting, Memory Limitation. The categories are described as orthogonal dimensions of agent behavior rather than mutually exclusive classes.

Across four domains — web navigation, operating system control, database querying, embodied manipulation — HORIZON reports three robust empirical patterns. First, every tested model exhibits a break level: a value of s beyond which success rate collapses rather than degrades smoothly. Second, Planning Error dominates the failure distribution across all domains. Third, embodied manipulation degrades the steepest; web navigation breaks earliest in s.

These observations are interesting in themselves. They are more interesting when interpreted structurally.

2. Structural Reframe

NC2.5 is built on a premise that HORIZON does not state explicitly but observes repeatedly: long-horizon success is not a monotone extrapolation of short-horizon success. The geometry of the former differs from the geometry of the latter in ways that no amount of capability scaling repairs. HORIZON's break level is the empirical signature of this geometric difference.

The NC2.5 account of why the break level exists is as follows. An agent operating on a bounded viability budget τ = C − Φ faces not a single optimization problem but a stratified one. There is a task objective to be optimized. There is an admissibility predicate that determines which realizations are structurally permitted independently of the objective. And there is a navigational signal — the directional asymmetry of interior contraction — that orients the agent away from configurations in which τ is depleted faster than the task can benefit from the depletion.

Operational definition: non-causal navigational layer. Throughout this work, "non-causal navigational layer" refers to a specific architectural construct: a directional-asymmetry tracking mechanism that observes the contracting interior of the admissibility geometry under monotone Φ accumulation, produces orientation signals about that geometry, and is categorically prohibited from entering action selection or optimization gradients. In NC2.5 formalism this corresponds to the spin component (Theorem 62) operating on the admissibility-set geometry (Axioms 29, 31, 60) under the non-reconstructibility bounds (NR-ε, NR-LR). The layer is "non-causal" in the technical sense that its output cannot be reconstructed from downstream behavioral signals beyond declared ε tolerances; it is "navigational" in that it produces directional information without itself being acted upon.

The term "non-causal" is sometimes misread as "behaviorally irrelevant". It is not. The layer constrains the candidate realization space before optimization selects within it; what it removes from the candidate set never reaches the optimizer at all. Its internal signal is not recoverable from downstream behavioral traces and is not itself an optimization target — but the geometry it shapes is what the optimizer ends up navigating. The layer affects behavior by changing what is available to optimize over, not by participating in optimization. This distinction is the architectural content of "non-causal upstream"; without it the term collapses into "irrelevant", which is not its meaning here.

In systems that collapse admissibility and navigation into the task objective, these three functions compete for the same gradient. The optimizer colonizes both the admissibility check and the navigational orientation. This colonization works on short horizons because τ remains effectively unbounded relative to task length; the structural deficit is invisible. On long horizons, τ becomes binding, and the absence of upstream structural layers manifests as a sharp, not gradual, failure. HORIZON's break level is, in NC2.5 terms, the horizon at which the structural deficit of a single-layer architecture becomes larger than any residual optimization competence can compensate for.

The dominance of Planning Error in the failure distribution is, under this reading, what NC2.5 would predict as the leading symptomatic projection under this reduction. Planning is the function most sensitive to the absence of an upstream navigational layer, because planning is precisely the activity of selecting trajectories in a space whose shape is being consumed by the act of traversing it. An agent without access to the directional asymmetry of interior contraction cannot plan on a long horizon; it can only plan as if the interior were stable. When the horizon exceeds the stability assumption, planning fails — and it fails more visibly than any other function, because the entire apparatus of planning presupposes the stability that has just broken.

Why this is not merely a planning-and-memory bottleneck account

The most natural alternative to the NC2.5 reading is a simpler one: long-horizon failure is just the expected consequence of finite memory, planning myopia, and error compounding under partial observability. Such an account would not require admissibility layers, non-reconstructibility bounds, or any of the NC2.5 formal apparatus.

Three features of HORIZON's findings are not naturally produced by such a baseline account. The first is the sharpness of the break level. A baseline of finite memory plus accumulating planning errors predicts smooth degradation: as horizon grows, memory pressure rises and planning errors compound, and success rate declines monotonically. It does not predict a phase transition. The empirical signature of a sharp break, robustly observed across all tested models and domains, is what a baseline account leaves underdetermined and what a structural account — in which admissibility geometry contracts non-linearly as τ approaches Φ — produces directly.

The second is the cross-domain dominance of Planning Error specifically. A baseline account has no reason to privilege Planning over Memory or False Assumption as the dominant failure category in every domain; one would expect the dominant category to depend on which bottleneck binds first in each domain. The observed cross-domain uniformity of Planning Error dominance suggests an upstream structural source rather than per-domain bottlenecks reaching their limits independently.

The third is the simultaneous co-occurrence of apparently distinct failure categories within the same trajectory under horizon inflation. A baseline account predicts that a single trajectory fails for a single dominant reason: it ran out of memory, or its plan was wrong, or it misread the instruction. HORIZON's authors note that categories are described as orthogonal dimensions rather than exhaustive classes precisely because failures co-occur. Co-occurrence at the trajectory level is what one would expect if a shared upstream deficit registers stress on multiple layers simultaneously, and it is what one would not expect if the categories were independent bottlenecks.

The NC2.5 reading is therefore not introduced as a replacement for planning or memory explanations, but as an upstream architectural account of why those downstream symptoms become jointly dominant once bounded viability becomes binding.

3. Reduction of the Seven-Category Taxonomy

HORIZON's taxonomy organizes failures along seven dimensions. NC2.5 formalism permits a hierarchical reduction: one primary architectural deficit (the absence of a non-causal navigational layer upstream of optimization, as defined in §2) projects through specific architectural layers into observable failure categories. The seven HORIZON categories are not seven independent failure modes nor seven independent deficits; they are seven symptomatic projections of the same primary deficit through different layer-specific manifestations.

The hierarchical reduction proposed below is an architectural hypothesis under operational test, not a derived theorem of NC2.5. The collapse of seven categories to one architectural deficit is a strong claim and is offered as such; its support comes not from the elegance of the mapping but from the operational tests in §5 that the mapping makes possible. If those tests fail, the reduction fails — locally, in this probe, against this corpus — without thereby refuting NC2.5 itself.

The value of the reduction is not nomenclatural. The mapping makes each category individually falsifiable through a layer-specific protocol — a property the original taxonomy does not provide, since orthogonal categories that may co-occur on a single trajectory cannot be tested independently without an upstream architectural account of why they co-occur. The reduction proposed here therefore replaces a flat seven-way classification with a hierarchical structure in which each layer carries its own falsification surface, and the joint co-occurrence pattern itself becomes a measurable predictor.

The reduction is not claimed as unique; it is offered as the coarsest hierarchical reduction that preserves the layer-specific diagnostic content of the observed categories. No further compression of categories preserves the architectural distinctness of each projection. Each category is, in principle, individually testable and individually falsifiable; what they share is upstream provenance, not downstream indistinguishability.

Planning Error is the projection through the forward admissibility evaluation layer. In NC2.5, admissibility is a non-causal predicate evaluated on candidate realizations before optimization selects among them. An agent without this predicate plans within a candidate set that includes structurally impermissible trajectories, and the planning process allocates budget to exploration of those trajectories before failure is detected. The empirical dominance of this category reflects the prevalence of single-layer architectures in current agentic systems.

False Assumption is the projection through the admissibility grounding layer. NC2.5 requires that admissibility be checked against the actual structural state of the environment, not against a cached or inferred state. When grounding is absent or delayed, admissibility evaluation proceeds on an assumed structural state that diverges from the actual one. The trajectory is admissible in the assumed world and inadmissible in the real world.

History Error Accumulation is the projection through the structural burden accounting layer. Φ is, by definition, monotone and irreversible. An agent that does not maintain a formal accounting of accumulated burden cannot detect that its viability budget is approaching exhaustion. Errors compound not because the agent forgets them — in the HORIZON sense, the trajectory is still available — but because the agent does not represent their structural cost. Each additional error consumes τ without registering as consumption.

Catastrophic Forgetting and Memory Limitation are two projections through the regime-coherence layer. NC2.5 treats memory not as a retrieval surface but as a regime-dependent selection over accumulated structural information. The distinction between catastrophic forgetting and memory limitation in HORIZON maps onto the distinction between regime transition without continuity preservation (forgetting) and bounded regime capacity saturated by insufficiently selective compression (limitation). Both are projections of the same primary deficit through different stress paths on the same underlying layer: the maintenance of structural coherence across regime boundaries.

Instruction (Ill-defined / Partial Understanding) is the projection through the semantic commitment layer. NC2.5 formalizes semantic commitment as an architectural authorization problem rather than a parsing problem. An agent that commits prematurely to a semantic interpretation operates on a specification that is not the one the instruction carries; an agent that does not commit operates on no specification at all. The NC2.5 treatment requires a regulated commitment operator that binds interpretation to the current regime and remains revisable under bounded Lyapunov descent.

Environment Disturbance is the projection through the coupling-aware viability regulation layer. External perturbations are not failures of the agent; they are failures of the architectural assumption that the agent and environment can be analyzed separately. NC2.5 treats system-environment coupling as a structural variable that modulates the rate of τ depletion. An agent that does not represent coupling explicitly cannot distinguish between a perturbation that is costly and one that is informative.

The reduction is therefore hierarchical: one primary architectural deficit (the missing non-causal navigational layer with its associated stack of admissibility-grounding, burden-accounting, regime-coherence, semantic-commitment, and coupling-aware regulatory functions) generates seven empirically separable but structurally cosourced failure categories. Each category is a separate observable; their joint explanation is a single architectural absence.

4. Residual

Three features of HORIZON's findings are not, under this reduction, adequately explained within HORIZON's own framework. They are explained — or at minimum made explicit — under NC2.5.

The sharpness of the break level. HORIZON documents that degradation is not smooth; it is a phase transition. Their account characterizes the empirical signature without proposing a structural mechanism. NC2.5 predicts the sharpness structurally. When the viability budget τ approaches the burden threshold Φ, admissibility contracts geometrically, not linearly. A single-layer architecture that has been operating in a regime where τ was non-binding does not experience gradual degradation as τ becomes binding; it experiences a regime transition. The break level is the empirical signature of this transition.

The domain ordering. HORIZON reports four domain-specific degradation profiles. Embodied manipulation degrades steepest; web navigation breaks earliest in s; operating system control and database querying sustain moderate performance until later extension levels. Under NC2.5, this ordering is interpretable through the coupling geometry of each domain. Embodied manipulation involves tight, high-dimensional coupling between agent action and environmental state, which amplifies τ depletion per action and produces steep failure under depth extension. Web navigation involves broad, loosely-coupled interactions where structural burden accumulates rapidly across many small actions without any single action being costly, producing early breadth-driven failure. Operating system control occupies an intermediate position: discrete state with persistent consequences, moderate coupling depth, structural burden accumulating per irreversible system modification rather than per query. Database querying is similarly intermediate: schema-bounded coupling that is fact-based and largely stateless per individual query, with structural burden concentrated at schema-modification or transaction-commitment boundaries rather than across query streams. The four profiles align with a coupling geometry continuum: tight (embodied), broad-loose (web), state-persistent (OS), and stateless-per-query (DB), with break behavior tracking the rate at which τ depletion concentrates per action in each regime.

The classification problem of the taxonomy itself. HORIZON notes that a single failed trajectory may exhibit multiple failure types simultaneously, and that categories are described as orthogonal dimensions rather than exhaustive classes. The authors explicitly note this overlap as a methodological tension to be addressed in future work. Under the hierarchical reduction of §3, this overlap admits a structural reading: the seven categories are projections of a shared primary deficit through different architectural layers, and different layers can register stress simultaneously when the deficit is active. The co-occurrence structure of the categories is, in principle, a measurable quantity, and NC2.5 predicts specific co-occurrence patterns based on which layers share the closest stress paths in a given deployment.

5. Predictive Consequence

A structural theory is of limited use if it only reinterprets observations post hoc. NC2.5 produces three predictions that extend beyond HORIZON's current empirical scope and are testable with HORIZON's methodology, though Predictions 1 and 2 require expanded model coverage to be operationally tight at HORIZON's current scale.

First. For a fixed model, the break level s* correlates with an agent-independent property of the task — specifically, the rate at which the admissible candidate set contracts as s increases (operationally proxied through extension-induced trajectory survival structure across model families). Tasks whose admissible set contracts slowly should exhibit high break levels across all models; tasks whose admissible set contracts rapidly should exhibit low break levels across all models. The variance of s* across models on a given task is a second-order effect, predicted by NC2.5 to be smaller than the variance of s* across tasks for a given model. Operational falsification: the prediction is falsified if, on a corpus with n_models ≥ 5 model families and matched task counts, the variance of s* across models on a given task is statistically indistinguishable from or larger than the variance of s* across tasks for a given model after sample-size correction, at p < 0.05. HORIZON's current n_models = 2 (GPT-5 variants and Claude-4-Sonnet) is insufficient for tight test; the prediction is testable as the model coverage expands.

Second. Introducing an explicit upstream admissibility check — a separate computational stage that evaluates candidate trajectories against structural predicates before optimization selects among them — should shift s* upward. The shift is predicted to monotonically increase with the fraction of candidate trajectories the check eliminates, though the exact scaling law is left open to empirical determination. NC2.5 predicts that this architectural intervention produces a larger shift than any prompt-level or post-hoc correction of equivalent computational cost, because it addresses the deficit at its structural location rather than at a downstream symptom. Operational falsification: the prediction is falsified if a prompt-level or post-hoc intervention produces a shift in s* statistically indistinguishable from or larger than an upstream admissibility check intervention of equivalent computational cost, on matched task sets.

A confounding alternative to the architectural reading of Planning Error dominance is that the LLM-judge classification protocol or the natural breadth of the Planning category produces dominance independently of any architectural deficit. This alternative is testable through two manipulations of HORIZON's existing methodology: variation of the judge protocol (different judge model, different category definitions, blind co-classification by multiple judges) and decomposition of the Planning category into finer sub-categories. NC2.5 predicts that Planning Error dominance persists across both manipulations, since the underlying source is architectural rather than nomenclatural; a methodological-artifact account predicts attenuation under either manipulation. Operational falsification: the architectural reading is weakened if Planning Error dominance attenuates by more than the cross-domain variance of category prevalence under either of the two manipulations, on matched trajectory subsets. This sub-prediction is testable with HORIZON's existing 3,100+ trajectory corpus; no additional experimental arm is required.

Third. The co-occurrence structure of HORIZON's seven failure categories, measured across trajectories, should exhibit statistically significant deviations from independence. Specifically, NC2.5 predicts three clusters tracking the layer-stress paths identified in §3: a grounding cluster (Planning Error, False Assumption, Environment Disturbance) reflecting failures in admissibility evaluation against actual environment state; a memory cluster (Catastrophic Forgetting, Memory Limitation, History Error Accumulation) reflecting failures in burden accounting and regime-coherence maintenance; and an interpretive singleton (Instruction) whose membership shifts across horizon. Instruction failures should cluster with memory failures at long horizon and with grounding failures at short horizon, because semantic commitment under uncertainty resolves differently depending on which layer is binding. Operational falsification: the cluster structure is falsified if pairwise mutual information between categories within a predicted cluster is statistically indistinguishable from cross-cluster pairwise mutual information at p < 0.05 over the trajectory corpus.

Predictions 2 and 3 are falsifiable with HORIZON's existing trajectory corpus. A reanalysis of the over 3,100 trajectories in their published attribution results would suffice for Prediction 3; Prediction 2 requires an additional experimental arm with the upstream admissibility check intervention.

It is worth stating in one place what would count against the probe. The architectural reading is weakened, on the same corpus and methodology, by any of the following: Planning Error dominance attenuating under judge-protocol variation or category decomposition; pairwise mutual information among the predicted failure clusters indistinguishable from cross-cluster mutual information; prompt-level or post-hoc interventions matching upstream admissibility check at equal computational cost; break-level s* tracking model capacity more strongly than task-level admissibility-set contraction structure; or a memory-only or cumulative-error baseline reproducing the observed sharp-break profile across all four domains with fewer architectural assumptions. Each of these outcomes is independently observable and would refute the probe locally, in this corpus, without thereby refuting NC2.5 as a corpus. The architectural reading earns its standing only if these tests pass.

6. Architectural Implication

The reduction of the seven-category taxonomy to a single primary architectural deficit implies a single architectural direction. It does not imply a single implementation.

The direction is this: architectures that are expected to maintain performance on long horizons are predicted, under NC2.5, to require a stable separation between navigation, admissibility, and optimization into distinct, non-causally related layers, and to maintain the stability of this separation under horizon inflation. The separation cannot be enforced by prompt engineering, by post-hoc filtering, by penalty terms in an objective function, or by any construct that permits information flow from optimization back into admissibility. Each of these constructs reintroduces the coupling that the separation was designed to prevent.

The non-reconstructibility bounds (NR-ε, NR-LR) formalized in NC2.5 specify the minimum conditions under which this separation remains valid. In practical terms, an admissibility layer that can be reconstructed from downstream behavioral signals — via mutual information accumulation, side-channel analysis, or distributional inference — has failed its structural function, regardless of whether it passes functional tests at any finite horizon.

HORIZON's empirical signature, read structurally, suggests that most current agentic architectures fall below this minimum threshold. The break level observed across models is not a model property; it is an architectural property of the class of single-layer systems currently deployed. Scaling within this class is unlikely to remove the break once τ becomes binding, even if it can postpone or partially mask it over shorter horizons.

Architectures that satisfy the NC2.5 layer separation condition should exhibit a qualitatively different break profile: gradual degradation rather than sharp failure, and significantly lower prevalence of Planning Error and False Assumption categories in the failure distribution. These predictions are, again, testable within HORIZON's methodology.

7. Philosophical Review

A short note on what is at stake, and it is the only place in this work where I will speak in the first person.

HORIZON measures where systems break rather than merely how well they perform. That choice matters architecturally, because breakdown under horizon inflation reveals structural limits that fixed-complexity leaderboards can hide. The break level is the point at which the agent's representation of its own capacity ceases to match the structural reality it is operating in. The taxonomy is a catalogue of the ways this mismatch manifests.

Returning to the formal register: NC2.5 is a formalization of what bounded adaptive systems require to navigate their own contracting interior rather than collapse against it. HORIZON, by its empirical signature, is a measurement of what architectures lacking that capacity fail to do. The two works describe the same structural fact from opposite sides.

Summary

The HORIZON benchmark documents empirical failures of long-horizon agentic systems across four cognitive domains. Its three principal findings — the existence of a sharp break level, the dominance of Planning Error across domains, and the characteristic degradation profiles of embodied versus web tasks — admit a coherent NC2.5 reduction to a single primary architectural deficit: the absence of a non-causal navigational layer upstream of optimization in current agentic architectures.

The seven-category failure taxonomy reduces hierarchically: one primary deficit projects through specific architectural layers (admissibility evaluation, grounding, burden accounting, regime coherence, semantic commitment, coupling-aware regulation) into seven separately observable but structurally cosourced failure categories. Each category is individually falsifiable; what they share is upstream provenance.

NC2.5 makes three falsifiable predictions extending beyond HORIZON's current analysis: task-level invariance of break levels across models (testable as model coverage expands beyond HORIZON's current two families), predictable break-level shifts under architectural intervention, and a specific co-occurrence cluster structure among failure categories. Predictions 2 and 3 are testable within HORIZON's existing trajectory corpus, with operational falsification thresholds stated in §5.

The philosophical reading of HORIZON is that it measures, empirically, the shape of bounded architectural finitude in adaptive systems. NC2.5 formalizes the architectural conditions under which a system can navigate this shape rather than collapse against it.

This probe concerns long-horizon agentic failures under HORIZON-style horizon extension. It does not claim that every planning failure, memory failure, or instruction failure in every agentic system reduces to NC2.5; the architectural reading is offered for the class of failures HORIZON characterizes, on the corpus HORIZON has measured, under the operational tests of §5. Generalization beyond this class is not asserted by this probe and is left to subsequent probes against other empirical surfaces.

This work does not claim that HORIZON proves NC2.5. It claims that HORIZON, by virtue of its diagnostic methodology and the trajectory corpus it has produced, opens the first independent empirical surface on which the NC2.5 architectural account can be tested against well-defined alternatives. The reduction stands or falls on the operational thresholds in §5.

References

Wang, X. J., Bai, H., Sun, Y., Wang, H., Zhang, S., Hu, W., Schroder, M., Mutlu, B., Song, D., & Nowak, R. D. (2026). The Long-Horizon Task Mirage? Diagnosing Where and Why Agentic Systems Break. arXiv preprint arXiv:2604.11978.

Barziankou, M. (2026). Navigational Cybernetics 2.5 — Axiomatic Core, Version 2.1. The Urgrund Laboratory, PETRONUS. DOI: 10.17605/OSF.IO/NHTC5.

NC2.5 Empirical Probes — Part I (v1.0, April 2026).

Probe I DOI: 10.17605/OSF.IO/BJ79D

NC2.5 v2.1 axiomatic core DOI: 10.17605/OSF.IO/NHTC5

Reductions in this work are made to NC2.5 v2.1.

petronus.eu · CC BY-NC-ND 4.0

DEV Community

NC2.5 ↔ HORIZON: On the Structural Reducibility of Long-Horizon Agent Failures to a Single Architectural Deficit

NC2.5 ↔ HORIZON: On the Structural Reducibility of Long-Horizon Agent Failures to a Single Architectural Deficit

Philosophical Frame

1. What HORIZON Measures

2. Structural Reframe

Why this is not merely a planning-and-memory bottleneck account

3. Reduction of the Seven-Category Taxonomy

4. Residual

5. Predictive Consequence

6. Architectural Implication

7. Philosophical Review

Summary

References

Top comments (0)