Naoki Higuchi (CSCT-NAIL)

Posted on Feb 3

A New AI Architecture Without Prior Distributions: Stream-Based AI and Compositional Inference

#ai #deeplearning #hallucination

The Problem with Current AI

The foundation of current AI is the Transformer/Attention model—undeniably successful. But why it succeeds and where its limits lie remain subjects of debate.

Hallucination—generating outputs that deviate from training data—has no fundamental solution despite numerous proposed fixes. Causal reasoning and compositional inference over unseen combinations remain weak points with clear limitations.

A Different Foundation

I designed a new AI architecture based on different principles. It's grounded in theory but implemented as working code.

The core ideas are simple:

We live in a continuously moving world → Input must be a stream
Based on human cognition → No prior distribution is better
To approximate neurons → Various gate mechanisms are needed

These are designed to be describable as physical dynamics, and they actually work.

However, the resulting architecture is far from existing AI. Understanding it requires knowledge across multiple domains—semiotics, neuroscience, geometry, information theory—rather than statistics alone. Bayesian inference is not required.

The CSCT Engine

This architecture is governed by axioms called CSCT (Clock-Selected Compression Theory). The key rules:

"Input is a stream (continuous). Basic coefficients have non-negative constraints. There is an external input called an 'anchor', and gate opening/closing is controlled in synchronization with the anchor's change (velocity). This gate mechanism achieves discretization. Input information becomes discrete codes, reusable for inference even after learning stops."

Design Philosophy: Cognition as a Projected Dynamical System

The CSCT Engine is a neurodynamical simulator implementing the 5 axioms in computable form. It mathematically models cognition as a Projected Dynamical System (PDS) evolving on a Simplex.

Unlike standard deep learning approaches that operate in unconstrained vector spaces allowing negative weights (subtractive interference), the CSCT Engine enforces a non-negative constraint (C ≥ 0) with unit sum (Σp_k = 1).

This ensures the system cannot "rewind" entropy via mathematical cancellation, but must "construct" meanings within the Simplex defined by codebook vectors, thereby grounding internal symbols in physical causality.

Architecture Diagram

Figure: Overview of the CSCT Engine Architecture.

Two Architectures: SingleGate and MultiGate

The CSCT Engine has two main architectures:

Aspect	SingleGate	MultiGate
Biological Correspondence	Peripheral Processing	Central Processing
Computational Cost	Low	High
Application	Single source waveform	Inter-channel relationships
Phase Processing	None (velocity-based)	θ phase temporal modulation

SingleGate performs clock selection directly from input features (position, velocity, acceleration). The gate opens/closes according to the anchor's rate of change (velocity), but no explicit phase calculation is performed.

MultiGate has three neurological gate mechanisms:

Na⁺ Gate: Corresponds to sodium channels—high-speed, sparse clock selection
θ Phase: Corresponds to hippocampal θ rhythm—time-dependent rhythm modulation
NMDA Gate: Corresponds to NMDA receptors—integration window opens at θ phase peak

combined_gate = Na⁺_activation × NMDA_activation

The NMDA gate opens and closes depending on θ phase. This matches the timing when LTP (Long-Term Potentiation) occurs in the hippocampus, making it a computational implementation of θ-γ coupling known in neuroscience.

Structural Differences from Attention Models

Time Structure

Aspect	Attention Model	CSCT Engine
Processing	Batch (static frames, text, video/audio via position encoding)	Stream (continuous time-series)
Self-reference	S created from S(t-1)	Forward flow only
Parallelism	Processes all information at once	Progresses through stream

In current Attention models, the self is created from S(t-1). This structure with negative indices allows static images, text, and video/audio (converted to position information) to be handled uniformly.

In contrast, the CSCT Engine is stream-based, so it currently cannot handle static images as elegantly.

This may seem more constrained and slower. However, it's an architecture with high potential for processing while moving—a fundamentally different capability.

Distance Structure

Aspect	Attention Model	CSCT Engine
Norm	L2 (codes mix, hard to extract)	L1-like (codes separable)
Negative values	Allowed	Not allowed
Vector addition	Unrestricted	Constrained to simplex
Convex hull	Local pseudo-hull only	Globally closed hull

Why Transformer "Attention" Is Insufficient

The Transformer attention mechanism is defined as:

Attention(Q, K, V) = softmax(QK^T / √d_k) V

Since softmax outputs sum to unity, the result is a convex combination of value vectors V. Superficially, this appears to satisfy the convex hull constraint.

However, this constitutes only a local pseudo-hull:

Each layer has its own local hull but doesn't form a globally closed simplex
No boundary means ever-increasing resources are needed to counteract entropy production

The CSCT Engine requires geometric simplex structure, disallows negative computation, and forbids simple addition.

This constraint creates a geometric convex hull in information space, enabling compositional inference. Conversely, no amount of improvement to current AI can eliminate hallucination or achieve true compositional inference.

Experiment 8: Meaning Extraction

Purpose

EX8 tests whether a frozen system can infer unseen inputs after discrete codes acquire meaning through anchoring.

Experimental Design

Training Data:

Singles: A, B
Composites: A+B, A+C, B+C

Withheld: C alone (never shown as single)

Test: After stopping learning (no gradient), can the model infer C (never directly taught) given only anchor 'c'?

Three Geometric Conditions

We vary C's position to examine convex hull effects:

IN_HULL: C is a convex combination of A and B (inside hull)
OUT_HULL: C is orthogonal to both A and B (outside hull)
RANDOM: C is randomly initialized (baseline)

Results (30 seeds per condition, 90 total runs)

Condition	Success Rate	Withheld Similarity
IN_HULL	96.7% (29/30)	0.979 ± 0.025
RANDOM	53.3% (16/30)	0.682 ± 0.370
OUT_HULL	16.7% (5/30)	0.701 ± 0.242

Statistical test: Kruskal-Wallis H = 42.52, p = 5.85 × 10⁻¹⁰

Key Discovery: Ungrounded Symbol Acquisition

Interestingly, in OUT_HULL, C acquired a unique code in 80% of seeds. Yet reconstruction accuracy remained low.

We term this "Ungrounded Symbol Acquisition": a discrete code is assigned and manipulated, but it lacks representational content within the codebook's constructive capacity.

This provides a mathematical instantiation of Searle's "Chinese Room" argument—the system can "handle" the symbol without the symbol bearing reconstructable meaning.

Experiment 9: Syntax Emergence

Purpose

EX9 tests whether syntactic inference can arise once meaning has been discretized. It is the mirror of EX8:

EX8 (Meaning): Withheld primitive must be inferred from observed composites
EX9 (Syntax): Withheld composite must be inferred from observed primitives and composites

Experimental Design

Training Data:

Singles: A, B, C
Composites: A+B, A+C

Withheld: B+C (never trained)

Test: Given anchor 'b+c', can the model infer B+C (never directly taught)?

Results (30 seeds per condition, 90 total runs)

Condition	Success Rate	Withheld Similarity
IN_HULL	66.7% (20/30)	0.890 ± 0.138
RANDOM	33.3% (10/30)	0.767 ± 0.253
OUT_HULL	13.3% (4/30)	0.742 ± 0.168

Statistical test: Kruskal-Wallis H = 19.13, p = 7.00 × 10⁻⁵

Discussion: Syntax as Interpolation, Not Algebra

EX9 results show that syntax in CSCT emerges not as algebraic rule manipulation (combining symbols independent of content) but as discovery of barycentric coordinates on the codebook simplex.

Rather than learning subtraction, the system discovers barycentric coordinates:

f: y_{A+B} → ½v_A + ½v_B
f: y_{A+C} → ½v_A + ½v_C

Because this mapping rule is geometrically consistent, the system generalizes to the withheld input y_{B+C} → ½v_B + ½v_C.

Conclusion

From these results, I conclude that human cognition—especially intellectual activity—is built on constraints.

Through nine experiments, we demonstrated that:

Discretization emerges reliably from continuous dynamics via clock selection (EX1-3)
Irreversible anchors provide stability against noise and drift, outperforming reversible (self-referential) systems long-term (EX4)
Binding is achieved as implicit synchronization through locking to a shared anchor (EX5)
Internal time progresses proportionally to the signal's rate of change (flux), halting when no signal is present (EX7)
Semantic grounding requires convex-hull membership; symbols assigned outside the hull remain ungrounded (EX6, EX8)
Syntax emerges as barycentric interpolation, not algebraic abstraction; compositional inference degrades outside the trained hull (EX9)

Modern AI pursues "infinite expansion" through scaling. I propose the inverse: closed constraint may be a necessary condition for stable, efficient intelligence.

The CSCT Engine may be the first program that can describe (design) our cognitive activity as an "OS."

If you're interested, please run the code. There are no licensing restrictions, and this is just the beginning. This is not speculation—it's a working program.

DEV Community

A New AI Architecture Without Prior Distributions: Stream-Based AI and Compositional Inference

The Problem with Current AI

A Different Foundation

The CSCT Engine

Design Philosophy: Cognition as a Projected Dynamical System

Architecture Diagram

Two Architectures: SingleGate and MultiGate

Structural Differences from Attention Models

Time Structure

Distance Structure

Why Transformer "Attention" Is Insufficient

Experiment 8: Meaning Extraction

Purpose

Experimental Design

Three Geometric Conditions

Results (30 seeds per condition, 90 total runs)

Key Discovery: Ungrounded Symbol Acquisition

Experiment 9: Syntax Emergence

Purpose

Experimental Design

Results (30 seeds per condition, 90 total runs)

Discussion: Syntax as Interpolation, Not Algebra

Conclusion

Links

Top comments (0)