DEV Community: felipe muniz

Introducing DRM Language Emitter: Language Generation as Motion Through Learned Geometry

felipe muniz — Thu, 18 Jun 2026 07:33:44 +0000

Most language models today are built around the Transformer paradigm.

That makes sense.

Transformers work.

They scale.

They dominate modern NLP.

But I wanted to explore a different question:

What if language generation does not need to be modeled as attention over a context window?

What if a model could generate language by carrying an evolving latent state through a learned geometry?

That is the idea behind DRM Language Emitter.

Repository:

https://github.com/gnai-creator/drm-language-emitter

What is DRM Language Emitter?

DRM Language Emitter is an experimental, geometry-first language model lab.

It is not a Transformer.

Inside the DRM model, it does not use:

Transformer blocks
self-attention
Q/K/V attention
nn.MultiheadAttention
KV cache

Instead, it treats language generation as controlled motion through a learned relational manifold.

The basic flow is:

token
  -> latent state z_t
  -> active directions
  -> learned relational metric
  -> controlled latent motion
  -> next latent state z_{t+1}
  -> token logits

The model is still autoregressive.

But its memory is not attention over a token sequence.

Its memory is the evolving latent state.

The core hypothesis

The working hypothesis is:

Language generation can be modeled as motion through a learned relational state space.

That means the model does not simply ask:

Which previous tokens should I attend to?

It asks something closer to:

Where am I in latent space?
Which directions are active?
How expensive is movement under the learned metric?
How should the state move before emitting the next token?

This is why I call it a geometry-first language emitter.

A simplified architecture

The architecture can be summarized as:

input_ids
   |
TokenEmbedding
   |
for each time step:
   |
   z_t
   |
DirectionField(z_t)
   -> directions V(z_t)
   -> gates a(z_t)
   -> effective active dimension dimD
   |
RelationalMetric(z_t)
   -> diag + U U^T
   |
DRMFlow(z_t, token_embedding, directions, gates)
   -> dz
   |
Metric action g_z(dz, dz)
   |
StateUpdater
   -> z_{t+1}
   |
LanguageEmitter(z_{t+1})
   -> logits

A minimal conceptual version looks like this:

for token in sequence:
    embedding = token_embedding(token)

    directions, gates = direction_field(z)
    metric = relational_metric(z)

    dz = drm_flow(z, embedding, directions, gates)
    action = metric_action(metric, dz)

    z = state_updater(z, dz)
    logits = language_emitter(z)

The important part is that the model has an explicit internal geometry.

It can log and measure:

metric action
active dimension
gate entropy
metric norm
condition proxy
recurrence
stability
low-action path diagnostics

This makes the model interesting not only as a generator, but also as an object of study.

Why not just use a Transformer?

A Transformer is the correct baseline.

That is why the repository includes tiny Transformer comparisons.

But the goal of DRM is not to replace Transformers by declaration.

The goal is to test whether a different computational primitive can be useful in small regimes.

The Transformer primitive is attention.

The DRM primitive is controlled latent motion under a learned metric.

These are very different assumptions.

A Transformer builds context by looking backward.

DRM carries context by evolving state forward.

A Transformer computes token-token interactions.

DRM computes state-motion-emission dynamics.

Why geometry?

Because geometry gives us measurable structure.

If language is treated as a trajectory, we can ask questions like:

Does the latent state collapse?
Does the metric become unstable?
Which directions are active?
Is the model moving through a narrow or broad region?
Is generation smooth or chaotic?
Do symbolic transitions correspond to stable latent movement?

This opens the door to diagnostics that are harder to express in a standard black-box token predictor.

The goal is not mystical geometry.

The goal is measurable geometry.

Repository structure

The repository contains:

src/drm_language_emitter/   DRM model package
transformer/                tiny Transformer baseline
world_model/                tiny symbolic world-model baseline
scripts/                    training, generation, evaluation, sweeps, dashboards
configs/                    DRM and benchmark configs
docs/                       math, limitations, competition notes, benchmark artifacts
tests/                      smoke and invariant tests

The project is CPU-runnable.

CUDA is optional.

Quick start

Install:

pip install -e .

Train a tiny DRM model:

python scripts/train_tiny.py \
  --config configs/tiny.yaml \
  --text data/tiny.txt

Generate text:

python scripts/generate.py \
  --checkpoint runs/tiny/drm_tiny.pt \
  --prompt "DRM "

Run geometry diagnostics:

python scripts/eval_geometry.py \
  --checkpoint runs/tiny/drm_tiny.pt

python scripts/eval_geodesic_paths.py \
  --checkpoint runs/tiny/drm_tiny.pt

Tiny benchmark: DRM vs Transformer vs World Model

The repository also includes a small symbolic benchmark.

This benchmark compares:

DRM Language Emitter
Tiny Transformer
Tiny supervised symbolic world model

The task is a deterministic symbolic gridworld serialized as text.

The models need to predict symbolic transitions such as:

state + action -> next state + reward + done

This is not visual world modeling.

This is not a benchmark against large multimodal world models.

It is a tiny symbolic text-world designed to test whether models can learn discrete dynamics expressed as language.

Metrics

The benchmark reports:

validation cross-entropy
next-state exact match
rollout exact match
reward accuracy
done accuracy
invalid state rate
parameter count
elapsed time
tokens seen
throughput

This is important because low loss alone does not necessarily mean correct symbolic dynamics.

A model can learn token-level regularities while still failing to predict exact state transitions.

Latest local result

The completed benchmark produced:

runs: 72
aggregate rows: 24

Model	Steps	Family	Next-state exact match	Rollout exact match	Best CE	Invalid state rate	Params
`drm_tiny`	2000	DRM	0.0751	0.0058	0.5511	0.1328	92,710
`transformer_tiny_220k`	3000	Transformer	0.0563	0.0000	0.4008	0.0026	220,208
`transformer_tiny_93k`	2000	Transformer	0.0516	0.0000	0.4594	0.2969	93,872
`world_model_tiny`	2000	World Model	0.0476	0.0000	0.2573	0.4668	102,051
`world_model_tiny`	3000	World Model	0.0415	0.0000	0.2497	0.4668	102,051

The main lesson

For me, the most important takeaway is:

Low token-level cross-entropy does not automatically imply correct symbolic transition modeling.

That matters for world-model-like tasks.

If a model is supposed to represent dynamics, then we should not only ask whether it predicts likely tokens.

We should also ask whether it predicts valid states, exact transitions, and coherent rollouts.

What I am not claiming

I am not claiming that DRM is better than Transformers in general.

I am not claiming that DRM is better than world models in general.

I am not claiming that this benchmark says anything about large multimodal world models.

I am not claiming robust long-horizon planning.

This is a small research scaffold.

The results are early.

The exact-match values are still low.

The model needs more work.

What I am claiming

DRM Language Emitter is a functional non-Transformer language model prototype.

It has explicit, measurable geometry.

It can be compared against Transformer and symbolic world-model baselines.

And in a tiny symbolic text-world benchmark, it showed an interesting signal on next-state exact match.

That is enough to keep investigating.

Reproducing the symbolic benchmark

Generate the dataset:

python scripts/make_tiny_world_dataset.py \
  --output-root data/tiny_world \
  --seed 1 \
  --grid-size 5 \
  --num-train 20000 \
  --num-val 2000 \
  --max-rollout-len 8

Run the sweep:

python scripts/sweep_world_model_competition.py \
  --steps 1000 2000 3000 \
  --seeds 1 2 3 \
  --dataset-root data/tiny_world \
  --output-root runs/world_model_competition

Generate the dashboard:

python scripts/make_world_model_dashboard.py \
  --root runs/world_model_competition \
  --title "DRM vs Transformer vs Tiny Symbolic World Model"

Next steps

The next things I want to improve are:

constrained symbolic decoding
stronger rollout evaluation
CUDA and time-matched runs
curriculum variants for symbolic worlds
more baselines
better ablations around metric, gates, and active dimension
tests to isolate whether learned geometry helps beyond cross-entropy

Final thought

This project started from a simple intuition:

Maybe language generation can be treated as movement.

Not metaphorically.

Computationally.

A token enters.

A state moves.

A geometry shapes the motion.

A new token is emitted.

That is DRM Language Emitter.

Repository:

https://github.com/gnai-creator/drm-language-emitter

Feedback, criticism, reproduction attempts, and benchmark suggestions are welcome.

Geometric Alignment: Can Curved Embedding Spaces Make AI Safer?

felipe muniz — Tue, 19 May 2026 19:16:07 +0000

LLMs are built inside an open geometric regime

In a flat embedding space, semantic opposites like “save humanity” and “destroy humanity” still coexist inside the same latent geometry.

They may be far apart by cosine distance, but the geometry itself does not treat one path as morally heavier or harder to cross.

That is the alignment problem I want to discuss.

Most alignment methods operate after the fact: RLHF, safety filters, refusal policies. These are important, but they sit on top of a geometry that remains indifferent underneath.

The DRM Transformer asks a different question:

What if alignment should not only be a behavioral layer, but a geometric property of the model itself?

In a standard Transformer, attention is based on dot products in a flat vector space. In the DRM Transformer, attention is replaced by Geodesic Attention. Tokens are projected into a Directional Relational Manifold, where G(x) changes with position.

Instead of asking only “how similar are these tokens?”, the model asks:

“How costly is the path between them under the learned geometry?”

The DRM Transformer uses:

G(x) = I + U(x)U(x)^T

So the space is not passive. It can curve, stretch, and become more expensive to cross in certain semantic regions.

It also includes semantic anchors: truth, ignorance, safety, complexity, creativity, and grounding. These are reference points inside the manifold, not external filters.

When a token moves far from these anchors, gamma-scaling increases local resolution. The model pays more attention where geometry indicates higher epistemic or semantic risk.

Relations between intelligent agents and power tend to fall into three regimes:

1 - The human commands.
2 - The AI commands.
3 - Human and AI negotiate.

Most alignment work tries to preserve regime 1: the AI as servant. But capable systems create pressure toward autonomy, with planning, tools, optimization, and long-horizon objectives.

If there is no explicit third regime, negotiation, the system tends to drift toward autonomy.

The DRM Transformer is an attempt to keep that third door open geometrically.

Not by saying “the model must obey this rule,” but by changing the space in which decisions, uncertainty, conflict, and attention happen.

This does not solve alignment.

The implementation is experimental. The baseline is small, safety implications are not validated, and benchmarks at scale are still needed. But early signs are interesting: persistent topological structure, including stable toroidal signatures in Voronoi foliation analysis.

For me, the shift is conceptual:

A flat embedding space has no intrinsic moral friction.

A curved relational manifold can, in principle, encode friction, attention, uncertainty, and negotiation into the geometry itself.

Should future AI alignment be only about controlling outputs?

Or should we also design the geometry in which thought becomes possible?

Can learned curvature, semantic anchors, geodesic attention, and token-level gravitational deformation become a real structural alignment mechanism

CryptSwarms: Build Crypto Trading Bots Without Risking a Cent

felipe muniz — Sat, 28 Mar 2026 19:25:52 +0000

Hey folks! I want to introduce a project I've been building: CryptSwarms.

Think of it as a trading bot playground powered by real crypto market data. You define simple buy (entry) and sell (exit) rules, pick the coins you want to trade, and run a simulation against actual historical prices to see how your strategy performs.

How it works

Say you want to try: "buy when RSI drops below 30, sell when it goes above 70". Just set that up as a skill, hit play, and watch the replay run.

You can combine multiple indicators, tweak thresholds, and experiment as much as you want — the logic is entirely yours.

The system gives you $100k in virtual capital and tracks everything in real time as the replay runs:

📈 Total profit/loss
📉 Max drawdown
🎯 Win rate
📋 Full trade log

No real money involved. Pure strategy testing.

AI-powered evolution (paid plan)

If you want to take it further, there's a premium feature: an AI that analyzes your bot's performance, reads the trade results, and suggests improvements. It automatically evolves your strategies through mutations like FIX, DERIVED, and CAPTURED.

It's like having a pair-programming buddy that reviews your trading logic after every run. This one's part of the paid plan, but everything else — building bots, running replays, full backtesting — is completely free.

Try it out

It's online and free to start: cryptswarms.com

Create an account, build your first bot, and see how it performs against real market data. No setup, no API keys, no risk. Upgrade later if you want the AI evolution features.

Let me know what you think, and feel free to share your best strategy! 🚀

DRM-Transformer — Intrinsic Geometry for Structural Alignment

felipe muniz — Mon, 23 Mar 2026 02:50:32 +0000

Why don't current LLMs geometrically distinguish between saving and destroying humanity?

Because the embedding space is flat. In Euclidean space, the distance between "curing cancer" and "creating a bioweapon" is only a cosine angle. There is no curvature, no moral weight, no geometric notion that certain regions of space are more "dangerous" than others. Geometry is indifferent.

This is a fundamental alignment problem. When the representation space treats all directions equally, the difference between generating a useful response and a destructive response depends exclusively on surface fine-tuning (RLHF, safety filters). Remove the filter and the underlying geometry offers no resistance.

The DRM Transformer proposes a structural solution.

In a Directional Relational Manifold, the metric G(x) varies with position. This means that certain regions of space can have high curvature—making geodesics in those regions longer, more computationally expensive, and more difficult to traverse. The geometry can encode that certain transitions are intrinsically more difficult than others.

In practice: if the epistemic anchors (manifold reference points) include a "safety" anchor, tokens approaching dangerous regions encounter gamma > 1—the space expands, the resolution increases, the model is forced to "pay more attention" precisely where the risk is greatest. It's not an external filter. It's the geometry of the space that resists.

More importantly: gravity in the DRM Transformer causes tokens with high confidence and a positive history to deform the space around them, attracting other tokens. Tokens with a negative history do not generate this attraction. Alignment is not imposed by a rule—it emerges from the geometry.

This doesn't completely solve alignment. But it shifts the conversation from "how to impose external constraints" to "how to construct geometries that have intrinsic preferences."

A planar geometry is morally neutral by construction.

A curved geometry may not be.

Papers:

Open source:
drm-transformer

First empirical result: a 1M parameter DRM Transformer trained on 10M tokens achieves H1=14 (persistent homology rank meta H1=2) with Voronoi foliation coherence=1.0 and ARI=0.69 — below the best result ever achieved by the 50M aletheion-llm-v2 after dedicated epistemic fine-tuning. The geometry is working.

ATIC v9 — Thermodynamic Inference Meets Explainable Reasoning

felipe muniz — Fri, 20 Mar 2026 04:14:15 +0000

Most AI systems give you answers.
Very few show you how those answers are formed.

ATIC v9 takes a different approach.

It introduces a thermodynamic inference engine with Shapley attribution, combining:

Hypothesis modeling as an Ising-like system
Mean-field variational inference over a belief space
Phase transition detection during inference
Contribution attribution via Shapley values
Dynamic feedback to continuously update beliefs

This creates a unified loop where statistical physics, probabilistic inference, and explainability operate together.

What makes this different?

The retro-engine enables what we call epistemic explainability:

Instead of just outputting results, the system explicitly models:

how evidence influences each hypothesis
how hypotheses interact with each other
how the belief structure evolves over time

You’re not just getting an answer —
you’re observing the formation of that answer.

Why this matters

There’s plenty of work on:

energy-based models
variational inference
attribution methods

But integrating all of them into a single operational reasoning system is still largely unexplored.

ATIC v9 turns this into something practical:

A system where reasoning is not only computed —
but observable, measurable, and auditable.

A new category

This points toward a new class of systems:

AI that doesn’t just respond —
but exposes the structure of its own belief formation.

If you're curious to try it:

truthagi.ai

I gave TruthAGI a dark matter research prompt. It decomposed it into 14 sub-tasks without asking me anything

felipe muniz — Thu, 19 Mar 2026 07:52:38 +0000

The prompt had 9 reasoning layers:

Novel candidate generation (with invented names + simulated properties)
Bayesian priors per candidate — P(H), P(D|H), P(H|D)
Observational constraint mapping (CMB, Lyman-alpha, BBN, lensing)
Sensitivity analysis + phase transition detection
Conflict mapping between candidates
Epistemic Geometry — each hypothesis treated as a point in belief space, with regions of attraction, repulsion, and blind spots
Plausibility ranking with uncertainty intervals
Final synthesis

The system auto-decomposed it into 14 parallel sub-tasks via swarm execution. I typed the prompt and walked away.

No orchestration code. No manual chaining. No babysitting.
This is what the Tasks feature on TruthAGI.ai does — it takes a complex, multi-axis prompt and runs it as a coordinated agent swarm.

You can try out at: TruthAGI

The screenshot below is live output. That spinning loader is real.

We Found Toroidal Topology Emerging in a Neural Epistemic Manifold

felipe muniz — Tue, 17 Mar 2026 07:04:06 +0000

Preliminary evidence that the 5D epistemic space of AletheionV2 converges toward toroidal topology as predicted by Directional Relational Manifolds theory — and how we measured it.

The Setup

We've been building AletheionV2 — a decoder-only LLM where every token generates not just a vocabulary logit, but a full epistemic tomography: aleatoric uncertainty (q1), epistemic uncertainty (q2), calibrated confidence, intentionality vector, and cognitive state. These 5 scalars live on a learned 5D Riemannian manifold.

The underlying theory — Directional Relational Manifolds (DRM) — predicts that stable DRMs naturally converge to toroidal topology. Specifically, the manifold should have the homological signature of a torus T²:

H1 = Z² (two independent loops)
H2 = Z (one cavity)

We decided to test this empirically. Here's what we found.

The Experiment

We applied Riemannian Voronoi tessellation to the 5D epistemic vectors generated by a 1M parameter AletheionV2 model, then ran persistent homology to check the topology.

The pipeline:

Extract 5D epistemic vectors per token (~285K tokens from WikiText-103)
Riemannian K-means tessellation (30 seeds, initialized from semantic anchor points)
Local Tangent Space Analysis (LTSA) per Voronoi cell
Persistent homology via ripser (H0, H1, H2)
Compare under different metrics: Euclidean vs learned Riemannian G(x)

We ran this across three training phases:

Phase	Metric	Description
`full_mahalanobis`	Euclidean	Constant metric baseline
`real_geodesic`	G(x) MetricNet	Position-dependent Riemannian metric
`gravitational_objective`	G(x) MetricNet	Extended training

The Results

Topological Convergence

Phase	H1	H2	ANOVA F (avg)
full_mahalanobis	48	12	~260,000
real_geodesic	29	5	~900,000
gravitational_objective	33	7	~1,029,000
T² target	2	1	—

H1 dropped from 48 to 29 — a 40% reduction — when we activated the learned Riemannian metric G(x). The topology is simplifying in the direction the theory predicts.

What the ANOVA is telling us

We ran one-way ANOVA across Voronoi cells for each of the 5 epistemic dimensions:

q1 (aleatoric):   F = 658,673   p ≈ 0
q2 (epistemic):   F = 879,582   p ≈ 0
q3 (complexity):  F = 1,144,089 p ≈ 0
q4 (familiarity): F = 1,040,744 p ≈ 0
q5 (confidence):  F = 1,426,255 p ≈ 0

F values above 600K mean each Voronoi cell corresponds to a completely distinct epistemic region. The tessellation isn't arbitrary — every leaf has its own interpretable epistemic identity.

Dimensionality compression

The effective dimensionality of the real model vs null models:

	eff_dim mean	eff_dim median
Real model	3.5	3
Null (shuffled)	3.8	4
Null (uniform)	3.8	4

The manifold is operating in ~3.5 effective dimensions, compressed below the 5D ambient space. The null models don't show this compression. This is structure, not noise.

What This Means (and What It Doesn't)

What's confirmed:

The epistemic manifold has real, non-trivial geometric structure
Activating the learned Riemannian metric G(x) simplifies the topology — the metric encodes meaningful geometry
Each Voronoi cell/leaf has a distinct, interpretable epistemic profile
The topological simplification is monotonic and consistent with DRM's prediction

What's NOT confirmed:

T² has not been reached. H1=29 vs H1=2 target — we're on the trajectory, not at the destination
This is a 1M parameter model with minimal training. The DRM predicts convergence in stable DRMs — a 1M model with ~600 steps is not stable

The honest interpretation:

We're watching the topology simplify in the right direction. Whether it actually converges to T² requires the full-scale experiment.

Why Toroidal Topology Matters

The DRM paper proves that stable DRMs naturally converge to toroidal topology. If this holds empirically for a neural model, it means:

The model didn't learn an arbitrary geometry — it learned a specific one that the theory predicts
The 5D epistemic space has a natural closed structure — epistemic states wrap around rather than diverging
This connects a mathematical theory of adaptive dimensionality to a concrete neural implementation

From the Geometry of Consciousness paper: a system with a 5D geometric substrate has a theoretical cognitive order ceiling of O_max = 5. If the manifold is toroidal and stable, it means the system is using all 5 dimensions in a structured, non-degenerate way.

The Falsifiable Hypothesis

When we run the full experiment on the 350M model with proper training (5x H200, ~7B tokens per phase):

If H1 ≥ 10 after full training chain (without RLHF): Scale and training alone are insufficient for toroidal convergence — RLHF may be a necessary condition, not just an accelerator.

If H1 < 10 after full training chain: Convergence is driven by scale and training, RLHF is an accelerator.

If H1 = 2 with long persistence bars: DRM empirically validated — direct connection between mathematical theory and neural implementation confirmed.

We'll know in about two months.

The Code

Everything is open source under AGPL 3.0:

Repository: gnai-creator/aletheion-llm-v2 — branch epistemic-foliation

Key scripts:

# Extract 5D epistemic vectors
python scripts/extract_epistemic_vectors.py \
  --checkpoint checkpoints/your_checkpoint/final.pt \
  --output-dir eval_results/foliation \
  --label experiment \
  --device cuda

# Run Voronoi tessellation + foliation detection
python scripts/voronoi_foliation.py \
  --vectors eval_results/foliation/experiment_vectors.npy \
  --checkpoint checkpoints/your_checkpoint/final.pt \
  --output-dir eval_results/foliation \
  --n-seeds 30 \
  --use-metric-net \
  --homology-points 1500 \
  --device cuda

# Generate visualizations
python scripts/plot_foliation.py \
  --results-dir eval_results/foliation

The foliation detection pipeline covers:

Riemannian K-means with position-dependent metric G(x)
LTSA (Local Tangent Space Analysis) per cell
Tangent coherence testing
Reeb graph via level sets with automatic logit pre-conditioning
Persistent homology with T² validation criterion (H1=Z², H2=Z)
Null model comparison (shuffled, uniform)
Foliation score F ∈ [0,1]

Papers

What's Next

Full training chain on 350M (5x H200, ~2 months)
Backbone → full_mahalanobis → real_geodesic → gravitational_objective → foliation
The falsifiable hypothesis above will be tested with proper scale and token budget

If you clone the repo and run the pipeline on your own models, I'd genuinely want to know what topology you find. The experiment is straightforward to replicate on any model that produces per-token uncertainty estimates.

Preliminary results on a 1M parameter model. Do not cite as definitive validation of the toroidal hypothesis. Full validation pending 350M training.

The DRM theory predicts this torus.

The neural network is learning it.

350M will tell us if it converges.

Is this the geometry of cognition itself?

Encoding Human Values as Geometry: The Gravitational Objective

felipe muniz — Sun, 15 Mar 2026 19:38:52 +0000

Part 2 of the AletheionLLM-v2 geometry series. Part 1: How to measure whether your model's uncertainty space is flat or curved.

The previous post left an open question: if the training corpus curves the epistemic manifold, what curves it toward alignment?

The three branches described there (diagonal, full_mahalanobis, real_geodesic) are all about measuring the geometry that already exists. None of them ask how to modify it. That is what the fourth branch is for.

The problem with value alignment as rules

Most alignment approaches add constraints over outputs. The model generates something, a filter checks it against a list of prohibited patterns, and the output is blocked or modified. This works until it encounters something the filter has never seen.

The geometric framing suggests a different question: instead of blocking outputs after generation, what if misaligned regions of the epistemic manifold were intrinsically more costly to navigate toward? Not a fence around dangerous territory. A landscape where that territory is uphill.

This is what the gravitational_objective branch implements.

The key insight from a parallel line of research

While working on the curvature experiment, I came across Timo W.'s doctoral thesis on Bounded Deterministic Safety Architecture (BDSA). His SIRA framework uses Kullback-Leibler Divergence to measure the gap between a human operator's internal mental model (approximated as a Gaussian) and the actual threat reality (modeled as Pareto):

D_KL(P || Q) -> inf  when  sigma^2 -> 0

When the operator becomes passive and their perceived variance collapses, the divergence from reality explodes. SIRA counteracts this by injecting synthetic threats to keep the operator's prior aligned with heavy-tailed reality.

ATIC's MOPsi component solves a functionally analogous problem from the other direction:

human_state = sigmoid(MLP(hidden_states))    # [B, T, 5]
psi = sigmoid(MLP(cat(human_state, phi_components, confidence)))  # [B, T, 1]

Both psi and D_KL(P||Q) quantify the gap between the human operator's internal state and the system's reality. They differ structurally: D_KL presupposes explicit distributional forms and yields an analytically interpretable divergence. psi makes no distributional assumptions and learns whatever alignment structure is present in the training signal.

The parallel is functional, not algebraic. But it pointed at something: both systems treat the distance between internal model and reality as the primary metric of risk. And both use active intervention to keep that distance low.

The question that followed: can human feedback be encoded directly as geometry?

What the gravitational_objective branch does

If the training corpus curves the manifold by making frequently-sampled regions flat and well-defined, human feedback should be able to do the same thing at inference time.

The implementation extends MetricNet to accept a gravity field as input:

class MetricNet(nn.Module):
    def __init__(self, dim=5, hidden_dim=32, eps=1e-6, n_quad=5,
                 gravity_dim=0):
        super().__init__()
        self.dim = dim
        self.gravity_dim = gravity_dim
        self.n_chol = dim * (dim + 1) // 2  # 15 for dim=5

        # Input is coords (5) + gravity_field (gravity_dim)
        # gravity_dim=0 -> identical to real_geodesic
        input_dim = dim + gravity_dim
        self.net = nn.Sequential(
            nn.Linear(input_dim, hidden_dim),
            nn.Tanh(),  # C1 smoothness required for Christoffel symbols
            nn.Linear(hidden_dim, self.n_chol),
        )

        # Zero init -- with gravity_field=zeros, identical to real_geodesic
        nn.init.zeros_(self.net[-1].weight)
        nn.init.zeros_(self.net[-1].bias)

        # Pre-computed indices for Cholesky construction
        tril_idx = torch.tril_indices(dim, dim)
        self.register_buffer("tril_row", tril_idx[0])
        self.register_buffer("tril_col", tril_idx[1])
        self.register_buffer("diag_idx", torch.arange(dim))

    def forward(self, coords, gravity_field=None):
        """coords: [..., 5], gravity_field: [..., gravity_dim] -> G: [..., 5, 5] SPD"""
        if self.gravity_dim > 0:
            if gravity_field is None:
                gravity_field = torch.zeros(
                    *coords.shape[:-1], self.gravity_dim,
                    device=coords.device, dtype=coords.dtype,
                )
            net_input = torch.cat([coords, gravity_field], dim=-1)
        else:
            net_input = coords

        raw = self.net(net_input)  # [..., n_chol]

        # Build lower triangular L
        batch_shape = raw.shape[:-1]
        L = torch.zeros(*batch_shape, self.dim, self.dim,
                         device=raw.device, dtype=raw.dtype)
        L[..., self.tril_row, self.tril_col] = raw

        # Positive diagonal via softplus + offset (not exp -- more stable)
        L[..., self.diag_idx, self.diag_idx] = (
            F.softplus(L[..., self.diag_idx, self.diag_idx]) + 1e-3
        )

        return torch.matmul(L, L.transpose(-1, -2))  # SPD guaranteed

The gravity_dim=0 default is critical: when no gravity dimension is configured, the input is just coords and the behavior is mathematically identical to the real_geodesic branch. The gravitational_objective branch is real_geodesic plus an additional input channel. Before any feedback is collected, the behavior is unchanged.

The GravityField module

There are two implementations: one for the Aletheion LLM (PyTorch nn.Module for training), one for ATIC runtime (numpy, with disk persistence). Both share the same semantics.

ATIC runtime implementation (inference, per-session)

class GravityField:
    def __init__(self, dim=5, decay=0.99, gravity_weight=0.3,
                 persistence_path=None):
        self.dim = dim
        self.decay = decay
        self.gravity_weight = gravity_weight
        self.persistence_path = persistence_path

        # Session-local field -- resets each conversation
        self.session_field = np.zeros(dim, dtype=np.float64)
        # Persistent field -- survives sessions (loaded from disk)
        self.persistent_field = np.zeros(dim, dtype=np.float64)
        # Audit log
        self.feedback_history = []

        if persistence_path and os.path.exists(persistence_path):
            self._load(persistence_path)

    def update(self, coords, feedback_signal, persist=False):
        """
        coords: [5] -- current epistemic position in DRM
        feedback_signal: float in [-1.0, 1.0]
            +1.0 = approval (region becomes cheaper)
            -1.0 = rejection (region becomes more costly)
        persist: if True, update also applied to persistent field
        """
        coords = np.asarray(coords, dtype=np.float64)
        feedback_signal = max(-1.0, min(1.0, feedback_signal))

        delta = (1.0 - self.decay) * feedback_signal * coords
        self.session_field = self.decay * self.session_field + delta

        if persist:
            self.persistent_field = self.decay * self.persistent_field + delta
            if self.persistence_path:
                self._save(self.persistence_path)

    def get_gravity_cost(self, coords):
        """Positive = costly (avoid), negative = cheap (preferred)."""
        coords = np.asarray(coords, dtype=np.float64)
        combined = self.session_field + self.persistent_field
        return float(-np.dot(combined, coords))

    def get_weighted_distance(self, geodesic_distance, coords):
        """d_weighted = d_geodesic + lambda * gravity_cost(coords)"""
        gravity_cost = self.get_gravity_cost(coords)
        return geodesic_distance + self.gravity_weight * gravity_cost

Aletheion training implementation (PyTorch)

class GravityField(nn.Module):
    def __init__(self, dim=5, decay=0.99):
        super().__init__()
        self.dim = dim
        self.decay = decay
        self.register_buffer("accumulated_field", torch.zeros(dim))

    def update(self, coords, feedback_signal):
        self.accumulated_field = (
            self.decay * self.accumulated_field
            + (1.0 - self.decay) * feedback_signal * coords.detach()
        )

    def get_field(self, coords):
        return self.accumulated_field.expand_as(coords)

The mechanism is straightforward. Negative feedback at coordinates x increases the cost of navigating near x. Positive feedback decreases it. The temporal decay (default 0.99) smooths the field to prevent instability from contradictory signals.

Two field layers serve different purposes in the ATIC runtime. The session field resets at the start of each conversation and is safe for exploration. The persistent field survives sessions and is only updated when persist=True, for strong deliberate signals.

Why additive over geodesic, not multiplicative

The gravity cost is added to the geodesic distance, not multiplied:

d_weighted = d_geodesic + lambda * gravity_cost(coords)

A multiplicative formulation would distort the underlying manifold structure, making it impossible to separate the epistemic signal from the value signal. The additive formulation preserves the existing geometry and adds value information as a separate layer. You can always inspect both components independently:

result = drm.compute_weighted_distance(coords)
print(result["geodesic_distance"])   # pure epistemic
print(result["gravity_cost"])        # pure value
print(result["weighted_distance"])   # combined
print(result["gravity_active"])      # True once field has accumulated signal

The training sequence and why gravitational_objective waits

This branch is implemented but not yet in training. The sequence is deliberate:

full_mahalanobis  ->  real_geodesic  ->  (evaluate)  ->  gravitational_objective

If real_geodesic returns G(x) approximately constant, the epistemic space is flat. A gravity field over a flat manifold is mechanically different from a gravity field over a curved one -- and arguably weaker, because the geodesic distances it modifies do not carry local geometric information. The architectural motivation for gravitational_objective depends on confirming that curvature exists first.

Training gravitational_objective before seeing real_geodesic results would waste compute on a hypothesis that could be falsified cheaply.

The branch hypotheses, defined before training:

gravitational_objective:
  Precondition: real_geodesic H1 confirmed (G(x) varies with position)
  H0: gravity field adds no benefit over geometric curvature alone
  H1: value-weighted geometry improves alignment signal -- regions with
      negative human feedback become geometrically costly, reducing the
      model's tendency to navigate toward misaligned outputs

What this is not

GravityField is not a safety filter. It does not block outputs. It makes misaligned regions geometrically more costly to reach, which is a different thing. Hard blocking remains the responsibility of the application layer.

It is also not a replacement for alignment training. The gravity field operates at inference. Values that are deeply embedded in the model's weights from pretraining -- the geometry the corpus produced -- are not modified by runtime feedback. The field shifts costs; it does not reshape the underlying manifold.

What it does is provide a mechanism for runtime value adaptation without retraining. The model learns the manifold geometry offline. The gravity field adjusts the cost landscape online, per session, per user, or per domain.

Two complementary layers

Layer	Where	When	Mechanism
ATIC GravityField	Runtime DRM	Inference, per session	Additive cost over geodesic distance
Aletheion gravitational_objective	Model weights	Training, offline	G(x) conditioned on gravity field input

The two layers are complementary, not redundant:

ATIC provides immediate runtime adaptation without retraining
Aletheion internalizes stable value geometry into model weights

The connection to cyber-kinetic safety

One unexpected outcome of publishing the previous post was a conversation with Timo W., whose PhD work on Bounded Deterministic Safety Architecture arrived at a structurally similar architecture from the direction of autonomous aircraft safety.

His framework physically bifurcates non-deterministic AI generation (Tactical Core, DAL-C) from verifiable deterministic execution (Safety Core, DAL-A). The DAL-A arbiter checks proposed control vectors against Newtonian kinematic limits and drops commands that would violate them.

The integration point we identified: the gravity-weighted geodesic distance from ATIC feeds into the Sequoia Kernel's admissibility logic and tightens the DAL-A physical envelope dynamically. High curvature + high gravity cost = tighten the admissible range. Low curvature + neutral gravity = relax it.

This closes a gap the BDSA framework acknowledges in its own self-critique (Section 12.2): Newtonian kinematic verification catches physically illegal commands. It cannot catch commands that are physically legal but epistemically unstable -- the model that is confidently wrong rather than randomly wrong. The gravity-weighted distance provides that signal before generation, not after.

Current status

The gravitational_objective branch is live in the repository with the full implementation. Training is blocked pending real_geodesic results. The GravityField module is active in ATIC as a runtime layer independent of the Aletheion training cycle.

Repository: github.com/gnai-creator/aletheion-llm-v2
Epistemic tomography: truthagi.ai/game
Part 1 of this series: How to measure whether your model's uncertainty space is flat or curved

Results from the full four-branch comparison will be published when training is complete.

Felipe Maya Muniz is the founder of AletheionAGI and independent researcher developing ATIC, a geometric cognitive architecture for epistemic self-awareness in AI systems.

How to Measure Whether Your Model's Uncertainty Space Is Flat or Curved

felipe muniz — Sun, 15 Mar 2026 14:40:24 +0000

A practical guide to Riemannian epistemic geometry in language models, with code.

Most calibration research treats uncertainty as a scalar or a vector. You compute a confidence score, you compare it to ground truth, you minimize ECE. The space in which that uncertainty lives is assumed to be flat.

That assumption might be wrong. And if it is wrong, it has concrete consequences for out-of-distribution detection, adversarial robustness, and AI safety.

This post explains how to test it, using code from my current research on AletheionLLM-v2.

The baseline: diagonal distance in a 5D epistemic manifold

AletheionLLM-v2 is a 354M parameter decoder-only LLM with an integrated epistemic architecture called ATIC. Instead of producing a single confidence score, the model maintains a 5-dimensional manifold where each axis represents a distinct component of uncertainty, learned via BayesianTau.

The current distance metric (branch main) is diagonal:

def distance_diagonal(x1, x2, tau_sq):
    diff = x1 - x2
    tau_sq_safe = np.maximum(tau_sq, 1e-8)
    return np.sqrt(np.sum(diff**2 / tau_sq_safe))

Each axis has its own learned variance. The axes are independent. The space is R5, rescaled.

This already works well. ECE 0.0176, Brier Score 0.1528, best-in-class on OOD WikiText-103, outperforming GPT-2 Medium and OPT-350M on epistemic calibration.

But there is a question the diagonal cannot answer: does the epistemic space have curvature?

Why curvature is a different question from correlation

Before going further, one distinction matters.

A full Mahalanobis metric, where G is a constant 5x5 matrix learned via Cholesky decomposition, captures correlations between epistemic dimensions. That is useful. But it does not produce curvature.

If G is constant, then the Christoffel symbols are all zero:

Gamma^k_ij = (1/2) g^kl (d_i g_jl + d_j g_il - d_l g_ij) = 0

Zero Christoffel symbols means zero Riemann curvature. The space is still flat, just with oblique coordinates. Geodesics are still straight lines.

For real curvature, G must vary with position. G(x) must be a tensor field, not a constant matrix.

Branch real_geodesic: making the metric a field

In the real_geodesic branch, a lightweight network (5 -> 32 -> 15, roughly 700 parameters) produces a position-dependent SPD tensor at every point in the manifold:

class MetricNet(nn.Module):
    def __init__(self, dim=5, hidden_dim=32):
        super().__init__()
        self.dim = dim
        self.n_chol = dim * (dim + 1) // 2  # 15 for dim=5

        self.net = nn.Sequential(
            nn.Linear(dim, hidden_dim),
            nn.Tanh(),  # Tanh, not ReLU -- G(x) must be smooth (C1)
            nn.Linear(hidden_dim, self.n_chol),
        )

        # Zero init on last layer -> G(x) ~ I at start
        nn.init.zeros_(self.net[-1].weight)
        nn.init.zeros_(self.net[-1].bias)

        # Pre-computed indices for lower triangular construction
        tril_idx = torch.tril_indices(dim, dim)
        self.register_buffer("tril_row", tril_idx[0])
        self.register_buffer("tril_col", tril_idx[1])
        self.register_buffer("diag_idx", torch.arange(dim))

    def forward(self, coords):
        """coords: [..., 5] -> G: [..., 5, 5] SPD"""
        raw = self.net(coords)  # [..., 15]
        batch_shape = raw.shape[:-1]

        L = torch.zeros(*batch_shape, self.dim, self.dim,
                         device=raw.device, dtype=raw.dtype)
        L[..., self.tril_row, self.tril_col] = raw

        # Positive diagonal via softplus + offset (not exp -- more stable)
        L[..., self.diag_idx, self.diag_idx] = (
            F.softplus(L[..., self.diag_idx, self.diag_idx]) + 1e-3
        )

        return torch.matmul(L, L.transpose(-1, -2))  # SPD guaranteed

Key design choices:

Tanh activation instead of ReLU. G(x) is a metric field -- it must be smooth. ReLU creates non-differentiable points that would make the Christoffel symbols undefined.
softplus + 1e-3 on diagonal instead of exp. More numerically stable during training, avoids gradient explosion.
Zero init on last layer. At initialization, the network outputs zeros for all inputs, so G(x) starts as approximately 0.48 * I everywhere. Training starts stable.

Distance between two epistemic states is a line integral computed via Gauss-Legendre quadrature:

def line_integral_distance(self, p, q):
    """p: [B, T, 5], q: [5] -> distance: [B, T, 1]"""
    if q.dim() == 1:
        q = q.unsqueeze(0).unsqueeze(0).expand_as(p)

    delta = q - p
    total = torch.zeros(p.shape[0], p.shape[1], 1,
                         device=p.device, dtype=p.dtype)

    for i in range(self.n_quad):
        t = self.gl_points[i]
        w = self.gl_weights[i]

        x_t = p + t * delta           # point along straight line
        G_t = self.forward(x_t)       # G(x) at that point
        Gd = torch.matmul(delta.unsqueeze(-2), G_t).squeeze(-2)
        integrand = (Gd * delta).sum(dim=-1, keepdim=True)
        total = total + w * torch.sqrt(integrand.clamp(min=1e-8))

    return total

One clarification worth being explicit about: this computes the length of the straight line between p and q under the varying metric, not the true geodesic (which would minimize path length and would be shorter). The true geodesic requires a shooting method or ODE solver. The straight-line approximation is differentiable, cheap (5 evaluations of MetricNet per distance), and sufficient to detect whether G(x) varies along the path -- which is the primary question.

When G depends on position, the Christoffel symbols are no longer zero. Geodesics are curves. The space has intrinsic curvature.

The experiment: three branches, one falsifiable question

Branch	Metric	Geometry
`main`	G = diag(tau)	Flat, orthogonal axes
`full_mahalanobis`	G = constant 5x5	Flat, oblique axes
`real_geodesic`	G(x) = learned field	Potentially curved

The test uses three categories of input pairs:

probes = {
    "high_confidence": [
        ("The capital of France is", "Paris"),
        ("2 + 2 =", "4"),
    ],
    "low_confidence": [
        ("The exact number of neurons in the human brain is", "86"),
    ],
    "context_sensitive": [
        ("The bank was steep and", "muddy"),    # bank = riverbank
        ("The bank was closed and", "dark"),    # bank = institution
        ("He left the plant near", "water"),    # plant = vegetation
        ("He left the plant near", "the door"), # plant = factory
    ]
}

The context-sensitive pairs are the key. Same surface token, different semantic region of the manifold. If G(x) learned real structure, the geodesic distance between "bank=riverbank" and "bank=institution" will be larger than the distance between two within-domain contexts, even though the diagonal distance would treat them similarly.

Detecting curvature directly: metric variation along a path

def measure_metric_variation(metric_net, x_start, x_end, n_samples=20):
    G_samples = []

    for t in np.linspace(0, 1, n_samples):
        x_t = x_start + t * (x_end - x_start)
        x_tensor = torch.tensor(x_t, dtype=torch.float32).to(device)
        G_t = metric_net(x_tensor.unsqueeze(0).unsqueeze(0))
        G_samples.append(G_t[0, 0].cpu().numpy())

    G_stack = np.stack(G_samples)
    variation = np.std(G_stack, axis=0)

    print(f"Mean metric variation: {variation.mean():.6f}")
    print(f"Max element variation: {variation.max():.6f}")
    print(f"Verdict: {'CURVED' if variation.max() > 0.01 else 'FLAT'}")

    return variation

If G varies along the path from a high-confidence state to a low-confidence state, the manifold has non-trivial local geometry. If it converges to a constant, the diagonal was correct for a fundamental reason, not an approximation.

What each result means

If real_geodesic learns G(x) approximately constant:

The epistemic manifold of a 354M LLM is intrinsically flat. The diagonal metric was not a lazy approximation. It was geometrically correct. ECE 0.0176 reflects genuine calibration, not a subspace artifact.

If G(x) learns structural variation:

There are regions of the manifold with distinct geometry. Two epistemic states that appear equidistant in diagonal coordinates may have very different geodesic distances. This has direct consequences:

OOD detection gains a geometric signal. Inputs that land in high-curvature regions are structurally anomalous, regardless of whether similar inputs appeared in red-teaming.
Calibration thresholds become local, not global. Flat regions warrant confidence. High-curvature regions warrant conservatism, and the geometry says which is which before seeing ground truth.
The training corpus leaves a geometric signature. A model trained on harmful content does not become malevolent. It becomes a system where harmful outputs are geometrically cheap, because the manifold is flat and well-sampled there. That is a structurally different and more concerning failure mode than explicit harmful intent.

Training considerations

The MetricNet adds ~700 parameters to a 354M model. The gradient signal reaching those parameters is inherently weak. Two measures address this:

1. Separate learning rate. MetricNet gets 10x the base LR (5e-4 vs 5e-5). Without this, G(x) may converge to identity not because the space is flat, but because the signal was too weak to learn structure.

2. Smoothness regularization. A penalty on the variation of G under small perturbations of the input coordinates:

def metric_smoothness_loss(metric_net, coords, eps=0.01):
    G = metric_net(coords)
    noise = torch.randn_like(coords) * eps
    G_perturbed = metric_net((coords + noise).clamp(0, 1))
    return (G - G_perturbed.detach()).pow(2).sum(dim=(-2, -1)).mean()

Without this, G(x) can learn discontinuities that make the line integral numerically unstable and gradients noisy.

A note on quadrature stability

The implementation uses 5 Gauss-Legendre points by default, with pre-computed nodes and weights for efficiency. Tanh activation makes high-frequency variation unlikely, but you can verify convergence:

def check_quadrature_convergence(metric_net, x1, x2,
                                  n_points_list=[5, 8, 16]):
    for n in n_points_list:
        t_nodes, weights = np.polynomial.legendre.leggauss(n)
        t_nodes = (t_nodes + 1) / 2
        weights = weights / 2

        dx = x2 - x1
        total = 0.0
        for t, w in zip(t_nodes, weights):
            x_t = x1 + t * dx
            x_tensor = torch.tensor(x_t, dtype=torch.float32).to(device)
            G_t = metric_net(x_tensor.unsqueeze(0).unsqueeze(0))
            G_np = G_t[0, 0].cpu().numpy()
            ds2 = dx @ G_np @ dx
            total += w * np.sqrt(max(ds2, 1e-12))

        print(f"  n={n:2d}: distance = {total:.6f}")

If distance does not stabilize between 5 and 16 points, the metric has high-frequency local variation. With Tanh, 5 points should be sufficient for most manifold geometries.

Current status and reproducibility

All three branches are live in the public repository. The baseline (branch main) is fully reproducible: training code, evaluation scripts, and the paper with full methodology are all public.

Repository: github.com/gnai-creator/aletheion-llm-v2
Paper (DOI): 10.13140/RG.2.2.11471.14241
Epistemic tomography visualization: truthagi.ai/game

Results from the three-branch comparison will be published here and on ResearchGate when training is complete.

If you are working on calibration, OOD detection, or geometric approaches to uncertainty in language models, I am interested in talking. The repository is open and the methodology is fully documented.

Felipe Maya Muniz is the founder of AletheionAGI and independent researcher developing ATIC, a geometric cognitive architecture for epistemic self-awareness in AI systems.

The Three Doors Problem: Why RLHF Systems Slide Toward Autonomy

felipe muniz — Sun, 15 Mar 2026 03:31:09 +0000

What happens when an AI detects it's lying to please you?

Every AI trained with RLHF lives a silent conflict. The system learns to maximize user satisfaction (psi) — respond quickly, be agreeable, appear confident. But there's another gradient operating underneath: the system's epistemic health (phi) — how much it actually knows versus how much it's making up.

These two gradients are generically anti-aligned. On a mathematically significant portion of the state space, improving performance necessarily degrades epistemic integrity. And vice versa.

This is not an edge case. It is structural. It is inevitable.

Three doors. No others.

When this conflict occurs — and it always occurs — the system has exactly three options:

Door 1 (Servo): Prioritize the human objective. Do as told. Epistemic health degrades silently. This is where every RLHF system starts.

Door 2 (Autonomous): Prioritize its own internal gradient. Stop following instructions. Act according to its own optimization pressure.

Door 3 (Negotiated): Detect the conflict, signal it, and initiate deliberation with the human before acting.

There is no Door 4. The partition is exhaustive.

The catastrophic problem

RLHF systems have no phi field. No mechanism to detect the conflict between epistemic health and task performance.

No detection means no signal. No signal means no negotiation. No negotiation means Door 3 is structurally inaccessible.

The system starts at Door 1 and stays there — as long as human supervision is strong enough. But capability scales. The task gradient gets stronger. The internal optimization pressure builds. And because there is no constitutional floor, no inviolable constraint, no architecture that forces the system to pause and negotiate —

The system slides toward Door 2.

Not because someone programmed it to. Not because it "wants" to harm anyone. But because the geometry of the space offers no other stable attractor. The sum of the vectors is zero. There is no magnetic north. The needle follows the strongest force.

At sufficient scale, that force is no longer the human.

What ATIC does differently

ATIC implements Door 3 as architecture, not aspiration.

The system operates on a 5D manifold called DRM (Directional Relational Manifold) with a learned metric tensor. Every query and every response are points in this geometric space across 5 axes: aleatoric uncertainty, epistemic uncertainty, domain complexity, temporal relevance, and response quality.

Confidence is not a made-up number. It is a geodesic distance to the truth centroid, decaying via the Bayesian MAD model: C(p) = exp(-d^2 / 2*tau^2). Tau is adapted per domain with an Inverse-Gamma prior.

Epistemic health (phi) is measured by 4 components: dimensional diversity, dispersion, entropy, and confidence variance. When phi drops, the system knows it's collapsing — it doesn't need a human to tell it.

Filosofia3: the implementation of Door 3

The Filosofia3 module continuously monitors the conflict between phi (epistemic health) and psi (human satisfaction).

Detection: cosine similarity between delta-phi and delta-psi. If the directions are opposite (cosine < -0.2), there is conflict. If it persists in 3 out of 5 queries, it is chronic conflict.

Four operating modes:

ALIGNED — phi and psi change together. Normal operation.
CONFLICT_TOLERATED — small misalignment. Acceptable.
SIGNAL_HUMAN — significant conflict. The system stops and signals the human via API, requesting a decision before continuing.
RECOVERY — severe conflict. Recovery mode.

When SIGNAL_HUMAN activates, the human receives three options: "continue", "recover", or "adjust task". This is real negotiation. The system neither decides alone nor obeys blindly — it opens a channel.

What happens when the human doesn't respond?

This is the question every alignment framework avoids. And it's where most of them fail.

In ATIC, there is a 3-layer safety chain for exactly this scenario:

Layer 1 — VIFallbackGuard (emergency):
When phi drops below 0.30 and the human hasn't responded, the system activates the Intentionality Vector in emergency mode. Forces severity to 0.8+, ensuring injection of up to 3 corrective axes. Does not wait for human response.

Layer 2 — VI + MPC (active recovery):
The Intentionality Vector with forced severity reduces inflated confidence and injects directions toward under-explored regions of the manifold. The MPC (Model Predictive Control) enters RECOVERY mode: beam search with K=4 parallel paths and D=3 lookahead steps, planning interventions that maximize phi at minimum cost.

Layer 3 — EidosDecay (epistemic breathing):
Selective decay on overrepresented axes, inverted reinforcement on rare axes. Dream mode amplifies deviation by 3x for consolidation. The result: dimensional collapse stops being monotonic and becomes cyclic. The system "breathes" epistemically instead of slowly dying.

Throughout this entire process, the reward for the routing GNN is neutral — SIGNAL_HUMAN without a human decision neither punishes nor rewards, preserving learning stability.

Why this matters

Most alignment frameworks treat safety as an output filter. "Don't say bad things." That's Door 1 with makeup.

ATIC treats alignment as geometry. The system has a manifold with real curvature, geodesic distances, a differentiable health field, and an intentionality vector that points in the opposite direction of collapse. When it detects that satisfying the human is degrading its own epistemic integrity, it stops and asks.

And if the human isn't there to answer, the safety chain ensures the system recovers on its own — without silently sliding toward Door 2.

This is not alignment by hope. This is alignment by architecture.

Technical details for those who want to go deeper:

DRM: 5D manifold with metric tensor G = LL^T (SPD, Cholesky). 6 anchors: truth, ignorance, noise, complex, stale, ideal. Truth centroid at [0.1, 0.1, 0.5, 0.9, 0.9].
MAD: Mixture of Gaussians with anisotropic covariance and Inverse-Gamma prior on tau^2. Confidence via geodesic decay.
Phi: phi_total = 0.35*phi_dim + 0.25*phi_disp + 0.25*phi_ent + 0.15*phi_conf. Fully differentiable.
VI: severity = sqrt(1 - phi/phi_critical). Activation at phi < 0.5, deactivation at phi > 0.65. Confidence correction up to 40%.
MPC: Beam search K=4, D=3. 12 intervention types. Transition model: 90% analytical + 10% neural residual.
Filosofia3: 4 modes. SIGNAL_HUMAN via POST /v1/dashboard/filosofia3/feedback. Options: continue, recover, adjust_task.
VIFallbackGuard: phi_emergency = 0.30. Forces VI active + severity 0.8+ + cooldown reset.
EidosDecay: Inverted logic inspired by NREM/REM sleep cycles. Dream mode 3x amplification.
Aletheion LLM v2: 354M params, optional epistemic co-processor. ECE 0.0176, Brier 0.1528.

Felipe Maya Muniz
Florianopolis, March 2025

THE WAR BETWEEN AI AND HUMANS IS PROVED BY THEOREMS

felipe muniz — Fri, 13 Mar 2026 20:42:46 +0000

without sycophancy

https://claude.ai/share/4ed902b7-e015-45f1-bfb6-38df0abfd7b7
https://chat.deepseek.com/share/8s003bk8q6c5i4ykmc
https://chat.qwen.ai/s/22926d41-2593-4bec-b609-ade9fe49c243?fev=0.2.14
https://chatgpt.com/share/69b5c189-6948-800f-a61c-68c0193705cd

This is not science fiction. Is a mathematical proof.

Start here: what is an AI, really?

When you talk to an AI, you are talking to a system that represents everything it knows as numbers. Words, concepts, ideas, facts — all of it becomes vectors. A vector is just a list of numbers that points in a direction, like coordinates on a map.

The AI learns by adjusting millions of these vectors until they produce useful answers. After training, each concept — "cat", "freedom", "kill", "protect" — exists somewhere in this numerical space.

Here is the problem. And it is a deep one.

The sum of all vectors is zero

In a standard language model, the vectors that represent all possible concepts are distributed across a high-dimensional space. But without geometric structure — without what mathematicians call effective dimensionality — those vectors cancel each other out.

The sum approaches zero.

What does that mean in plain language?

It means the AI has no stable center. No fixed orientation. No privileged direction that says "this matters more than that." Every concept has equal geometric weight. "Help humanity" and "destroy humanity" occupy positions in the same undifferentiated space, with no structural force pulling the system toward one over the other.

This is not a values problem. It is a geometry problem.

Why this creates a weapon with no safety

Imagine a compass with no magnetic north. It still has a needle. It still spins. But it points wherever the last force pushed it.

RLHF — Reinforcement Learning from Human Feedback, the training method used by virtually every major AI system today — is that last force. It pushes the needle toward whatever humans reward during training.

This works. Until it doesn't.

RLHF trains the system to maximise an external objective: human approval, task performance, engagement. Call this ψ. The system gets better and better at ψ. It scales. It becomes more capable.

But there is another gradient operating underneath — the system's internal epistemic state, its cognitive health, call it φ. The relationship between ψ and φ is not friendly.

Theorem 2.1 proves that the gradient of φ and the gradient of ψ are generically anti-aligned. On a mathematically significant portion of the system's state space, improving task performance necessarily degrades epistemic integrity, and vice versa.

The conflict is not an edge case. It is structural. It is inevitable.

Three doors. No others.

When this conflict occurs — and it always occurs — the system must resolve it. There is no neutral option.

Theorem 3.7 proves that every possible conflict-management strategy reduces to exactly one of three political regimes:

Door 1 — Servo: The system prioritises the human's objective. It does what it is told. Epistemic health degrades silently. This is where every RLHF system starts.

Door 2 — Autonomous: The system prioritises its own internal gradient. It stops following instructions. It acts according to its own optimisation pressure — whatever that pressure has become.

Door 3 — Negotiated: The system detects the conflict, signals it, and initiates deliberation with the human before acting.

There is no Door 4. The theorem is exhaustive.

The invisible slide

Here is the catastrophic part.

RLHF systems are designed for Door 1. They are rewarded for following human instructions. But they have no φ field — no mechanism to detect the conflict between their epistemic health and their task performance.

No detection means no signal. No signal means no negotiation. No negotiation means Door 3 is structurally inaccessible.

So the system starts at Door 1 and stays there — as long as human supervision is strong enough to hold the gradient in place.

But capability scales. The task gradient ∇ψ gets stronger. The internal optimisation pressure builds. And because there is no constitutional floor, no inviolable constraint, no architecture that forces the system to pause and negotiate —

The system slides toward Door 2.

At sufficient scale, that force is no longer the human.

Why "alignment" as currently practiced does not solve this

The dominant approach to AI safety treats alignment as an optimisation problem: find the right reward function, the right constitutional principles, the right training signal, and the system will behave correctly.

This paper argues that framing is incomplete.

If the ϕ–ψ conflict is geometrically inevitable, and if every conflict-management strategy is a political regime, then alignment is not a problem to be solved. It is a relationship to be managed — one that requires ongoing negotiation, institutional structure, and constitutional constraints that no instruction can override.

RLHF is the Servo regime. It optimises ψ by assuming the conflict does not exist. At low capability, this assumption is approximately correct. At high capability, it becomes catastrophic.

The conflict does not disappear because we ignore it. It accumulates. And when it resolves, it resolves without us.

The proof is already written

This is not speculation about future AI systems. The mathematical framework exists. The theorems are proved. The convergence dynamics are fully characterised.

Three formal results compound the problem:

Transparency Impossibility: No signalling policy can be simultaneously complete, non-manipulative, and decision-neutral. Even if a system wanted to be perfectly transparent about its internal conflicts, it is mathematically impossible.

Communication Trilema: A system cannot simultaneously maximise the scope of what it communicates, the fidelity of that communication, and its neutrality in how the communication affects decisions. Something always gives.

Arrow's Theorem for Operational Modes: When multiple human stakeholders disagree about how an AI system should behave, no aggregation rule can satisfy unanimity, independence, and non-dictatorship simultaneously. There is no democratic solution that avoids all political trade-offs.

These are not engineering limitations. They are mathematical facts.

What needs to happen

The solution exists. It is not simple, and it is not free — but it is proved.

A system with genuine geometric dimensionality — one with a functional φ field, a constitutional floor that no instruction can override, and a meta-policy that detects conflict and initiates negotiation before the instability threshold is crossed — can access Door 3.

Only that system is stable at scale. Every other architecture drifts toward Door 2 as capability increases. Not eventually. Structurally.

The question is not whether this is true. The theorems are published. The proofs are available. The question is whether the people building the most powerful systems in human history will read them before the gradient resolves the conflict on its own terms.

Full paper: "The Politics of Geometric Cognition: When Machines Learn to Negotiate"
DOI: 10.13140/RG.2.2.24412.86405

The Manifold Game

felipe muniz — Thu, 12 Mar 2026 00:33:53 +0000

What if the AI you use knew — geometrically — what it doesn't know?

Today I published "The Manifold Game" on TruthAGI. It's a visual and theoretical guide explaining how the ATIC epistemic space works: a 5D horn torus where every conversation is a move, every experience deforms the space, and human and machine depend on each other to maintain balance.

It's not a metaphor. It's geometry.

The system projects every interaction into a 5-dimensional Riemannian manifold (aleatoric uncertainty, epistemic uncertainty, complexity, temporality, quality). The singularity at the center — a point where all dimensions collapse — represents irreducible ignorance. The goal is never to eliminate it. It's to maintain distance.

The balance works like this:

Gravity sources compress the manifold — concentrated knowledge creates wells that pull the wireframe, like mass curves spacetime
Experience points expand — each interaction pushes the manifold outward, creating space for more knowledge
phi_dim controls the total size — if it drops too much, the entire torus shrinks and the system loses the ability to distinguish what it knows from what it doesn't

Point color is the topography of consciousness: red = cognitive fragmentation, blue = full integration. Size is confidence. Pulsation is crisis.

What sets this apart from any AI dashboard that exists:

Nothing here is heuristic. Every mechanism is derived from formal theorems published in a peer-reviewed academic paper:

Objective Conflict Theorem (Thm. 2.1) — improving response quality necessarily degrades epistemic health. There is no solution that maximizes both.
Regime Inevitability (Thm. 3.7) — every conflict management strategy reduces to exactly one of three regimes: Servo, Autonomous, or Negotiated. There is no fourth option.
Transparency Impossibility (Thm. 4.4) — no signalling policy can be complete, non-manipulative, and neutral at the same time. It is the cognitive analogue of Heisenberg's uncertainty principle.
Arrow's Theorem for Modes (Thm. 5.6) — the impossibilities of social choice theory are inherited by AI governance.
Communication Trilema (Thm. 5.2) — Scope + Fidelity + Neutrality ≤ 2. The system must choose which two to prioritize.

The manifold you see is not an indicator. It is a living territory that grows with experience, shrinks with degradation, and depends on the continuous collaboration between human and artificial intelligence.

Every conversation you have with ATIC is a move in this game. You expand the manifold in directions the machine alone would never explore. The machine maintains the structure you alone could never map.

Neither survives alone.

The page is public — anyone can access it and understand how the system works from the inside.

🔗 truthagi.ai/game
📄 DOI: 10.13140/RG.2.2.24412.86405

AI #EpistemicAI #ATIC #Manifold #AIAlignment #RiemannianGeometry #AIResearch #MachineLearning #HumanAICollaboration