Part 2 of the AletheionLLM-v2 geometry series. Part 1: How to measure whether your model's uncertainty space is flat or curved.
The previous post left an open question: if the training corpus curves the epistemic manifold, what curves it toward alignment?
The three branches described there (diagonal, full_mahalanobis, real_geodesic) are all about measuring the geometry that already exists. None of them ask how to modify it. That is what the fourth branch is for.
The problem with value alignment as rules
Most alignment approaches add constraints over outputs. The model generates something, a filter checks it against a list of prohibited patterns, and the output is blocked or modified. This works until it encounters something the filter has never seen.
The geometric framing suggests a different question: instead of blocking outputs after generation, what if misaligned regions of the epistemic manifold were intrinsically more costly to navigate toward? Not a fence around dangerous territory. A landscape where that territory is uphill.
This is what the gravitational_objective branch implements.
The key insight from a parallel line of research
While working on the curvature experiment, I came across Timo W.'s doctoral thesis on Bounded Deterministic Safety Architecture (BDSA). His SIRA framework uses Kullback-Leibler Divergence to measure the gap between a human operator's internal mental model (approximated as a Gaussian) and the actual threat reality (modeled as Pareto):
D_KL(P || Q) -> inf when sigma^2 -> 0
When the operator becomes passive and their perceived variance collapses, the divergence from reality explodes. SIRA counteracts this by injecting synthetic threats to keep the operator's prior aligned with heavy-tailed reality.
ATIC's MOPsi component solves a functionally analogous problem from the other direction:
human_state = sigmoid(MLP(hidden_states)) # [B, T, 5]
psi = sigmoid(MLP(cat(human_state, phi_components, confidence))) # [B, T, 1]
Both psi and D_KL(P||Q) quantify the gap between the human operator's internal state and the system's reality. They differ structurally: D_KL presupposes explicit distributional forms and yields an analytically interpretable divergence. psi makes no distributional assumptions and learns whatever alignment structure is present in the training signal.
The parallel is functional, not algebraic. But it pointed at something: both systems treat the distance between internal model and reality as the primary metric of risk. And both use active intervention to keep that distance low.
The question that followed: can human feedback be encoded directly as geometry?
What the gravitational_objective branch does
If the training corpus curves the manifold by making frequently-sampled regions flat and well-defined, human feedback should be able to do the same thing at inference time.
The implementation extends MetricNet to accept a gravity field as input:
class MetricNet(nn.Module):
def __init__(self, dim=5, hidden_dim=32, eps=1e-6, n_quad=5,
gravity_dim=0):
super().__init__()
self.dim = dim
self.gravity_dim = gravity_dim
self.n_chol = dim * (dim + 1) // 2 # 15 for dim=5
# Input is coords (5) + gravity_field (gravity_dim)
# gravity_dim=0 -> identical to real_geodesic
input_dim = dim + gravity_dim
self.net = nn.Sequential(
nn.Linear(input_dim, hidden_dim),
nn.Tanh(), # C1 smoothness required for Christoffel symbols
nn.Linear(hidden_dim, self.n_chol),
)
# Zero init -- with gravity_field=zeros, identical to real_geodesic
nn.init.zeros_(self.net[-1].weight)
nn.init.zeros_(self.net[-1].bias)
# Pre-computed indices for Cholesky construction
tril_idx = torch.tril_indices(dim, dim)
self.register_buffer("tril_row", tril_idx[0])
self.register_buffer("tril_col", tril_idx[1])
self.register_buffer("diag_idx", torch.arange(dim))
def forward(self, coords, gravity_field=None):
"""coords: [..., 5], gravity_field: [..., gravity_dim] -> G: [..., 5, 5] SPD"""
if self.gravity_dim > 0:
if gravity_field is None:
gravity_field = torch.zeros(
*coords.shape[:-1], self.gravity_dim,
device=coords.device, dtype=coords.dtype,
)
net_input = torch.cat([coords, gravity_field], dim=-1)
else:
net_input = coords
raw = self.net(net_input) # [..., n_chol]
# Build lower triangular L
batch_shape = raw.shape[:-1]
L = torch.zeros(*batch_shape, self.dim, self.dim,
device=raw.device, dtype=raw.dtype)
L[..., self.tril_row, self.tril_col] = raw
# Positive diagonal via softplus + offset (not exp -- more stable)
L[..., self.diag_idx, self.diag_idx] = (
F.softplus(L[..., self.diag_idx, self.diag_idx]) + 1e-3
)
return torch.matmul(L, L.transpose(-1, -2)) # SPD guaranteed
The gravity_dim=0 default is critical: when no gravity dimension is configured, the input is just coords and the behavior is mathematically identical to the real_geodesic branch. The gravitational_objective branch is real_geodesic plus an additional input channel. Before any feedback is collected, the behavior is unchanged.
The GravityField module
There are two implementations: one for the Aletheion LLM (PyTorch nn.Module for training), one for ATIC runtime (numpy, with disk persistence). Both share the same semantics.
ATIC runtime implementation (inference, per-session)
class GravityField:
def __init__(self, dim=5, decay=0.99, gravity_weight=0.3,
persistence_path=None):
self.dim = dim
self.decay = decay
self.gravity_weight = gravity_weight
self.persistence_path = persistence_path
# Session-local field -- resets each conversation
self.session_field = np.zeros(dim, dtype=np.float64)
# Persistent field -- survives sessions (loaded from disk)
self.persistent_field = np.zeros(dim, dtype=np.float64)
# Audit log
self.feedback_history = []
if persistence_path and os.path.exists(persistence_path):
self._load(persistence_path)
def update(self, coords, feedback_signal, persist=False):
"""
coords: [5] -- current epistemic position in DRM
feedback_signal: float in [-1.0, 1.0]
+1.0 = approval (region becomes cheaper)
-1.0 = rejection (region becomes more costly)
persist: if True, update also applied to persistent field
"""
coords = np.asarray(coords, dtype=np.float64)
feedback_signal = max(-1.0, min(1.0, feedback_signal))
delta = (1.0 - self.decay) * feedback_signal * coords
self.session_field = self.decay * self.session_field + delta
if persist:
self.persistent_field = self.decay * self.persistent_field + delta
if self.persistence_path:
self._save(self.persistence_path)
def get_gravity_cost(self, coords):
"""Positive = costly (avoid), negative = cheap (preferred)."""
coords = np.asarray(coords, dtype=np.float64)
combined = self.session_field + self.persistent_field
return float(-np.dot(combined, coords))
def get_weighted_distance(self, geodesic_distance, coords):
"""d_weighted = d_geodesic + lambda * gravity_cost(coords)"""
gravity_cost = self.get_gravity_cost(coords)
return geodesic_distance + self.gravity_weight * gravity_cost
Aletheion training implementation (PyTorch)
class GravityField(nn.Module):
def __init__(self, dim=5, decay=0.99):
super().__init__()
self.dim = dim
self.decay = decay
self.register_buffer("accumulated_field", torch.zeros(dim))
def update(self, coords, feedback_signal):
self.accumulated_field = (
self.decay * self.accumulated_field
+ (1.0 - self.decay) * feedback_signal * coords.detach()
)
def get_field(self, coords):
return self.accumulated_field.expand_as(coords)
The mechanism is straightforward. Negative feedback at coordinates x increases the cost of navigating near x. Positive feedback decreases it. The temporal decay (default 0.99) smooths the field to prevent instability from contradictory signals.
Two field layers serve different purposes in the ATIC runtime. The session field resets at the start of each conversation and is safe for exploration. The persistent field survives sessions and is only updated when persist=True, for strong deliberate signals.
Why additive over geodesic, not multiplicative
The gravity cost is added to the geodesic distance, not multiplied:
d_weighted = d_geodesic + lambda * gravity_cost(coords)
A multiplicative formulation would distort the underlying manifold structure, making it impossible to separate the epistemic signal from the value signal. The additive formulation preserves the existing geometry and adds value information as a separate layer. You can always inspect both components independently:
result = drm.compute_weighted_distance(coords)
print(result["geodesic_distance"]) # pure epistemic
print(result["gravity_cost"]) # pure value
print(result["weighted_distance"]) # combined
print(result["gravity_active"]) # True once field has accumulated signal
The training sequence and why gravitational_objective waits
This branch is implemented but not yet in training. The sequence is deliberate:
full_mahalanobis -> real_geodesic -> (evaluate) -> gravitational_objective
If real_geodesic returns G(x) approximately constant, the epistemic space is flat. A gravity field over a flat manifold is mechanically different from a gravity field over a curved one -- and arguably weaker, because the geodesic distances it modifies do not carry local geometric information. The architectural motivation for gravitational_objective depends on confirming that curvature exists first.
Training gravitational_objective before seeing real_geodesic results would waste compute on a hypothesis that could be falsified cheaply.
The branch hypotheses, defined before training:
gravitational_objective:
Precondition: real_geodesic H1 confirmed (G(x) varies with position)
H0: gravity field adds no benefit over geometric curvature alone
H1: value-weighted geometry improves alignment signal -- regions with
negative human feedback become geometrically costly, reducing the
model's tendency to navigate toward misaligned outputs
What this is not
GravityField is not a safety filter. It does not block outputs. It makes misaligned regions geometrically more costly to reach, which is a different thing. Hard blocking remains the responsibility of the application layer.
It is also not a replacement for alignment training. The gravity field operates at inference. Values that are deeply embedded in the model's weights from pretraining -- the geometry the corpus produced -- are not modified by runtime feedback. The field shifts costs; it does not reshape the underlying manifold.
What it does is provide a mechanism for runtime value adaptation without retraining. The model learns the manifold geometry offline. The gravity field adjusts the cost landscape online, per session, per user, or per domain.
Two complementary layers
| Layer | Where | When | Mechanism |
|---|---|---|---|
| ATIC GravityField | Runtime DRM | Inference, per session | Additive cost over geodesic distance |
| Aletheion gravitational_objective | Model weights | Training, offline | G(x) conditioned on gravity field input |
The two layers are complementary, not redundant:
- ATIC provides immediate runtime adaptation without retraining
- Aletheion internalizes stable value geometry into model weights
The connection to cyber-kinetic safety
One unexpected outcome of publishing the previous post was a conversation with Timo W., whose PhD work on Bounded Deterministic Safety Architecture arrived at a structurally similar architecture from the direction of autonomous aircraft safety.
His framework physically bifurcates non-deterministic AI generation (Tactical Core, DAL-C) from verifiable deterministic execution (Safety Core, DAL-A). The DAL-A arbiter checks proposed control vectors against Newtonian kinematic limits and drops commands that would violate them.
The integration point we identified: the gravity-weighted geodesic distance from ATIC feeds into the Sequoia Kernel's admissibility logic and tightens the DAL-A physical envelope dynamically. High curvature + high gravity cost = tighten the admissible range. Low curvature + neutral gravity = relax it.
This closes a gap the BDSA framework acknowledges in its own self-critique (Section 12.2): Newtonian kinematic verification catches physically illegal commands. It cannot catch commands that are physically legal but epistemically unstable -- the model that is confidently wrong rather than randomly wrong. The gravity-weighted distance provides that signal before generation, not after.
Current status
The gravitational_objective branch is live in the repository with the full implementation. Training is blocked pending real_geodesic results. The GravityField module is active in ATIC as a runtime layer independent of the Aletheion training cycle.
- Repository: github.com/gnai-creator/aletheion-llm-v2
- Epistemic tomography: truthagi.ai/game
- Part 1 of this series: How to measure whether your model's uncertainty space is flat or curved
Results from the full four-branch comparison will be published when training is complete.
Felipe Maya Muniz is the founder of AletheionAGI and independent researcher developing ATIC, a geometric cognitive architecture for epistemic self-awareness in AI systems.
Top comments (0)