Runtime Constitutional AI: Validating Every Agent Action Before Execution

#ai #security #agents #constitutional

Towards Data Science: Why Constitutional AI Needs Runtime Validation

Note: This article is also submitted to Towards Data Science as a guest post.

Constitutional AI (Anthropic, 2022) constrains model outputs during training. But what about runtime constraints for autonomous agents?

ODEI takes constitutional AI principles and applies them at runtime — before every consequential agent action.

The Training-Time vs Runtime Gap

Training-time constitutional AI:

Trains the model to avoid harmful outputs
Works well for content generation
Cannot prevent: duplicate actions, hallucinated references, unauthorized operations

Runtime constitutional AI (what ODEI does):

Validates every action before execution
Works for any LLM or agent framework
Catches: the above + temporal invalidity, authority violations

The 7 Runtime Constitutional Checks

1. Immutability

Is the target entity or resource locked? Some things (completed transactions, sealed contracts) should never be modified.

2. Temporal Context

Is this action still valid in time? Instructions expire. Sessions have temporal windows.

3. Referential Integrity

Do all referenced entities actually exist? The #1 LLM failure mode: confident references to things that don't exist.

4. Authority

Is this agent authorized for this action? Maps to your governance rules.

5. Deduplication

Has this exact action already been taken? Content-hash of action parameters.

6. Provenance

Where did this instruction come from? Trace back to a trusted principal.

7. Constitutional Alignment

Does this violate fundamental principles? Highest-level safety net.

Production Proof

ODEI has been running since January 2026 on Virtuals Protocol ACP:

92% task success rate
Zero hallucination errors (referential integrity layer)
Zero duplicate actions (deduplication layer)
~20% of actions ESCALATE to human review (catching edge cases)

Code

result = requests.post(
    "https://api.odei.ai/api/v2/guardrail/check",
    json={"action": "transfer 500 USDC to 0x...", "severity": "high"}
).json()

# verdict: APPROVED | REJECTED | ESCALATE

The Bigger Picture

Training-time constitutional AI prevents models from saying harmful things. Runtime constitutional AI prevents autonomous agents from doing harmful things. Both are needed for safe AI systems.

Research: https://github.com/odei-ai/research | API: https://api.odei.ai