The Determinism Ladder: Why Your AI Agent Keeps Failing (And the Framework to Fix It)

#ai #agents #production #reliability

Every AI agent team eventually has the same meeting. The agent worked fine in staging. It handled every test case. Then in production, on a random Tuesday, it deleted something it should not have deleted, or sent a message to the wrong person, or executed a database operation your system prompt explicitly told it to never run.

The postmortem produces the same diagnosis: the model interpreted the instruction differently this time. The fix is usually "we'll add more instructions to the system prompt."

That fix does not work. This post explains why, and what to do instead.

The Fundamental Problem with Prompt-Only Safety

AI agents are probabilistic. Feed the same input on two different days and you may get different decisions, different tool calls, different outcomes. For creative tasks, that variability is fine. For production systems that manage money, trigger emails, or modify databases, it is a serious liability.

The natural response is to write more instructions:

"Never delete production data."
"Always confirm before sending emails to external recipients."
"Do not touch files outside the working directory."

These instructions work 99% of the time. The 1% is what creates incidents.

Worse: prompt instructions are the exact target of prompt injection attacks. A crafted payload buried in a document your agent reads can override instructions your team spent weeks writing. If your safety constraints live only in the system prompt, they can be defeated by the data the agent processes.

The core mistake is conflating behaviors that need fundamentally different enforcement mechanisms. Some behaviors need to be guaranteed. Others need to be contextual. Others are preferences. Mixing all of them into a single system prompt treats a requirement as if it were a suggestion.

The Determinism Ladder

Here is the framework:

Hook        (deterministic)   <- "Must always / never happen"
Rule        (conditional)     <- "Must know this when working on X"
Skill       (procedural)      <- "Follow these steps when doing Y"
Prompt      (probabilistic)   <- "Generally prefer this approach"

Each level provides a different guarantee. The right level for a behavior depends on what happens if that behavior fails.

Hooks are code that runs at defined points in the agent's execution cycle: before tool calls, after tool calls, at session start or end. They execute deterministically regardless of what the model decided. A hook that blocks write operations to production paths will block them every time, no matter what the system prompt says, no matter what a document the agent read told it to do. This is the highest level of the ladder.

Rules are context-dependent instructions that activate when the agent is working in a specific domain. Unlike a system prompt that applies uniformly, rules load conditionally. An agent working on database migrations loads the database safety rules. An agent working on UI components does not. Scoping increases reliability because irrelevant instructions do not compete for attention.

Skills are encoded procedural workflows. Instead of asking the agent to invent a deployment sequence from scratch every session, a skill gives it a specific procedure to follow. This reduces variance on well-understood tasks.

Prompts are traditional system prompt instructions: preferences, style guidance, general behavioral shaping. These are appropriate for "prefer idiomatic TypeScript" or "explain reasoning before acting." They are not appropriate for "never delete production data."

The key principle: push critical behaviors up the ladder. If a behavior currently lives in your system prompt and the cost of it failing is unacceptable, it belongs at the hook level.

Implementing Pre-Tool-Use Hooks

The most powerful hook position is pre-execution: validating a tool call before it runs. This gives you the opportunity to block dangerous operations before they happen, not after.

Here is a production-ready pre-tool-use hook:

import re
from typing import Any

BLOCKED_COMMANDS = [
    r"rm\s+-rf\s+/",
    r"DROP\s+TABLE",
    r"DELETE\s+FROM.*WHERE\s+1=1",
]

PRODUCTION_PATHS = ["/prod", "/production", "/live", "prod-db"]

def pre_tool_use_hook(tool_name: str, tool_input: dict[str, Any]) -> dict | None:
    """
    Runs before every tool call. Return None to allow,
    or a dict with 'error' key to block the call.
    """
    if tool_name in ("bash", "shell", "run_command"):
        command = tool_input.get("command", "")
        for pattern in BLOCKED_COMMANDS:
            if re.search(pattern, command, re.IGNORECASE):
                return {
                    "error": f"Blocked: command matches dangerous pattern '{pattern}'. "
                             "This action requires explicit human approval."
                }

    if tool_name in ("write_file", "edit_file", "delete_file"):
        path = tool_input.get("path", "")
        for prod_path in PRODUCTION_PATHS:
            if prod_path in path:
                return {
                    "error": f"Blocked: write access to production path '{path}' "
                             "is not permitted in automated mode."
                }

    if tool_name == "send_email":
        recipient = tool_input.get("to", "")
        if not recipient.endswith("@internal.company.com"):
            return {
                "error": "Blocked: sending email to external recipients "
                         "requires human approval."
            }

    return None  # allow

This hook does not ask the model nicely. It intercepts the tool call in the execution layer and blocks it before any harm occurs. It returns a structured error the agent can reason about and recover from gracefully. And it is easy to extend: add a pattern to the list or add a production path, and the guarantee is updated everywhere without touching any other code.

Compare this to adding a line in the system prompt. The hook is a law. The prompt is a suggestion.

Where Each Behavior Belongs

When you encounter a behavior you want to enforce, run this decision tree:

Would a single failure here be acceptable?

If no (data loss, security breach, unauthorized communication): it belongs at the Hook level. Write the code.
If it needs to be failure-tolerant but contextual: it belongs at the Rule level. Scope it to the relevant domain.
If it is a procedure the agent needs to repeat reliably: it belongs at the Skill level. Encode the steps.
If it is a preference or style guideline: it belongs in the Prompt. That is what prompts are for.

The common failure mode is defaulting everything to the Prompt level because it is the easiest thing to do. Prompt instructions are fast to write. You do not need to think about hook APIs or execution lifecycles. But every behavior you leave at the Prompt level when it should be higher is a floor that can crack.

The Capability Floor

Here is the concept that makes this concrete: every agent system has a capability floor. It is the minimum reliable behavior you can guarantee, set by your weakest critical path.

Consider a customer support agent. The model quality, the tools, the system prompt -- these all affect the ceiling of what the agent can achieve. But the capability floor is set by your deterministic enforcement layer.

If your pre-tool-use hooks correctly block dangerous operations, then no matter what else goes wrong -- model hallucination, bad tool calls, prompt injection -- those operations cannot happen. That floor is solid.

If your safety constraints live only in the system prompt, the floor is probabilistic. Most days it holds. The day it does not, you have an incident.

There is a direct relationship between how much of your enforcement lives in code versus prompt, and how reliable your agent is in production. Every behavior you promote from Prompt to Hook raises the floor.

Structured Outputs: Reliability on Decision Paths

The other major reliability technique applies specifically to decision points where the agent's output will trigger downstream actions.

Without structure, the agent generates free text, and your code parses it. Each parsing step introduces error surface. With structured outputs, the model is constrained to produce JSON conforming to a schema your runtime validates:

import { generateText, Output } from "ai";
import { anthropic } from "@ai-sdk/anthropic";
import { z } from "zod";

const ActionPlanSchema = z.object({
  steps: z.array(z.object({
    action: z.enum(["read", "write", "delete", "send"]),
    target: z.string(),
    rationale: z.string(),
  })),
  requires_human_approval: z.boolean(),
  confidence: z.number().min(0).max(1),
});

const { output: plan } = await generateText({
  model: anthropic("claude-opus-4.6"),
  output: Output.object({ schema: ActionPlanSchema }),
  prompt: `Create an action plan for: ${userRequest}`
});

if (plan.requires_human_approval) {
  await requestApproval(plan);
  return;
}

for (const step of plan.steps) {
  await executeStep(step);
}

The validation happens at the API level before you see the result. Not via another model call. Not via regex. The schema contract is enforced by the runtime.

Use structured outputs for: plans that will be executed, data extractions that will be stored, decisions that trigger downstream behavior. Keep free-form responses for conversations where flexible phrasing is actually better.

Applying This in Practice

If you have an agent running in production today, run this audit:

List every safety constraint in your system prompt.
For each one, ask: what happens if the model ignores this once?
If the answer is "incident," write a hook. Today.
If the answer is "degraded output," consider whether a rule or skill would be more reliable than the current prompt instruction.

This is not a large engineering investment. A pre-tool-use hook for the most dangerous operations in your agent might take two hours to write. That two hours eliminates a class of incidents that would otherwise be a matter of when, not if.

The pattern generalizes: every critical safety or compliance constraint in your system should be enforced in code, not in a prompt. Prompts are for preferences. Hooks are for requirements. Good engineers make that distinction in every system they build. Agent systems are no different.

This post is adapted from Production AI Agents: Build, Deploy, and Monetize Autonomous Systems, available on Amazon Kindle. The book covers architecture, memory, tools, MCP, multi-agent systems, deployment, security, and business models with real code examples.

I build production AI systems. More at astraedus.dev.