Penloom Studio

Posted on Jul 1

Never trust an LLM's output directly. Here's the validation layer I put on every agent.

#ai #llm #claudeai #typescript

Here's a failure mode I've seen in nearly every AI agent codebase I've reviewed: the agent receives a model response, trusts the JSON it contains, and calls .result.items[0].id — which throws Cannot read properties of undefined at 2 AM because the model returned {"result": null} on an edge case.

The model didn't hallucinate the content. It hallucinated the structure.

This is surprisingly common, and the fix isn't "use a better prompt." The fix is a validation layer that runs between the raw model output and the code that acts on it.

Why structured output isn't enough

Claude and GPT-4 both support structured output modes that constrain the model to emit valid JSON matching a given schema. This is genuinely useful and you should use it. But it doesn't fully solve the problem, for two reasons:

1. JSON-valid is not semantically valid.

The model can emit perfectly valid JSON that conforms to your schema and still be wrong. A string field that should be a UUID might contain a made-up identifier that fails a database lookup. An integer field labeled confidence_score might be 847 when your code expects a 0-1 float. The schema enforces types, not semantics.

2. Not all LLM calls use structured output.

If you're doing multi-step reasoning, chain-of-thought steps, tool call parsing, or processing outputs from models that don't support native JSON mode, you're parsing free-text responses. You need to handle that robustly.

The pattern: parse, validate, classify

Every agent call I build now goes through three stages:

raw model output
     ↓
  [PARSE]   – extract the structure from the text
     ↓
 [VALIDATE] – assert the structure matches expectations
     ↓
 [CLASSIFY] – categorize the outcome so the caller can handle it

Here's the TypeScript implementation I actually use:

import { z } from "zod";

// 1. Define the schema for what you expect
const AnalysisResultSchema = z.object({
  sentiment: z.enum(["positive", "negative", "neutral"]),
  confidence: z.number().min(0).max(1),
  key_points: z.array(z.string()).min(1).max(10),
  action_required: z.boolean(),
  follow_up: z.string().optional(),
});

type AnalysisResult = z.infer<typeof AnalysisResultSchema>;

// 2. The parse-validate-classify wrapper
type AgentOutput<T> =
  | { ok: true; data: T }
  | { ok: false; reason: "parse_failure" | "validation_failure" | "empty_response"; raw: string; error?: string };

function parseAgentOutput<T>(
  raw: string,
  schema: z.ZodSchema<T>
): AgentOutput<T> {
  // Guard: empty or whitespace-only response
  if (!raw.trim()) {
    return { ok: false, reason: "empty_response", raw };
  }

  // Extract JSON from the response — models often wrap it in prose or code fences
  const jsonMatch = raw.match(/```
{% endraw %}
(?:json)?\s*([\s\S]*?)
{% raw %}
```/) || 
                    raw.match(/(\{[\s\S]*\}|\[[\s\S]*\])/);

  const jsonString = jsonMatch ? jsonMatch[1] ?? jsonMatch[0] : raw.trim();

  let parsed: unknown;
  try {
    parsed = JSON.parse(jsonString);
  } catch (err) {
    return {
      ok: false,
      reason: "parse_failure",
      raw,
      error: err instanceof Error ? err.message : "JSON.parse failed",
    };
  }

  const result = schema.safeParse(parsed);
  if (!result.success) {
    return {
      ok: false,
      reason: "validation_failure",
      raw,
      error: result.error.errors.map(e => `${e.path.join(".")}: ${e.message}`).join("; "),
    };
  }

  return { ok: true, data: result.data };
}

The AgentOutput<T> discriminated union forces the caller to handle both the happy path and the failure paths. You can't accidentally access output.data without first checking output.ok.

Putting it together in a real agent call

import Anthropic from "@anthropic-ai/sdk";

const client = new Anthropic();

async function analyzeCustomerFeedback(
  feedback: string
): Promise<AgentOutput<AnalysisResult>> {
  const response = await client.messages.create({
    model: "claude-sonnet-4-5",
    max_tokens: 512,
    system: `You analyze customer feedback. Always respond with JSON matching this schema exactly:
{
  "sentiment": "positive" | "negative" | "neutral",
  "confidence": number between 0 and 1,
  "key_points": array of strings (1-10 items),
  "action_required": boolean,
  "follow_up": optional string
}
No prose. No markdown. Just the JSON object.`,
    messages: [{ role: "user", content: feedback }],
  });

  const rawText = response.content
    .filter((b): b is Anthropic.TextBlock => b.type === "text")
    .map(b => b.text)
    .join("");

  return parseAgentOutput(rawText, AnalysisResultSchema);
}

// Calling code handles both outcomes explicitly
const result = await analyzeCustomerFeedback(userFeedback);

if (!result.ok) {
  // Log the failure with full context for debugging
  console.error("Agent output invalid", {
    reason: result.reason,
    error: result.error,
    raw: result.raw.slice(0, 500), // don't log huge payloads
  });

  // Decide what to do: retry, fall back, surface to user, etc.
  return handleValidationFailure(result.reason);
}

// TypeScript knows result.data is AnalysisResult here
const { sentiment, confidence, key_points } = result.data;

The retry logic that actually works

Not all validation failures are permanent. Sometimes the model produces malformed JSON on the first try but gets it right on a retry. The key is distinguishing which failures are worth retrying.

async function analyzeWithRetry(
  feedback: string,
  maxAttempts = 3
): Promise<AnalysisResult> {
  let lastError = "";

  for (let attempt = 1; attempt <= maxAttempts; attempt++) {
    const result = await analyzeCustomerFeedback(feedback);

    if (result.ok) return result.data;

    lastError = result.error ?? result.reason;

    // Don't retry empty responses — something else is wrong
    if (result.reason === "empty_response") break;

    // On validation failure, give the model the error as feedback
    if (attempt < maxAttempts && result.reason === "validation_failure") {
      // Could pass the error back in the next prompt: "Your last response failed 
      // validation: {lastError}. Try again."
      console.warn(`Attempt ${attempt} failed validation: ${lastError}`);
      continue;
    }
  }

  throw new Error(`Failed after ${maxAttempts} attempts. Last error: ${lastError}`);
}

The pattern of feeding the validation error back to the model in the retry prompt is particularly effective. Instead of blindly retrying, you're telling the model what went wrong. In my experience this gets you to a valid output on the second attempt about 80% of the time when the first attempt had a validation failure.

What to log when validation fails

When validation fails in production, you need enough information to understand and fix the problem — but not so much that you're logging personally identifiable information or burning storage costs.

// Good: structured, queryable, safe
console.error(JSON.stringify({
  event: "agent_validation_failure",
  reason: result.reason,
  error_path: result.error, // which field failed
  response_length: result.raw.length,
  response_prefix: result.raw.slice(0, 100), // enough to see the pattern
  model: "claude-sonnet-4-5",
  timestamp: new Date().toISOString(),
}));

After a week of production logs, you'll see patterns. Maybe the model consistently omits the confidence field for certain categories of input. Maybe it returns arrays as strings when the input contains newlines. Those patterns tell you where to strengthen your prompt or add extra coercion logic.

The 10-minute version if you just want to ship

If Zod feels like overkill, here's the minimal version that still catches the most common failures:

import json
from typing import TypedDict

class AnalysisResult(TypedDict):
    sentiment: str
    confidence: float
    action_required: bool

REQUIRED_KEYS = {"sentiment", "confidence", "action_required"}
VALID_SENTIMENTS = {"positive", "negative", "neutral"}

def parse_analysis(raw: str) -> AnalysisResult | None:
    # Strip code fences if present
    text = raw.strip()
    if text.startswith("```

"):
        text = text.split("

```")[1]
        if text.startswith("json"):
            text = text[4:]

    try:
        data = json.loads(text.strip())
    except json.JSONDecodeError:
        return None

    # Check required keys
    if not REQUIRED_KEYS.issubset(data.keys()):
        return None

    # Check semantic constraints
    if data["sentiment"] not in VALID_SENTIMENTS:
        return None
    if not (0 <= float(data["confidence"]) <= 1):
        return None

    return data

Not as composable as Zod, but it catches the common failure modes: missing keys, wrong enum values, out-of-range numbers.

The principle

LLMs are probabilistic. They do not guarantee that their structured output will be valid — even when you ask nicely. A production agent needs a deterministic layer that classifies every output as valid or invalid before any code acts on it. Build that layer first, log its failures, and let the failure data tell you where your prompt needs to improve.

The validation layer doesn't slow you down — it makes your agent debuggable. Without it, you're flying blind.

I cover validation patterns, retry logic, and production reliability in the free Reliable Agent Field Guide: penloomstudio.com/field-guide.html

DEV Community