Sitaram Srivatsavai

Posted on Mar 26

Structured Outputs Are the Contract Your AI Agent Is Missing

#ai #json #architecture #llm

Free-text agent responses look great in demos. They fall apart the moment you try to automate anything downstream. Here's the pattern that fixes it.

Your AI agent just summarized a payer's response to a benefits verification request. The output reads beautifully:

"The patient's coverage appears to be active under the Gold Plus plan. Prior authorization is likely required based on the payer's language, though this isn't entirely clear. The copay seems to be around $50-75, and there may be a quantity limit of 30 tablets per month."

Now try writing code that processes that. Does "appears to be active" mean coverage is confirmed? Is prior auth required or not? What's the copay — $50 or $75? And good luck extracting "30 tablets per month" reliably when the next response says "one month supply" or "qty limit: 30 units."

This is the problem that structured outputs solve. And once I started treating them as the contract between the LLM and the rest of my system, every other design decision got easier.

The Problem: Beautiful Text, Useless Automation

I've built AI agents for regulated workflows in Life Sciences — benefits verification, clinical trial site selection, medical inquiry response. In every case, the agent's output needed to drive downstream automation: update a record's status, create follow-up tasks, route exceptions to the right person, trigger the next step in a workflow.

My first implementations all made the same mistake. I let the LLM return natural language summaries and then tried to parse them. Sometimes I wrote regex. Sometimes I added a second LLM call to "extract the fields." Sometimes I just gave up and had a human re-read the summary and fill in the fields manually.

Every approach was fragile because I was depending on the LLM being consistent in how it phrases things. And consistency is exactly what LLMs don't guarantee.

The Fix: Define a Schema, Enforce It in the Prompt

A structured output is a JSON object with a defined schema that the LLM must conform to. Instead of asking "summarize this payer response," you ask "extract the following fields from this payer response and return them as JSON."

Here's what the output contract looks like for a benefits verification summary:

{
  "coverageConfirmed": true,
  "priorAuthRequired": false,
  "copayNotes": "$50 copay per fill",
  "deductibleNotes": "$500 annual, not yet met",
  "limitationsNotes": "Specialty pharmacy required",
  "missingInfo": ["Effective date not stated"],
  "confidence": 82
}

Every field has a type. Every field has a purpose. And the downstream logic is trivial:

// This is the entire routing logic. No parsing. No interpretation.
if (summary.missingInfo.length === 0 && summary.confidence >= 75) {
  status = "Verified";
} else {
  status = "Needs Follow-up";
  for (const item of summary.missingInfo) {
    createTask(`Follow up: ${item}`, recordId);
  }
}

Compare that to trying to extract the same decisions from a paragraph of prose. The structured output turned a fragile NLP problem into a simple conditional.

How to Get the LLM to Comply

The schema alone isn't enough — you need to tell the LLM exactly what you expect. The prompt template has three parts:

1. Role and constraints:

You are a benefits verification analyst. Extract structured data
from the payer response below. Never invent information that isn't
in the response. If a field cannot be determined, use null for
booleans, "Not stated" for strings, and add the missing item to
the missingInfo array.

2. Context (merge fields from your system):

Patient: Jane Doe
Plan: AcmeHealth Gold Plus
Drug: RX-OMNI 10mg
Payer response: [raw text pasted or ingested here]

3. Output schema with field descriptions:

Return ONLY a JSON object matching this schema:
{
  "coverageConfirmed": boolean — is coverage active for this drug/plan?,
  "priorAuthRequired": boolean — does the payer require prior auth?,
  "copayNotes": string — copay amount and terms, or "Not stated",
  "deductibleNotes": string — deductible details, or "Not stated",
  "limitationsNotes": string — quantity limits, step therapy, etc.,
  "missingInfo": [string] — list of fields that couldn't be determined,
  "confidence": number 0-100 — how complete/clear was the payer response?
}

The field descriptions inside the schema are doing heavy lifting. They tell the LLM what each field means, what the valid values are, and what to do when information is missing. Without them, you get inconsistent interpretations — one run puts copay info in limitationsNotes, the next puts it in copayNotes.

If you're using a platform that supports structured output enforcement natively (OpenAI's response_format, Salesforce's Prompt Builder structured outputs), use it. If not, the prompt-based approach works well as long as you validate the JSON before processing it.

Three Domains, Same Pattern

The schema changes, but the pattern is identical across every use case I've built.

Clinical trial site feasibility:

{
  "enrollmentCapacity": "15-20 patients/year",
  "therapeuticExperience": "3 prior Phase III oncology trials",
  "regulatoryReadiness": "IRB approved, ethics committee pending",
  "riskFlags": ["Ethics committee approval pending"],
  "overallScore": 72,
  "recommendation": "Conditionally activate"
}

The riskFlags array drives automatic task creation — one task per flag. The overallScore drives routing: 80+ with no flags goes to "Activation Ready," 60-79 goes to "Under Review," below 60 goes to "On Hold." No human reads a paragraph and decides what to do. The schema decides.

Medical inquiry response:

{
  "answer": "Based on the Phase III LIBERTY trial...",
  "confidence": "high",
  "sourceDocs": ["DOC-2024-001", "DOC-2024-047"],
  "escalateFlag": false,
  "nextBestAction": "send_response"
}

The nextBestAction field is the routing mechanism. If the confidence is low or escalateFlag is true, the workflow opens a collaborative review instead of queuing the response for send. The sourceDocs array makes the response auditable — a reviewer can verify that the answer is grounded in approved content.

The Audit Advantage

Structured outputs give you something free-text summaries never can: a complete, machine-readable audit trail.

Every time my agent runs, I store three things on the record:

AI_Summary_JSON__c  = raw JSON output (the full structured response)
AI_Last_Run_By__c   = the user who triggered the agent
AI_Last_Run_At__c   = timestamp

Six months from now, when compliance asks "why was this case marked as Verified?", I can show them the exact JSON the agent produced, with the exact confidence score and the exact fields it extracted. I can show who triggered it and when. Try doing that with a paragraph of prose that was copy-pasted into a notes field.

In regulated industries — Life Sciences, healthcare, financial services — this isn't a nice-to-have. It's the difference between "we can demonstrate what happened" and "we think the AI said it was fine."

When NOT to Use Structured Outputs

Structured outputs are the right pattern when the agent's output drives automation. They're the wrong pattern when:

The user is having a conversation and wants a natural language response
The task is creative or exploratory (brainstorming, writing, ideation)
The output is the final product, not an intermediate step in a workflow

If a human is the consumer of the output, natural language is fine. If a system is the consumer, demand a schema.

Start With the Schema

If I could go back and give myself one piece of advice before building my first AI agent, it would be this: design the output schema before you write a single prompt.

The schema forces you to answer the hard questions early. What fields does the downstream system need? What are the valid values? What happens when information is missing? What score threshold triggers which action?

Every prompt you write, every test you create, every piece of routing logic you build — all of it flows from the schema. It's the contract between the AI and the rest of your system. Get it right first, and everything downstream gets simpler.

Get it wrong — or skip it and use free text — and you'll spend the next six months writing parsers, debugging inconsistent extractions, and explaining to stakeholders why the agent "sometimes gets it right."

Define the contract. Enforce it. Version it. Your future self will thank you.