My most-read post on this blog is about prompt contracts — treating prompts like API specs with defined inputs, outputs, and error handling.
But the original had a gap: it told you what to specify, not how to validate that the AI followed the spec.
This is the v2. It adds validation rules you can check automatically.
Quick Recap: What's a Prompt Contract?
A prompt contract is a structured prompt with three sections:
INPUT: what you're giving the AI
OUTPUT: what you expect back (format, structure, length)
ERROR: what to do when something doesn't match
If you haven't read the original, the core idea is: ambiguous prompts produce ambiguous outputs. Specs produce specs.
The Problem With v1
Prompt contracts work. But they rely on you manually checking whether the output matches. That's fine for one-off tasks. It breaks down when you're running prompts in scripts, pipelines, or daily workflows.
You need machine-checkable rules.
Validation Rules: The v2 Addition
Add a VALIDATION\ block to your contract:
INPUT:
- code_diff: string (unified diff format)
- language: "typescript" | "python" | "go"
OUTPUT:
- format: JSON
- schema: { "issues": [{ "line": int, "severity": "error"|"warning", "message": string }] }
- max_items: 10
VALIDATION:
- output must be valid JSON (parseable)
- every issue must reference a line number present in the diff
- severity must be one of the allowed values
- message must be under 200 characters
- if no issues found, return { "issues": [] } (not null, not empty string)
ERROR:
- if output fails validation, retry once with: "Your output failed validation: {error}. Fix and return only the corrected JSON."
- if retry also fails, return the raw output tagged as UNVALIDATED
Why This Matters
With validation rules, you can:
- Auto-retry on structural failures (JSON parse errors, missing fields)
- Log quality metrics (what % of outputs pass validation first-try?)
- A/B test prompts with a consistent scoring mechanism
- Chain prompts safely — downstream steps can trust the upstream output
A Real Example: Code Review Contract
Here's a complete contract I use for automated PR reviews:
# Code Review Contract v2.1
## INPUT
- diff: unified diff of the PR
- context: file-level summary (max 500 tokens)
- focus_areas: list of strings (e.g., ["security", "performance"])
## OUTPUT
- format: Markdown
- sections: Summary (2-3 sentences), Issues (bulleted list), Verdict (APPROVE / REQUEST_CHANGES)
- max_length: 800 words
## VALIDATION
- must contain exactly 3 sections with the headers above
- Issues section: each bullet must start with [ERROR], [WARNING], or [SUGGESTION]
- Verdict must be one of the two allowed values
- Summary must not exceed 3 sentences
- No line references to files not in the diff
## ERROR
- missing section → retry with "You're missing the {section} section"
- invalid verdict → retry with "Verdict must be APPROVE or REQUEST_CHANGES"
- 2 consecutive failures → flag for human review
The Validation Checklist Template
Copy this and adapt it for any prompt:
## VALIDATION
- [ ] Output format matches spec (JSON / Markdown / plain text)
- [ ] All required fields present
- [ ] Field types correct (strings are strings, numbers are numbers)
- [ ] Values within allowed ranges / enums
- [ ] Length constraints met (min/max words, items, characters)
- [ ] No hallucinated references (files, URLs, variables that don't exist in input)
- [ ] Consistent with input (doesn't contradict the given context)
Wiring It Up in Code
If you're calling an LLM from code, validation is just a function:
def validate_review(output: dict) -> list[str]:
errors = []
required = ["Summary", "Issues", "Verdict"]
for section in required:
if section not in output:
errors.append(f"Missing section: {section}")
if output.get("Verdict") not in ("APPROVE", "REQUEST_CHANGES"):
errors.append(f"Invalid verdict: {output.get('Verdict')}")
return errors
# Usage
result = call_llm(prompt)
issues = validate_review(result)
if issues:
result = call_llm(f"Fix these validation errors: {issues}\n\nOriginal output: {result}")
Simple. No framework needed.
When to Skip Validation
Not everything needs a contract. If you're brainstorming, exploring, or doing creative work, contracts add friction without value.
Use contracts when:
- Output feeds into another system
- You're running the prompt more than once
- Incorrect output has a real cost (time, money, bugs)
Skip contracts when:
- You're thinking out loud
- The output is for your eyes only
- You'll edit it heavily anyway
Key Takeaway
Prompt contracts v1 told you to define the spec. v2 tells you to enforce it.
The validation block turns your prompt from a wish into a testable requirement. That's the difference between "AI-assisted" and "AI-reliable."
Grab the template above and try it on your most-used prompt. Share what breaks — I'm collecting failure modes for a follow-up post.
Top comments (0)