If you’ve ever shipped a prompt that worked yesterday and mysteriously fails today, you’ve already learned the core lesson:
Prompts aren’t “just text”. They’re an interface.
And interfaces need contracts.
In software, we don’t call a function without knowing what it expects and what it returns. We don’t expose an API without documenting error cases. Yet a lot of prompt work is still “try a few words and hope”. That approach breaks the moment you:
- add a second consumer (a teammate, a cron job, a different model)
- change upstream inputs (new fields, longer docs, weird formatting)
- introduce tools (retrieval, code execution, web browsing)
A prompt contract is a lightweight way to treat prompts like APIs: explicit inputs, explicit outputs, and explicit failure behavior.
This post shows a practical template you can copy, plus a concrete example (with validation + a repair loop) that you can drop into a real workflow.
What is a prompt contract?
A prompt contract is a short, structured specification that answers five questions:
- Purpose — what job does this prompt perform?
- Inputs — what data does it need (and in what shape)?
- Outputs — what should the model return (format + rules)?
- Constraints — what must never happen?
- Errors — what should happen when requirements can’t be met?
The key move: write the contract as if you’re writing an API doc. Because you are.
Why contracts beat “better prompting”
Most prompt failures come from ambiguity, not “weak instructions”. Contracts remove ambiguity by making expectations testable.
A good contract gives you:
- Stable integration points (your code can rely on a schema)
- Cheaper debugging (you can tell if the failure is input, output, or logic)
- Safer iteration (you can version the contract and roll changes out gradually)
- Better evaluation (you can regression-test against a known output shape)
If you only take one thing from this article: define the output like you mean it.
Example: a meeting-notes → action-items prompt
Let’s build a prompt that turns messy meeting notes into action items. The common failure modes:
- It returns prose when you need JSON.
- It “helpfully” invents tasks that aren’t in the notes.
- It produces inconsistent fields (sometimes
owner, sometimesassignee).
Here’s a prompt contract that prevents those.
1) Contract (human-readable)
Name: action_items.v1
Purpose: Extract actionable tasks from meeting notes.
Inputs:
-
notes(string) — raw meeting notes, may include bullets, timestamps, or speaker tags. -
context(optional string) — team/project context.
Output: JSON object with:
-
items: array of action items - each item has:
task,owner,due_date(optional, ISO-8601),evidence(quote from notes) - No extra keys
Constraints:
- Do not invent tasks. Every task must be supported by
evidence. - If no tasks exist, return
{ "items": [] }. - If an owner isn’t specified, use
"owner": null(don’t guess).
Errors:
- If notes are empty/garbled, return
{ "items": [], "error": "..." }.
Notice how this reads like an API: required fields, optional fields, and precise behavior.
2) The prompt (machine-enforceable)
You can translate that contract into a prompt that is explicit and “schema-first”:
You are extracting action items from meeting notes.
Return ONLY valid JSON. Do not wrap in markdown.
Schema:
{
"items": [
{
"task": string,
"owner": string|null,
"due_date": string|null, // ISO-8601 date, e.g. "2026-02-25". null if unknown.
"evidence": string // direct quote from notes supporting the task
}
],
"error": string|null
}
Rules:
- Every item MUST have evidence quoted from the notes.
- Do not invent owners or due dates.
- If there are zero tasks, return {"items":[],"error":null}.
- If notes are empty or unusable, return {"items":[],"error":"unusable_input"}.
INPUT_NOTES:
{{notes}}
OPTIONAL_CONTEXT:
{{context}}
This prompt is boring on purpose. Boring prompts integrate well.
Add validation + a repair loop (the production move)
Even with a contract, models sometimes return:
- invalid JSON (trailing commas, comments)
- wrong keys
- strings where you wanted nulls
So treat the model like any other external dependency: validate, then repair.
Here’s a minimal TypeScript example using zod:
import { z } from "zod";
const ActionItem = z.object({
task: z.string().min(1),
owner: z.string().min(1).nullable(),
due_date: z.string().min(1).nullable(),
evidence: z.string().min(1)
});
const OutputSchema = z.object({
items: z.array(ActionItem),
error: z.string().nullable()
});
export function parseOrRepair(raw: string, repair: (msg: string) => Promise<string>) {
try {
const json = JSON.parse(raw);
return OutputSchema.parse(json);
} catch (err) {
return repair(
"Your last response violated the JSON schema. " +
"Return ONLY valid JSON matching the schema exactly. " +
"Do not add explanations.\n\nINVALID_OUTPUT:\n" + raw
).then((fixed) => OutputSchema.parse(JSON.parse(fixed)));
}
}
That repair() call can use a second prompt that says “here’s the schema, here’s what you produced, fix it”.
This is the same pattern you’d use for data pipelines: parse → validate → correct.
Version your prompt contract like code
Contracts become really powerful when you version them.
-
action_items.v1— original schema -
action_items.v2— addpriorityorstatus
When you need to change output, don’t silently mutate v1. Create v2 and migrate consumers.
A simple tactic that works well:
- Keep the contract name in your prompt (
Name: action_items.v1) - Log it with every run
- Write small regression tests that pin expected output shape
If a change breaks your pipeline, you’ll know which contract version introduced it.
A prompt contract template (copy/paste)
Use this for any prompt you want to “ship”:
### Prompt Contract: <name>.v<version>
**Purpose**
- …
**Inputs**
- `input_a` (type) — …
- `input_b` (type, optional) — …
**Output (format + schema)**
- Return ONLY …
- Keys: …
- No extra keys
**Constraints / Safety**
- Must …
- Must not …
**Error Behavior**
- If … return …
**Examples**
- Input: …
- Output: …
If you’re working in a team, put these contracts in your repo right next to the code that calls them.
Closing thought
The fastest way to improve “prompt reliability” isn’t clever wording.
It’s engineering:
- define the interface
- validate the output
- repair when it’s broken
- version and test the changes
When you treat prompts like APIs, your AI workflows stop feeling like magic and start behaving like software.
Top comments (0)