DEV Community

Nova Elvaris
Nova Elvaris

Posted on

Prompt Contracts for Teams: Sharing AI Specs Without the Merge Hell

If you've been writing Prompt Contracts for a while, you know they work. One page, clear inputs, clear outputs, clear error cases, and suddenly your AI assistant stops drifting mid-task.

Then your teammate opens yours, tweaks three lines, and everyone on the team has a slightly different version floating around in a Notion doc, a Slack thread, and somebody's ~/prompts/ folder. Now you're back to the chaos Prompt Contracts were supposed to fix.

Here's the workflow I landed on after the third week of reconciling "which contract is current?"

The problem with shared prompts

Prompts are code. But most teams treat them like docs:

  • No version control
  • No review process
  • No ownership
  • No tests

When a prompt lives in a shared doc, every edit is a silent breaking change. Your teammate adds "respond in YAML" to fix one task and breaks the downstream script that expected JSON. Nobody notices for three days.

The team contract layout

I keep team prompts in a dedicated repo (or subfolder) with this structure:

prompts/
├── contracts/
│   ├── code-review.md
│   ├── bug-triage.md
│   └── release-notes.md
├── fixtures/
│   ├── code-review/
│   │   ├── input-valid.json
│   │   └── expected-output.json
│   └── ...
├── tests/
│   └── run-contract-tests.sh
└── CHANGELOG.md
Enter fullscreen mode Exit fullscreen mode

Each contract file is the spec. Fixtures are the test cases. Tests prove the contract still works against a real model. Changelog tracks breaking changes.

The contract template

# Contract: Code Review

**Version:** 1.3.0
**Owner:** @nova
**Last verified:** 2026-04-09

## Purpose
Review a git diff and return actionable feedback in a structured format.

## Inputs
- `diff`: unified git diff (string, required)
- `language`: programming language (string, optional)
- `focus`: one of [bugs, style, security, all] (default: bugs)

## Output schema
Return JSON matching:
{
  "summary": string,
  "findings": [{"severity": "high|medium|low", "file": string, "line": number, "issue": string, "suggestion": string}],
  "blocking": boolean
}

## Error modes
- If diff is empty → return {"summary": "empty diff", "findings": [], "blocking": false}
- If diff > 2000 lines → return {"error": "diff too large, split into chunks"}

## Non-goals
- Do NOT rewrite code
- Do NOT flag style issues when focus=bugs
- Do NOT suggest library swaps

## Changelog
- 1.3.0: Added `focus` parameter
- 1.2.0: Added `blocking` boolean
- 1.1.0: Switched from markdown to JSON output
Enter fullscreen mode Exit fullscreen mode

The review process

Treat contract changes like any other PR:

  1. Open a PR against the prompt repo
  2. Bump the version in the contract header (semver: breaking = major, additive = minor, wording = patch)
  3. Add a fixture if the change introduces new behavior
  4. Run the test script — it replays fixtures against the model and diffs the output
  5. Get one approval before merging

The test script is dumb but effective:

#!/bin/bash
for contract in contracts/*.md; do
  name=$(basename "$contract" .md)
  for fixture in fixtures/$name/*.json; do
    output=$(cat "$contract" "$fixture" | llm -m claude-4 --no-stream)
    expected="fixtures/$name/expected-$(basename $fixture)"
    if [ -f "$expected" ]; then
      diff <(echo "$output" | jq -S .) <(jq -S . "$expected") || echo "FAIL: $name / $fixture"
    fi
  done
done
Enter fullscreen mode Exit fullscreen mode

It's not deterministic (LLMs never are), but it catches the obvious regressions: wrong schema, missing fields, breaking format changes.

Ownership matters

Every contract needs exactly one owner. Not a team, not a channel — one name. The owner:

  • Approves breaking changes
  • Responds when the contract fails in production
  • Deprecates old versions

Without ownership, contracts rot the same way docs rot.

What this fixes

After a month of this setup on a team of four:

  • Zero "which prompt should I use?" questions in Slack
  • Two caught regressions before deploy (both were additive changes that accidentally broke JSON output)
  • One deprecation handled cleanly (v0.9 → v1.0 with a migration note in the changelog)

The contract model isn't new. Treating prompts like shared API specs is. Once you flip that mental switch, half the ambient prompt pain goes away.


Question for you: If you're on a team using AI tools, where do your prompts live right now? Shared doc? Personal folder? A repo? I'm curious how many teams have actually committed to treating prompts as code vs. just talking about it.

Top comments (0)