DEV Community

Kotaro Andy
Kotaro Andy

Posted on

Formal Agent Contracts: Bring Mathematical Rigor to Multi-Agent Development with a Claude Code Plugin

When multiple AI agents collaborate on a system, something always goes wrong at the boundaries. Agent A assumes one format, Agent B expects another, and the bug surfaces three layers deep at 2 AM. We've all been there.

Formal Agent Contracts is a Claude Code plugin that attacks this problem head-on. It lets you define precise, machine-verifiable contracts between agents using VDM-SL (Vienna Development Method – Specification Language), then automatically verify, prove, and generate code from those contracts. The twist: you don't need to know VDM-SL at all. Claude handles the formalism; you describe what your agents do in plain English.

What Does It Actually Do?

The plugin provides six skills that form a pipeline:

  1. define-contract — Describe your agent's role in natural language. Claude converts it into a VDM-SL formal specification with types, preconditions, postconditions, and invariants.

  2. verify-spec — Run VDMJ (the VDM-SL reference toolchain) to syntax-check and type-check your spec, then auto-generate proof obligations.

  3. smt-verify — Convert proof obligations to SMT-LIB and solve them with Z3. Get back: proved, counterexample found, or unknown.

  4. generate-code — Produce TypeScript or Python scaffolds from your spec, complete with runtime contract checks that throw clear errors on violations.

  5. integrated-workflow — Run the full pipeline (define → verify → prove → generate → test) in one session.

  6. formal-methods-guide — Ask Claude to explain any VDM-SL concept along the way.

Installation

Prerequisites

Java 11+ is required for VDMJ (the VDM-SL checker). Python 3.8+ is needed if you want Z3-based automated proving.

Install VDMJ:

# Download the JAR from GitHub releases
mkdir -p ~/.vdmj
curl -L -o ~/.vdmj/vdmj.jar \
  https://github.com/nickbattle/vdmj/releases/latest/download/vdmj-4.6.0-SNAPSHOT.jar
Enter fullscreen mode Exit fullscreen mode

Install Z3 (optional, for SMT proving):

pip install z3-solver
Enter fullscreen mode Exit fullscreen mode

Install the Plugin

In a Claude Code session (terminal or Cowork), run:

/plugin install formal-agent-contracts@iid
Enter fullscreen mode Exit fullscreen mode

That's it. The six skills are now available in your session.

Walkthrough: Building a Task Manager Agent

Let's build a real example from scratch. We'll define a task management agent with status transitions, verify the spec catches invalid transitions, and generate code with runtime contract enforcement.

Step 1: Define the Contract

Tell Claude what your agent does:

You: Define a task manager agent. Tasks have an ID, title, status
     (Todo / InProgress / Done), and priority (Low / Medium / High).
     Done tasks cannot transition back to any other status.
Enter fullscreen mode Exit fullscreen mode

Claude invokes the define-contract skill and produces a VDM-SL specification:

types
  Status = <Todo> | <InProgress> | <Done>;
  Task :: id     : nat1
          title  : seq1 of char
          status : Status
  inv t == len t.title <= 100;

operations
  ChangeStatus: TaskId * Status ==> ()
  ChangeStatus(taskId, newStatus) == ...
    pre  taskId in set dom board and
         ValidTransition(board(taskId).status, newStatus)
    post board(taskId).status = newStatus;
Enter fullscreen mode Exit fullscreen mode

The pre (precondition) says: the task must exist and the transition must be valid. The post (postcondition) guarantees the status is actually updated. The inv (invariant) constrains title length at all times. These three constructs form the contract — the rules every caller and implementation must respect.

Step 2: Verify the Spec

You: Verify this spec.
Enter fullscreen mode Exit fullscreen mode

Claude runs VDMJ and reports:

✅ Syntax check: PASSED
✅ Type check: PASSED
📋 Proof Obligations generated: 38
   - CreateTask satisfies TaskBoard invariant
   - ChangeStatus preserves title across transition
   - DeleteTask reduces board cardinality by exactly 1
Enter fullscreen mode Exit fullscreen mode

Proof obligations (POs) are questions automatically derived from your contracts. For instance, "After ChangeStatus, is board(taskId).title still the same as before?" You didn't write this check — the tool inferred it from your post condition.

Step 3: Prove Obligations with Z3

You: Prove the POs with Z3.
Enter fullscreen mode Exit fullscreen mode

Claude converts each PO to SMT-LIB and runs Z3:

PO #12 (ChangeStatus preserves title): ✅ Proved
PO #15 (DeleteTask cardinality):       ✅ Proved
PO #23 (CreateTask invariant):         ✅ Proved
...
37/38 proved, 1 unknown (requires manual review)
Enter fullscreen mode Exit fullscreen mode

Step 4: Generate Code

You: Generate TypeScript from this spec.
Enter fullscreen mode Exit fullscreen mode

Claude produces implementation code with runtime contract checks baked in:

changeStatus(taskId: TaskId, newStatus: Status): void {
  // --- Pre-conditions (from VDM-SL spec) ---
  checkPre(
    this.board.has(taskId),
    `taskId ${taskId} not in dom board`
  );
  checkPre(
    validTransition(this.board.get(taskId)!.status, newStatus),
    `Invalid transition: ${this.board.get(taskId)!.status}${newStatus}`
  );

  // --- Operation body ---
  const task = this.board.get(taskId)!;
  const updated = { ...task, status: newStatus };
  this.board.set(taskId, updated);

  // --- Post-conditions (from VDM-SL spec) ---
  checkPost(
    this.board.get(taskId)!.status === newStatus,
    `status must be ${newStatus}`
  );
}
Enter fullscreen mode Exit fullscreen mode

Try violating the contract:

agent.changeStatus(taskId, "Todo");
// → ContractError: Pre-condition failed:
//   Invalid transition: Done → Todo
Enter fullscreen mode Exit fullscreen mode

No silent bugs. No debugging session at 2 AM.

One-Shot: The Integrated Workflow

If you want the full pipeline in one go:

You: Run the integrated workflow for a task management agent.
     Tasks have ID, title, status (Todo/InProgress/Done),
     priority. Done tasks can't go back.
Enter fullscreen mode Exit fullscreen mode

Claude orchestrates all phases — define, verify, prove, generate, test — handling errors and retries automatically, and produces a session report at the end.

Why Formal Contracts Instead of Tests?

Tests are inductive: you check a finite number of cases and hope they cover enough. Formal contracts are deductive: you prove properties hold for all possible inputs.

Consider this scenario without contracts:

// This silently succeeds — the bug surfaces later, somewhere else
task.status = "Todo";  // Was "Done" — should be forbidden!
Enter fullscreen mode Exit fullscreen mode

With contracts:

changeStatus(taskId, "Todo");
// → ContractError: Invalid transition: Done → Todo
Enter fullscreen mode Exit fullscreen mode

The spec also doubles as living documentation. It precisely describes what each agent does, what it expects, and what it guarantees — and it never drifts out of sync with the code, because the code is generated from it.

Resources


Built by IID Systems. Licensed under MIT.

Top comments (0)