Kotaro Andy

Posted on Apr 1

Formal Agent Contracts: Bring Mathematical Rigor to Multi-Agent Development with a Claude Code Plugin

#claudecode #ai #formalmethods #agents

When multiple AI agents collaborate on a system, something always goes wrong at the boundaries. Agent A assumes one format, Agent B expects another, and the bug surfaces three layers deep at 2 AM. We've all been there.

Formal Agent Contracts is a Claude Code plugin that attacks this problem head-on. It lets you define precise, machine-verifiable contracts between agents using VDM-SL (Vienna Development Method – Specification Language), then automatically verify, prove, and generate code from those contracts. The twist: you don't need to know VDM-SL at all. Claude handles the formalism; you describe what your agents do in plain English.

What Does It Actually Do?

The plugin provides six skills that form a pipeline:

define-contract — Describe your agent's role in natural language. Claude converts it into a VDM-SL formal specification with types, preconditions, postconditions, and invariants.
verify-spec — Run VDMJ (the VDM-SL reference toolchain) to syntax-check and type-check your spec, then auto-generate proof obligations.
smt-verify — Convert proof obligations to SMT-LIB and solve them with Z3. Get back: proved, counterexample found, or unknown.
generate-code — Produce TypeScript or Python scaffolds from your spec, complete with runtime contract checks that throw clear errors on violations.
integrated-workflow — Run the full pipeline (define → verify → prove → generate → test) in one session.
formal-methods-guide — Ask Claude to explain any VDM-SL concept along the way.

Installation

Prerequisites

Java 11+ is required for VDMJ (the VDM-SL checker). Python 3.8+ is needed if you want Z3-based automated proving.

Install VDMJ:

# Download the JAR from GitHub releases
mkdir -p ~/.vdmj
curl -L -o ~/.vdmj/vdmj.jar \
  https://github.com/nickbattle/vdmj/releases/latest/download/vdmj-4.6.0-SNAPSHOT.jar

Install Z3 (optional, for SMT proving):

pip install z3-solver

Install the Plugin

In a Claude Code session (terminal or Cowork), run:

/plugin install formal-agent-contracts@iid

That's it. The six skills are now available in your session.

Walkthrough: Building a Task Manager Agent

Let's build a real example from scratch. We'll define a task management agent with status transitions, verify the spec catches invalid transitions, and generate code with runtime contract enforcement.

Step 1: Define the Contract

Tell Claude what your agent does:

You: Define a task manager agent. Tasks have an ID, title, status
     (Todo / InProgress / Done), and priority (Low / Medium / High).
     Done tasks cannot transition back to any other status.

Claude invokes the define-contract skill and produces a VDM-SL specification:

types
  Status = <Todo> | <InProgress> | <Done>;
  Task :: id     : nat1
          title  : seq1 of char
          status : Status
  inv t == len t.title <= 100;

operations
  ChangeStatus: TaskId * Status ==> ()
  ChangeStatus(taskId, newStatus) == ...
    pre  taskId in set dom board and
         ValidTransition(board(taskId).status, newStatus)
    post board(taskId).status = newStatus;

The pre (precondition) says: the task must exist and the transition must be valid. The post (postcondition) guarantees the status is actually updated. The inv (invariant) constrains title length at all times. These three constructs form the contract — the rules every caller and implementation must respect.

Step 2: Verify the Spec

You: Verify this spec.

Claude runs VDMJ and reports:

✅ Syntax check: PASSED
✅ Type check: PASSED
📋 Proof Obligations generated: 38
   - CreateTask satisfies TaskBoard invariant
   - ChangeStatus preserves title across transition
   - DeleteTask reduces board cardinality by exactly 1

Proof obligations (POs) are questions automatically derived from your contracts. For instance, "After ChangeStatus, is board(taskId).title still the same as before?" You didn't write this check — the tool inferred it from your post condition.

Step 3: Prove Obligations with Z3

You: Prove the POs with Z3.

Claude converts each PO to SMT-LIB and runs Z3:

PO #12 (ChangeStatus preserves title): ✅ Proved
PO #15 (DeleteTask cardinality):       ✅ Proved
PO #23 (CreateTask invariant):         ✅ Proved
...
37/38 proved, 1 unknown (requires manual review)

Step 4: Generate Code

You: Generate TypeScript from this spec.

Claude produces implementation code with runtime contract checks baked in:

changeStatus(taskId: TaskId, newStatus: Status): void {
  // --- Pre-conditions (from VDM-SL spec) ---
  checkPre(
    this.board.has(taskId),
    `taskId ${taskId} not in dom board`
  );
  checkPre(
    validTransition(this.board.get(taskId)!.status, newStatus),
    `Invalid transition: ${this.board.get(taskId)!.status} → ${newStatus}`
  );

  // --- Operation body ---
  const task = this.board.get(taskId)!;
  const updated = { ...task, status: newStatus };
  this.board.set(taskId, updated);

  // --- Post-conditions (from VDM-SL spec) ---
  checkPost(
    this.board.get(taskId)!.status === newStatus,
    `status must be ${newStatus}`
  );
}

Try violating the contract:

agent.changeStatus(taskId, "Todo");
// → ContractError: Pre-condition failed:
//   Invalid transition: Done → Todo

No silent bugs. No debugging session at 2 AM.

One-Shot: The Integrated Workflow

If you want the full pipeline in one go:

You: Run the integrated workflow for a task management agent.
     Tasks have ID, title, status (Todo/InProgress/Done),
     priority. Done tasks can't go back.

Claude orchestrates all phases — define, verify, prove, generate, test — handling errors and retries automatically, and produces a session report at the end.

Why Formal Contracts Instead of Tests?

Tests are inductive: you check a finite number of cases and hope they cover enough. Formal contracts are deductive: you prove properties hold for all possible inputs.

Consider this scenario without contracts:

// This silently succeeds — the bug surfaces later, somewhere else
task.status = "Todo";  // Was "Done" — should be forbidden!

With contracts:

changeStatus(taskId, "Todo");
// → ContractError: Invalid transition: Done → Todo

The spec also doubles as living documentation. It precisely describes what each agent does, what it expects, and what it guarantees — and it never drifts out of sync with the code, because the code is generated from it.

Resources

Plugin install: /plugin install formal-agent-contracts@iid in Claude Code
Source: github.com/kotaroyamame/formal-agent-contracts
Full task manager example: included in the plugin at examples/task-manager/
Research paper: Formal-Spec-Driven Development

Built by IID Systems. Licensed under MIT.

DEV Community