rokoss21

Posted on Jan 19

FACET: Contracts + Gates for LLM Systems

#webdev #ai #programming #opensource

Stop doing improv theatre in production. Ship agents like software.

Agentic tooling is moving fast: CLIs that edit repositories, frameworks that orchestrate swarms, tool-calling APIs everywhere. And still, most teams that try to run “agents” in production hit the same wall:

outputs drift between runs
“structured output” breaks at the worst moment
tool calls happen at the wrong time, with the wrong shape
debugging turns into story-time (“it worked yesterday…”)
trust collapses exactly when you need it most

The root cause isn’t that models aren’t smart enough.
It’s that we keep shipping non-contractual behavior.

This post argues a simple thesis:

Reliability in LLM systems doesn’t come from better prompts.
It comes from contracts and gates — with the system holding veto power.

FACET v2.0 is a compiler-grade, deterministic agent configuration language designed around that thesis: strict AST → type checking (FTS) → reactive compute (R-DAG) → deterministic context packing (Token Box Model) → canonical JSON render.

A short failure story: “theatre in production”

A team ships an “agentic PR bot”. It edits code, runs tests, and posts a confident summary.

One day the bot “fixes” an issue by adding a dependency. Tests pass locally. The PR merges.
In production, a transitive change triggers a locale/timezone edge case. A downstream service fails for a subset of users. Rollback takes hours because nobody can answer:

Was the agent allowed to introduce new dependencies?
Which tool calls did it run, with what arguments, in what order?
Can we replay the run?
What evidence exists beyond “agent said it’s fine”?

The bot didn’t “misbehave”. It acted exactly as designed: it operated without enforceable boundaries.

That’s the pattern: not “bad model”, but missing veto power.

Contracts and gates: the difference between a demo and a pipeline

Most agent stacks look like this:

Prompt + JSON hope → model writes → parse fails → retry culture → merge anyway

A contractual pipeline looks like this:

Contract → validate inputs + permissions → generate artifact → validate artifact → gates → commit (or reject)

Two key primitives make this real:

Contracts: define what’s allowed and what “valid” means
Gates: run reality checks (tests, security, perf) and block state changes

FACET makes both primitives first-class — not conventions, not best-effort prompts.

Part 1 — Contracts in FACET (real examples)

FACET v2.0 treats agent behavior as a compiled spec. That starts with strict structure and typing.

1) Tool contracts with `@interface` (typed tools, not “tool descriptions”)

In FACET, tools aren’t loose JSON blobs. They are typed interfaces that compile into provider tool schemas.

@interface WeatherAPI
  fn get_current(city: string) -> struct {
    temp: float
    condition: string
  }

@system
  tools: [$WeatherAPI]

This is a contract:

the tool name exists
args are typed (city: string)
return shape is typed (struct { temp: float, condition: string })
the compiler can emit canonical provider schemas during render

In practice, this eliminates a whole class of runtime failures: wrong arg names, wrong types, ambiguous “tool results”.

2) Inputs are explicit with `@input` (no hidden dependencies)

FACET forces you to declare runtime inputs in @vars via @input(...).

@vars
  user_query: @input(type="string")
  user_photo: @input(type="image", max_dim=1024)

This matters because:

missing input is not “guess it” — it’s an error
constraints (like image size) are enforced at runtime
inputs become leaf nodes in the R-DAG (deterministic dependency graph)

This is fail-closed engineering: if data isn’t provided, the system does not hallucinate a substitute.

3) Variables are reactive, deterministic, and immutable after compute (R-DAG)

FACET variables can depend on other variables. Evaluation happens via R-DAG in topological order; cycles and invalid orders are errors.

@vars
  raw_query: $user_query |> trim()
  query_lang: $raw_query |> detect_lang()
  normalized: $raw_query |> normalize(lang=$query_lang)

Key point: once computed, the variable map becomes immutable.
This makes runs reproducible and debuggable: the same inputs produce the same computed state (in Pure Mode).

4) Lenses have trust levels (Pure / Bounded / Volatile)

FACET introduces trust levels for transformations (lenses):

Level 0 — Pure: deterministic, no I/O
Level 1 — Bounded external: allowed only with deterministic params, cacheable
Level 2 — Volatile: nondeterministic, only in Execution Mode

A pipeline makes the contract explicit:

@vars
  summary: $normalized
    |> summarize(model="gpt-5.2", temperature=0)   # Level 1 (bounded)
    |> to_markdown()                               # Level 0 (pure)

This is where “determinism is a property of the system” becomes concrete.
If you’re in Pure Mode: you simply cannot smuggle volatility in “because it felt right”.

Part 2 — Gates in FACET (not vibes, executable checks)

A contract without gates is still fragile. Gates give the system the right to say: no.

FACET v2.0 includes a first-class testing system via @test.

5) Tests as executable gates with mocks and assertions (`@test`)

@test "basic greeting"
  vars:
    username: "TestUser"

  mock:
    WeatherAPI.get_current: { temp: 10, condition: "Rain" }

  assert:
    - output contains "umbrella"
    - cost < 0.01

This is CI thinking applied to agent specs:

tests execute the full 5-phase pipeline
tools can be mocked (deterministic runs)
assertions can check output and telemetry

In other words: “agent done” is not a feeling — it’s passing checks.

Part 3 — Deterministic context packing (Token Box Model) is a gate too

Even when contracts and tests exist, real systems fail because context is managed ad hoc. Prompts overflow, critical instructions get truncated, and the model “drifts” because the context layout changed.

FACET treats context like layout, not like concatenated strings.

6) Token Box Model: deterministic allocation + critical overflow as a hard failure

The model is simple:

your prompt is a set of sections (@system, @user, history, docs, etc.)
each section has min/grow/shrink/priority
critical sections are those with shrink == 0 and must never be dropped or compressed

If critical sections can’t fit, FACET raises a hard error (critical overflow).
This is a gate: the system refuses to ship an invalid prompt.

That single decision kills an entire class of “mysterious agent regressions” caused by silent truncation.

Part 4 — What “enforced before generation” actually means (no magic)

This phrase can sound controversial, so here’s the precise version:

FACET enforces a double barrier:

Before action (pre-check):
validate inputs, tool interfaces, allowed operations, budgets, deterministic mode constraints
Before state change (post-check):
validate produced artifacts, run gates, reject if any invariant breaks

So the flow is:

validate → generate → validate → gate → commit

This is how compilers and CI pipelines behave.
Production agent systems should do the same.

Part 5 — A small, concrete canonical output artifact

FACET’s final output is a canonical JSON structure (before provider-specific transformations). Here’s a simplified “what your orchestration layer can log and replay” shape:

{
  "meta": {"profile": "hypervisor", "mode": "pure"},
  "tools": [
    {"name": "WeatherAPI.get_current", "input_schema": {"city": "string"}}
  ],
  "sections_order": ["system", "tools", "history", "user"],
  "user": {"query": "what to wear today in Berlin?"},
  "gates": [
    {"gate": "tests_green", "pass": true},
    {"gate": "critical_overflow", "pass": true}
  ]
}

Notice the difference vs typical systems:

there is an explicit mode
tools are typed
section order is deterministic
gates and outcomes are visible
this is loggable and replayable

Part 6 — Tooling matters: the reference CLI (`fct`) makes this operational

FACET isn’t only a philosophy; it specifies tooling expectations. A reference CLI (fct) is part of the standard:

fct build file.facet — resolution + type checking
fct run file.facet --input input.json — full 5-phase pipeline → canonical JSON
fct test file.facet — execute @test blocks, report failures + telemetry
fct inspect ... — introspect AST/R-DAG/context allocation (debuggability)

When the language includes these operations, teams stop inventing bespoke glue.

Closing: stop shipping theatre — ship standards

LLMs are powerful components — but without enforceable boundaries they introduce entropy at the exact moment correctness, security, and reliability matter most.

Contracts + gates aren’t bureaucracy.
They’re the difference between a cool demo and a shippable system.

FACET’s core bet is simple:

Treat agent behavior like compiled software:
parse, type-check, compute deterministically, pack context deterministically, render canonical JSON — and never commit state unless gates pass.

Repositories

FACET Compiler: https://github.com/rokoss21/facet-compiler
FACET Standard: https://github.com/rokoss21/facet-standard

DEV Community

FACET: Contracts + Gates for LLM Systems

A short failure story: “theatre in production”

Contracts and gates: the difference between a demo and a pipeline

Part 1 — Contracts in FACET (real examples)

1) Tool contracts with `@interface` (typed tools, not “tool descriptions”)

2) Inputs are explicit with `@input` (no hidden dependencies)

3) Variables are reactive, deterministic, and immutable after compute (R-DAG)

4) Lenses have trust levels (Pure / Bounded / Volatile)

Part 2 — Gates in FACET (not vibes, executable checks)

5) Tests as executable gates with mocks and assertions (`@test`)

Part 3 — Deterministic context packing (Token Box Model) is a gate too

6) Token Box Model: deterministic allocation + critical overflow as a hard failure

Part 4 — What “enforced before generation” actually means (no magic)

Part 5 — A small, concrete canonical output artifact

Part 6 — Tooling matters: the reference CLI (`fct`) makes this operational

Closing: stop shipping theatre — ship standards

Repositories

Top comments (0)

A short failure story: “theatre in production”

Contracts and gates: the difference between a demo and a pipeline

Part 1 — Contracts in FACET (real examples)

1) Tool contracts with @interface (typed tools, not “tool descriptions”)

2) Inputs are explicit with @input (no hidden dependencies)

3) Variables are reactive, deterministic, and immutable after compute (R-DAG)

4) Lenses have trust levels (Pure / Bounded / Volatile)

Part 2 — Gates in FACET (not vibes, executable checks)

5) Tests as executable gates with mocks and assertions (@test)

Part 3 — Deterministic context packing (Token Box Model) is a gate too

6) Token Box Model: deterministic allocation + critical overflow as a hard failure

Part 4 — What “enforced before generation” actually means (no magic)

Part 5 — A small, concrete canonical output artifact

Part 6 — Tooling matters: the reference CLI (fct) makes this operational

Closing: stop shipping theatre — ship standards

Repositories

1) Tool contracts with `@interface` (typed tools, not “tool descriptions”)

2) Inputs are explicit with `@input` (no hidden dependencies)

5) Tests as executable gates with mocks and assertions (`@test`)

Part 6 — Tooling matters: the reference CLI (`fct`) makes this operational