DEV Community: rokoss21

IOSM CLI: AI Engineering Runtime. Not Another Chat Wrapper.

rokoss21 — Wed, 11 Mar 2026 09:52:54 +0000

Chat agents hit a ceiling.

You feel it around week three of using any of them — Claude Code, Gemini CLI, Cursor, Aider. The first sessions are impressive. Then you hit a real task: a cross-cutting refactor, a parallel migration, a codebase you've touched across a hundred sessions. And the tool starts to break at the seams. Re-explain the context. Manually merge competing edits. Hope the rollback works.

Engineering requires structure. Chat doesn't have it.

IOSM CLI is the runtime for that structure.

"AI without a methodology is just faster improvisation."

This is the sentence that drives every design decision in iosm-cli. Not a tagline — an architectural constraint. A coding agent that can't measure its own outcomes, can't coordinate parallel work, and can't remember decisions across sessions is not an engineering tool. It's a search engine that writes code.

AI adoption is no longer the advantage. Systematic AI engineering is.

🛠️ Why This Exists

I built agent systems for production codebases for years. The pattern was always the same: impressive demos, fragile execution at scale. When the task grew beyond a single context window — spanning multiple modules, multiple agents, multiple sessions — chat-based tools collapsed into manual coordination overhead.

The missing piece wasn't a better model. It was a missing runtime layer: something that enforces methodology, tracks outcomes, coordinates agents, and survives session boundaries. That's what IOSM CLI is.

👥 Who Is This For

Three types of engineers use IOSM CLI, and they come for different reasons.

The solo developer who wants a real coding agent

You've tried the other tools. You're tired of re-explaining your project every session. You want something that already knows your architectural decisions, your banned dependencies, your team conventions — and actually executes tasks autonomously.

With iosm-cli, you run iosm, type your task, and the agent works. It reads your files, runs your tests, handles rollbacks. Persistent memory means session 10 knows everything session 1 learned.

Time to productive first result: under 5 minutes.

The senior engineer running complex refactors

180K-line monolith. Extract the payment service, migrate auth to OAuth2, keep CI green throughout. One sequential agent on a 15-hour task is not a plan.

With /orchestrate, you spin up parallel agents with dependency ordering, file lock guarantees, and git worktree isolation. You get a coordinated team in one command, with a consolidated result you can actually review.

The kind of work you previously couldn't safely delegate to AI.

The team lead operationalizing AI coding

You need engineering workflows that are auditable, reproducible, safe for shared codebases. Every cycle should leave traces: what changed, why, what the metrics were before and after.

With IOSM cycles, every run captures baseline metrics, hypothesis cards, and outcome deltas in .iosm/cycles/. The next engineer resumes from the same artifact state.

AI coding as a team engineering system, not a solo productivity hack.

⚡ Barrier to Entry: Minimal

The tool is layered. You start at the bottom and unlock depth only when you need it.

Day 1 — Three commands to a working agent

npm install -g iosm-cli
cd your-project
iosm

Inside the session:

/login     → guided API key setup (30 seconds)
/model     → pick provider + model
your task  → start immediately

No YAML. No config files. No methodology training. The default full profile gives you a capable coding agent with full filesystem access, shell tooling, and semantic search — ready out of the box.

Week 1 — Unlock depth when you need it

Shift+Tab           → switch to iosm profile
/init               → bootstrap IOSM workspace
/iosm 0.95          → run your first structured cycle

Entirely optional. Stay in full profile forever if it works. The IOSM layer appears when you need measurable, auditable improvement cycles — not before.

No provider lock-in

export ANTHROPIC_API_KEY="..."      # Claude models
export OPENAI_API_KEY="..."         # OpenAI GPT models
export GEMINI_API_KEY="..."         # Google Gemini models
export OPENROUTER_API_KEY="..."     # 100+ models via OpenRouter

Node.js >=20.6.0 is the only hard requirement. Everything else is optional.

🆚 Honest Positioning vs Other Tools

This isn't a "we win every cell" table. It's a map so you can pick the right tool.

	Claude Code	Gemini CLI	Cursor	OpenCode	IOSM CLI
Provider	Claude-native	Gemini-native	Any (IDE)	Any (75+ providers)	Any
Mode	Terminal	CLI	IDE	Terminal	Terminal runtime
IOSM methodology	—	—	—	—	✅
Parallel agent orchestration	Partial ¹	—	—	—	✅
Structured checkpoint / rollback	Partial ²	—	Partial ³	—	✅
Persistent cross-session memory	Via CLAUDE.md	—	Via Automations	—	✅
Semantic / vector code search	Agentic ⁴	—	✅ IDE-native	—	✅ terminal-native
MCP support	✅	✅	✅	✅	✅
SDK / JSON-RPC mode	—	—	—	—	✅
Free tier	❌ paid only ⁵	✅ 1000 req/day	✅ Hobby (limited)	✅ open-source	✅ open-source

Notes (to keep this honest):

¹ Claude Code supports parallel subagents via /batch and skills, but without dependency DAGs, file locks, or worktree isolation.

² Claude Code introduced checkpoints for exploration in 2025, but without structured rollback to named states.

³ Cursor has a "Restore Checkpoint" UI button within a session, but not an explicit CLI-level /checkpoint + /rollback workflow.

⁴ Claude Code does deep agentic codebase search (reads and understands files in-context with a 200K token window) — not vector embeddings, but highly capable for many use cases.

⁵ Claude Code requires a paid Pro ($20/mo) or Max ($100+/mo) plan. The free claude.ai web interface handles general coding questions but is not the Claude Code CLI agent.

The pattern: Claude Code and Gemini CLI are go-to choices for their respective native models. Cursor excels at IDE-integrated flows. OpenCode is the best fully open-source lightweight option. IOSM CLI is the only terminal runtime that combines structured methodology, coordinated parallel execution with dependency ordering, and a full platform layer — across any provider.

Different tools for different jobs. IOSM CLI is not a "better chat" — it is a different category: an engineering runtime.

🏗️ Three Architectural Layers

The runtime is layered. Each layer adds capabilities. You get value at any level.

Layer 1 — Runtime: Agents, Orchestration, Worktrees

The base is a full coding agent with direct filesystem and shell access. Real file reads, real diffs, real test runs — no hallucinated paths.

For complex work, /orchestrate turns one agent into a coordinated team:

/orchestrate --parallel --agents 4 \
  --profiles iosm_analyst,explore,iosm_verifier,full \
  --depends 3>1,4>2 \
  --locks schema,config \
  --worktree \
  Refactor auth module, verify invariants, document changes

Dependency DAG (--depends 3>1,4>2): agent 3 waits for 1, agent 4 waits for 2
File locks (--locks schema,config): zero parallel write collisions
Git worktrees (--worktree): main branch untouched until merge

This is continuous dispatch — tasks launch the moment their dependencies are satisfied, not when an arbitrary wave completes. 3–5× reduction in wall-clock time for parallelizable work.

Layer 2 — Methodology: IOSM Cycles, Metrics, Artifacts

IOSM is Improve → Optimize → Shrink → Modularize — a four-phase iterative loop that turns vague "make this better" requests into measurable engineering decisions.

Shift+Tab              # switch to iosm profile
/init                  # bootstrap workspace
/iosm 0.95 --max-iterations 5

/init generates:

iosm.yaml — thresholds, weights, governance policies
IOSM.md — operator + agent playbook
.iosm/cycles/ — artifact workspace for cycle history

Every cycle run captures: baseline metrics → hypothesis cards → improve/verify/optimize iterations → outcome deltas → artifact write.

iosm> Baseline captured
iosm> Planned cycle from team artifacts: simplify auth module
iosm> Running improve -> verify -> optimize loop
iosm> Result: simplicity +18%, modularity +11%, performance +6%
iosm> Artifacts written to .iosm/cycles/2026-03-10-001/

These numbers are real and reproducible. Not vibes. Not impressions. Measurable deltas with full decision log.

Also in this layer:

/memory — persistent project facts across sessions. Active decisions, anti-patterns, architectural constraints. The agent loads them at startup.
/contract — hard engineering constraints the agent enforces. "No new dependencies without approval." "Test coverage must stay above 80%."
/semantic — intent-based code search. Query by meaning, not tokens. "Find all places handling token expiry" — across renamed variables and different module boundaries.
/singular — before implementing anything complex, run feasibility analysis across 3 variants. Choose before you build.

Layer 3 — Platform: SDK, JSON-RPC, MCP

iosm-cli is a foundation you build on, not a closed product.

SDK — embed the runtime in your own tooling:

import { createAgent } from 'iosm-cli';

const agent = await createAgent({
  model: 'sonnet',
  profile: 'iosm',
  tools: ['read', 'write', 'bash']
});

await agent.run('Analyze auth module security posture');

JSON-RPC — wire into CI pipelines and custom dashboards:

iosm --json-rpc --port 3042

Print mode — pipe to other tools:

iosm -p "Audit src/ for dead code" --output-format json | jq '.findings'

MCP — connect any external tool ecosystem:

/mcp    # interactive MCP server manager

🔄 A Full Production Workflow

No demo. A real scenario: refactor an authentication module safely, with verification, measurable outcomes, under 3 hours.

$ iosm
IOSM CLI v0.1.3 [full]

you> /orchestrate --parallel --agents 4 \
     --profiles iosm_analyst,explore,iosm_verifier,full \
     --depends 3>1,4>2 --locks schema,config --worktree \
     Refactor auth module, verify security invariants, document changes

iosm> Team run started: #77
iosm> agent[1] architecture map complete
iosm> agent[2] implementation patch set prepared
iosm> agent[3] verification suite and rollback checks ready
iosm> agent[4] integration validation passed
iosm> Consolidated patch plan generated

→ Shift+Tab (switch to iosm profile)
→ /init
→ /iosm 0.95 --max-iterations 5

iosm> Baseline captured
iosm> Planned cycle from team artifacts: simplify auth module
iosm> Running improve -> verify -> optimize loop
iosm> Result: simplicity +18%, modularity +11%, performance +6%
iosm> Artifacts written to .iosm/cycles/2026-03-10-001/

Outcome: completed in ~2.5 hours. Measurable deltas. Full audit trail. Safe to present to the team and repeat next week.

📦 Install

npm install -g iosm-cli
cd your-project
iosm

# In session:
/doctor    # verify model + auth + tools are healthy

For maximum performance on large codebases:

# macOS
brew install ripgrep fd ast-grep comby jq yq semgrep

# Ubuntu/Debian
sudo apt-get install -y ripgrep fd-find jq yq sed

🌐 Open Spec, Open Runtime

The methodology is a separate, versioned specification: github.com/rokoss21/IOSM — formal definitions, schemas, artifact templates, quality gate validators.

The spec is the contract. The CLI is one implementation. Nothing stops you from running IOSM cycles in your CI, your own orchestrator, your custom tooling. The spec is the invariant.

One Last Thing

Most teams have already adopted some AI coding tool. Most have hit the ceiling: autocomplete works, quick boilerplate works, but anything requiring real coordination across sessions, files, or agents — breaks down.

The next gap in engineering teams won't be "do you use AI?" Everyone will. The gap will be how systematically.

Stop prompting. Start executing.

GitHub: github.com/rokoss21/iosm-cli

npm: npm install -g iosm-cli

Docs: github.com/rokoss21/iosm-cli/docs

IOSM Spec: github.com/rokoss21/IOSM

Swarm-IOSM: Orchestrating Parallel AI Agents with Quality Gates

rokoss21 — Mon, 19 Jan 2026 08:46:24 +0000

TL;DR: Swarm-IOSM is an orchestration engine for Claude Code that transforms complex development tasks into coordinated parallel work streams. It implements continuous dispatch scheduling (no wave barriers), hierarchical file lock management, and enforces IOSM quality gates before merge. Real-world speedup: commonly 3-8x faster than sequential execution.

The Parallel Agent Problem

You're working on a complex feature. It needs:

Codebase analysis to understand existing patterns
Architecture design for the new system
Implementation across 3 modules (independent)
Integration tests
Security audit

Traditional approach: One agent does everything sequentially. 15 hours of wall-clock time.

What if you could run analysis, design, and implementation in parallel? 4-6 hours.

But here's the catch: parallel AI agents need coordination. They can't all edit the same file. They need to share knowledge. And you need quality guarantees before merging their work.

That's what Swarm-IOSM solves.

What is Swarm-IOSM?

Swarm-IOSM is a Claude Code Skill that orchestrates parallel AI agent execution with built-in quality enforcement. It combines:

Continuous Dispatch Loop — Tasks launch immediately when dependencies are met (no artificial wave barriers)
File Lock Management — Hierarchical conflict detection prevents parallel write chaos
PRD-Driven Planning — Structured requirements → decomposition → execution
IOSM Quality Gates — Automated code quality, performance, and modularity checks
Auto-Spawn Protocol — Agents discover new work during execution

Core Model

Touches → Locks → Gates → Done

A correctness model for parallel agent work:

Declare what files you touch
Acquire locks to prevent conflicts
Pass quality gates
Ship

Key Innovation: Continuous Dispatch

Traditional orchestration waits for entire "waves" to complete:

Wave 1: [T01, T02, T03] → Wait for ALL to finish
Wave 2: [T04, T05]      → Can't start until Wave 1 done

Swarm-IOSM uses continuous scheduling:

T01 done → T04 starts IMMEDIATELY (even if T02, T03 still running)

This eliminates idle time and maximizes parallelism. Here's the dispatch algorithm:

while not gates_met:
    # 1. Collect ready tasks (deps satisfied, no conflicts)
    ready = [t for t in backlog if deps_satisfied(t) and not conflicts(t)]

    # 2. Classify by mode (background vs foreground)
    bg = [t for t in ready if can_auto_background(t)]
    fg = [t for t in ready if needs_user_input(t)]

    # 3. Dispatch batch (max 3-6 tasks)
    launch_parallel(bg[:6], mode='background')
    launch_parallel(fg[:2], mode='foreground')

    # 4. Monitor & spawn
    for report in collect_completed():
        spawn_candidates = parse_spawn_candidates(report)
        backlog.extend(deduplicate(spawn_candidates))

    # 5. Check gates
    if all_gates_pass():
        break

Result: Tasks launch as soon as they're ready, not when an arbitrary wave completes.

Live Example: Adding Redis Caching

Let's walk through a real track from examples/demo-track/.

Problem

API endpoint /api/natal/chart has 450ms P95 latency. Database CPU at 75% during peak hours.

Goal

Add Redis caching to reduce latency to <200ms and achieve 80%+ cache hit rate.

Step 1: Create Track

/swarm-iosm new-track "Add Redis caching to API endpoints"

Claude generates:

PRD.md — 10 sections (Problem, Goals, Requirements, Risks, IOSM Targets)
spec.md — Technical design with acceptance tests
plan.md — Task breakdown with dependencies

Generated plan (7 tasks):

T01: Analyze current performance (Explorer, 1h, read-only)
T02: Design caching strategy (Architect, 2h, foreground)
T03: Implement cache service (Implementer-A, 3h, background)
T04: Add caching to /natal endpoint (Implementer-B, 2h, background, after T03)
T05: Add caching to /transits endpoint (Implementer-C, 2h, background, after T03)
T06: Integration testing (TestRunner, 2h, background, after T04+T05)
T07: Security audit + merge (Integrator, 1h, foreground, after T06)

Step 2: Execute Plan

/swarm-iosm implement

Orchestrator creates continuous_dispatch_plan.md:

## Initial Ready Set
- T01 (Explorer, background)

## Expected Timeline
Batch 1: T01 → completes in 1h
Batch 2: T02 → completes in 2h (total: 3h)
Batch 3: T03 → completes in 3h (total: 6h)
Batch 4: T04, T05 (PARALLEL) → completes in 2h (total: 8h)
Batch 5: T06 → completes in 2h (total: 10h)
Batch 6: T07 → completes in 1h (total: 11h)

Serial estimate: 13h
Parallel estimate: 11h
Speedup: ~1.2x

But wait — T01 discovers an N+1 query issue:

## SpawnCandidates (from T01 report)

| ID | Subtask | Touches | Effort | Severity |
|----|---------|---------|--------|----------|
| SC-01 | Optimize calculate_aspects N+1 query | `backend/core/astro/natal.py` | M | medium |

Orchestrator auto-spawns SC-01 and adjusts timeline.

Step 3: Integration & Quality Gates

/swarm-iosm integrate demo-add-caching

Generated iosm_report.md:

## Gate Evaluation Summary

| Gate | Target | Final | Status |
|------|--------|-------|--------|
| Gate-I (Code Quality) | ≥0.75 | 0.89 | ✅ PASS |
| Gate-O (Performance) | Tests pass | All pass | ✅ PASS |
| Gate-M (Modularity) | No circular deps | Pass | ✅ PASS |
| Gate-S (Simplicity) | API stable | N/A | ⚪ SKIP |

IOSM-Index: 0.85 ✅ (threshold: 0.80)

**Result:** APPROVED FOR PRODUCTION MERGE

Results

⚡ P95 latency: 450ms → 180ms (60% improvement)
🎯 Cache hit rate: 82%
✅ All tests passing (24 unit + 6 integration)
🔒 Zero production errors during rollout
⏱️ Total time: 9.25h parallel vs 16h+ sequential (~1.7x faster)

Technical Deep Dive

1. File Lock Management

Challenge: How do you prevent two agents from editing the same file simultaneously?

Solution: Hierarchical lock manager with folder/file awareness.

Lock rules:

def conflicts(lock_a: str, lock_b: str) -> bool:
    a, b = normalize(lock_a), normalize(lock_b)
    # Exact match
    if a == b:
        return True
    # Folder contains file
    if a.startswith(b + '/') or b.startswith(a + '/'):
        return True
    return False

Example:

## Lock Plan

Tasks with overlapping touches (sequential only):
- `backend/core/__init__.py`: T03, T04 → ❌ Cannot run parallel
- `backend/api/`: T05, T06 → ❌ Folder conflict

Safe parallel execution:
- `backend/auth.py` (T02) + `backend/payments.py` (T07) → ✅ No overlap

Read-only tasks: Always parallel (no locks needed).

2. IOSM Quality Gates

Four gates enforce production-grade quality:

Gate-I: Improve (Code Quality)

semantic_coherence: ≥0.95  # Clear naming, no magic numbers
duplication_max: ≤0.05     # Max 5% duplicate code
invariants_documented: true # Pre/post-conditions
todos_tracked: true        # All TODOs in issue tracker

Measured by: AST analysis, clone detection, docstring coverage.

Gate-O: Optimize (Performance & Resilience)

latency_ms:
  p50: ≤100
  p95: ≤200
  p99: ≤500
error_budget_respected: true
chaos_tests_pass: true
no_obvious_inefficiencies: true  # N+1 queries, memory leaks

Measured by: Load testing (locust, k6), chaos engineering, profiling.

Gate-M: Modularize (Clean Boundaries)

contracts_defined: 1.0       # 100% of modules
change_surface_max: 0.20     # ≤20% of codebase touched
no_circular_deps: true
coupling_acceptable: true

Measured by: Dependency graph analysis, interface stability.

Gate-S: Shrink (Minimal Complexity)

api_surface_reduction: ≥0.20  # Or justified growth
dependency_count_stable: true
onboarding_time_minutes: ≤15

Measured by: Public API count, requirements.txt diff, README clarity.

IOSM-Index Calculation:

IOSM-Index = (Gate-I + Gate-O + Gate-M + Gate-S) / 4
Production Threshold: ≥ 0.80

Auto-spawn rules:

Gate-I < 0.75 → Spawn clarity/duplication fixes
Gate-O fails → Spawn test/performance fixes
Gate-M fails → Spawn boundary clarification tasks

3. Auto-Spawn Protocol

Problem: Agents discover issues during execution (e.g., N+1 queries, missing tests).

Solution: Structured SpawnCandidates section in reports.

Format:

## SpawnCandidates

| ID | Subtask | Touches | Effort | User Input | Severity | Dedup Key | Accept Criteria |
|----|---------|---------|--------|------------|----------|-----------|-----------------|
| SC-01 | Fix missing type annotation | `backend/auth.py` | S | false | medium | auth.py\|type-annot | mypy passes |
| SC-02 | Clarify API contract | `docs/api_spec.yaml` | M | true | high | api_spec\|contract | Contract approved |

Orchestrator actions:

Parse SpawnCandidates from completed task reports
Deduplicate by dedup_key (prevents duplicate work)
If needs_user_input=false and severity != critical → auto-spawn
If needs_user_input=true → Add to blocked queue
Run new tasks through planner and dispatch

Spawn protection: Budget limits (default: 20 auto-spawns per track) prevent infinite loops.

4. Cost Tracking & Model Selection

Model selection rules:

Model	Use Case	Cost (per 1M tokens)
Haiku	Read-only analysis	$0.25 / $1.25
Sonnet	Standard implementation	$3.00 / $15.00
Opus	Architecture, security	$15.00 / $75.00

Budget controls:

Default limit: $10.00 per track
⚠️ 80% usage → Warning
🛑 100% usage → Pause execution

Check current spend:

## Cost Tracking (from iosm_state.md)
- budget_total: $10.00
- spent_so_far: $6.50
- remaining: $3.50

Real-World Use Cases

1. Greenfield Feature (Email Notifications)

Task: Add complete email notification system to SaaS app

Plan:

T01: Design email templates (Architect, foreground)
T02: Implement SMTP service (Implementer-A, background)
T03: Add queue system (Implementer-B, background, parallel with T02)
T04: Write integration tests (TestRunner, background, after T02+T03)
T05: Add API endpoints (Implementer-C, background, after T02)

Results:

⚡ ~3x faster (4-6h parallel vs 12-15h sequential)
✅ 100% test coverage (Gate-O enforcement)
📉 Minimal technical debt (Gate-I: 0.92)

2. Brownfield Refactoring (Payment Module)

Task: Refactor legacy payment processing (5000+ LOC, 3 years old)

Workflow:

Plan mode: Explorer analyzes codebase (read-only, safe)
PRD with rollback strategy
Comprehensive regression tests (before touching code)
Parallel implementation (2 modules refactored simultaneously)
Gate-M fails: Circular dependency detected
Auto-spawn: "Break circular import between Payment and Invoice"
Re-check Gate-M: Pass ✅

Results:

🎯 Gate-driven quality — Forced resolution of hidden issues
🔒 Safe refactor — All tests passing before merge
📊 Measured improvement — 40% reduction in module coupling

3. Multi-Module Feature (Multi-Tenant Architecture)

Task: Add multi-tenancy (affects 8 modules)

Plan: 20+ tasks across 5 waves

Wave 1: T01 Design schema (Architect, critical path)
Wave 2: T02-T04 Database migrations (3 parallel implementers)
Wave 3: T05-T10 Update 6 modules (6 parallel implementers)
Wave 4: T11-T15 Tests (5 parallel test runners)
Wave 5: T16 Integration

Auto-spawn: 3 critical tasks discovered during execution

Results:

📈 High parallelism — 6 modules updated simultaneously
💰 Budget control — $6.50 spent (within $10 limit)
⏱️ Time savings — ~18h parallel vs 60h+ sequential

Getting Started (5 Minutes)

Installation

# Clone into Claude Code skills directory
git clone https://github.com/rokoss21/swarm-iosm.git .claude/skills/swarm-iosm

Verify: type /swarm-iosm in Claude Code.

Create Your First Track

/swarm-iosm new-track "Add user authentication with JWT"

Claude will:

Ask questions (mode: greenfield/brownfield, priorities, constraints)
Generate PRD (10 sections)
Create plan.md with task breakdown
Show orchestration plan

Execute

/swarm-iosm implement

Watch the magic:

Parallel agents launch automatically
Progress tracked in iosm_state.md
Reports appear in reports/ directory

Integrate

/swarm-iosm integrate <track-id>

Quality gates run automatically. You get iosm_report.md with pass/fail.

Commands Reference

Command	Description
`/swarm-iosm setup`	Initialize project context
`/swarm-iosm new-track "<desc>"`	Create feature track
`/swarm-iosm implement`	Execute plan (auto mode)
`/swarm-iosm status`	Check progress
`/swarm-iosm watch`	Live monitoring (v1.3)
`/swarm-iosm simulate`	Dry-run with timeline (v1.3)
`/swarm-iosm resume`	Resume after crash (v1.3)
`/swarm-iosm retry <task-id>`	Retry failed task (v1.2)
`/swarm-iosm integrate <id>`	Merge and run gates

What Swarm-IOSM is NOT

To set clear expectations:

❌ Not a general-purpose workflow engine — Designed specifically for Claude Code agent orchestration
❌ Not a replacement for CI/CD — Complements your pipeline, doesn't replace it
❌ Not a code generator "autopilot" — Requires human oversight and decision-making
❌ Not safe to run unattended on production repos — Always review changes before merge

Architecture Overview

┌──────────────────────────────────────────────────────────────────────┐
│                    ORCHESTRATOR (Main Claude Agent)                  │
│  ┌─────────────────────────────────────────────────────────────────┐ │
│  │              Continuous Dispatch Loop (v1.1+)                   │ │
│  │  ┌──────────┐  ┌──────────┐  ┌──────────┐  ┌──────────────────┐ │ │
│  │  │ Collect  │→ │ Classify │→ │ Conflict │→ │ Dispatch Batch   │ │ │
│  │  │  Ready   │  │  Modes   │  │  Check   │  │ (max 3-6 tasks)  │ │ │
│  │  └──────────┘  └──────────┘  └──────────┘  └──────────────────┘ │ │
│  │       ↑                                           │             │ │
│  │       │        ┌──────────┐  ┌──────────┐         ↓             │ │
│  │       └────────│  IOSM    │←─│ Auto-    │←────────┘             │ │
│  │                │  Gates   │  │ Spawn    │                       │ │
│  │                └──────────┘  └──────────┘                       │ │
│  └─────────────────────────────────────────────────────────────────┘ │
│                                   │                                  │
│               ┌───────────────────┼───────────────────┐              │
│               ↓                   ↓                   ↓              │
│  ┌────────────────────┐ ┌────────────────────┐ ┌─────────────────┐   │
│  │   Subagent (BG)    │ │   Subagent (BG)    │ │  Subagent (FG)  │   │
│  │   Explorer         │ │   Implementer-A    │ │  Architect      │   │
│  │   read-only        │ │   write-local      │ │  needs_user     │   │
│  └────────────────────┘ └────────────────────┘ └─────────────────┘   │
│               │                   │                   │              │
│               ↓                   ↓                   ↓              │
│         reports/T01.md      reports/T02.md      reports/T03.md       │
│         + SpawnCandidates   + SpawnCandidates   + Escalations        │
└──────────────────────────────────────────────────────────────────────┘

IOSM Framework Integration

Swarm-IOSM implements the IOSM methodology (Improve → Optimize → Shrink → Modularize) as an executable system:

┌────────────────────────────────────────────────────────────────────────────┐
│                           IOSM FRAMEWORK                                   │
│                   https://github.com/rokoss21/IOSM                         │
├────────────────────────────────────────────────────────────────────────────┤
│                                                                            │
│    ┌──────────┐    ┌──────────┐    ┌──────────┐    ┌──────────────────┐    │
│    │ IMPROVE  │ →  │ OPTIMIZE │ →  │  SHRINK  │ →  │   MODULARIZE     │    │
│    │          │    │          │    │          │    │                  │    │
│    │ Clarity  │    │ Speed    │    │ Simplify │    │ Decompose        │    │
│    │ No dups  │    │ Resil.   │    │ Surface  │    │ Contracts        │    │
│    │ Invars   │    │ Chaos    │    │ Deps     │    │ Coupling         │    │
│    └────┬─────┘    └────┬─────┘    └────┬─────┘    └────────┬─────────┘    │
│         │               │               │                   │              │
│    ┌────▼─────┐    ┌────▼─────┐    ┌────▼─────┐    ┌────────▼─────────┐    │
│    │ Gate-I   │    │ Gate-O   │    │ Gate-S   │    │     Gate-M       │    │
│    │ ≥0.85    │    │ ≥0.75    │    │ ≥0.80    │    │     ≥0.80        │    │
│    └──────────┘    └──────────┘    └──────────┘    └──────────────────┘    │
│                                                                            │
│    IOSM-Index = (Gate-I + Gate-O + Gate-S + Gate-M) / 4                    │
│    Production threshold: ≥ 0.80                                            │
└────────────────────────────────────────────────────────────────────────────┘

Version History

v2.1 (2026-01-19) — Current

Automated State Management (iosm_state.md auto-generated)
Status Sync CLI (--update-task)
Improved Report Conflict Detection

v2.0 (2026-01-18)

Inter-Agent Communication (shared_context.md)
Task Dependency Visualization (--graph)
Anti-Pattern Detection
Template Customization

v1.3 (2026-01-17)

Simulation Mode (/swarm-iosm simulate) with ASCII Timeline
Live Monitoring (/swarm-iosm watch)
Checkpointing & Resume (/swarm-iosm resume)

v1.2 (2026-01-16)

Concurrency Limits (Resource Budgets)
Cost Tracking & Model Selection (Haiku/Sonnet/Opus)
Intelligent Error Diagnosis & Retry (/swarm-iosm retry)

v1.1 (2026-01-15)

Continuous Dispatch Loop (no wave barriers)
Gate-Driven Continuation
Auto-Spawn from SpawnCandidates
Touches Lock Manager

Contributing

We welcome contributions! Key areas:

Gate Automation Scripts — Measure IOSM criteria automatically
CI/CD Integration — GitHub Actions, GitLab CI examples
Language-Specific Checkers — Python, TypeScript, Rust evaluators
More Examples — Real-world track demonstrations
IDE Integration — VS Code extension

See CONTRIBUTING.md for guidelines.

Conclusion

Swarm-IOSM proves that AI agent orchestration can be both fast (3-8x speedup through parallelism) and safe (quality gates before merge).

The continuous dispatch model eliminates artificial wave barriers, file lock management prevents conflicts, and IOSM gates enforce production-grade standards.

Key takeaway: Don't choose between speed and quality. With proper orchestration, you get both.

Try it today:

git clone https://github.com/rokoss21/swarm-iosm.git .claude/skills/swarm-iosm
/swarm-iosm new-track "Your next feature"

FACET: Contracts + Gates for LLM Systems

rokoss21 — Mon, 19 Jan 2026 08:28:53 +0000

Stop doing improv theatre in production. Ship agents like software.

Agentic tooling is moving fast: CLIs that edit repositories, frameworks that orchestrate swarms, tool-calling APIs everywhere. And still, most teams that try to run “agents” in production hit the same wall:

outputs drift between runs
“structured output” breaks at the worst moment
tool calls happen at the wrong time, with the wrong shape
debugging turns into story-time (“it worked yesterday…”)
trust collapses exactly when you need it most

The root cause isn’t that models aren’t smart enough.
It’s that we keep shipping non-contractual behavior.

This post argues a simple thesis:

Reliability in LLM systems doesn’t come from better prompts.
It comes from contracts and gates — with the system holding veto power.

FACET v2.0 is a compiler-grade, deterministic agent configuration language designed around that thesis: strict AST → type checking (FTS) → reactive compute (R-DAG) → deterministic context packing (Token Box Model) → canonical JSON render.

A short failure story: “theatre in production”

A team ships an “agentic PR bot”. It edits code, runs tests, and posts a confident summary.

One day the bot “fixes” an issue by adding a dependency. Tests pass locally. The PR merges.
In production, a transitive change triggers a locale/timezone edge case. A downstream service fails for a subset of users. Rollback takes hours because nobody can answer:

Was the agent allowed to introduce new dependencies?
Which tool calls did it run, with what arguments, in what order?
Can we replay the run?
What evidence exists beyond “agent said it’s fine”?

The bot didn’t “misbehave”. It acted exactly as designed: it operated without enforceable boundaries.

That’s the pattern: not “bad model”, but missing veto power.

Contracts and gates: the difference between a demo and a pipeline

Most agent stacks look like this:

Prompt + JSON hope → model writes → parse fails → retry culture → merge anyway

A contractual pipeline looks like this:

Contract → validate inputs + permissions → generate artifact → validate artifact → gates → commit (or reject)

Two key primitives make this real:

Contracts: define what’s allowed and what “valid” means
Gates: run reality checks (tests, security, perf) and block state changes

FACET makes both primitives first-class — not conventions, not best-effort prompts.

Part 1 — Contracts in FACET (real examples)

FACET v2.0 treats agent behavior as a compiled spec. That starts with strict structure and typing.

1) Tool contracts with `@interface` (typed tools, not “tool descriptions”)

In FACET, tools aren’t loose JSON blobs. They are typed interfaces that compile into provider tool schemas.

@interface WeatherAPI
  fn get_current(city: string) -> struct {
    temp: float
    condition: string
  }

@system
  tools: [$WeatherAPI]

This is a contract:

the tool name exists
args are typed (city: string)
return shape is typed (struct { temp: float, condition: string })
the compiler can emit canonical provider schemas during render

In practice, this eliminates a whole class of runtime failures: wrong arg names, wrong types, ambiguous “tool results”.

2) Inputs are explicit with `@input` (no hidden dependencies)

FACET forces you to declare runtime inputs in @vars via @input(...).

@vars
  user_query: @input(type="string")
  user_photo: @input(type="image", max_dim=1024)

This matters because:

missing input is not “guess it” — it’s an error
constraints (like image size) are enforced at runtime
inputs become leaf nodes in the R-DAG (deterministic dependency graph)

This is fail-closed engineering: if data isn’t provided, the system does not hallucinate a substitute.

3) Variables are reactive, deterministic, and immutable after compute (R-DAG)

FACET variables can depend on other variables. Evaluation happens via R-DAG in topological order; cycles and invalid orders are errors.

@vars
  raw_query: $user_query |> trim()
  query_lang: $raw_query |> detect_lang()
  normalized: $raw_query |> normalize(lang=$query_lang)

Key point: once computed, the variable map becomes immutable.
This makes runs reproducible and debuggable: the same inputs produce the same computed state (in Pure Mode).

4) Lenses have trust levels (Pure / Bounded / Volatile)

FACET introduces trust levels for transformations (lenses):

Level 0 — Pure: deterministic, no I/O
Level 1 — Bounded external: allowed only with deterministic params, cacheable
Level 2 — Volatile: nondeterministic, only in Execution Mode

A pipeline makes the contract explicit:

@vars
  summary: $normalized
    |> summarize(model="gpt-5.2", temperature=0)   # Level 1 (bounded)
    |> to_markdown()                               # Level 0 (pure)

This is where “determinism is a property of the system” becomes concrete.
If you’re in Pure Mode: you simply cannot smuggle volatility in “because it felt right”.

Part 2 — Gates in FACET (not vibes, executable checks)

A contract without gates is still fragile. Gates give the system the right to say: no.

FACET v2.0 includes a first-class testing system via @test.

5) Tests as executable gates with mocks and assertions (`@test`)

@test "basic greeting"
  vars:
    username: "TestUser"

  mock:
    WeatherAPI.get_current: { temp: 10, condition: "Rain" }

  assert:
    - output contains "umbrella"
    - cost < 0.01

This is CI thinking applied to agent specs:

tests execute the full 5-phase pipeline
tools can be mocked (deterministic runs)
assertions can check output and telemetry

In other words: “agent done” is not a feeling — it’s passing checks.

Part 3 — Deterministic context packing (Token Box Model) is a gate too

Even when contracts and tests exist, real systems fail because context is managed ad hoc. Prompts overflow, critical instructions get truncated, and the model “drifts” because the context layout changed.

FACET treats context like layout, not like concatenated strings.

6) Token Box Model: deterministic allocation + critical overflow as a hard failure

The model is simple:

your prompt is a set of sections (@system, @user, history, docs, etc.)
each section has min/grow/shrink/priority
critical sections are those with shrink == 0 and must never be dropped or compressed

If critical sections can’t fit, FACET raises a hard error (critical overflow).
This is a gate: the system refuses to ship an invalid prompt.

That single decision kills an entire class of “mysterious agent regressions” caused by silent truncation.

Part 4 — What “enforced before generation” actually means (no magic)

This phrase can sound controversial, so here’s the precise version:

FACET enforces a double barrier:

Before action (pre-check):
validate inputs, tool interfaces, allowed operations, budgets, deterministic mode constraints
Before state change (post-check):
validate produced artifacts, run gates, reject if any invariant breaks

So the flow is:

validate → generate → validate → gate → commit

This is how compilers and CI pipelines behave.
Production agent systems should do the same.

Part 5 — A small, concrete canonical output artifact

FACET’s final output is a canonical JSON structure (before provider-specific transformations). Here’s a simplified “what your orchestration layer can log and replay” shape:

{
  "meta": {"profile": "hypervisor", "mode": "pure"},
  "tools": [
    {"name": "WeatherAPI.get_current", "input_schema": {"city": "string"}}
  ],
  "sections_order": ["system", "tools", "history", "user"],
  "user": {"query": "what to wear today in Berlin?"},
  "gates": [
    {"gate": "tests_green", "pass": true},
    {"gate": "critical_overflow", "pass": true}
  ]
}

Notice the difference vs typical systems:

there is an explicit mode
tools are typed
section order is deterministic
gates and outcomes are visible
this is loggable and replayable

Part 6 — Tooling matters: the reference CLI (`fct`) makes this operational

FACET isn’t only a philosophy; it specifies tooling expectations. A reference CLI (fct) is part of the standard:

fct build file.facet — resolution + type checking
fct run file.facet --input input.json — full 5-phase pipeline → canonical JSON
fct test file.facet — execute @test blocks, report failures + telemetry
fct inspect ... — introspect AST/R-DAG/context allocation (debuggability)

When the language includes these operations, teams stop inventing bespoke glue.

Closing: stop shipping theatre — ship standards

LLMs are powerful components — but without enforceable boundaries they introduce entropy at the exact moment correctness, security, and reliability matter most.

Contracts + gates aren’t bureaucracy.
They’re the difference between a cool demo and a shippable system.

FACET’s core bet is simple:

Treat agent behavior like compiled software:
parse, type-check, compute deterministically, pack context deterministically, render canonical JSON — and never commit state unless gates pass.

Repositories

FACET Compiler: https://github.com/rokoss21/facet-compiler
FACET Standard: https://github.com/rokoss21/facet-standard

Parallel Agents Are Easy. Shipping Without Chaos Isn’t.

rokoss21 — Sun, 18 Jan 2026 09:14:24 +0000

Introducing Swarm-IOSM — a Parallel Subagent Orchestration Engine for Claude Code

Everyone is building multi-agent workflows now.

Swarm prompts. Agent teams. Tool calling. “Auto-developers”.

And yet… most of them collapse the moment you try to use them on real codebases.

Not because the models can’t code.

Because parallel development has two hard problems that prompt-chains don’t solve:

Safe concurrency (two agents writing into the same file is not “parallelism”, it’s a race condition)
Stop conditions (how do you know the result is shippable, not just “it ran”)

I built Swarm-IOSM to turn agent orchestration into an engineering discipline:
locks, dispatch scheduling, gates, and anti-chaos rules — executable, repeatable, and production-oriented.

GitHub: https://github.com/rokoss21/swarm-iosm

The Hidden Failure Mode of “Agent Swarms”

Here’s the truth nobody wants to say out loud:

Most “agent swarms” are just concurrency without a correctness model.

They don’t fail spectacularly. They fail quietly:

Agent A fixes a bug and touches auth.py
Agent B adds a feature and also touches auth.py
You merge both and discover behavior drift
The PR looks large, architecture degrades, confidence drops
Then the swarm spawns more tasks to “fix” the mess
Congratulations, you built a self-replicating backlog generator

The root cause is simple:

“Parallel agents” ≠ Parallel development

Parallel development requires conflict prevention, not conflict resolution.

Swarm-IOSM: IOSM Methodology + Execution Engine

IOSM is the methodology:

Improve → Optimize → Shrink → Modularize
A disciplined loop that forces engineering quality to remain measurable, not performative.

Swarm-IOSM is the execution engine:

PRD-driven decomposition
Continuous dispatch scheduling
File-conflict prevention via lock discipline
Auto-spawn protocol for discoveries
Quality gates as stop conditions

It’s not “a prompt”.

It’s a workflow runtime for parallel software development inside Claude Code.

The Architecture: An Orchestrator That Does Not Implement

Swarm-IOSM is intentionally designed around one rule:

The Orchestrator does NOT implement.

The main agent coordinates only.

All implementation work happens in subagents, each producing a report.

This is not a style preference — it’s a safety boundary.

When the orchestrator writes code, it stops being a scheduler and becomes “yet another contributor”, losing global coordination ability.

So Swarm-IOSM splits responsibilities cleanly:

Orchestrator = scheduling + gates + conflict check + state tracking
Subagents = execution + reports + spawn candidates

The Core Engine: Continuous Dispatch (No Wave Barriers)

Most orchestration frameworks work like this:

Prepare plan → run wave 1 → wait → run wave 2 → wait → merge

That’s not how software work actually flows.

Reality is continuous: tasks unblock tasks every minute.

Swarm-IOSM implements continuous dispatch scheduling:

tasks move through states: backlog → ready → running → done
as soon as dependencies are satisfied, tasks are eligible to run
you dispatch ready tasks immediately (no waiting for a “wave boundary”)

This is what makes it feel fast.

It maximizes parallelism without turning the repo into a battlefield.

The Missing Primitive: “Touches” Lock Manager

This is the centerpiece.

Swarm-IOSM treats a codebase like a shared memory system.

If agents are threads, then files are memory regions.

So Swarm introduces a primitive that classic “agent swarms” ignore:

Touches = the set of files/folders a task may modify.

Each task declares:

Touches: auth.py, services/auth/
Concurrency class:
- read-only (no locks, always safe)
- write-local (lock only touches)
- write-shared (exclusive, sequential)

Then Swarm enforces locks:

folder lock blocks everything inside it
file lock blocks only that file
read-only tasks remain parallel always

Result:

✅ real parallelism
✅ predictable merges
✅ no random collisions “because agent decided to edit config too”

Auto-Spawn… Without Infinite Task Proliferation

Auto-spawn sounds cool until you actually run it.

A naive swarm will spawn tasks forever.

Swarm-IOSM forces auto-spawn to be bounded and deduplicated:

spawn budget total
per-gate budgets
dedup key: <primary_touch>|<intent_category>
severity thresholds
anti-loop counters (max iterations without progress)

This is what transforms “agent creativity” into something you can safely run in an engineering process.

IOSM Gates: Stop Conditions That Mean Something

Most systems “stop” when tasks finish.

Swarm-IOSM stops when quality is achieved.

It tracks four gate families:

Gate-I (Improve)

Clarity, invariants, low duplication.

Gate-O (Optimize)

Latency budget, error budget, chaos checks, no obvious inefficiencies.

Gate-S (Shrink)

Surface area reduction, dependency stability, onboarding time.

Gate-M (Modularize)

Contracts, coupling limits, no circular dependencies.

Swarm is not just “agents executing tasks”.

It’s agents executing tasks until the system crosses a production threshold.

Quick Start (The Happy Path)

Swarm-IOSM lives here:

https://github.com/rokoss21/swarm-iosm

1) Install as a Claude Code skill

Project-level:

git clone https://github.com/rokoss21/swarm-iosm.git .claude/skills/swarm-iosm

User-level:

git clone https://github.com/rokoss21/swarm-iosm.git ~/.claude/skills/swarm-iosm

2) Initialize project context

/swarm-iosm setup

3) Create a feature track

/swarm-iosm new-track "Add user authentication with JWT"

Swarm generates PRD + plan and returns a track id like:

2026-01-17-001

4) Validate & generate a continuous dispatch plan

python .claude/skills/swarm-iosm/scripts/orchestration_planner.py \
  swarm/tracks/<track-id>/plan.md --validate

python .claude/skills/swarm-iosm/scripts/orchestration_planner.py \
  swarm/tracks/<track-id>/plan.md --continuous

5) Execute

/swarm-iosm implement

6) Integrate

/swarm-iosm integrate <track-id>

This produces integration artifacts and quality gate reporting.

Why This Is Different From “Yet Another Agent Framework”

This part matters.

Swarm-IOSM doesn’t compete with “prompt frameworks” by being smarter.

It wins by being stricter.

Swarm-IOSM treats a repo as a concurrency system.

Locks are not optional.

Swarm-IOSM treats quality as a stop condition.

No gates = no ship.

Swarm-IOSM treats spawn as a budgeted resource.

Infinite loops are a design bug, not “agent autonomy”.

You can replace models, providers, or toolchains.

But you can’t replace engineering discipline with vibes.

Real-World Fit: Where Swarm-IOSM Shines

Use Swarm-IOSM when:

multi-file features require coordination
brownfield refactoring needs guardrails
parallel implementation streams are valuable
acceptance criteria must exist (not “it compiles”)

Avoid Swarm-IOSM when:

it’s a single-file change
you want quick fixes without planning
you’re doing purely exploratory research

A hammer is not a screwdriver.

A swarm is not a substitute for architecture.

The Meta-Point: This Is Part of a Bigger Stack

I’m building a full deterministic engineering ecosystem around AI systems:

IOSM = methodology layer
Swarm-IOSM = execution/orchestration layer
FACET = deterministic contract layer for AI behavior

If you’ve read my FACET articles, you already know the thesis:

We don’t need “more prompting”.
We need engineering primitives: contracts, determinism, orchestration rules, replayable artifacts.

Swarm-IOSM is exactly that philosophy applied to parallel agent development.

Closing Thoughts

Parallel agents are not the hard part.

The hard part is shipping without chaos:

no file conflicts
no accidental coupling
no architecture collapse
no infinite spawn loops
gates that enforce engineering quality

Swarm-IOSM is my answer to that.

If you’re using Claude Code and you’ve ever tried to scale beyond a single agent — try it:

https://github.com/rokoss21/swarm-iosm

And if you want the next deep dive, I can write a follow-up:

the touches lock hierarchy rules
a demo track walkthrough
and how IOSM gates can be automated for CI.

History and Rationale of FACET

rokoss21 — Wed, 17 Dec 2025 00:01:11 +0000

Purpose of This Document

This document records the historical context, architectural motivations, and rationale behind the design decisions of FACET.

It exists to answer a recurring future question:

Why was FACET designed this way, and not differently?

This is not a changelog and not a roadmap.
It is a rationale document intended for:

future maintainers
standard reviewers
enterprise architects
historians of AI infrastructure

1. Pre-FACET Era (≈ 2018–2022)

1.1 Prompt Engineering as an Anti-Pattern

Early LLM systems treated prompts as:

opaque strings
mutable runtime artifacts
informal contracts

As systems grew, prompt engineering evolved into:

copy-paste templates
ad-hoc retries
regex-based JSON extraction
post-hoc validation

Failures were handled after generation, not prevented.

This era established a false assumption:

LLM unreliability is inherent and unavoidable.

1.2 Structured Output Did Not Solve the Core Problem

Later approaches introduced:

JSON schemas in prompts
function / tool calling APIs
Pydantic-style validators

However:

schemas were advisory, not enforced
providers interpreted constraints differently
invalid states were still produced
validation happened after the model responded

The system still allowed invalid intermediate states.

2. FACET v1.x (2022–2024): Lessons Learned

FACET v1.x originated as a deterministic prompt templating system.

It introduced:

structured blocks
conditional logic
early lens pipelines
canonical JSON output

2.1 What v1.x Got Right

determinism mattered
canonical JSON enabled caching and diffing
composition beat monolithic prompts

2.2 What v1.x Could Not Solve

no type system
no execution model
no formal notion of invalid state
no prevention of tool-call failures

FACET v1.x reduced chaos, but did not eliminate it.

3. The Breaking Point (2024–2025)

By 2024, several systemic failures became unavoidable:

multi-tool agents failing nondeterministically
provider-specific tool-call rules causing silent breakage
streaming vs non-streaming divergence
context truncation corrupting logic
retries masking correctness bugs

At scale, these failures were:

expensive
non-reproducible
impossible to audit

The industry response remained reactive:

Add retries. Add validators. Add guardrails.

This approach did not converge.

4. The Core Insight

FACET v2.0 is built on a single foundational realization:

You cannot build reliable systems on top of nondeterministic contracts.

The problem was not LLMs.
The problem was lack of a contract layer.

5. FACET v2.0 (2025): A Structural Reset

FACET v2.0 was intentionally designed as:

a compiler, not a template engine
a contract system, not a helper library
an execution model, not a runtime patch

5.1 Determinism as a System Property

FACET does not attempt to make models deterministic.

Instead:

invalid states are prevented upstream
contracts are enforced before execution
outputs are canonicalized

Determinism is achieved by architecture, not by probability control.

5.2 Canonical JSON as Intermediate Representation

FACET introduced Canonical JSON as its IR:

provider-neutral
hash-stable
diff-friendly
replayable

This decouples:

authoring
execution
provider rendering

and prevents vendor lock-in.

5.3 Execution Phases and R-DAG

FACET formalized execution into five phases:

Resolution
Type Checking
Reactive Compute (R-DAG)
Layout (Token Box Model)
Render

This eliminated:

implicit execution order
hidden side effects
runtime guesswork

5.4 Token Box Model

Context handling was redefined as:

a resource allocation problem
with explicit priorities
deterministic compression rules

This replaced:

truncation heuristics
"best effort" packing
silent loss of critical data

5.5 Adapters as Pure Translators

Adapters were intentionally constrained:

no logic
no inference
no recovery

This preserves:

auditability
replayability
long-term stability

6. Rejected Alternatives (By Design)

FACET explicitly rejected:

probabilistic retries
self-healing prompts
adaptive prompt rewriting
runtime schema repair

These techniques obscure failure rather than eliminate it.

7. Long-Term Positioning

FACET is designed to age like:

LLVM
SQL
JSON Schema

Not like:

an agent framework
a vendor SDK
a prompt toolkit

It is intended to remain:

boring
strict
predictable

for decades.

8. Historical Attribution

FACET — Deterministic Contract Layer (since 2025)

Author: Emil Rokossovskiy (rokoss21)

The central idea predates industry consensus.

When determinism became urgent, the architecture already existed.

Status

This document is informative.

It does not define new requirements, but explains why the requirements exist.

End of document.

FACET Glossary

rokoss21 — Tue, 16 Dec 2025 23:50:17 +0000

This glossary defines normative terminology used across the FACET standard, specification, and ecosystem documents.

All terms listed here are intended to be interpreted consistently across implementations, adapters, documentation, and discussions.

FACET

FACET — A deterministic contract layer and language (NADL) for defining, validating, and executing AI system behavior.

FACET treats AI behavior as compiled software, not probabilistic improvisation.

Determinism

Determinism — The property that identical inputs produce identical outputs.

In FACET, determinism is defined at the system level, not at the model level.

Determinism applies to:

execution order
context layout
tool-calling semantics
canonical JSON output

Contract

Contract — A formally defined, enforced agreement describing:

valid inputs
valid outputs
execution constraints
resource bounds

A contract differs from a schema in that it is enforced before execution, not merely validated after generation.

Contract Layer

Contract Layer — The architectural boundary where AI behavior is constrained, validated, and canonicalized.

The contract layer prevents invalid states from entering execution.

FACET implements a contract layer via:

types (FTS)
interfaces
execution phases
Canonical JSON

NADL (Neural Architecture Description Language)

NADL — A declarative language used to describe AI system architecture, behavior, and constraints.

FACET v2.0 is a NADL.

Canonical JSON

Canonical JSON — A deterministic, normalized intermediate representation (IR) produced by FACET.

Canonical JSON is:

provider-agnostic
structurally stable
hashable
replayable

It is the single source of truth for execution, caching, testing, and auditing.

IR (Intermediate Representation)

Intermediate Representation (IR) — A normalized internal form used between compilation stages.

Canonical JSON serves as FACET’s IR, analogous to LLVM IR.

AST (Abstract Syntax Tree)

AST — A structured representation of a parsed FACET document.

The AST is:

immutable after type checking
the input to R-DAG construction

FTS (Facet Type System)

Facet Type System (FTS) — A strict, language-neutral type system used by FACET.

FTS governs:

variable types
tool interfaces
lens signatures
multimodal values

Interface (`@interface`)

Interface — A typed contract defining a callable tool.

Interfaces specify:

function name
parameters
return type

Interfaces compile into provider-specific tool schemas.

Lens

Lens — A transformation function applied to values within FACET.

Lenses are categorized by trust level:

Level 0 — Pure (fully deterministic)
Level 1 — Bounded external (deterministic under constraints)
Level 2 — Volatile (non-deterministic)

R-DAG (Reactive Dependency Graph)

R-DAG — A directed acyclic graph representing variable dependencies.

R-DAG guarantees:

no cycles
deterministic evaluation order
single execution per node

Token Box Model

Token Box Model — A deterministic algorithm for context allocation under token budgets.

It defines:

critical vs flexible sections
compression rules
drop order

Adapter

Adapter — A pure translation layer that maps Canonical JSON to provider-specific payloads.

Adapters:

MUST be deterministic
MUST NOT add logic
MUST NOT mutate semantics

Adapters are translators, not collaborators.

Provider Payload

Provider Payload — The final request format required by a specific AI provider API.

Provider payloads are derived views of Canonical JSON.

Pure Mode

Pure Mode — An execution mode in which all behavior is fully deterministic.

Pure Mode forbids:

randomness
unrestricted I/O
volatile lenses

Pure Mode outputs are canonical.

Execution Mode

Execution Mode — A permissive mode allowing volatile lenses and external side effects.

Execution Mode outputs are not canonical.

Snapshot Testing (Golden Tests)

Snapshot Testing — A testing method where output is compared against a known-good snapshot.

FACET uses snapshot testing for:

Canonical JSON
adapter outputs
regression detection

Vendor Lock-in

Vendor Lock-in — Dependency on a specific provider’s undocumented or unstable behavior.

FACET mitigates vendor lock-in by:

enforcing provider-agnostic Canonical JSON
isolating provider logic in adapters

Reproducibility

Reproducibility — The ability to replay executions and obtain identical results.

Reproducibility in FACET is defined by:

FACET document
inputs
execution mode
Canonical JSON

Invalid State

Invalid State — Any state that violates a contract, type, constraint, or execution rule.

FACET prevents invalid states before execution.

Summary

This glossary defines the shared language of the FACET ecosystem.

Correct use of these terms is required for:

specification compliance
adapter implementation
meaningful technical discussion

Status: Normative reference document

FACET vs Existing Approaches

rokoss21 — Tue, 16 Dec 2025 23:43:51 +0000

Status: Informative (but engineering-focused)

This document positions FACET — Deterministic Contract Layer (since 2025) against common industry approaches for structured outputs, tool-calling, and agent orchestration.

FACET’s core thesis is simple:

Reliability is not a prompt property. It’s a system property.

Most stacks attempt to “coerce” reliability after the model produces an invalid state (validators, retries, repair prompts). FACET enforces validity before generation through compilation, typing, canonicalization, deterministic layout, and replayable artifacts.

Executive Map

FACET is best understood as:

A standard (spec + conformance levels)
A compiler (AST → type-check → R-DAG → Token Box → Canonical JSON)
A contract boundary (tool schema + deterministic context + replay)
A provider-decoupling layer (Canonical JSON as IR; adapters as views)

If you already have an agent stack, FACET is not competing with your business logic.
It competes with the fragile parts: prompt glue, schema drift, provider quirks, ad hoc truncation, and non-replayable runs.

Comparison Table (High Signal)

Approach	What it actually is	Strengths	Failure mode	What FACET adds
“JSON schema in prompt”	Best-effort instruction	Simple, low overhead	Model deviates; post-hoc repair	Compile-time contracts + deterministic rejection
SDK tool/function calling	Vendor tool schema + runtime loop	Good DX, integrations	Provider quirks; invalid arguments; streaming drift	Canonical contracts + deterministic sequencing constraints
Pydantic validation (post-hoc)	Runtime validation of model output	Strong typing; great errors	You already paid for a bad sample; retries/repair loops	Prevent invalid states upstream; replayable artifacts
Instructor / Guardrails / Output Fixers	Validators + repair prompting	Practical mitigation	“Fixing” can mutate semantics; non-deterministic	Deterministic compilation + stable IR for audits
Agent frameworks (LangChain, etc.)	Orchestration + memory + tools	Fast iteration	Hidden heuristics; brittle prompt stacks	Standard contracts + canonical execution model
“We just retry”	Operational band-aid	Sometimes works	Cost blowups; latency; silent drift	Deterministic success criteria; lower ops burden

Note: FACET does not replace providers, SDKs, or orchestration frameworks. It standardizes the contract boundary they all currently treat as “best-effort”.

1) FACET vs “JSON Schema in the Prompt”

What the industry does

Teams paste JSON Schema (or a shape description) into system instructions and hope the model follows it.

Why it breaks

A schema in a prompt is advisory, not enforceable.
The model can:
- omit required fields
- output wrong types
- hallucinate keys
- emit extra commentary
- violate nested constraints
When it fails, systems react with:
- “Try again”
- repair prompts
- regex hacks

What FACET does differently

FACET turns schemas into typed contracts enforced by:

FTS (Facet Type System)
Phase ordering (compile-time checks before render)
Canonical JSON (stable structure, explicit nulls)
Adapter boundary (provider view derived from the same IR)

Result: a run either produces a valid canonical state or fails before it pollutes downstream execution.

2) FACET vs Vendor SDK Tool Calling

What the industry does

Use tool/function calling via OpenAI / Anthropic / Gemini SDKs.

Why it still breaks in production

Even when using “structured tools”, you face:

provider-specific sequencing constraints
tool name casing or normalization differences
streaming vs non-streaming inconsistencies
serialization and “invisible dict args” class bugs
subtle incompatibilities between SDK helpers

What FACET adds

FACET treats provider constraints as first-class compile-time inputs.

Interfaces (@interface) are typed and validated.
Provider constraints are captured during compilation (targeting profiles).
Adapters are passive translators (no repair).

You still use the provider SDK. FACET makes the tool-calling boundary deterministic and replayable.

3) FACET vs Pydantic (Post-hoc Validation)

What Pydantic gives you

Pydantic is excellent at validating Python values against types/models.

Where it cannot help alone

Validation after the model output is already generated means:

you still pay latency/cost for invalid samples
you need retry/repair loops
behavior diverges across providers/modes
multi-tool chains fail in the middle

FACET’s shift in order-of-operations

Instead of:

generate
validate
retry

FACET pushes reliability upstream:

compile contracts
constrain generation
reject invalid states before they execute

Pydantic remains useful inside the host application — FACET complements it by preventing invalid tool states and making runs replayable.

4) FACET vs Guardrails / “Output Fixers” / Repair Prompts

What these tools do well

They reduce pain quickly by:

validating outputs
re-asking the model to correct mistakes
forcing JSON-only responses

The hidden risk

Repair systems can:

mutate meaning while “fixing” structure
introduce non-determinism (different fix attempts)
hide root-cause: your contract boundary is porous

FACET’s principle

A contract layer should not patch.
It should prevent.

FACET is compatible with guardrails, but flips the default: deterministic compilation and canonicalization first; “repair” becomes optional and explicitly non-canonical.

5) FACET vs Agent Frameworks

Agent frameworks are essential for orchestration.
FACET does not compete with:

routing
memory strategies
tool registries
business workflows

FACET competes with:

prompt sprawl
undocumented heuristics
non-replayable execution
vendor lock-in at the message/tool boundary

FACET can be used as a contract boundary inside any framework.

6) FACET vs “Just Use Retries”

Retries are the industry’s default reliability strategy.

Why retries are a tax

latency increases non-linearly
cost becomes unpredictable
partial failures pollute state
error handling grows faster than features

FACET’s alternative

deterministic failure boundaries
canonical replay
stable hashing and caching
snapshot-based regression tests

Operationally, this reduces the “unknown unknowns” that appear at scale.

Canonical JSON as IR

FACET’s Canonical JSON is the IR that makes everything else possible:

Diffability: stable diffs between runs
Hashing: stable cache keys
Replays: deterministic reproduction of incidents
Audits: exact historical payload reconstruction
Vendor switching: adapters render provider payloads as views

In compiler terms:

.facet = source
Canonical JSON = IR
Provider payloads = target-specific codegen

LLVM Analogy (For Future Readers)

FACET deliberately follows a compiler architecture familiar to systems engineers:

Source language: .facet
Front-end: parse → AST → type-check (FTS)
Mid-end: deterministic evaluation graph (R-DAG)
Resource allocator: Token Box Model (context algebra)
IR: Canonical JSON (stable, hashable)
Back-ends: provider adapters (OpenAI/Anthropic/Gemini/etc.)

Just as LLVM enabled multiple backends from one IR, FACET enables multiple providers from one canonical contract.

When to Use FACET

FACET is strongest when:

tool chains are multi-step
failures are expensive
you need deterministic replay
you must support multiple providers
you need formal governance (tests, audits, compliance)

If your use case is a single prompt in a toy script, FACET may be overkill.
If your use case is production agents, it becomes a reliability layer.

Practical Adoption Paths

Path A — Contract Boundary Only

keep your existing framework
introduce FACET only for:
- interface contracts
- canonical JSON
- snapshot tests

Path B — Deterministic Runs for CI

use @test + canonical snapshots
regress tool schemas and prompts without hitting production

Path C — Full Hypervisor Profile

R-DAG variables
Token Box Model
deterministic caching and replay
adapters per provider

Summary

FACET is not trying to be “one more wrapper.”
It is a standard and compiler that:

replaces best-effort schemas with enforceable contracts
makes context layout deterministic
makes runs replayable and auditable
isolates vendor churn behind adapters

When reliability becomes urgent, the solution should already be written.

FACET — Deterministic Contract Layer (since 2025)

Compliance Levels

rokoss21 — Tue, 16 Dec 2025 23:35:38 +0000

Purpose

This document defines compliance levels for FACET-related implementations.

While the FACET v2.0 specification defines what is correct, compliance levels define how completely a given component (compiler, adapter, runtime, SDK integration) adheres to the FACET contract model.

This allows the ecosystem to:

distinguish partial integrations from full implementations
avoid false claims of determinism
set clear expectations for enterprise use
evolve the standard without breaking attribution or trust

Compliance levels are declarative and auditable.

Core Principle

Not all FACET integrations are equal — and that must be explicit.

A component MUST declare its compliance level.

Silently claiming "FACET-compatible" without meeting the requirements of a level is considered non-compliant.

Compliance Levels Overview

FACET defines four compliance levels:

Level	Name	Scope
L0	Conceptual	Documentation / ideas only
L1	Structural	Canonical JSON & schema adherence
L2	Deterministic	Full determinism & reproducibility
L3	Reference	Spec-complete, reference-grade

Level 0 — Conceptual Compliance (L0)

Audience: blog posts, design docs, experimental prototypes

Definition

The implementation:

references FACET concepts (contracts, determinism, Canonical JSON)
does NOT implement formal compilation or guarantees

Allowed Claims

"FACET-inspired"
"FACET concepts applied"
"Contract-based approach"

Forbidden Claims

deterministic execution
reproducibility guarantees
FACET-compatible

Notes

L0 is not an implementation level.
It exists to allow discussion without misleading users.

Level 1 — Structural Compliance (L1)

Audience: SDK extensions, tooling, lightweight integrations

Definition

The implementation:

produces or consumes Canonical JSON
follows canonical ordering and explicit null rules
enforces schema shape stability

Required Properties

stable key ordering
explicit null for missing optional fields
deterministic serialization

Non-Requirements

full R-DAG execution
Token Box Model
strict determinism across runs

Allowed Claims

"FACET-compatible (structural)"
"Canonical JSON compliant"

Common Examples

logging / auditing tools
snapshot testing harnesses
visualization layers

Level 2 — Deterministic Compliance (L2)

Audience: production agent systems, enterprise deployments

Definition

The implementation:

fully enforces deterministic execution
produces identical Canonical JSON for identical inputs
rejects invalid states before provider execution

Required Properties

strict Facet Type System (FTS)
deterministic R-DAG execution
deterministic Token Box Model layout
canonical JSON as the single source of truth
no retries as a correctness mechanism

Guarantees

reproducible outputs
stable hashing
replayable executions
deterministic failure modes

Allowed Claims

"Deterministic"
"FACET-compliant"
"Reproducible agent execution"

Level 3 — Reference Compliance (L3)

Audience: standards bodies, auditors, long-term infrastructure

Definition

The implementation:

satisfies all FACET v2.0 normative requirements
passes the official FACET golden test suite
is suitable as a reference implementation

Required Properties

full spec coverage (all execution phases)
golden tests with published fixtures
strict adapter requirements
hermetic execution guarantees
documented versioning and change history

Privileges

Only L3 implementations may claim:

"FACET Reference Implementation"
"Spec-complete"
"FACET Standard"

Adapters and Compliance

Provider adapters have their own compliance axis.

An adapter may be:

L1 compliant (structural mapping only)
L2 compliant (deterministic mapping + golden tests)

Adapters can never be L3 on their own.
They inherit system-level compliance.

Misrepresentation Clause

Claiming a higher compliance level than implemented is a spec violation.

Non-compliant claims:

"Deterministic" without reproducibility
"FACET-compatible" without Canonical JSON
"Standard" without spec coverage

Such claims invalidate trust and interoperability.

Rationale

Compliance levels exist to prevent:

marketing-driven overclaims
partial integrations masquerading as standards
ecosystem fragmentation

A deterministic contract layer only works if trust is explicit.

Summary

FACET compliance is not binary.

It is tiered, explicit, and enforceable.

If a system does not declare its compliance level, it has none.

Status

This document defines normative compliance levels for the FACET ecosystem.

Adapter Requirements

rokoss21 — Tue, 16 Dec 2025 23:31:04 +0000

Purpose

This document defines normative requirements for FACET-compatible provider adapters.

Adapters are the only layer allowed to translate Canonical JSON into provider-specific payloads (OpenAI, Anthropic, Gemini, local runtimes, etc.).

They exist to map, not to interpret, fix, enrich, or re‑decide behavior.

Adapters are translators, not collaborators.

Core Principle

Adapters MUST be behaviorally passive.

They MUST NOT:

introduce new logic
infer missing data
reorder execution semantics
apply provider-specific heuristics
silently recover from invalid states

All intelligence, validation, and determinism belong above the adapter boundary.

Architectural Position

FACET enforces a strict layered architecture:

.facet document
      ↓
Typed AST
      ↓
R-DAG execution
      ↓
Token Box Model
      ↓
Canonical JSON   ← SINGLE SOURCE OF TRUTH
      ↓
Provider Adapter
      ↓
Provider Payload

Adapters operate only on Canonical JSON.
They MUST NOT accept partially-compiled or provider-shaped inputs.

Mandatory Adapter Properties

A compliant adapter MUST satisfy all of the following.

1. Deterministic Mapping

Given identical Canonical JSON input:

adapter output MUST be byte-for-byte identical
mapping MUST be pure and stateless
no randomness, clocks, environment state, or I/O allowed

Adapters MUST be referentially transparent functions:

output = adapter(canonical_json)

2. No Semantic Repair

Adapters MUST NOT attempt to "fix" provider constraints by modifying semantics.

Forbidden behaviors include:

renaming tools to match provider casing quirks
injecting missing fields
reordering messages to satisfy undocumented rules
splitting or merging tool calls

If Canonical JSON violates a provider constraint, the adapter MUST fail loudly.

Silent recovery is corruption.

3. Provider Constraints Are Declarative Inputs

All provider-specific constraints MUST be declared upstream, during compilation.

Examples:

required tool-call turn ordering
serialization restrictions
streaming limitations
tool name casing rules

Adapters may only apply constraints that were already resolved into Canonical JSON.

They MUST NOT discover or infer constraints dynamically.

This requirement implies that provider targeting is an explicit compilation choice
(e.g. target = "gemini", profile = "strict_chat"), not a runtime adaptation.

Adapters MUST NOT compensate for missing or incorrect target selection.

4. One-to-One Structural Mapping

Adapters MUST preserve structure:

one canonical tool → one provider tool definition
one canonical message → one provider message
explicit null fields MUST remain explicit

Adapters MUST NOT:

collapse multiple messages
expand single messages
drop empty or null fields

5. Failure Containment

Adapters MUST be a failure boundary.

If a provider:

rejects a payload
changes undocumented behavior
introduces breaking changes

The failure MUST:

surface as an adapter error
NOT mutate Canonical JSON
NOT poison caches or history

Canonical JSON remains valid and replayable.

Explicit Prohibitions

Adapters MUST be safe to execute in zero-trust environments.

Adapters MUST NOT:

perform validation (already done by compiler)
run type checks
execute lenses
call LLMs
fetch external resources
access filesystem or network

Adapters are not execution engines.

Versioning Requirements

Adapters MUST:

declare supported Canonical JSON version(s)
fail on incompatible versions
be forward-incompatible by default

This prevents silent misinterpretation of newer contracts.

Testing Requirements

Every adapter implementation MUST include:

Golden tests

Canonical JSON input → exact provider payload snapshot

Negative tests

invalid Canonical JSON → deterministic failure

Round-trip safety

adapter output MUST NOT affect canonical replay hashes

Snapshot tests MUST be stable across environments.

Relationship to Reproducibility

Adapters MUST NOT compromise reproducibility guarantees.

Reproducibility is defined entirely by:

FACET document
inputs
execution mode
Canonical JSON

Adapters are excluded from the reproducibility contract.

They are replaceable.

Design Rationale

Why adapters are intentionally constrained:

to prevent vendor lock-in
to localize API churn
to enable long-term replay and auditing
to keep the compiler authoritative

Once adapters are allowed to "help", determinism collapses.

Summary

Adapters exist to answer one question only:

"How does this Canonical JSON look in this provider’s dialect?"

Anything beyond that violates the contract.

Status

This document defines normative requirements for FACET-compatible adapters.

Any adapter violating these rules is non-compliant by design.

Tool-Calling Failure Modes

rokoss21 — Tue, 16 Dec 2025 23:23:00 +0000

Purpose

This document catalogs real, recurring failure modes observed in LLM tool-calling systems across major providers and agent frameworks.

Its goal is to:

make failures explicit and enumerable
demonstrate that these failures are systemic, not user error
show why post-hoc validation and retries are structurally insufficient
define the problem space a deterministic contract layer must solve

This is not a critique of any single provider.
It is a taxonomy of failure modes that emerge when probabilistic generation is asked to satisfy implicit contracts.

Core Observation

Tool calling today fails not because models are weak, but because:

Tool contracts are implicit, informal, and enforced only after generation.

LLMs are expected to infer:

schema shape
parameter types
tool names
sequencing rules
provider-specific constraints

…without those constraints being part of the execution model.

The result is a predictable set of failure classes.

Failure Class 1: Schema Shape Violations

Description

The model produces a tool call whose JSON structure does not match the declared schema.

Examples

missing required fields
extra unexpected fields
wrong nesting depth
arrays where objects are expected

Real-World Symptoms

Pydantic validation errors
silent field dropping
runtime exceptions after generation

Why Retries Fail

Retries re-sample from the same unconstrained distribution.
They reduce probability but do not eliminate invalid states.

Failure Class 2: Type Mismatches

Description

The model emits values of the wrong type for otherwise valid fields.

Examples

numbers as strings ("42" instead of 42)
booleans as text ("true")
objects serialized as strings

Real-World Symptoms

deserialization failures
silent coercion bugs
inconsistent behavior across SDKs

Root Cause

Schemas exist only as instructions, not as constraints on generation.

Failure Class 3: Tool Name Drift

Description

The model references a tool name that does not exactly match the declared identifier.

Examples

casing drift (process_payment → Process_Payment)
partial names (search → search_docs)
hallucinated tool names

Impact

downstream dispatch failure
silent no-op behavior
hard-to-debug agent stalls

Failure Class 4: Parameter Visibility Loss

Description

Certain parameter shapes are ignored or dropped by provider APIs or SDK layers.

Examples

dict arguments not visible to OpenAI-powered agents
binary payloads failing serialization

Impact

tools invoked with incomplete inputs
agents behaving inconsistently between sync and stream modes

Root Cause

Mismatch between:

declared tool schema
provider transport format
SDK serialization logic

Failure Class 5: Sequencing Violations

Description

The model produces a valid-looking tool call at an invalid point in the conversation.

Examples

Gemini requiring tool calls immediately after user or tool response turns
tool calls emitted after assistant messages

Symptoms

provider-side INVALID_ARGUMENT errors
conversation reset or termination

Why This Is Fundamental

Sequencing rules are provider-specific and not visible to the model.

Failure Class 6: Streaming vs Non-Streaming Drift

Description

The same agent behaves differently in streaming and non-streaming modes.

Examples

tool calls appearing only in one mode
different output shapes
missing final tool invocation

Impact

non-reproducible behavior
broken production parity

Failure Class 7: Multi-Tool Chain Collapse

Description

Agents fail when chaining multiple tools in a single reasoning flow.

Symptoms

early termination
partial execution
invalid intermediate state

Root Cause

Each tool call compounds uncertainty.
Without contracts, error probability grows multiplicatively.

Failure Class 8: Context-Induced Tool Corruption

Description

Tool calls degrade as context grows or is truncated heuristically.

Examples

truncated tool schema
partial parameter emission
hallucinated defaults

Root Cause

Context overflow handled by truncation, not allocation.

Why Validation and Retries Cannot Fix This

Post-generation validation:

detects invalid states after they exist
cannot prevent invalid intermediate steps
cannot guarantee convergence

Retries:

reduce probability
increase cost
do not change the state space

This is equivalent to catching compiler errors at runtime.

The Missing Layer

All listed failures share one property:

They occur because tool contracts are not part of the execution model.

A deterministic system must:

encode schema, types, and sequencing before generation
reject invalid states before emission
treat provider constraints as first-class

This is the problem space FACET addresses at the contract layer.

Status

This document defines an informative but implementation-grounded taxonomy of tool-calling failures.

It is intended to support:

adapter design
contract systems
future standardization efforts

The failures described here are not hypothetical.
They are observed, reproducible, and systemic.

Token Box Model

rokoss21 — Tue, 16 Dec 2025 23:17:42 +0000

Purpose

The Token Box Model defines a deterministic context allocation algorithm for LLM execution.

Its purpose is to replace ad-hoc truncation, heuristic compression, and retry-based prompt handling with a formal, reproducible layout model.

In FACET, context is not a side-effect of string concatenation — it is a compiled artifact.

Problem Statement

Modern LLM systems fail under context pressure because:

token limits are enforced late (after prompt assembly)
truncation is implicit and non-deterministic
critical instructions may be silently dropped
different runs drop different parts of context
provider tokenizers behave differently

This leads to:

non-reproducible agent behavior
debugging instability
production-only failures

The Token Box Model addresses this by making context layout explicit, typed, and deterministic.

Core Concept

The context is treated as a finite-capacity container with a fixed token budget.

Each logical block of prompt data is represented as a Section with explicit layout constraints.

The compiler is responsible for fitting all sections into the available budget without violating invariants.

Section Definition

Each Section has the following properties:

Field	Type	Description
`priority`	int	Removal order (lower = dropped earlier)
`base_size`	int	Token count after render
`min`	int	Minimum guaranteed size
`grow`	float	Weight for expansion
`shrink`	float	Weight for compression
`strategy`	LensPipeline	Compression strategy

Critical Sections

A Section is Critical if:

shrink == 0

Critical sections:

MUST NOT be compressed
MUST NOT be truncated
MUST NOT be dropped

If all critical sections do not fit, execution MUST fail.

Deterministic Algorithm

Let:

S = all sections
B = token budget
size[i] = base_size of section i

Step 1 — Fixed Load

Critical = { i | shrink[i] == 0 }
FixedLoad = sum(size[i] for i in Critical)

If:

FixedLoad > B

→ FAIL with ContextCriticalOverflow

Step 2 — Free Space

FreeSpace = B - FixedLoad

Step 3 — Expansion (Optional)

Expandable = { i | grow[i] > 0 }

FreeSpace MAY be distributed proportionally:

extra[i] = FreeSpace * (grow[i] / sum(grow))

Step 4 — Compression

If total size exceeds budget:

Deficit = total_size - B
Flexible = { i | shrink[i] > 0 }
Sort Flexible by (priority ASC, shrink DESC)

For each section:

Apply compression strategy
Recompute size
Truncate to min if needed
Drop section if still oversized

Stop when Deficit <= 0.

Determinism Guarantees

Given:

identical sections
identical priorities
identical token budget

The resulting context layout is:

byte-for-byte identical
order-stable
provider-independent

This makes context cacheable, diffable, and replayable.

Why This Matters

Without a formal layout model:

retries hide bugs
prompt behavior drifts
context loss is invisible

With the Token Box Model:

failures are explicit
critical instructions are protected
behavior is reproducible

This turns context handling from a heuristic into an engineering discipline.

Relationship to FACET Execution

The Token Box Model is executed in:

Phase 4 — Layout

Inputs:

computed variable values
rendered sections
token budget

Output:

finalized ordered context

Any violation aborts execution before provider interaction.

Design Principle

Context is not text.
Context is a resource.

The Token Box Model makes that resource explicit, bounded, and deterministic.

Status

This document defines the normative Token Box Model for FACET v2.0 and later.

All compliant implementations MUST follow this algorithm when performing context layout.

Adapter Philosophy

rokoss21 — Tue, 16 Dec 2025 23:12:05 +0000

Purpose

This document defines the architectural role and strict limitations of provider adapters in the FACET ecosystem.

Adapters exist to translate Canonical JSON into provider-specific payloads. They are not execution engines, not sources of truth, and not places where logic is allowed to accumulate.

Adapters are translators, not decision-makers.

This principle is foundational to FACET’s long-term correctness, reproducibility, and vendor independence.

The Core Rule

All semantic decisions MUST be completed before an adapter is invoked.

Once Canonical JSON exists:

No logic may be added
No structure may be inferred
No defaults may be applied
No recovery heuristics may run

Adapters perform mechanical transformation only.

Why Adapters Must Be Dumb

Modern LLM stacks routinely collapse because adapters grow "helpful" behavior:

filling in missing fields
renaming tools dynamically
reordering messages to satisfy undocumented rules
retrying failed calls with mutated payloads
patching provider quirks ad hoc

This creates systems where:

behavior differs per provider
bugs cannot be reproduced
audits become impossible
fixes introduce new regressions

FACET treats this as an architectural failure, not an implementation detail.

Canonical JSON as the Contract Boundary

Adapters consume Canonical JSON and emit provider payloads.

They MUST treat Canonical JSON as:

immutable
complete
authoritative

If a provider rejects a payload derived from valid Canonical JSON, the adapter MUST fail loudly.

Adapters are not allowed to “make it work”.

Failure is information. Mutation is corruption.

What Adapters Are Allowed To Do

Adapters MAY:

rename fields to match provider APIs
transform message layouts (e.g. roles → blocks)
map FACET interfaces to provider tool schemas
attach provider-required metadata
split or merge fields when explicitly specified

All transformations MUST be:

deterministic
stateless
reversible in principle

What Adapters Are Forbidden To Do

Adapters MUST NOT:

infer missing values
change execution order
modify tool arguments
drop or add messages
reinterpret context priorities
retry with modified payloads
apply provider-specific heuristics silently

Any of the above breaks:

determinism
reproducibility
trust

Failure Containment Model

FACET intentionally localizes all provider-specific failures to the adapter layer.

If a provider:

changes an API
introduces undocumented constraints
breaks streaming semantics

Then:

Canonical JSON remains valid
stored executions remain replayable
only the adapter needs updating

This sharply bounds blast radius and prevents systemic corruption.

Adapters vs Frameworks

Most agent frameworks embed logic inside provider integrations:

Agent Logic
  ↓
Provider Wrapper
  ↓
Model

FACET inverts this:

FACET Compiler
  ↓
Canonical JSON (IR)
  ↓
Adapter (View)
  ↓
Provider

This inversion is what makes determinism possible.

Adapters Are Replaceable

Because adapters are:

stateless
mechanical
non-authoritative

They can be:

swapped
versioned independently
rewritten without touching agent logic

Vendor lock-in becomes structurally impossible.

Design Principle

If fixing a bug requires changing adapter logic, the bug was upstream.

Adapters reveal incompatibilities; they do not hide them.

This is the only sustainable way to build systems that survive provider churn.

Status

This document defines the normative adapter philosophy for FACET-based systems.

Any implementation that embeds decision-making logic inside adapters is non-compliant by design.

DEV Community: rokoss21

IOSM CLI: AI Engineering Runtime. Not Another Chat Wrapper.

🛠️ Why This Exists

👥 Who Is This For

The solo developer who wants a real coding agent

The senior engineer running complex refactors

The team lead operationalizing AI coding

⚡ Barrier to Entry: Minimal

Day 1 — Three commands to a working agent

Week 1 — Unlock depth when you need it

No provider lock-in

🆚 Honest Positioning vs Other Tools

🏗️ Three Architectural Layers

Layer 1 — Runtime: Agents, Orchestration, Worktrees

Layer 2 — Methodology: IOSM Cycles, Metrics, Artifacts

Layer 3 — Platform: SDK, JSON-RPC, MCP

🔄 A Full Production Workflow

📦 Install

🌐 Open Spec, Open Runtime

One Last Thing

Swarm-IOSM: Orchestrating Parallel AI Agents with Quality Gates

The Parallel Agent Problem

What is Swarm-IOSM?

Core Model

Key Innovation: Continuous Dispatch

Live Example: Adding Redis Caching

Problem

Goal

Step 1: Create Track

Step 2: Execute Plan

Step 3: Integration & Quality Gates

Results

Technical Deep Dive

1. File Lock Management

2. IOSM Quality Gates

Gate-I: Improve (Code Quality)

Gate-O: Optimize (Performance & Resilience)

Gate-M: Modularize (Clean Boundaries)

Gate-S: Shrink (Minimal Complexity)

3. Auto-Spawn Protocol

4. Cost Tracking & Model Selection

Real-World Use Cases

1. Greenfield Feature (Email Notifications)

2. Brownfield Refactoring (Payment Module)

3. Multi-Module Feature (Multi-Tenant Architecture)

Getting Started (5 Minutes)

Installation

Create Your First Track

Execute

Integrate

Commands Reference

What Swarm-IOSM is NOT

Architecture Overview

IOSM Framework Integration

Version History

v2.1 (2026-01-19) — Current

v2.0 (2026-01-18)

v1.3 (2026-01-17)

v1.2 (2026-01-16)

v1.1 (2026-01-15)

Contributing

Conclusion

Links

FACET: Contracts + Gates for LLM Systems

A short failure story: “theatre in production”

Contracts and gates: the difference between a demo and a pipeline

Part 1 — Contracts in FACET (real examples)

1) Tool contracts with @interface (typed tools, not “tool descriptions”)

2) Inputs are explicit with @input (no hidden dependencies)

3) Variables are reactive, deterministic, and immutable after compute (R-DAG)

4) Lenses have trust levels (Pure / Bounded / Volatile)

Part 2 — Gates in FACET (not vibes, executable checks)

5) Tests as executable gates with mocks and assertions (@test)

Part 3 — Deterministic context packing (Token Box Model) is a gate too

6) Token Box Model: deterministic allocation + critical overflow as a hard failure

Part 4 — What “enforced before generation” actually means (no magic)

Part 5 — A small, concrete canonical output artifact

Part 6 — Tooling matters: the reference CLI (fct) makes this operational

Closing: stop shipping theatre — ship standards

Repositories

1) Tool contracts with `@interface` (typed tools, not “tool descriptions”)

2) Inputs are explicit with `@input` (no hidden dependencies)

5) Tests as executable gates with mocks and assertions (`@test`)

Part 6 — Tooling matters: the reference CLI (`fct`) makes this operational

Interface (`@interface`)