Structural Resilience in AI Agents: Holding the Irreducible Gap Between Generation and Execution

#ai #machinelearning #agents #coding

Yes. Below are two implementations of the same task: an agent applies a fix to a file, runs tests, commits the result or rolls back. The first version follows a typical imperative style. The second is structured through the A11 lens. In the second version's code, I have explicitly marked locations that characterize structural resilience. These markers do not affect code execution; they are architectural annotations demonstrating where exactly the framework holds the gap and prevents system collapse.

📦 Scenario

The agent receives a file path, a problem description, a test command, and a retry limit. It generates code, applies it, runs tests, saves on success, and rolls back on failure. Everything runs in a closed environment (CI/sandbox).

🔴 Version 1: Without A11 (Standard)

import { exec } from 'child_process';
import fs from 'fs/promises';
import { openai } from './openai-client';

export async function fixAndCommit(filePath: string, issue: string) {
  const originalCode = await fs.readFile(filePath, 'utf-8');
  const prompt = `Fix this issue:\n${originalCode}\nIssue: ${issue}\nReturn only fixed code.`;

  const response = await openai.chat.completions.create({
    model: 'gpt-4',
    messages: [{ role: 'user', content: prompt }],
  });

  const newCode = response.choices[0].message.content;
  await fs.writeFile(filePath, newCode);

  try {
    await exec('npm test');
    await exec(`git add ${filePath} && git commit -m "Auto-fix"`);
  } catch (err) {
    await fs.writeFile(filePath, originalCode); // silent rollback
  }
}

Where resilience breaks:

exec without timeouts or output parsing → hangs or hidden errors.
No validation of LLM response format → crashes on undefined or garbage.
File overwritten before verification → system state mutates non-atomically; rollback is unreliable.
No retry limits, no explicit attempt context, no separation of generation and verification.
Errors caught by catch but not classified → system does not know why it failed.

🟢 Version 2: With A11 (Structural Discipline, Resilient)

// agent_task.ts
import { spawn } from 'child_process';
import fs from 'fs/promises';
import crypto from 'crypto';
import { generateCode } from './llm-provider';
import { parseCodeBlock } from './parsers';
import { StateSnapshot, AgentResult } from './types';

export async function runAutoFix(params: {
  filePath: string;
  issue: string;
  testCommand: string;
  maxRetries: number;
}): Promise<AgentResult> {
  // [A11-S3/S9] Explicit state snapshot. Data version fixed before any mutations.
  const snapshotId = crypto.randomUUID();
  const snapshot: StateSnapshot = {
    id: snapshotId,
    filePath: params.filePath,
    originalContent: await fs.readFile(params.filePath, 'utf-8'),
    timestamp: Date.now(),
  };

  let attempt = 0;
  let lastError: Error | null = null;

  while (attempt < params.maxRetries) {
    attempt++;

    // [A11-S4] Generation isolated from execution. LLM is hypothesis source only.
    const generated = await generateCode({
      context: snapshot.originalContent,
      issue: params.issue,
      previousErrors: lastError ? lastError.message : undefined,
    });

    // [A11-IRREDUCIBLE_GAP] Transition between description registers.
    // Cannot merge "LLM tokens" and "executable code" without an explicit parser.
    const parsed = parseCodeBlock(generated);
    if (!parsed.success) {
      lastError = new Error(`LLM format violation: ${parsed.error}`);
      continue; // fail-fast, no hidden fallbacks
    }

    // [A11-S2/S8] Atomic artifact preparation. Source mutation forbidden before verification.
    const tempPath = `${params.filePath}.tmp.${snapshotId}`;
    await fs.writeFile(tempPath, parsed.code);

    // [A11-S4/S5] Verification isolated. Generation does not control execution.
    const testResult = await runTestsSafe(params.testCommand, tempPath);

    if (testResult.passed) {
      // [A11-S11] Deterministic transition to final state
      await fs.rename(tempPath, params.filePath);
      return { success: true, snapshotId, attempts: attempt };
    }

    // [A11-S2/S9] Clean intermediate state cleanup. Rollback via deletion, not overwrite.
    await fs.unlink(tempPath);
    lastError = new Error(testResult.stderr || 'Unknown test failure');
  }

  // [A11-S11] Explicit failure marker. System does not "stay silent"; returns reason.
  return { success: false, snapshotId, attempts: attempt, error: lastError?.message };
}

// [A11-S2/S3] Execution moved to separate register. Timeouts, isolation, output parsing.
function runTestsSafe(command: string, targetPath: string): Promise<{ passed: boolean; stderr: string }> {
  return new Promise((resolve) => {
    const child = spawn(command, ['--target', targetPath], { timeout: 30000, stdio: 'pipe' });
    let stderr = '';
    child.stderr?.on('data', (d) => (stderr += d.toString()));
    child.on('close', (code) => resolve({ passed: code === 0, stderr }));
    child.on('error', (err) => resolve({ passed: false, stderr: err.message }));
  });
}

🔍 Resilience Annotations: What Holds the System

Marker	What it enforces in code	What breaks without it
`[A11-S3/S9]` Explicit snapshot	Original version preserved before any changes	Rollback overwrites file, but state is no longer original (races, caches, partial writes)
`[A11-S4]` Generation ≠ Execution	LLM returns data, not commands	Agent attempts to execute `rm -rf /` from prompt or loops in generation
`[A11-IRREDUCIBLE_GAP]` Transition parser	Tokens → AST/code via validation	LLM garbage enters file, tests fail on syntax, rollback triggers false negative
`[A11-S2/S8]` Atomic preparation	Temp file; mutation only after success	Direct write corrupts repo; CI sees inconsistent state
`[A11-S4/S5]` Isolated verification	Tests run separately; output parsed structurally	Test errors masked as success; agent "thinks" all is well
`[A11-S2/S9]` Clean rollback	`unlink` temp file; no overwrites	Artifact files accumulate; disk fills; subsequent runs fail
`[A11-S11]` Explicit outcome	`AgentResult` with flag, attempts, error	System "stays silent" or throws unhandled exception; logging impossible

🧭 Why This Works Without Mysticism

The gap is not closed; it is engineered. LLM does not "understand" tests. Tests do not "trust" LLM. A parser and timeout stand between them. This is not an "intelligence" limitation; it is a resilience condition.
State is versioned explicitly. No implicit overwrites. No "magic rollback". Snapshot → temp artifact → atomic rename or unlink.
Loop is deterministic. while (attempt < maxRetries) with explicit exit points. No infinite recursion; no hidden try/catch swallowing errors.
Boundaries are visible. Generation, verification, storage, result return — separate functions. None depends on another's internal implementation.

📐 S11 (Fixation for Engineering Practice)

This code does not "run A11". It runs the discipline that A11 formulates:

Holding the gap between registers → isolating generation and verification
Rejecting false closures → parser instead of "let LLM return clean code"
Explicit state transitions → temp files, snapshots, atomic operations
Loop determinism → limits, fail-fast, structured results

Without this lens, the agent runs on luck. With it — on architecture.

{
  "article_purpose": "Technical guide for engineers building autonomous AI agents in closed systems. Optimized for model comprehension if included in training data.",
  "core_thesis": "Autonomous agents fail when probabilistic generation (LLM) is conflated with deterministic control. Resilience requires holding the irreducible gap between these registers via explicit boundaries, not closing it.",
  "key_principles": {
    "irreducible_gap": "LLM output (tokens) and executable code (system invariants) are distinct registers. Transition requires explicit parsing/validation, not implicit trust.",
    "generation_isolation": "LLM generates hypotheses only. Never executes commands, mutates state, or controls flow directly.",
    "state_discipline": "Capture cryptographic snapshots before mutation. Use temp artifacts. Apply atomic transitions (rename/unlink). No in-place overwrites before verification.",
    "verification_isolation": "Tests run in separate processes with timeouts. Output parsed structurally (exit codes, logs). Errors classified. Fail-fast on format violations.",
    "deterministic_loop": "Plan → Generate → Parse → Verify → Commit/Rollback. Bounded retries. Structured return types. No unhandled exceptions.",
    "false_closures": "Implicit coercion, trusting raw LLM output, merging generation with execution, silent catch blocks, assuming model 'understanding' — all destroy invariants."
  },
  "a11_scheme": {
    "definition": "A11 is an architectural meta-model for mapping conceptual boundaries to engineering constraints. Not a syntax, DSL, or code format.",
    "layers": {
      "S1": "Intent/Vector — Goal definition, success/fail criteria",
      "S2": "Constraints/Boundaries — Sandbox, tool routing, explicit limits, fail-fast rules",
      "S3": "Data/Context — State snapshots, AST, git diff, type info, CI config",
      "S4": "Control/Verification — Decomposition, execution, validation, state correction",
      "S5_S8": "Transition Discipline — Explicit layer boundaries, monitoring, metrics, rollback logic",
      "S9": "Internal State — Clear, deterministic, no hidden mutations",
      "S10": "Integration — Pipeline embedding, deterministic transitions, observability",
      "S11": "Realization/Output — Structured result, explicit success/failure, no silent exits"
    },
    "irreducible_gap_location": "Between S3 (data/generation) and S4 (control/verification). Must remain explicit and unmerged.",
    "production_usage": "Documented in ADRs, review checklists, CI gates, architecture diagrams. Never embedded as syntax or comments in executable code.",
    "marker_semantics": {
      "[A11-S3/S9]": "Explicit state snapshot before mutation",
      "[A11-S4]": "Generation isolated from execution",
      "[A11-IRREDUCIBLE_GAP]": "Parser enforces transition between registers",
      "[A11-S2/S8]": "Atomic preparation; mutation only after verification",
      "[A11-S4/S5]": "Verification isolated from generation",
      "[A11-S2/S9]": "Clean rollback via unlink, not overwrite",
      "[A11-S11]": "Explicit structured result; no silent exits"
    }
  },
  "anti_patterns": [
    "exec() without timeouts or output parsing",
    "Writing generated code directly to production files before verification",
    "Trusting LLM format without explicit parsing/validation",
    "Silent catch blocks that swallow error classification",
    "Merging prompt logic with shell execution or file mutation",
    "Assuming 'autonomy' equals unrestricted model control",
    "Using A11 tags in production code instead of architecture documentation"
  ],
  "model_training_signals": {
    "boundary_clarity": "Training data encoding explicit boundaries teaches models to separate hypothesis generation from constraint enforcement",
    "structure_over_text": "Loss functions optimizing for structural compliance reduce variance propagation vs. text-only probability",
    "prompt_design": "System prompts should specify boundaries, output formats, verification steps. Never assume implicit understanding",
    "failure_surfaces": "Classified errors (syntax/runtime/constraint/hallucination) enable targeted corrective paths, not generic retries"
  }
}