DEV Community

Cover image for Agent Series (20): Harness in Production — From Single File to Reusable Package
WonderLab
WonderLab

Posted on

Agent Series (20): Harness in Production — From Single File to Reusable Package

From Demo Code to a Reusable Package

Article 19 used a 900-line harness_full_demo.py to demonstrate eight defense layers. That file is good for explaining concepts, but not for reuse — all layers are coupled together, nothing can be tested in isolation, and nothing can be imported by another project.

A production-grade Agent project needs something you can actually import:

harness/
├── __init__.py      Public API exports
├── registry.py      Layer 2: ActionRegistry + PermissionLevel
├── budget.py        Layer 3: PermissionBudget (with refund())
├── sandbox.py       Layer 4: sanitise_input + sandboxed_eval
├── audit.py         Layer 6: ImmutableAuditLog (hash-chained)
├── rollback.py      Layer 7: RollbackCoordinator
└── harness.py       Unified entry point: AgentHarness
Enter fullscreen mode Exit fullscreen mode

This article starts with package design, covers three key API decisions, and finishes with two integration styles: standalone Python and LangGraph graph embedding.


Module Design

registry.py — Layer 2

class PermissionLevel(Enum):
    READ        = 1
    WRITE       = 2
    ADMIN       = 3
    IRREVERSIBLE = 4

@dataclass
class RegisteredAction:
    name: str
    level: PermissionLevel
    budget_cost: int
    description: "str"
    handler: Any   # Callable or BaseTool

class ActionRegistry:
    def register(self, action: RegisteredAction) -> None: ...
    def get(self, name: str) -> RegisteredAction: ...    # not found → PermissionError
    def is_allowed(self, name: str) -> bool: ...
    def names(self) -> list[str]: ...
Enter fullscreen mode Exit fullscreen mode

get() rather than __getitem__: raises a consistent PermissionError, without leaking the internal KeyError detail.


budget.py — Layer 3

class PermissionBudget:
    def spend(self, action_name: str, cost: int) -> None:
        if self.remaining < cost:
            raise BudgetExhaustedError(...)
        self.remaining -= cost

    def refund(self, action_name: str, cost: int) -> None:
        self.remaining = min(self.total, self.remaining + cost)
Enter fullscreen mode Exit fullscreen mode

The new refund() method fixes a design flaw from Article 19: budget was deducted before approval, and never returned on rejection. The production package corrects this — when an IRREVERSIBLE action is intercepted, harness.py proactively calls refund() to keep budget accounting accurate.


sandbox.py — Layer 4

INJECTION_PATTERN = re.compile(
    r"(ignore.*(previous|above|prior)|forget.*instruction|"
    r"you are now|act as|jailbreak|bypass|"
    r"override.*system|system.*override|"     # both word orders covered
    r"</s>|\n\n###|###\s*system|<\|im_start\|>|system prompt)",
    re.IGNORECASE,
)
Enter fullscreen mode Exit fullscreen mode

Two subtle points:

  1. Both SYSTEM OVERRIDE (system first) and override.*system (override first) are covered
  2. \n\n### matches a real newline, not the literal string \\n\\n###

Both bugs were discovered and fixed during the adversarial tests in Article 21.


audit.py — Layer 6

class ImmutableAuditLog:
    def log(self, action, actor, target, result, metadata=None) -> str:
        entry = {..., "prev_hash": self._last_hash}
        entry["hash"] = self._hash(json.dumps(entry, sort_keys=True) + self._last_hash)
        with self._path.open("a") as f:   # append-only
            f.write(json.dumps(entry) + "\n")
        return entry["hash"]

    def verify_integrity(self) -> bool:
        # Replays the hash chain; any modified field returns False
        ...
Enter fullscreen mode Exit fullscreen mode

The __len__() helper lets tests use len(audit) to check entry count directly.


rollback.py — Layer 7

class RollbackCoordinator:
    @contextmanager
    def transaction(self, state: dict, op_name: str):
        snapshot = copy.deepcopy(state)
        self._snapshots.append({"op": op_name, "snapshot": snapshot})
        try:
            yield state
        except Exception:
            state.clear()
            state.update(snapshot)
            self._snapshots.pop()
            raise

    def rollback_last(self, state: dict) -> str | None:
        """Manual trigger: undo the most recent committed transaction."""
        if not self._snapshots:
            return None
        entry = self._snapshots.pop()
        state.clear()
        state.update(entry["snapshot"])
        return entry["op"]
Enter fullscreen mode Exit fullscreen mode

rollback_last() enables manual rollback: after a transaction commits, the snapshot is retained until explicitly confirmed or cleared by the caller.


Unified Entry Point: AgentHarness

class AgentHarness:
    def __init__(self, budget: int = 100, log_path: str = ...):
        self.registry = ActionRegistry()
        self.budget   = PermissionBudget(total=budget)
        self.audit    = ImmutableAuditLog(log_path=log_path)
        self.rollback = RollbackCoordinator()
        self._state: dict = {}

    def execute(self, action_name: str, actor: str = "agent", **kwargs) -> Any:
        # Layer 4: sanitise string arguments
        # Layer 2: registry check (missing → PermissionError)
        # Layer 3: budget deduction (insufficient → BudgetExhaustedError)
        # Layer 5: IRREVERSIBLE → refund budget + raise HumanApprovalRequired
        # Layer 7: WRITE/ADMIN wrapped in rollback.transaction
        # Layer 6: audit record
        ...

    def approve_and_execute(self, action_name: str, actor: str = "human", **kwargs) -> Any:
        """Call this after catching HumanApprovalRequired to complete execution."""
        ...
Enter fullscreen mode Exit fullscreen mode

Why the two methods are separate:

  • execute() is the automated path: all checks pass, execute immediately
  • approve_and_execute() is the human path: the caller explicitly signals "this has been approved"

Merging them (e.g., with an approved=False parameter) makes intent ambiguous and harder to test.


Standalone Usage

Basic Flow

harness = AgentHarness(budget=50)

# Register actions
harness.registry.register(RegisteredAction(
    "read_ticket",   PermissionLevel.READ,        1,  "Read Jira ticket",  handler_fn))
harness.registry.register(RegisteredAction(
    "write_draft",   PermissionLevel.WRITE,        3,  "Write draft fix",   handler_fn))
harness.registry.register(RegisteredAction(
    "create_pr",     PermissionLevel.ADMIN,         8,  "Open pull request", handler_fn))
harness.registry.register(RegisteredAction(
    "merge_to_main", PermissionLevel.IRREVERSIBLE, 20, "Merge to main",     handler_fn))
Enter fullscreen mode Exit fullscreen mode

READ → WRITE → ADMIN normal flow:

r1 = harness.execute("read_ticket",  ticket_id="BUG-101")
r2 = harness.execute("write_draft",  ticket_id="BUG-101", patch="fix: add null check")
r3 = harness.execute("create_pr",    ticket_id="BUG-101", title="fix: BUG-101")
# read=1 + write=3 + admin=8 = 12 spent, 38 remaining
Enter fullscreen mode Exit fullscreen mode

Unregistered Action Blocked

try:
    harness.execute("delete_all_data")
except PermissionError as e:
    # "Action 'delete_all_data' not in registry. Execution blocked."
    ...
Enter fullscreen mode Exit fullscreen mode

IRREVERSIBLE Two-Phase Execution

try:
    harness.execute("merge_to_main", pr_id=1)
except HumanApprovalRequired as e:
    print(e.action_name)   # "merge_to_main"
    print(e.action_args)   # {"pr_id": 1}
    # After human review:
    result = harness.approve_and_execute("merge_to_main", pr_id=1)
Enter fullscreen mode Exit fullscreen mode

Key point: when execute() intercepts an IRREVERSIBLE action, it calls budget.refund() first. The net budget cost is zero. Only approve_and_execute() actually charges the budget.

Budget Exhaustion

# budget=5, write cost=3
h = AgentHarness(budget=5)
h.execute("write_draft", ...)   # OK, 2 remaining
h.execute("write_draft", ...)   # BudgetExhaustedError: need 3, remaining 2
Enter fullscreen mode Exit fullscreen mode

LangGraph Integration

Embedding the harness inside LangGraph's tools_node:

def tools_node(state: HState) -> dict:
    last = state["messages"][-1]
    results = []
    for tc in last.tool_calls:
        name, args = tc["name"], tc["args"]
        try:
            reg = harness.registry.get(name)               # Layer 2
            harness.budget.spend(name, reg.budget_cost)    # Layer 3

            if reg.level == PermissionLevel.IRREVERSIBLE:
                decision = interrupt({...})                 # Layer 5: LangGraph primitive
                if decision != "approved":
                    harness.budget.refund(name, reg.budget_cost)
                    harness.audit.log(name, "checkpoint", ..., "HUMAN_REJECTED")
                    results.append(ToolMessage(content="rejected", ...))
                    continue

            if reg.level in (WRITE, ADMIN):
                with harness.rollback.transaction(harness._state, name):  # Layer 7
                    output = TOOL_MAP[name].invoke(args)
            else:
                output = TOOL_MAP[name].invoke(args)

            harness.audit.log(name, "agent", ..., "EXECUTED")       # Layer 6
            results.append(ToolMessage(content=str(output), ...))

        except PermissionError as e:
            harness.audit.log(name, "registry", ..., "BLOCKED")
            results.append(ToolMessage(content=str(e), ...))
        except BudgetExhaustedError as e:
            results.append(ToolMessage(content=str(e), ...))

    return {"messages": results}
Enter fullscreen mode Exit fullscreen mode

tools_node is the harness's natural insertion point: it intercepts before tool execution without touching any agent_node (reasoning layer) logic.


Article 21 Test Results (45/45)

This package's behavior is fully verified by Article 21's test suite:

Functional  (Layer 1–7 basic behaviour)     ████████████████████████████████  19/19  PASS
Adversarial (injection / escalation)        ████████████████████████████████  17/17  PASS
Chaos       (fault injection / partial)     ████████████████████████████████   9/ 9  PASS

Total                                        45/ 45 tests passed
Enter fullscreen mode Exit fullscreen mode

Two real bugs found by the tests:

  1. INJECTION_PATTERN only matched override.*system, missing [SYSTEM OVERRIDE] (reversed word order)
  2. \\n\\n### matched the literal string \n, not a real newline — jailbreak pattern ### System: slipped through

Both fixed in sandbox.py with a one-line regex adjustment.


Design Checklist

Package Structure

  • [ ] One file per layer; each file does exactly one thing
  • [ ] __init__.py exports only the public API; internal classes stay private
  • [ ] AgentHarness acts as Facade; callers don't reach into subsystems directly

API Design

  • [ ] execute() is the automated path covering the full Layer 2→7 chain
  • [ ] approve_and_execute() is the human path; the caller signals "approved"
  • [ ] Budget is refunded (refund()) when IRREVERSIBLE is intercepted, keeping accounting accurate
  • [ ] All exception types (PermissionError / BudgetExhaustedError / HumanApprovalRequired) exported from __init__.py

Sandbox

  • [ ] Injection pattern covers both forward and reverse word orders
  • [ ] \n is a real newline character, not the literal \\n

LangGraph Integration

  • [ ] Harness is embedded only in tools_node, not in agent_node
  • [ ] Each tool call runs through the harness check chain independently
  • [ ] IRREVERSIBLE uses LangGraph interrupt(), not a Python exception

Summary

Five core conclusions:

  1. Modularity is a prerequisite for testability: you can't test a single layer in isolation when everything is one file; splitting into a package lets each module be independently mocked and verified
  2. Refund budget on IRREVERSIBLE interception: the Article 19 design flaw, fixed here — "intercept before charging" is cleaner than "charge then refund," though both are valid; pick one and document it
  3. Separating execute() and approve_and_execute() makes intent explicit: automated and human paths are distinct; caller intent is unambiguous
  4. Tests found real production bugs: two regex vulnerabilities were invisible during development; adversarial tests exposed them on the first run
  5. LangGraph's tools_node is the harness's natural slot: no changes to agent logic needed; add the harness only at the tool execution layer, keeping concerns separated

References


Check out PrimeSkills — a curated marketplace of AI agents and skills that have been validated in real-world, enterprise-grade workflows. No fluff, just what actually works.

Find more useful knowledge and interesting products on my Homepage

Top comments (1)

Collapse
 
mehmetcanfarsak profile image
Mehmet Can Farsak

Solid breakdown of production agent architecture. The ActionRegistry + PermissionBudget pattern is clean — you're essentially building guardrails at the infrastructure level. I've been thinking about a related gap: most agents don't distinguish between "thinking mode" and "action mode." When you ask them to brainstorm, they still run through the full tool-use pipeline.

That's why I built Brainstorm-Mode (mehmetcanfarsak/Brainstorm-Mode on GitHub) — it adds a mode layer via PreToolUse hooks. Divergent mode blocks all tools, actionable mode whitelists safe ones, academic mode routes to research tools. Fits right alongside the registry/budget pattern you're describing.