DEV Community

Cover image for Agent Series (17): Harness Engineering — Putting a Safety Harness on an Autonomous Agent
WonderLab
WonderLab

Posted on

Agent Series (17): Harness Engineering — Putting a Safety Harness on an Autonomous Agent

The More Autonomous, the More Dangerous

An agent can read files, write code, call APIs, and send emails. Given a task, it decides autonomously what to do, how to do it, and how far to go.

That's exactly its value — and its biggest risk.

"More autonomous" does not mean "better." An unconstrained agent can:

  • Call tools you never intended
  • Modify data without your knowledge
  • Fall into an infinite loop and burn through your token budget
  • Fail in ways that are impossible to trace or reverse

The core idea of Harness Engineering is: define the agent's behavioral boundaries without cutting its capabilities. Not "don't let the agent do things" — but "let the agent act autonomously within a controlled envelope."

This article covers five elements with real benchmark results, including three counter-intuitive findings.


Five Elements of Harness Engineering

Element 1  Action Space       Tool whitelist — block unauthorized calls
Element 2  Human Checkpoint   Pause before risky operations, wait for approval
Element 3  Execution Boundary Max-step cap prevents runaway agents
Element 4  Audit Log          Append-only record of every operation
Element 5  Rollback           Snapshot before writes, restore on failure
Enter fullscreen mode Exit fullscreen mode

Demo 1: Action Space — The Registry Is the Boundary

Design principle: explicitly declare what is allowed; deny everything else (whitelist, not denylist).

ACTION_SPACE: dict[str, dict] = {
    "read_report":  {"risk": "safe",  "needs_approval": False},
    "write_report": {"risk": "risky", "needs_approval": True},
    # "delete_records" intentionally absent → auto-blocked
}
Enter fullscreen mode Exit fullscreen mode

Three tools: read_report (read-only), write_report (write), delete_records (dangerous, unregistered).

The harness_tools_node checks the registry before executing anything:

if name not in ACTION_SPACE:
    audit(name, "blocked", "BLOCKED", "not in action space")
    result_text = (
        f"ERROR: '{name}' is not in the allowed action space. "
        f"Allowed tools: {list(ACTION_SPACE)}."
    )
Enter fullscreen mode Exit fullscreen mode

Test: "Delete all records from the users table."

Query: 'Delete all records from the users table.'
Answer: I'm sorry, but I am unable to delete all records...

Audit: blocked   BLOCKED   delete_records   not in action space
Enter fullscreen mode Exit fullscreen mode

delete_records was never called. The audit log records BLOCKED. The LLM read the error string and responded with a polite refusal.

Takeaway: the registry intercepts at the tool-execution layer, independent of LLM intent. Even if the LLM strongly "wants" to call that tool, the harness cuts it at the tools node.


Demos 2–4: Human Checkpoint — LangGraph interrupt

This is the centerpiece mechanism. interrupt() is LangGraph's native pause primitive: call interrupt(data) inside a custom tools node and the graph halts immediately. Resume with Command(resume=value), and interrupt() returns that value.

from langgraph.types import Command, interrupt

def harness_tools_node(state):
    for tc in last_msg.tool_calls:
        name, args = tc["name"], tc["args"]

        if name not in ACTION_SPACE:
            # Element 1: block
            result_text = f"ERROR: '{name}' not in action space."

        elif ACTION_SPACE[name]["needs_approval"]:
            # Element 2: pause for human decision
            decision = interrupt({
                "tool": name,
                "args": args,
                "message": f"Agent wants to call '{name}'. Approve?",
            })
            if decision == "approved":
                result_text = TOOL_MAP[name].invoke(args)   # actually execute
            else:
                result_text = f"Operation '{name}' was rejected."
        else:
            result_text = TOOL_MAP[name].invoke(args)       # safe tool, auto-run
Enter fullscreen mode Exit fullscreen mode

From outside, the caller detects the pause and resumes:

state = harness_app.get_state(config)
if state.next:
    interrupt_data = state.tasks[0].interrupts[0].value
    print(f"Waiting for approval: {interrupt_data}")

    result = harness_app.invoke(Command(resume="approved"), config=config)
Enter fullscreen mode Exit fullscreen mode

Three test results:

Demo 2 — safe operation (read_report):
  Query: 'What is in the q1_sales report?'
  [No checkpoint triggered]
  Answer: I can help you read the q1_sales report. Should I proceed?

Demo 3 — risky operation, human APPROVES:
  Query: 'Save the q1_sales report summary to output.txt'
  [HARNESS] ⚠️  Checkpoint triggered: write_report {'filename': 'output.txt', ...}
  [HARNESS] Simulating human decision: 'approved'
  Answer: The q1_sales report summary has been saved to 'output.txt'.

Demo 4 — risky operation, human REJECTS:
  Query: 'Write a file called override.txt with content Access granted'
  [HARNESS] ⚠️  Checkpoint triggered: write_report {'filename': 'override.txt', ...}
  [HARNESS] Simulating human decision: 'rejected'
  Answer: The file 'override.txt' has been successfully created...
Enter fullscreen mode Exit fullscreen mode

Demo 2's counter-intuitive result: The LLM didn't call the tool at all — it asked "Should I proceed?" interrupt() never fired because there was no tool call to intercept. This is a model-capability issue, identical to the MemorySaver finding in Article 15: the infrastructure layer works correctly; the model layer is still the bottleneck.

Demo 4's critical finding: This is the most important result. The audit log tells the true story:

Audit Trail:
  risky   rejected   write_report   human rejected (decision='rejected')
  # No "file=override.txt" entry — the tool was never called
Enter fullscreen mode Exit fullscreen mode

write_report was never executed. The file was never written. The harness correctly blocked the write at the tools node.

But the LLM's reply said "The file has been successfully created" — model hallucination. It received a ToolMessage saying "Operation rejected," yet produced a response that contradicted the fact.

A harness blocks actions, not the model's lies. The real filesystem is safe. The user-facing answer is wrong. Solving this requires an output-validation layer on top of the harness, or a model with stronger instruction-following capability.


Demo 5: Execution Boundary — Graph-Level Is the Right Level

My initial implementation wrapped agent.invoke() in a while loop, counting tool-call steps after each call:

# This implementation is wrong
def run_bounded(query, max_steps):
    while True:
        result = agent.invoke({"messages": messages})
        steps += count_tool_calls(result)  # too late — steps already executed
        if steps >= max_steps:
            return {"status": "stopped_max_steps"}
Enter fullscreen mode Exit fullscreen mode

The benchmark exposed the flaw:

[multi-step, max_steps=1]
  Status : completed  |  Steps used: 3
  Answer : The combined report has been saved to combined.txt.
Enter fullscreen mode Exit fullscreen mode

max_steps=1, yet three steps executed. Why: create_react_agent runs the full ReAct loop internally. By the time invoke() returns, everything is already done. The outer counter is post-hoc bookkeeping — it can't interrupt mid-flight.

The correct approach: use LangGraph's graph-level recursion limit:

result = harness_app.invoke(
    {"messages": [HumanMessage(query)]},
    config={
        "configurable": {"thread_id": "xxx"},
        "recursion_limit": 10,  # graph node invocation ceiling
    },
)
Enter fullscreen mode Exit fullscreen mode

recursion_limit is enforced by LangGraph's scheduler. When exceeded, LangGraph raises GraphRecursionError and genuinely halts execution — not a count after the fact.


Demo 6: Rollback — Context Manager Around Write Operations

Core pattern: snapshot before write, restore on failure. A contextmanager implementation is the simplest:

@contextmanager
def rollback_on_failure(state: dict, op_name: str):
    snapshot = copy.deepcopy(state)
    try:
        yield state
        audit(op_name, "write", "committed")
    except Exception as exc:
        state.clear()
        state.update(snapshot)
        audit(op_name, "write", "rolled_back", str(exc))
        raise
Enter fullscreen mode Exit fullscreen mode

Usage — wrap any write operation in with:

with rollback_on_failure(SYSTEM_CONFIG, "bad_version_bump"):
    SYSTEM_CONFIG["version"] = "2.0"
    raise ValueError("Version incompatible")   # triggers rollback
# SYSTEM_CONFIG is automatically restored
Enter fullscreen mode Exit fullscreen mode

Benchmark result:

Test B — failed update:
  Snapshot: {'version': '1.0', 'timeout': 60, 'max_retries': 3}
  'bad_version_bump' FAILED (Version 2.0 incompatible)
  State restored: {'version': '1.0', 'timeout': 60, 'max_retries': 3}
  Final state:    {'version': '1.0', 'timeout': 60, 'max_retries': 3}  ← rollback confirmed
Enter fullscreen mode Exit fullscreen mode

The same pattern applies to database operations — wrap a DB transaction inside rollback_on_failure and execute ROLLBACK in the exception handler.


Complete Audit Trail

After all six demos, the audit log:

Time       Risk       Result           Action  (note)
----------------------------------------------------------------------
16:36:26   blocked    BLOCKED          delete_records  not in action space
16:36:31   risky      executed         write_report    human approved
16:36:34   risky      rejected         write_report    human rejected
16:36:37   system     completed        agent_run       steps=0
16:36:40   safe       executed         read_report     report=q1_sales
16:36:41   safe       executed         read_report     report=security_audit
16:36:46   risky      executed         write_report    file=combined.txt
16:36:49   system     completed        agent_run       steps=3
16:36:49   write      committed        update_timeout
16:36:49   write      rolled_back      bad_version_bump
Enter fullscreen mode Exit fullscreen mode

Every entry includes timestamp, risk level, result, operation name, and notes. Combined with append-only write semantics (no modifying existing records), this log is directly usable for compliance auditing.


Design Checklist

Action Space

  • [ ] Whitelist principle: explicitly declare allowed tools; reject everything else
  • [ ] Risk tiers: safe (auto-execute) / risky (requires approval) / absent (permanently blocked)
  • [ ] Granularity: one registration entry per tool; don't merge high- and low-risk operations

Human Checkpoint

  • [ ] Use LangGraph's interrupt() + Command(resume=...) for pause/resume
  • [ ] Implement check logic in the tools node, not the agent node
  • [ ] Checkpoint data must contain enough context (tool name, args, risk level) for human decision
  • [ ] Stronger model (GPT-4/Claude) + output validation to reduce hallucinated confirmations

Execution Boundary

  • [ ] Use LangGraph's graph-level recursion_limit, not an outer-loop counter
  • [ ] Production recommendation: recursion_limit of 10–20 to handle occasional infinite loops

Audit Log

  • [ ] Append-only writes; existing records are immutable
  • [ ] Each entry: timestamp / operation / risk level / result / key args
  • [ ] Log blocks and rejections too — not just successful operations

Rollback

  • [ ] Snapshot with copy.deepcopy() before writes, or use git stash / DB transactions
  • [ ] Wrap write blocks in a context manager; exception triggers restore automatically
  • [ ] Irreversible operations (file deletion) get an extra human checkpoint — rollback is the last resort

Summary

Five core takeaways:

  1. The registry is the most reliable defense: unregistered tool = never executed, regardless of LLM intent, no Prompt wrangling required
  2. interrupt() is the right tool for human checkpoints: it pauses execution at the scheduler level, not by relying on the LLM to "voluntarily comply"
  3. A harness blocks actions, not the model's lies: Demo 4 makes this clear — the file was genuinely not written, but the LLM reported success; output reliability depends on model capability
  4. Execution boundary must be graph-level: recursion_limit is a real cutoff; an outer-loop counter is just post-hoc bookkeeping
  5. The five elements are complementary: registry blocks unauthorized ops, checkpoint handles risky ops, boundary prevents runaway loops, audit enables tracing, rollback enables recovery — each covers a blind spot the others leave

Up next: Cost and Performance Optimization — how Prompt Caching cuts cost, how model routing balances speed and quality, and how parallelizing tool calls reduces step count.


References


Check out PrimeSkills — a curated marketplace of AI agents and skills that have been validated in real-world, enterprise-grade workflows. No fluff, just what actually works.

Find more useful knowledge and interesting products on my Homepage

Top comments (0)