Ken Imoto

Posted on Apr 2 • Edited on May 5

Harness Engineering for AI Code Review -- How OpenAI, Anthropic, and HumanLayer Control Agent-to-Agent Review

#ai #codereview #softwareengineering #security

The Problem: Code Review Can't Keep Up

AI agents now write code 10x faster than humans.

OpenAI's Codex team generated over 1 million lines of code in 5 months, with 3 engineers merging an average of 3.5 PRs/day each. Anthropic's long-running agents code continuously for 6+ hours.

New problem: code review can't keep up.

Imagine a factory line running 10x faster, but the quality inspectors are the same headcount. The inspection queue stretches out the door.

OpenAI's answer: agent-to-agent review -- AI reviews AI-written code. But "just ask AI to review" doesn't work. You need a control system. That system is the harness, and the discipline of designing it is harness engineering.

What Is Harness Engineering?

Mitchell Hashimoto (HashiCorp founder) defined it:

When an agent makes a mistake, improve the environment so it never makes the same mistake again.

HumanLayer positions this as a subset of Context Engineering.

Harness engineering = designing the configuration that manages an agent's context window. We went from tweaking prompts (prompt engineering) to designing entire environments (harness engineering).

OpenAI Codex Team's Approach

AGENTS.md as a "Table of Contents"

OpenAI initially built a massive AGENTS.md -- coding conventions, architecture decisions, project context, everything in one file. It failed.

Context is a scarce resource. A giant instruction file pushes out task details, code, and relevant documentation. When everything is "important," nothing is.

The fix: keep AGENTS.md to ~100 lines as a table of contents.

# AGENTS.md (~100 lines)

## Architecture
→ docs/architecture/overview.md

## API Conventions  
→ docs/api/conventions.md

## Testing
→ docs/testing/strategy.md

## Security
→ docs/security/guidelines.md

Details live in docs/. The agent references them only when needed. This is Progressive Disclosure applied to AI context.

The Agent-to-Agent Review Loop

Here's OpenAI's actual flow:

Codex generates code changes
Codex runs its own local review
Requests additional agent reviews (local + cloud)
Responds to feedback and fixes
Loops until all agent reviewers pass
Humans intervene only on escalation

Humans step in for exactly 3 cases: new architecture decisions, security-sensitive changes, and product direction calls. Everything mechanical is agent-to-agent.

Educational Linter Design

OpenAI's custom linters embed "why" and "how to fix" in every error message:

ERROR: Module 'payments' imports from 'users' internal package.
WHY: Cross-module internal imports break module boundaries.
     See docs/architecture/module-boundaries.md
FIX: Use the public API: import { getUserById } from '@app/users'

Error messages = teaching moments. The agent doesn't need to understand the entire architecture. It just needs clear feedback when it crosses a boundary.

Anthropic's Two-Phase Approach

Anthropic tackles a different angle: the "memory gap" problem in long-running agents.

Initializer Agent + Coding Agent

Session 1 (Initializer Agent):
  → Creates init.sh
  → Creates claude-progress.txt
  → Generates 200+ feature list as JSON (all passes: false)
  → Initial git commit

Session 2+ (Coding Agent):
  → Reads claude-progress.txt + git history
  → Implements exactly 1 feature
  → Confirms tests pass
  → Updates passes: true
  → Clean git commit
  → Hands off to next session

The key is one feature at a time, incrementally. This structurally prevents the "try to do everything at once" failure mode.

JSON Over Markdown for Feature Lists

An interesting finding: Anthropic manages feature lists in JSON, not Markdown. The reason: "LLMs tend to improperly rewrite Markdown files, but JSON's strict structure makes it harder to tamper with."

{
  "category": "functional",
  "description": "New chat button creates a new conversation",
  "steps": [
    "Navigate to main interface",
    "Click new chat button",
    "Verify new conversation is created"
  ],
  "passes": false
}

They pair this with a strong instruction: "Do NOT edit or delete tests. Do NOT change passes to true without actually running the test." They call unauthorized test modification "unacceptable."

You wouldn't want a student grading their own exam and reporting "100% correct!" JSON's strict format plus firm instructions: trust but verify.

HumanLayer's 6 Levers

HumanLayer organizes harness components into 6 levers:

Lever	Role	Code Review Application
System Prompt	Base instructions	Define review criteria
Tools / MCP	External tool integration	Invoke SAST/linters
Context	Reference information	Architecture docs
Sub-agents	Task isolation	Parallel review by concern
Hooks	Automatic triggers	Auto-review on PR creation
Skills	Knowledge modules	Security/performance review skills

Sub-agents deserve special attention. HumanLayer calls them "context firewalls." Run security review and performance review as separate sub-agents, and intermediate noise never pollutes the parent thread.

Getting Started: 4 Steps

Step 1: Build Deterministic Checks First

Before involving LLMs, automate what's automatable. Type checking, import boundary validation, naming conventions, test coverage thresholds. These are deterministic: same input, same output, every time.

LLMs handle what deterministic tools can't: design review, readability assessment, security pattern recognition. Use both layers together -- deterministic as the foundation, LLM-as-Judge on top.

Step 2: Define Review Criteria with PASS/FAIL

Not "check security" but "check for SQL injection, XSS, and auth bypass. FAIL if any are found." Explicit criteria that leave no room for interpretation.

Step 3: Isolate Concerns with Sub-agents

Don't give one agent all review concerns. Separate into security agent, performance agent, readability agent. Each gets a focused context window, uncontaminated by other concerns.

Step 4: Feed Failures Back Into the Harness

When AI review misses something:

Add the case as a linter rule or evaluation criterion
Incorporate as a regression test
Guarantee the same failure never recurs

This is the core loop of harness engineering.

Summary

Organization	Approach	Core Insight
OpenAI	Agent-to-agent review	Humans only on escalation
Anthropic	Initializer + Coding Agent	One feature at a time, incrementally
HumanLayer	6 levers	Sub-agent = context firewall
Martin Fowler	Deterministic + LLM hybrid	Custom linters = teaching moments

Harness engineering isn't "how to delegate work to AI." It's "how to build an environment where AI failures are safe."

You wouldn't ride a horse without reins. AI agents are the same: if you're going to let them run, design the harness first.

References

For more on Context Engineering and harness design, check out my book:
📕 MCP Security in Practice

DEV Community