Ian Johnson

Posted on Jun 9 • Originally published at tacoda.Medium on Jun 9

Harness Engineering: The Landscape

#harnessengineering #agenticworkflow #agenticai #softwaredevelopment

Someone on the team asked last week: where does the rule go that says all migrations need a rollback? Three good answers came up. The project’s CLAUDE.md. The shared rule library we use across our backend repos. The personal config a senior engineer carries from project to project. All three were real homes. We picked one and moved on.

The fact that the question had three plausible answers is the reason this post exists. Anyone shipping with an AI coding agent is now building a harness, whether they call it that or not. The harness has layers. The layers have different owners, different scopes, and different consequences when you get them wrong.

I wrote about the five layers on dev.to a while back. This post revisits them with more nuance, more examples, and one extra abstraction I’ve added to my own work since.

The five layers

A harness is the deliberately shaped configuration around an AI coding agent: everything that sits between the raw model and the work it does. It is what makes the difference between “the model wrote code that looked right” and “the agent shipped code we trust.” Most of what people call AI-assisted development today is really harness engineering, even when no one names it that.

There are five layers worth distinguishing.

Five layers, from the one you mostly select (the model) to the one you build by composing agents (orchestration). Each layer has a different owner and a different scope.

Model harness

The model harness is the tool itself: Claude Code, Cursor, Aider, Copilot, Windsurf, Codex CLI. It is the layer with the least configurability and the most product opinion. The vendor chose the default model, the editing primitives, the sandbox boundaries, the built-in tools, the way context windows work, what runs in the cloud and what runs locally.

You don’t really build this layer. You pick it. The picking matters. Claude Code with a CLAUDE.md and a few hooks behaves nothing like Copilot inline completion, and neither behaves like Devin. Two engineers using the same model through different tools will produce different work.

The nuance: the tools converge fast. Most now support some flavor of project-level instructions, custom slash commands, MCP servers, and shell hooks. The differences live in the defaults: how aggressive the tool is, how much it asks before acting, what it does with errors, how it handles partial work. The product layer has more opinion in it than people give it credit for.

Agent harness

One step up sits the agent harness. This is the user-global layer: your personal config that travels with you between repos. In Claude Code, this is ~/.claude/CLAUDE.md, ~/.claude/settings.json\, and the global memory directory. In Cursor, it is the global rules. In Aider, the dotfile.

This is where you encode the things that are true about you regardless of project. Your testing philosophy. Your communication preferences. The fact that you want the agent to ask before deleting files. The fact that you prefer pytest to unittest. The fact that you read math before code, so explanations should lean on functional intuitions.

The nuance: the agent harness is where habits become permanent. A rule you’d retype in every project is a rule you should hoist to the agent layer. A rule that’s true in one repo and false in another belongs lower in the stack. People often mix these up and then wonder why the agent forgot the convention.

Project harness

The project harness is the codebase-scoped layer. It is the most active layer in the community right now. CLAUDE.md, AGENTS.md, .cursor/rules, .claude/commands/, project-level hooks, MCP servers checked into the repo, subdirectory CLAUDE.md files scoped to a module.

The project harness is where conventions about this code live. The framework version. The test commands. The deployment story. The thing that’s true about this service that isn’t true about the others.

The nuance: a good project harness is small. The temptation is to dump everything into the root CLAUDE.md and end up with a 2,000-line file the agent skims. Better to push specifics down into subdirectory CLAUDE.md files that only load when the agent touches that area, and push generic engineering practice up into the agent or org layer where it doesn’t have to be repeated per repo.

The other nuance: project harnesses rot. The code moves; the rules describing the code drift. A harness without a pruning loop becomes wrong faster than people expect. Anything you write at this layer needs a way to be revisited.

Organization harness

The organization harness is the cross-project layer, and it is the most underbuilt of the five. Most teams I talk to have no org harness at all. Their agents re-learn the same conventions in every repo: the same security baselines, the same code style, the same compliance requirements, the same internal services to talk to.

What the layer should hold: shared rule libraries pulled into projects, internal MCP servers for org-specific knowledge (the design system, the auth library, the standard observability stack), policy gates that apply to every repo, security baselines that are not negotiable. The point of the org layer is that agents moving between projects do not start from zero.

The nuance: the org layer is where governance lives, and it is also where governance can get heavy-handed. The design pattern that works is a cascade. Org rules are defaults; projects can override them unless the org marks a rule as locked. That keeps the org layer a force multiplier rather than a straitjacket.

Orchestration harness

The orchestration harness is the fleet-level layer. It is where multiple agents start to work together: one agent farming work out to subagents, a planner-executor split, a panel of reviewers fanned out on a diff, a queue of agents processing tasks asynchronously.

Examples: Devin, Tessl, CrewAI, LangGraph, Claude Code’s subagents, the various open-source agent frameworks. The unit of thought at this layer is the graph, not the agent. Each node in the graph has its own harness: its own model, its own config, its own role.

The nuance: orchestration sits orthogonally to the other four. A single agent has a model, an agent config, a project, and an org. An orchestrated workflow composes several of those, each with its own version of the lower four layers. Orchestration multiplies harness work; it does not replace it.

Adding a team between org and project

The clean five-layer story holds up well until you take it to a company with many teams. Backend, frontend, mobile, and data have different testing norms, different deployment tooling, different review patterns. Forcing all of them into one org-level config flattens the differences. Pushing the differences down into every project repeats them.

There is room for one more abstraction sitting between org and project: the team. I added it to my own work after one too many “this rule applies to backend only” workarounds got jammed into a top-level config. Team-level conventions live above the project (so they’re shared across the team’s repos) and below the org (so the team can specialize without renegotiating with the company).

Three tiers cascading from org to team to project. Rules marked strict win going down the tree; rules that aren’t marked strict act as defaults the lower layer can adjust.

For a smaller company, the team layer collapses into the org. A solo developer skips both and uses only the project harness. The same shape works across sizes; the cascade only kicks in when there are layers worth cascading.

Who’s exploring this

The harness conversation is young. The people pushing it forward are mostly working in public, and reading their work is the fastest way to get up to speed.

Simon Willison has been documenting his coding-with-LLMs workflow on his blog for years. He’s the person to read if you want to see the day-by-day rhythm of working with agents: what works, what breaks, what shows up on the command line. His writing about llm, AGENTS.md, and Claude Code captures the practitioner view of the project and agent layers in real time.

Geoffrey Huntley has written some of the sharpest essays I’ve seen on agentic coding. His work on the shift from autocomplete to autonomous loops, and on the engineering discipline that has to grow up around them, has shaped how I think about the agent and project layers.

Birgitta Böckeler and the team at Thoughtworks have done some of the most patient pattern-finding work in the space. The Exploring Generative AI series on martinfowler.com catalogs what’s working, what isn’t, and what’s still ambiguous, with the kind of careful field reporting that consulting work makes possible. If you want patterns rather than hot takes, start there.

Steve Yegge and Gergely Orosz wrote the Pragmatic Engineer piece that named the gap between vibe coding and agentic coding for a lot of working engineers. Yegge’s perspective from Sourcegraph and Orosz’s reach pulled the conversation about agent workflows into mainstream engineering rather than leaving it in an AI niche.

Anthropic’s Claude Code team has done most of the work shipping the primitives the harness layers actually run on. CLAUDE.md, slash commands, hooks, skills, the memory system, settings inheritance. Each release moves the model and agent layers forward and gives the rest of us more raw material.

David Crawshaw (sketch.dev) has written about agents-as-coworkers in a way that grounds the abstraction in lived day-to-day pairing. His pieces are useful because they push back on hype while taking the agents seriously.

Hamel Husain and Eugene Yan write the most useful material I know on evaluation: how you tell, empirically, whether your harness is working. The harness conversation will collapse without good eval practice underneath it, and they are the practitioners pushing eval discipline into the AI coding space.

The space moves fast and the names worth reading change quarter to quarter. The shape of the conversation does not.

Keystone hits 1.0

I built a tool to make the project, team, and org layers easier to start and easier to maintain. It’s called Keystone, and it just hit 1.0.

What it does: Keystone is an agent harness framework. It scaffolds the directories the agent reads, seeds them with sensible defaults, and wires up verification gates that catch the agent’s mistakes before they reach a commit. The harness is markdown-only after install: no central service, no runtime dependency, nothing to break if you uninstall.

What it ships with: a six-phase workflow (spec, plan, implement, verify, review, release), a corpus of guides the agent loads on demand, computational sensors (lint, type-check, test) and inferential sensors (functional, security, risk reviews), state ledgers that track code debt and quality signals, and a learning loop that turns review feedback into rules.

Where it adds the cascade: Keystone’s org layer ships as policy plugins, a directory of organization-owned rules that projects pull in on update. Strict policies lock things projects can’t override; non-strict policies act as defaults. The team layer sits between, for organizations big enough to need different policies for different teams. Solo developers can ignore both and use only the project layer.

Why it earns its place: a harness is a lot of files. Writing them from scratch is the reason most teams never start. Keystone solves the empty-room problem by giving you a working harness on day one, then giving you a flywheel for shaping it to your code over time. The cascade means you can share what works without rewriting it in every repo.

A separate post goes into the details of the 1.0 release, the design choices behind it, and how to set it up for a team. For now: it lives at tacoda.dev/keystone, and the install is one command.

Try this in your harness this week

A few concrete moves, in order of how much they’ll teach you about your own setup.

Audit one rule for layer fit. Pick a rule already in one of your CLAUDE.md files. Ask which layer owns it. If it’s true across every project you work on, hoist it to your agent harness. If it’s true for one team but not all, it belongs in a team or org layer. If it’s only true here, leave it. You’ll find at least one rule in the wrong layer on the first audit.

Write down the rule you keep retyping. Notice a rule you’ve explained to the agent three times this month? It belongs in the agent or org harness, not in your head. Put it in the right file once. Stop paying the cost again.

Try a subdirectory CLAUDE.md. If your root CLAUDE.md is over a few hundred lines, move the parts that are scoped to specific modules into their own files. Watch how much faster the agent responds when it isn’t skimming a wall of rules per turn.

Use the vocabulary with your team. The five-layer naming is the cheapest tool you can install. Once everyone agrees there are five somewheres to choose from, “where does this go?” stops being a debate and starts being a category question.

DEV Community