Alan West

Posted on Mar 25 • Edited on Mar 29

Prompt Engineering Is Dead. Harness Engineering Is What Actually Works.

#ai #harnessengineering #claudecode #productivity

Remember when "prompt engineering" was the hot skill? Write the perfect prompt, get the perfect output. Then we realized that giving the model better context mattered more than crafting clever prompts, and "context engineering" became the thing.

Now theres a third evolution thats quietly replacing both: harness engineering. And if youre using AI coding agents for real work, this is the one that actually moves the needle.

What Harness Engineering Actually Means

The term got popular after Mitchell Hashimoto (HashiCorp co-founder) used it in February 2026 to describe something specific: building mechanisms that prevent AI agents from making the same mistake twice. OpenAI expanded on this when they published how their team built production software using only Codex agents, merging roughly 1,500 pull requests and generating about a million lines of code over five months without writing a single line by hand.

The key insight: they didnt get those results by writing better prompts. They got them by building better harnesses.

Think of it this way. If prompt engineering is telling a horse "turn right," context engineering is giving the horse a map, road signs, and visible terrain so it understands where its going. Harness engineering is building the reins, the saddle, the fences, and the road itself so that ten horses can run safely at the same time.

Its the infrastructure around the agent, not the instructions to it.

The Three Layers

Prompt engineering — what you say to the model. "Build me an auth system." This matters but has diminishing returns fast.

Context engineering — what information the model has access to. Your codebase, docs, examples, conversation history. This matters a lot more than most people realize.

Harness engineering — the entire environment: tools, constraints, validation, feedback loops, recovery mechanisms. This is what separates "AI generates code" from "AI reliably ships production features."

Each layer builds on the previous one. You still need decent prompts and good context. But the harness is what makes the difference between a demo and a workflow.

Practical Harness Components That Actually Work

Ive been running Claude Code as my primary development tool for months now, and the harness Ive built around it matters more than any model upgrade. Here are the components that made the biggest difference:

CLAUDE.md: Less Is More

Your CLAUDE.md file gets injected into every conversation. Most people stuff it with everything they can think of. Thats wrong.

An ETH Zurich study found that human-written agent instruction files improved performance by only about 4%, while auto-generated ones actually hurt results and cost 20%+ more tokens. The takeaway: keep it short, keep it manual, keep it universally applicable.

My production CLAUDE.md is under 50 lines. It has:

Project-specific conventions (naming, file structure)
Common pitfalls the agent keeps hitting ("dont modify the migration files directly")
Links to actual docs, not paraphrased instructions
Nothing that could be inferred from the codebase itself

Every time the agent makes a mistake, I add one line to prevent it next time. Thats harness engineering in its simplest form.

Context Window: The 60% Rule

Heres a number that changed how I work: past roughly 60% utilization of the context window, more context makes the agent actively worse. Not marginally worse. Measurably, consistently worse.

This means:

Dont dump your entire codebase into context "just in case"
Run test suites silently. Success should produce zero output. Only surface errors.
Use sub-agents as context firewalls. Give a sub-agent one discrete task, get back one result. The parent agent never sees the intermediate noise.

The biggest context killer is verbose tool output. A passing test suite that prints 200 lines of "OK" is actively degrading your agents performance for the next task.

Hooks for Automatic Verification

Set up hooks that run typechecks and builds automatically when the agent completes a task. The trick: exit silently on success, surface only errors. This forces the agent to fix problems before moving on, without polluting context with success messages.

This is the "shift worker" pattern from Anthropics harness engineering guide. Treat the agent like a contractor who comes in for a shift. They need to verify the state of things before starting new work, and leave things clean when theyre done.

Skills for Progressive Disclosure

Instead of loading all instructions upfront, use skill files that activate only when needed. This prevents bloating the base context with specialized knowledge the agent doesnt need for most tasks.

If the agent is writing a React component, it doesnt need your database migration conventions in context. Load those instructions only when theyre relevant.

Git as Recovery Infrastructure

Every meaningful change gets a commit with a descriptive message. Not because of good engineering practice (though it is), but because agents break things. When they do, you need to roll back cleanly.

The OpenAI Codex team structured their entire workflow around this: agents work on single features sequentially, commit after each, and the next session starts by reading git history to understand what happened before.

Why This Matters More Than Model Upgrades

Theres a common pattern I see: developers struggling with AI output quality, and their solution is to switch to a bigger model. Opus instead of Sonnet. GPT-5 instead of GPT-4. More tokens, more money, marginally better results.

The harness approach is the opposite. Same model, better environment. Constraints that prevent common failures. Validation that catches errors automatically. Context management that keeps the agent sharp instead of drowning in noise.

In my experience, a well-harnessed Sonnet consistently outperforms an unharnessed Opus. The environment matters more than the engine.

Getting Started

If you want to start building a harness without overthinking it:

Create a CLAUDE.md with your top 10 project rules. Keep it under 50 lines.
Every time the agent makes a mistake you have to manually fix, add a one-line rule to prevent it.
Set up a hook that runs your linter/typecheck after every agent action. Silent on success.
Use sub-agents for isolated tasks instead of cramming everything into one conversation.
Commit frequently. Read git log at the start of every session.

Thats it. No framework needed. No special tools. Just deliberate structure around the agent you already use.

The shift from "write better prompts" to "build better harnesses" is the same shift software engineering made decades ago from "write better code" to "build better systems." The individual unit matters less than the architecture around it.

DEV Community