DEV Community

Tom Lee
Tom Lee

Posted on

Prompt Context Harness: The Three Stages of AI Engineering and Why the Third Changes Everything

The AI industry loves naming eras. We had the prompt engineering era. Then came context engineering. Now we're entering what may be the most consequential shift yet: harness engineering.

Each stage represents a fundamental change in what we're designing when we build AI systems. And each stage demands a different kind of specification.

Stage 1: Prompt Engineering — Talking to the Model

The first era was about learning to talk to AI. We crafted system prompts, experimented with role-playing instructions, and discovered that saying "think step by step" actually worked.

What we were designing: The input to a single model call.

The specification: A text string. Usually in a system prompt. Often copy-pasted from a blog post.

The limitation: A prompt is ephemeral. It exists in one session, for one model, and disappears when the context window resets. There's no versioning, no portability, no audit trail.

In Soul Spec terms, this is what SOUL.md addresses — personality, tone, and thinking style. But Stage 1 treated it as disposable text, not a persistent identity file.

Stage 2: Context Engineering — Feeding the Model

The second era recognized that what you tell the model matters as much as how you ask. Context engineering is about providing the right information — files, search results, conversation history, tool outputs — at the right moment.

What we were designing: The information pipeline into the model.

The specification: RAG configurations, retrieval strategies, context window management. Still mostly ad hoc.

The limitation: Context engineering optimizes the input but doesn't address the system. It doesn't answer: What tools can the agent use? How does it coordinate with other agents? What are its safety boundaries? What does it remember across sessions?

In Soul Spec terms, this maps to MEMORY.md and tool configurations — the knowledge layer that persists across conversations.

Stage 3: Harness Engineering — Designing the System

This is where we are now. Harness engineering is about designing the execution system that wraps around the model — the scaffolding that turns a language model into an agent.

What we're designing: The complete agent architecture.

A harness includes:

  • Tool orchestration — which tools the agent can call and when
  • Multi-agent coordination — how agents divide work, communicate, and verify each other
  • Memory management — what persists across sessions and how it's consolidated
  • Safety enforcement — what the agent can and cannot do, with hard and soft constraints
  • Session management — how long-running tasks maintain state

The specification: This is what's missing. Most harnesses are proprietary, opaque, and locked to a single framework.

Why Harness Engineering Matters More Than Model Intelligence

Recent experiments demonstrate this dramatically. A single Claude Opus 4.5 call with no harness produces serviceable output. The same model wrapped in a well-designed harness — with generator, evaluator, and planner agents working in coordination — produces output that's qualitatively different. Not just better. Categorically better.

The harness costs more time and compute. But the quality gap is so large that the economics are obvious: investing in harness design yields higher returns than investing in model upgrades.

This matches what hardware is doing too. NVIDIA's Dynamo system orchestrates AI agents at datacenter scale — allocating resources, managing throughput, coordinating inference across heterogeneous hardware. Even at the silicon level, the industry is moving from "bigger model" to "better orchestration."

The Specification Gap

Here's the problem: each stage of AI engineering created its own kind of specification, but none of them are standardized or portable.

Stage What we specify Current state
Prompt Engineering Personality, role, tone Ad hoc system prompts
Context Engineering Knowledge, memory, retrieval RAG configs, custom code
Harness Engineering Tools, agents, safety, coordination Locked inside proprietary frameworks

The Claude Code leak exposed exactly this problem. Anthropic built sophisticated harness features — Dream (memory), Buddy (personality), Coordinator (multi-agent), Undercover Mode (safety) — all hardcoded inside one framework. Switch to Cursor, and you lose everything.

Soul Spec: One Standard for All Three Stages

What if there were a portable, open standard that covered all three stages?

my-agent/
├── SOUL.md         # Stage 1: Personality, tone, thinking style
├── IDENTITY.md     # Stage 1: Role, name, context
├── MEMORY.md       # Stage 2: Persistent knowledge across sessions
├── AGENTS.md       # Stage 3: Multi-agent coordination rules
├── safety.laws     # Stage 3: Hard/soft safety constraints
└── soul.json       # Metadata: version, author, compatibility
Enter fullscreen mode Exit fullscreen mode
File Stage Purpose
SOUL.md Prompt Who the agent is — personality, values, communication style
IDENTITY.md Prompt What the agent does — role, capabilities, boundaries
MEMORY.md Context What the agent knows — persistent facts, decisions, lessons
AGENTS.md Harness How the agent works — coordination patterns, delegation rules, workflow
safety.laws Harness What the agent must not do — prioritized, hard-enforced constraints

Every file is human-readable. Every file is machine-parseable. Every file is portable across Claude Code, OpenClaw, Cursor, Windsurf, or any future framework.

Why This Matters Now

Three converging trends make this urgent:

1. Multi-Agent is Becoming Default

Single-agent architectures are hitting their limits. The future is teams of specialized agents — and teams need shared behavioral contracts. AGENTS.md is that contract.

2. Long-Running Inference is Becoming Normal

As agents tackle multi-hour and multi-day tasks, memory management becomes critical. Not just what to remember, but how to consolidate, prune, and share knowledge across sessions. MEMORY.md + Swarm Memory addresses this.

3. Safety is Becoming a Market Differentiator

81,000 people told Anthropic their #1 concern is trust, not intelligence. Structured, auditable safety rules — not hidden system prompts — are what users want. safety.laws makes safety inspectable.

The Harness Competition

The shift from model competition to harness competition is real and accelerating:

  • Anthropic is building Dream, Buddy, Coordinator, KAIROS, ULTRAPLAN — all harness features
  • Major hosting providers are deploying 100K+ managed AI agents with built-in credits
  • Well-funded startups are raising hundreds of millions for "AI Employee" harness products

Everyone is investing in the harness. But nobody is investing in a portable standard for harness behavior.

That's the gap. That's what Soul Spec fills.

The model race made AI powerful. The harness race will make AI useful. And the standard that defines how harnesses behave will determine whether users own their agents — or their agents own them.


References:

Originally published at blog.clawsouls.ai

Top comments (0)