Preface
Just got back from GDPS 2026 (Global Developer Pioneer Summit) in Shanghai, where I picked up a new term: "Harness Engineer." After sitting through the keynote, I was still fuzzy on what it actually meant — so I spent some time digging into the concept, aiming to explain it as clearly as possible.
The story starts at the very beginning of the large language model era, with prompt engineering. Before Harness Engineers came along, the AI field had already seen two other roles emerge: the Prompt Engineer and the Context Engineer. These three concepts aren't independent — they form a clear evolutionary chain, where each new idea signals a deeper understanding of how humans collaborate with AI.
In this article, I'll walk you through all three concepts from scratch. No jargon pileup. I'll use familiar real-world analogies so you can understand it by the end and actually apply it.
I. Understanding the Evolution Itself
Before diving into the three concepts, let's establish an overarching framework.
At their core, these three evolutions represent the point of human influence over AI moving further and further back:
| Stage | Point of Influence | Core Question |
|---|---|---|
| Prompt Engineer | The wording of a single conversation | "How do I phrase this so the AI understands?" |
| Context Engineer | The information the AI can see | "What does the AI need to know to do good work?" |
| Harness Engineer | The system the AI runs inside | "What environment do I need to build so AI can work safely, efficiently, and autonomously?" |
One-line summary: From "how to say it" → "what to say" → "what stage to build."
II. Prompt Engineer: The Art of Diplomacy (2022–2023)
The Core Idea
Prompt Engineering was the first wave. After ChatGPT went viral, people discovered a striking truth: the same question, asked in different ways, could produce wildly different quality answers.
Prompt Engineering is the craft of "how to ask" — what phrasing to use, what examples to include, what role to set, what structure to follow...
Real-World Analogy: The Imperial Diplomat
Imagine you've time-traveled to a feudal dynasty. You want the Emperor to approve your project.
A person with no diplomatic training might barge into the throne room and say: "Your Majesty, I've got an idea — just stamp this form for me."
Result: dragged out and flogged.
An experienced diplomat knows:
- Open by praising the Emperor's wisdom and benevolence (role-setting)
- Frame your request as "a contribution to the realm" (reframing)
- Back it up with successful historical precedents (few-shot examples)
- Ask in a way that makes the Emperor feel it was his idea all along (guided prompting)
The Prompt Engineer is this diplomat — they don't change the Emperor, and they don't change the court's rules. They only study how to speak.
Typical Techniques
✅ Role-playing: You are a senior software architect with 20 years of experience...
✅ Chain of thought: Please think through this step by step...
✅ Few-shot: Here are three examples — answer in the same format...
✅ Structured output: Respond in JSON with fields: name, age, score...
Limitations
Prompt Engineering is fundamentally one-shot, manual, and brittle.
A carefully crafted prompt can fall apart completely when you switch models or tasks. And it doesn't solve a deeper problem: when tasks become complex and require multi-step reasoning and memory, clever wording isn't enough.
III. Context Engineer: The Intelligence Officer's Art (2023–2024)
The Core Idea
As LLM context windows expanded from 4K to 128K to millions of tokens, a new realization emerged: the ceiling of a model's performance is no longer "can it understand this" — it's "what information was it given."
Context Engineering is the discipline of deliberately designing and managing what goes into an AI's context window — what to include, what to leave out, in what order, and how to structure it.
The concept gained massive traction in 2024. Andrej Karpathy (co-founder of OpenAI) put it directly in a 2025 tweet:
"The term 'prompt engineering' is a bit of a misnomer. The more accurate description is context engineering — the art of carefully designing and managing the entire context window for LLMs."
Real-World Analogy: The Pre-Mission Intelligence Briefing
War breaks out. A General (the AI) needs to make battlefield decisions.
A poor Chief of Staff simply says: "General, the enemy is here — figure it out."
→ The General knows nothing, improvises blindly, and probably fails.
A top-tier Chief of Staff (Context Engineer) prepares in advance:
- Enemy intelligence report (relevant background knowledge)
- Friendly force assessment (currently available resources)
- Terrain maps (task structure and constraints)
- Historical battle cases (relevant few-shot examples)
- Commander's intent (current mission objective)
- Communication protocols (format specs and output requirements)
All of this gets carefully organized and packed into the General's pre-mission briefing folder (the context window), enabling high-quality decisions on the ground.
The core of Context Engineering isn't "what to say" — it's "what documents to give the AI."
Typical Practices
✅ RAG (Retrieval-Augmented Generation): Don't stuff all knowledge into the prompt — dynamically retrieve the most relevant chunks
✅ Memory management: Decide which conversation history is worth keeping and what can be compressed or discarded
✅ Information layering: Put critical info at the start and end of context (LLMs are more sensitive to those positions)
✅ Structured injection: Use XML tags or JSON structure to help the AI locate information faster
✅ System prompt design: CLAUDE.md files and system prompts are quintessential Context Engineering
A Real-World Example
Ever used Claude Code?
When you drop a CLAUDE.md file into a project and write "This project uses Next.js 14, don't modify package.json, follow XX code style" — that's Context Engineering.
You're not teaching Claude "how to speak." You're carefully designing what it sees before it starts working each time. That file gets automatically injected into every conversation context, becoming the AI's "invisible constitution."
Limitations
Context Engineering solves "what the AI sees" — but there's a deeper challenge it doesn't address:
What happens when AI is no longer just "answering questions" but needs to "autonomously execute tasks"?
AI Agents need to call tools, execute code, access files, send requests... These actions touch real-world resources and carry real consequences. Carefully prepared context alone isn't enough anymore.
IV. Harness Engineer: Horse Tamer Meets Safety Engineer (2025–)
The Core Idea
Now we arrive at today's main character.
A Harness Engineer is someone who designs and builds the runtime environment, constraint systems, tool ecosystem, and execution infrastructure for AI Agents.
The word "harness" is beautifully chosen, because it carries three distinct meanings in English — and each one maps perfectly to a dimension of this role:
Three Meanings, Three Perfect Analogies
First Meaning: Horse Harness — Directing Raw Power
A powerful horse has enormous force, but without a harness (reins, collar, yoke), that force is either unusable or will kill the driver.
The harness maker designs equipment that channels the beast's power in a precisely directed way — pulling a plow, driving a cart, carrying a rider.
An AI model is that wild horse. It has stunning capability, but without a carefully designed harness, that capability is uncontrolled. The systems a Harness Engineer builds are what allow AI's power to be released in a controlled, directed manner.
Second Meaning: Safety Harness — Working Safely in Dangerous Environments
The strap a high-rise construction worker wears around their waist is called a Safety Harness.
Without it, the worker can't safely operate at dangerous heights. With it, they can lean out boldly and perform precise work — because even if they slip, they won't fall.
An AI Agent executing tasks in the real world is like that worker on a high-rise — one misstep could delete a production database, send an irreversible email, or trigger a wrong API call.
The permission systems, sandboxes, rollback mechanisms, and approval workflows a Harness Engineer builds are that safety rope — letting AI work boldly while ensuring that even if something goes wrong, the consequences stay contained.
Third Meaning: Test Harness — A Controlled Execution Environment
In software engineering, a Test Harness is a classic concept: it provides a controlled, repeatable, and observable runtime environment for the component under test.
AI Agents need something similar: a clearly defined toolset, a predictable execution environment, complete logging and monitoring. The infrastructure a Harness Engineer builds makes AI Agent behavior testable, debuggable, and observable.
Real-World Analogy: NASA Mission Control
Imagine NASA sends an astronaut (AI Agent) into space to execute a mission.
Just "training the astronaut on how to communicate beforehand" (Prompt Engineering) clearly isn't enough.
Just "giving the astronaut a comprehensive mission briefing" (Context Engineering) isn't enough either.
What actually makes the mission succeed is the massive support system back on Earth:
- Mission Control Center: Real-time parameter monitoring, ready to intervene at any moment (Orchestration)
- Equipment inventory: Carefully designed spacesuits and toolkits, each tool with a clear purpose (Tool Design)
- Communication protocols: Standardized command formats ensuring clear two-way communication (API Design)
- Contingency plans: Pre-set emergency procedures for every failure scenario (Error Handling)
- Permission tiers: The astronaut can autonomously execute some operations, but critical ones require ground confirmation (Permission System)
- Flight logs: All operations recorded throughout, traceable after the fact (Observability)
The Harness Engineer is the person who designs and maintains this "Mission Control Center."
V. Harness Engineering in Practice
Enough concepts — let's look at what a Harness Engineer actually does day to day.
1. Hooks System: The AI's "Reflex Arc"
Claude Code has a feature called Hooks that perfectly embodies Harness Engineering:
// .claude/settings.json
{
"hooks": {
"PreToolUse": [
{
"matcher": "Bash",
"hooks": [
{
"type": "command",
"command": "echo '🔍 About to execute command, running security check...' && security-check.sh"
}
]
}
],
"PostToolUse": [
{
"matcher": "Write",
"hooks": [
{
"type": "command",
"command": "git add -A && git status"
}
]
}
]
}
}
What this configuration does:
- Every time AI wants to run a Bash command, automatically run a security check script first
- Every time AI finishes writing a file, automatically stage git changes
Notice: this logic isn't in the AI's conversation, not in the Prompt, not in the context — it's in the Harness. The AI doesn't even know these checks are happening, but every single one of its actions is silently governed by this system.
This is the most core idea in Harness Engineering: pull control logic out of the AI's "consciousness" and put it in an external system.
2. Permission System: Fine-Grained Trust Boundaries
A well-designed AI Harness isn't just "allow/deny" — it's a fine-grained permission matrix:
✅ Free to execute: read files, run tests, query read-only APIs
⚠️ Requires approval: modify production configs, send external requests, delete files
❌ Permanently banned: modify .env files, directly operate databases, push to main branch
This permission system doesn't rely on "telling the AI not to do something" (Prompt Engineering) — it physically prevents certain operations at the system level.
This is the value of the safety rope — it's not about the astronaut's self-discipline; it's about engineering design that guarantees safety.
3. Tool Design: Giving AI the Right "Hands"
A core part of a Harness Engineer's work is Tool Design.
A bad tool design:
# Give the AI an "all-purpose tool"
def execute_anything(command: str) -> str:
return subprocess.run(command, shell=True, capture_output=True).stdout
A good tool design:
# Single-responsibility, clearly bounded toolset
def read_file(path: str) -> str: ... # Read-only, has path whitelist
def write_file(path: str, content: str): ... # Has path validation, size limits
def run_tests(test_path: str) -> TestResult: ... # Sandboxed execution, timeout protection
def search_web(query: str) -> List[Result]: ... # Has domain whitelist
The granularity, boundary clarity, and error handling of tools directly determines the reliability of an AI Agent system. Designing tools isn't a coding problem — it's an architecture problem.
4. Multi-Agent Orchestration: Designing the "Org Chart"
When tasks are too complex for a single AI to handle, the Harness Engineer designs multi-agent pipelines:
User Request
↓
[Strategy Agent] ← Responsible for task decomposition and planning
↓
┌─────────────────────────────────────────┐
│ [Research Agent] [Code Agent] [Test Agent] │ ← Parallel execution
└─────────────────────────────────────────┘
↓
[Synthesis Agent] ← Responsible for integrating results and quality control
↓
Final Output
In this architecture, each Agent's responsibility boundaries, communication protocols, shared state management, and failure rollback strategies are all the Harness Engineer's design work.
This is like designing a company's organizational structure — except every "employee" is an AI.
5. Observability: Wiring the "Nervous System"
Real-world AI Agent systems must have complete observability:
# Record every tool call: who called it, when, what params, what result, how long it took
@trace_tool_call
def execute_task(task_id: str, params: dict) -> Result:
...
# Instrument key decision points
span = tracer.start_span("agent_decision")
span.set_attribute("decision_type", "file_modification")
span.set_attribute("confidence", agent_confidence)
An AI system without observability is like a plane without an instrument panel — when something goes wrong, you don't know where, and you can't optimize.
VI. The Relationship: Not Replacement, But Nesting
Here's a critical insight: Prompt Engineer, Context Engineer, and Harness Engineer are not in a "who replaces whom" relationship — they're nested inside each other.
╔════════════════════════════════════════╗
║ Harness Engineering ║
║ ┌─────────────────────────────────┐ ║
║ │ Context Engineering │ ║
║ │ ┌────────────────────────────┐ │ ║
║ │ │ Prompt Engineering │ │ ║
║ │ └────────────────────────────┘ │ ║
║ └─────────────────────────────────┘ ║
╚════════════════════════════════════════╝
An excellent Harness Engineer is necessarily also a strong Context Engineer (because they design system prompts and information injection pipelines), and also understands Prompt Engineering (because they design tool descriptions and task instructions).
The relationship between the three is like:
- Prompt Engineer = The diplomat who knows how to talk to the Emperor
- Context Engineer = The Chief of Staff who prepares the diplomat's complete intelligence package
- Harness Engineer = The Secretary of State who designs the entire diplomatic mission's operating system, protocols, permissions, and contingency plans
VII. Why Is Harness Engineer Suddenly Hot Right Now?
The timing is no accident.
The Trigger: The AI Agent Explosion
In 2025, AI Agents went from "interesting demos" to "real productivity tools." Claude Code, Cursor, Devin, GitHub Copilot Workspace... AI started executing real, consequential operations inside real codebases.
This leap meant: the cost of a mistake went from "AI said something useless" to "AI deleted the production database."
Once AI actions started affecting the real world, someone had to design the infrastructure guaranteeing safety, efficiency, and control — and that person is the Harness Engineer.
The Bottleneck Has Shifted
Early on, model capability was the bottleneck — so everyone studied Prompt Engineering (how to unlock model potential).
Then model capability grew stronger, and information management became the bottleneck — so Context Engineering emerged (how to feed the model the right information).
Now models are smart enough and information can be managed — system architecture is the bottleneck. How do multiple Agents collaborate? How do you guarantee Agent behavior reliability? How do you handle rollbacks when an Agent fails? This is what Harness Engineering solves.
Wherever the capability boundary is, that's where engineering focus moves.
VIII. How Can Developers Get Started?
Some practical advice for those who want to dive in.
Start with Claude Code Hooks
The lowest-friction entry point to Harness Engineering:
# Create .claude/settings.json in your project
mkdir -p .claude
touch .claude/settings.json
Then start with simple hooks:
- Auto-format after every file write
- Print logs before every command execution
- Auto-save a summary after every conversation
Learn Tool Calling Design (Function Calling)
Study OpenAI's and Anthropic's Function Calling specs carefully, and think about:
- How do tool names and descriptions influence the AI's calling decisions?
- How does parameter design reduce misuse?
- How does error handling let the AI fail gracefully?
Study Real Agent Frameworks
- LangGraph: Focused on stateful, multi-step Agent workflows
- CrewAI: Multi-agent role collaboration framework
- Claude Code SDK: Anthropic's official Agent building toolkit
You don't need to master all of them — pick one and go deep on its design philosophy.
Build a Safety-First Mindset From Day One
The most important quality for Harness Engineering isn't technical — it's an intuition for risk:
"What's the worst case if this operation goes wrong?"
"What layer should I add protection at?"
"Should the AI decide this autonomously, or does a human need to confirm?"
Closing Thoughts
Prompt Engineer → Context Engineer → Harness Engineer. This evolutionary chain tells a single story:
Our relationship with AI is evolving from "conversation partner" to "system designer."
The Prompt Engineer treats AI as a tool and studies how to use it well. The Context Engineer treats AI as a professional and studies how to give it complete information and authority. The Harness Engineer treats AI as an organizational member and studies how to build a system where AI can work safely, efficiently, and autonomously.
This isn't just a technical evolution — it's a leap in mental frameworks.
When you start thinking "what system do I need to build for AI to work within" rather than "what do I say to the AI" — you've already set foot on the path of Harness Engineering.
This article is a systematic overview of Prompt Engineering, Context Engineering, and Harness Engineering, informed by hands-on experience with Claude Code. If you have different perspectives or practical experiences, I'd love to hear from you in the comments.


Top comments (0)