Does this sound familiar?
Your AI just fixed a bug. Two weeks later, the exact same bug is back.
You deploy something, and you have no idea if it actually worked — so you manually test it.
You've written 100 lines of rules in your config file, but the AI still ignores half of them.
Every new chat session, you re-explain the same context from scratch.
I ran into all four of these problems while building an internal AI quoting system for a healthcare company — with no technical background. And after months of debugging, I realized: none of these were model problems. They were Harness problems.
What is Harness Engineering?
Harness Engineering is the discipline of building the scaffolding around your AI — the rules, constraints, verification scripts, and knowledge structures that make it produce consistent, reliable output.
Without Harness, even the best model will drift, forget, and repeat the same mistakes.
The data backs this up: research shows that 80% of Agent quality failures come from Harness gaps, not model limitations. And in one benchmark, the same 15 models all improved significantly when only the Harness changed — not the models themselves.
The problem is: most people don't know what their Harness is missing. They just know something feels broken.
The framework: two dimensions, not six steps
After studying real production failures and building my own system from scratch, I organized Harness Engineering into two dimensions.
Vertical Quality Layers (Q) — required for every project
| Layer | Name | What it solves |
|---|---|---|
| Q1 | SPEC | AI knows what to build, what not to, and how to verify |
| Q2 | Rules + Security | Hard business limits + security red lines, equally mandatory |
| Q3 | Skills | Repetitive workflows standardized with counter-examples |
| Q4 | Scripts (unified gate) | Nothing is "done" until scripts pass |
Horizontal Scale Layers (S) — enable only when needed
| Layer | Name | When to enable |
|---|---|---|
| S1 | Context | Sessions losing coherence after ~20 turns |
| S2 | dev-map + Memory | Project iterating 2+ months, AI re-inventing solutions |
| S3 | Multi-Agent | Single agent consistently failing on long task chains |
The key insight: Q4 is not step four. It's the exit gate for every layer. Code changes, doc updates, multi-agent outputs — all must pass Q4 before anything counts as done.
Most people skip Q4 entirely. That's why the same bug keeps coming back.
What I built: Rein
Rein is an open-source Skill for Claude Code (and any agent supporting the SKILL.md standard) that acts as a silent Harness Engineering advisor throughout your project.
It watches your conversations for patterns — not keywords — and speaks up only when it detects a real gap. When everything's fine, it stays silent. Silence is a feature.
What it detects automatically:
- Repeated failures (same bug fixed twice → missing Rule or regression test)
- Context loss (re-explaining background every session → incomplete project docs)
- Scale shifts (internal tool going external → time to harden your Harness)
- Cost spikes (API bill climbing → identifies token waste sources)
- Over-engineering (more config, slower shipping → tells you what to delete)
Test results: 97% pass rate across 16 scenarios with Rein vs 52% without.
The biggest gap was in root cause diagnosis: 92% accuracy with Rein, 24% without.
A real example from my project
My verify.sh only checked if the service started. It didn't check if the business logic was correct.
So when the AI "fixed" a pricing calculation bug, it passed my verification — service was running — but the actual calculation was still wrong. Same bug, two weeks later.
After adding a business baseline check (call a known correct quote request, compare against expected output), that class of bug disappeared entirely.
This is Q4. Not just "is the service alive?" but "is the output actually correct?"
Install
git clone https://github.com/DtoTHEmoon/rein-skill.git ~/.claude/skills/rein
Restart your agent. Rein activates automatically — no commands needed.
Also works with: OpenClaw, Codex CLI, Gemini CLI, Cursor, and any agent supporting SKILL.md.
The core philosophy
Start minimal. Add only when you have a real pain point. And know when to subtract — Rein will tell you when your Harness is getting in your own way.
If your scaffolding is slowing you down, it's time to cut.
GitHub: github.com/DtoTHEmoon/rein-skill
Top comments (0)