Harness Engineering in Practice: How I Built Mine in 4-steps

#vibecoding #claudecode #harnessengineering #agentskills

TL;DR: Harness engineering is the layer above context engineering — you build the system (documentation, standards, quality checks, tool configs) that lets AI run unattended. I built one over two days for my project book2skills and ended up with a fully automated book-to-skill publishing pipeline. This post walks through the four steps.

Harness engineering is a term that's been gaining traction recently. Like most emerging concepts, it's ahead of widespread adoption — we're still in the early days, and real-world examples are scarce. That's exactly why I want to share how I've been applying it in my own product.

What Is Harness Engineering?

Earlier this year, OpenAI published a write-up describing how their team built a production app with over a million lines of code — without a single line written by human hands. The engineers weren't writing code. They were building the system that allowed AI to write code reliably.

That system — the documentation, quality standards, constraints, and feedback loops — is the harness.

The word "harness" comes from horse tack: reins, saddle, bridle. A horse is powerful, but without that equipment it just goes wherever it wants. The AI is the horse. The harness is everything that channels its power in the right direction. You're the rider building it.

To put it in AI evolution timeline, here's how harness engineering fits into the broader arc of AI engineering:

Prompt + workflow: Single calls with rigid, hard-coded pipelines. Predictable, but inflexible.
Context engineering: Supports longer reasoning chains and more flexible orchestration — but the agent can drift mid-task, and someone usually needs to stay close.
Harness engineering: Unattended execution. You write the standards, principles, quality checks, and tool configurations — the "playbook" — upfront. When the agent hits a decision point, it consults the playbook. The better the playbook, the less you need to intervene.

Still abstract? Let me make it concrete.

My Case: Building a Harness for book2skills

book2skills is a skills factory — it takes classic non-fiction books and distills the frameworks and decision logic inside them into AI-ready skills, open source and free.

The full production pipeline looks like this:

Pick a book → Read it → Extract the skill → Publish to GitHub → Generate usage examples → Generate website content

I handle book selection myself — I enjoy it too much to hand it off. Everything else runs automatically.

The whole harness took me about two days to build, across four steps. None of it was generated in one shot — roughly 60% of my time went into iteration and refinement.

The Four Steps to Build Harness

Step 1: Write Thorough Requirements

Before building anything, I spent about two hours getting the brief right. I had the AI help me research competitors, understand what users actually want, map out implementation approaches, and find design inspiration. The goal was to turn vague instincts into explicit written standards.

Step 2: Build Skills at the Right Granularity

This took three to four hours and went through three phases:

first, get something working end-to-end — even if it's rough, even if the input is a simple PDF and the output is a basic skill;

then refine each skill individually;

then split any skill that grows too long into two. Skills shouldn't be too large or too small. Too large and the agent drifts. Too granular and you're essentially writing code. Finding the right size takes trial and error.

Finally I ended up with a fully automated publishing pipeline: One main skill book2skills-publisher, and then five subskills.

Each step has its own dedicated skill, all orchestrated by the main skill. But the orchestration isn't rigid like a Coze workflow — it's written in plain language, loosely structured. The agent can adapt as it goes. That flexibility is precisely what makes skills more powerful than hard-coded automation.

The pipeline goes like:

Pick a book → Read it → Extract the skill → Publish to GitHub → Generate usage examples → Generate website content

Here's what each skill does:

Book selection: The one step I do myself. Not because AI can't — I just genuinely love this part.

Reading (read-book-skill): Runs local Python scripts to extract the PDF chapter by chapter, then structures the content into markdown summaries. Claude Code's native PDF capabilities are the foundation here.

Skill extraction (book-skill-creator): Takes the chapter summaries and distills them into a SKILL.md — following Anthropic's skill-creator spec, covering skill dimensions, query-response frameworks, and output formats. This is the step that most determines skill quality, and the one most worth continuing to refine.

GitHub packaging (write-skill-repo): Takes the SKILL.md and generates a complete, spec-compliant GitHub folder — cleaned SKILL.md, README, LICENSE — zipped and committed to the local repo.

Usage examples (book-skills-examples): Most skills ship with no real demonstration of what they actually do. I wanted to fix that. This skill has the AI genuinely invoke the skill, combined with web search, to produce real usage examples — two to three rounds of conversation, with real data references. Readers can see exactly what the skill does before deciding whether it's useful to them.

Website content (write-skills-page-content): Pulls everything together into a publishable page — skill description, examples, install instructions, download link — and deploys it to book2skills.

Step 3: Write an Anchoring Document

Once the pipeline was running and the project had a clear shape, I wrote a "What is book2skills" document before doing any further refinement. Its real purpose wasn't to explain the product to the outside world — it was to set a boundary for myself, to make sure the skills I was about to build wouldn't be over-engineered or scope-creep into something they weren't meant to be. I ended up publishing it as book2skills' first blog post.

Step 4: Let Skills Iterate on Themselves

This is where things get interesting. If I find a flaw in a skill, I just tell it: "add this rule." It updates its own file. The change takes effect on the next run. No developer needed.

Over time, each skill accumulates its own references and specs — like a master craftsman's handbook, full of standards and precedents for the apprentice to consult. The richer that handbook gets, the stronger the harness becomes.

What's Next

The book-to-skill pipeline is done. Next up: bringing operations, UI, and SEO into the harness as well, that's why you are seeing more skills as below. I'll keep sharing as that unfolds.