DEV Community: Yuan

I stopped prompt-engineering my AI coding agent. I started engineering the repo instead.

Yuan — Fri, 29 May 2026 02:43:50 +0000

The itch

Quick hello first: I'm a developer based in South Korea, and English isn't my
first language — so I'll keep this plain and let the code and the repo do most
of the talking. (If a sentence reads a little stiff, that's the translation tax,
not the idea.)

If you've handed real work to an AI coding agent, you know the pattern.

It edits the wrong file. It re-introduces an approach your team rejected three
months ago. It invents a folder structure nobody agreed to. You correct it in
the chat, it says "You're absolutely right," and then the next session it does
the exact same thing — because the next session starts with none of that
context.

I spent weeks getting better at prompting. Better system messages, better
examples, sharper instructions. It helped one conversation at a time. But every
new task, every new branch, every teammate's agent started from zero again.

At some point the framing flipped for me: prompt engineering improves one
interaction. It does nothing for the environment that interaction happens in.

The reframe: engineer the repo, not the prompt

So I stopped optimizing prompts and started optimizing the repository. The
idea — I've been calling it harness engineering — is to turn the implicit
context an agent keeps missing into durable artifacts that live in the repo and
survive every session.

It comes down to four components, plus a way to keep them from rotting:

Component	Job	Typical files
Instruction document	Tell the agent how to behave	`AGENTS.md`, `CLAUDE.md`
Architecture constraints	Block invalid structure before merge	linters, type checks, import rules
Feedback loops	Correct behavior fast	tests, CI, pre-commit, examples
Knowledge store	Preserve decisions and dead ends	`docs/decisions`, `docs/failures`

And the part most people skip: garbage collection — drift checks that fail
when docs reference missing files, when temp files sneak in, when the structure
wanders away from what you agreed to.

The operating principle that ties it together:

Every recurring agent failure should become at least one durable artifact — a
clearer instruction, an automated constraint, a test or CI check, a decision
or failure record, or a drift check.

You're not trying to make the agent perfect. You're trying to make the project
easier to understand and harder to damage.

The build

I packaged this into a starter kit. Design constraints I gave myself:

Tool-agnostic. It's prompt-first. You hand any agent the repo URL, it reads the kit, and adapts the pattern to your stack. Not locked to one vendor's agent.
Boring on purpose. MIT licensed, standard-library Python, plain Markdown. No framework to install, nothing to audit for an afternoon.
Conservative. It inspects the target repo first and adds only the missing pieces. It never blindly overwrites your files.

Because I kept tripping over English-only docs myself, I wrote the README in
four languages — English, Korean, Japanese, and Chinese. If you've ever bounced
off a great tool because its docs assumed your first language, you know why that
mattered to me.

A drift check is deliberately tiny — small enough to read in one sitting:

# scripts/check_docs_drift.py (excerpt)
# Fails when a doc links to a path that doesn't exist,
# so your README can't quietly rot.
for doc, reference_value in missing_paths:
    print(f"Missing referenced path in {doc.relative_to(root)}: {reference_value}")
return 1 if missing_paths else 0

The instruction file (AGENTS.md) is just as plain — project overview,
directory rules, exact commands, forbidden actions, PR behavior. Nothing magic.
The value isn't in any one file; it's in having all four pillars present and
enforceable.

The dogfood: building a Django app through the harness

A methodology you only ever describe is a blog post. I wanted to know if it
actually held up, so I took an empty folder and built a small Django app —
a board with post CRUD, ownership permissions, admin user management, search,
pagination, comments — entirely through the harness workflow.

What accumulated in the repo as I went:

8 decision records in docs/decisions/ — why generic-first, why this Django layout, why post-ownership permissions work the way they do.
CI running the same harness checks GitHub Actions runs on every push.
.harness/source.json pinning the exact kit commit the repo absorbed, so "which version of the methodology is this repo on?" has a real answer.

The features weren't the point. The point was that every behavior change shipped
with its verification and its durable memory, automatically, because the
harness made that the path of least resistance.

The moment it bit me — and why that's the best part

Here's the beat I didn't plan for.

I added CI. It immediately went red. The failure was inside my own drift
checker:

Missing referenced path in docs/decisions/0002-initialize-django-config-project.md: .\.venv\Scripts\python.exe
Missing referenced path in docs/harness/adoption-report.md: .\.venv\Scripts\python.exe

My docs documented a Windows virtual-env command, .\.venv\Scripts\python.exe.
My drift checker saw the backslashes and "helpfully" decided it was a file
reference — then checked whether that file existed. It existed on my Windows
machine, so it passed locally. It did not exist in Linux CI. Green on my
laptop, red in the cloud. The classic.

The tool I built to catch drift had drifted.

But this is exactly what the methodology is for. I didn't just patch it and
move on. I fixed the checker to recognize venv commands by executable name
instead of path existence — and then I wrote it down, in
docs/failures/0001-docs-drift-windows-venv-command.md: context, symptoms, root
cause, resolution, prevention. Now the next agent (or the next me, at 1 a.m.)
that touches drift logic reads a one-page record instead of rediscovering the
trap.

A recurring failure became a durable artifact. The principle, applied to the
tool itself.

The result (and an honest limit)

I also built a quick diagnostic, harness doctor, that scores how ready a repo
is for reliable agent collaboration across five axes. Run against the dogfooded
Django app:

Score: 83/100 (baseline evidence scan)
Grade: B+ (baseline)

Now the honest part, because dev.to readers can smell a pitch: that score is a
baseline evidence scan. It checks that durable files and command patterns
exist — it does not yet prove that adopting the harness measurably reduces
repeated agent mistakes. I have the measurement protocol (wrong-file edits,
first-pass verification, repeated-mistake counts) but not enough before/after
runs to claim a number. That's the next thing I'm working on, and it's the
subject of a follow-up post.

So: take this as a strong qualitative result — a real app, real decision and
failure memory, a tool that caught its own bug — not as a benchmark. Yet.

Takeaway

The shift that actually changed how my agents behave wasn't a cleverer prompt.
It was treating the repository as the thing you engineer:

Every recurring agent failure should become a durable artifact.

If your agent keeps making the same mistake, stop re-explaining it in chat.
Write it into the repo once, in a form the next session can't miss.

The kit is MIT-licensed and on GitHub:
https://github.com/baskduf/harness-starter-kit

To try it, open your repo with your agent and point it at that URL — it'll
inspect your stack and add only the harness pieces you're missing. Stars,
issues, and "this broke on my stack" reports are all genuinely welcome — and if
you read the Korean, Japanese, or Chinese README and something sounds off, tell
me; I maintain all four by hand.

Part 2 will be the measurement: can I show, with numbers, that a harnessed repo
makes agents repeat fewer mistakes? Follow along if that's the post you actually
want.

What stack would you try this on? Tell me in the comments — I'm collecting
real targets for the measurement study.

Harness Engineering: Stop Re-Prompting Your Coding Agent Every Session

Yuan — Tue, 26 May 2026 05:10:05 +0000

Every time I started a new agent session, I was re-explaining the same things.

The architecture rules. The patterns to avoid. The decisions I'd already made. The approaches that already failed.

The agent would forget everything and I'd be back to square one.

My first instinct was to write better prompts. Longer, more detailed, more explicit. But that just made the problem worse — now I had a 200-line prompt to maintain, and the agent still forgot it all next session.

The real problem isn't the prompt. Prompts are session-scoped. They disappear.

A Different Approach: Harness Engineering

Instead of putting rules in prompts, what if we put them in the repository itself?

Prompting is temporary. Context is session-scoped. A harness is project-scoped.

The idea is simple: every time an agent makes a recurring mistake, convert it into a durable artifact in the repo instead of re-prompting.

Agent repeats a bad pattern → add a lint rule that blocks it
Agent forgets architecture decisions → write an ADR in docs/decisions/
Agent retries approaches that already failed → log them in docs/failures/
Agent ignores conventions → codify them in AGENTS.md

The repository gets smarter with every mistake. The agent reads the repo at the start of each session and picks up all the rules automatically — no prompting required.

The Five Components of a Harness

1. AGENTS.md — Instruction Document

A file the agent reads at the start of every session. Contains project overview, directory rules, forbidden patterns, test commands, and PR behavior. Think of it as a permanent briefing document.

2. Architecture Constraints

Automated rules that block invalid code before it merges. Linters, type checks, import boundaries, pre-commit hooks. If AGENTS.md says "no direct DB access from routes", add a lint rule that enforces it.

3. Feedback Loops

Signals the agent uses to self-correct. Test failures, CI failures, lint errors. A good harness gives the agent clear, actionable failure messages so it can fix its own mistakes.

4. Knowledge Store (docs/)

Durable context that survives session resets:

docs/decisions/ — why certain architectural choices were made
docs/failures/ — approaches already tried and rejected
docs/conventions/ — project-specific coding rules
docs/domain/ — business terminology and domain knowledge

5. Drift Checks

Scripts that detect when the harness itself goes stale. Is AGENTS.md referencing files that no longer exist? Are there temporary files that never got cleaned up? Drift checks catch this automatically.

harness-starter-kit

I built a starter kit that applies this pattern to any existing project.

Usage:

Clone it inside your target repo:

workspace/
└── target-repo/
    ├── harness-starter-kit/
    └── existing project files

Then give your coding agent this prompt:

Read ./harness-starter-kit first, then apply the harness engineering 
starter kit to this repository. Preserve existing architecture, tools, 
and conventions. Do not overwrite existing files without explaining why.
Finish with a short adoption report.

The agent inspects your repo, adapts to your existing tools (eslint, tsc, ruff, etc.), and installs only the missing harness files.

Key design decisions:

Non-destructive — never overwrites existing files
Tool-agnostic — works with whatever you already use
Agent-agnostic — works with Claude Code, Cursor, Copilot, etc.
Profiles available for generic, Python, and TypeScript projects

Why This Matters

Better prompting is a local fix. Harness engineering is a systemic fix.

Every recurring agent failure should become at least one durable artifact — a clearer rule, an automated constraint, a test, a decision record, or a drift check. That's the core loop.

The goal isn't to write the perfect prompt. It's to build a repository that makes the same mistakes increasingly unlikely over time.

👉 GitHub: harness-starter-kit

Have you been dealing with agents repeating the same mistakes? How are you handling it?