I Built a Harness for My AI Agent. Then I Realized I Needed One for Myself.

#ai #productivity #webdev #career

My agent shipped 12 clean PRs in a weekend.

The product still isn't live three weeks later.

The harness around the model worked. The harness around me didn't exist. And I think that's the actual story of agentic coding in 2026 — not that the model can't build, but that nothing is holding you to finishing what it builds.

What a harness actually is

If you've done serious agentic coding, you already know the model alone is useless. A raw LLM is a horse with no harness — wild power, pointed nowhere. The harness is everything you wrap around it: scoped tasks, clean context, spec-driven loops, the discipline to not let it run wild. That's what turns a model into an agent that actually does work.

Andrej Karpathy put the failure mode precisely: your agent doesn't get dumb because the model is bad. It gets dumb because you feed it too much, too old, or the wrong thing to read. So you engineer the context. You curate what it sees. You stay in control instead of vibing.

This is real craft, and it works. I got good at it. I can keep an agent on rails for hours.

Here's what nobody told me: I built all of that discipline for the machine, and exactly none of it for myself.

The harness that doesn't exist

The model has a context window I carefully manage. I have a context window too, and on Day 4 of a side project it fills up with my day job, a flaky deploy, and a vague sense that "this idea wasn't that good anyway." Nobody scopes that down. Nobody notices when I go quiet.

The agent has a loop that checks its own output. I have no loop. When I stop, nothing fires. The PRs just sit there, 90% of a product, on a branch the world will never see.

This is the part of software that AI made worse, not better. METR measured experienced developers on real tasks in 2025 and found them 19% slower with AI tools (arxiv.org/abs/2507.09089) — because AI eats the easy, satisfying parts and leaves you alone at the hard, boring, project-killing part. The agent will give you the plan and the diff. It will not notice when you skip Day 4.

The build getting easier is the trap

Karpathy also called this the decade of agents, not the year — we're closer to the start of that curve than the end of it. And here's the uncomfortable second-order effect: as agentic tools get better, the build collapses to almost nothing. Ninety percent of a product in a weekend is now normal.

Which means the bottleneck didn't disappear. It moved one layer up.

When building was the hard part, "not finished" meant "not built." Now the thing is built — it runs on your machine — and it still isn't shipped. Faster building was supposed to mean more shipping. Instead it just produced a bigger graveyard of almost-done. More half-finished repos, not fewer. The completion gap is widening, and it's widening fastest for the people who got good at agentic coding.

What a harness for a human is — and isn't

So what do you wrap around yourself?

Not a dashboard. A dashboard shows you activity; it doesn't notice your absence. A streak counter is just a nicer way to watch yourself quit. A community isn't it either — an audience claps when you post and says nothing when you disappear for two weeks. A dashboard is not accountability. A community is not accountability.

The thing that actually works is embarrassingly old: someone who reads. Not software that tracks you — a human who notices when you go quiet and names the move you're avoiding. AI tracks. A human reads. The whole point of a harness is that it acts on the thing it's wrapped around the moment it drifts. For the model, that's context engineering. For you, that's another person with a reason to expect the next step.

You already accept this for your agent. You'd never ship an autonomous loop with no checks and just hope it stays on task. You do exactly that to yourself every time you start a side project alone.

The honest next step

If your projects keep dying at the same point — and it's usually the same point, your point — the useful thing isn't more motivation. It's figuring out where your loop actually breaks.

I built a free diagnostic for exactly that. Seven questions, about two minutes, no signup to see the result. It names where you stop and the one move that breaks it.

You can engineer a perfect harness for the model and still ship nothing. Find out where your own loop breaks: mvpbuilder.io/ship-readiness

Building in public. Day 125.