How I shipped a government contract demo through Claude Code

#ai #tooling #opensource #claude

I'm an engineer with a project management background. I'd never shipped production software before. A few months ago I took on a government contract that needed a working demo on a fixed deadline. The deadline wasn't flexible. The reviewers were paying attention. All I had was a Claude Code subscription and the time to figure out how to direct it.

I shipped. What I figured out along the way is in a public repo at github.com/NBTDx/claude-cycle-loop. This is the post about how I got there.

What I had and didn't have

What I had going for me: engineering instincts and a project manager's eye for scope. I'd watched teams ship and watched teams fail to ship. You start to see what works and what doesn't.

What I didn't have was anything resembling a developer's working knowledge. I'd never deployed a service. I'd never written a migration. I'm not sure I could have told you what one was. My understanding of testing was vague at best. My opinions about web frameworks were "I think there are several."

The constraint forced a specific question. Not can Claude write this code? I already knew the answer to that was yes. The real question was: can I direct Claude to ship something the reviewers will accept, without me having to learn the things I don't know?

The first loop didn't work

My first instinct was long prompts. Two paragraphs describing the feature, ask Claude to build it end-to-end. That fell apart quickly. Things I hadn't asked for got built. There was no place to stop and check the plan before it became code. When something went sideways halfway through, there was no clean point to recover from.

The fix wasn't better prompts. The fix was more structure between me and the agent.

The pattern

What emerged was a five-step loop:

Verify the previous task didn't regress.
Plan the next task. Turn a stub into a full spec.
Implement exactly what the spec says.
Run build-verify until both commands exit zero.
Close the cycle. Mark it done, commit.

Three things the loop needs on disk: a file with the project's rules that the agent reads at the top of every cycle, a template for what a task spec looks like, and a list of upcoming tasks. That's it.

Inside the spec format, three things turned out to matter more than I expected:

Do-NOT lines. Most scope creep is the agent doing reasonable adjacent work. Naming the adjacent work as out-of-scope kills it more reliably than just omitting it.
A manual test block. Forces the spec to translate into user-visible value. If you can't write the manual test, the spec isn't ready.
Acceptance criteria tied to build-verify commands. "Both commands exit zero" is checkable. "Make sure it works" is not.

The cheapest thing I did was wrap the build-verify into one command (make verify). It paid off more than anything else. Same command runs locally, runs in CI, runs as the agent's last step before closing a cycle.

The supervised-vs-autonomous axis

The repo ships three variants of the same loop, at different points on a supervision gradient:

A. Supervised. I plan every task in a long-context AI, review the spec, then a single autonomous builder implements it. Highest quality on novel work, slowest cycles.
B. Dual CLI. Two claude -p calls per cycle. The first writes the spec, the second implements it. Autonomous, but the file the planner produces is something I can stop and inspect.
C. Single agent. One claude -p call per cycle, plan and implement in one prompt. Leanest setup, no human review.

Early on I lived in A. Once the patterns of what I was building stabilized, I let the planner run autonomously (B). For repetitive work, like backfilling tests or applying refactors across a codebase, I collapsed plan and build into one agent (C). None of them is the answer. Pick whichever ratio of supervision to autonomy fits what you're doing on a given day.

What surprised me

The PM instincts mattered more than the engineering depth would have. You don't need to know how the code works to direct what gets built. You need to know what done looks like, what's in scope, and what's explicitly not. The agent will do exactly what you specify. It will also do exactly what you didn't tell it not to. So you specify both.

If you want it

The repo is at github.com/NBTDx/claude-cycle-loop. MIT licensed, stack-agnostic, three variants you can copy into any project.

If you have engineering thinking and PM instinct but haven't shipped production code before, this is the system that closed the gap for me. If you're already a developer, Variant C is probably the most useful. It's the leanest setup and runs unattended.

I'm one person on one project. This is what worked, not a methodology I'm sure generalizes. Don't run it on a repo you can't git reset --hard. Otherwise, have at it.