Ruben

Posted on Jun 23

Coding Agents Made Me Take Specs Seriously

#productivity #ai #opensource

Specs as survival tools for fragmented work

Everything changed very quickly.

In a matter of days, my work at DataCamp went from coding and debugging mostly by hand to doing a lot of it through Claude Code. Those long sessions where implementation felt manual, almost artistic, did not disappear completely, but they changed.

That shift was gradual, then suddenly obvious.

With Leo's arrival, a lot of things changed in my personal life too. If you do not know what this is about, take a look at the first post in the series. Sudden context switching has become a skill I had to develop by force. With a baby at home, uninterrupted work sessions do not exist. There is always a diaper change, a bottle, a bath, or some play time in the middle.

And energy, of course, is not always the same.

Luckily, Leo sleeps pretty well, but just like during the day, the actual usable time is fragmented. I can no longer count on having several hours in a row to get into a problem, understand it, implement it, test it, and close it. Now I need the work to survive constant interruptions.

That is where specs stopped feeling like a formality and became a survival tool.

The useful part, if you do not care about my exact setup, is this: agents get much better when the work has already been turned into something testable. Not a perfect document. Not a corporate requirements process. Just enough shape that the agent does not have to guess the product, the constraints, and the finish line at the same time.

Using the moment

This is where agents have allowed me to keep working on my side projects with some continuity. Over time, what started as a collection of prompts and personal habits became Harness: my own development workflow packaged as skills, documents, hooks, subagents, and small guardrails.

It did not start as a product. It started because simply sitting in front of the computer and writing code was no longer enough.

In practice, Harness is a collection of Claude Code skills: slash commands that know which files to read, what questions to ask, and what documents to produce. Each one can run without you sitting there; you come back to the output, review it, and feed the next step. The repo is public if you want to look at how it is structured.

Everything starts with an idea. It usually comes from one of my own needs, either professional, some tool I miss as a developer, or personal, some tool that could help me in my day-to-day life.

At first, this step was manual, with some Claude Code help to settle the idea. Now the flow starts with /ideate, which researches competitors, pain signals, and viability before I fall too much in love with it. If it still makes sense, /product-plan turns the vision into audience, positioning, roadmap, UX direction, and risks.

The idea is simple: instead of jumping from a vague intuition straight into implementation, Harness forces me to explain what I want to build, who it is for, why it makes sense, and what should stay out.

The output lives in .harness/product/: idea.md, product.md, roadmap.md, competitors.md, and a CONTEXT.md with the domain vocabulary. That last file sounds small, but it matters. If a concept has a name in the product, I want the agent to use that name in specs, code, and docs.

The next step is /dev-plan. Product specs become architecture, stack decisions, an implementation plan, ADRs, and one feature spec per must-have feature. Each feature file includes the goal, scope, technical approach, edge cases, and acceptance criteria.

So, before writing a single line of code, the project already has a clear shape: what we are building, who it is for, which decisions constrain the work, and how we will know when a feature is done. It is not always perfect, but it is good enough for an agent to start with context.

The difference is small but important. As a very simplified example:

Build the export flow and make sure it works well.

And this is the kind of spec that gives the agent a chance:

Build the export flow for SP-404 packs.

Constraints:
- A pack has at most 16 samples.
- Files must use the hardware profile naming convention.
- Export should not mutate the source samples.

Acceptance criteria:
- I can choose a hardware profile before exporting.
- The exported folder contains the expected file names.
- Invalid packs show a clear error before any files are written.

It is still not a huge document. But now there is something to verify, something to reject, and something to review.

This usually takes me one or two naps. I would rather settle the concepts properly in a long conversation than change the strategy halfway through implementation. When time is so fragmented, improvising gets expensive.

Once I have those documents, the next step begins: implementation.

Learning to delegate

For implementation, I have tried several strategies and skills. The ones that stuck are the ones now built into Harness.

/implement reads the specs, classifies which features can be done AFK and which ones need human approval, and implements the current phase as vertical slices. That part matters: it does not send one agent to do "the frontend" and another to do "the backend." Each agent implements one feature end to end, from the user-facing entry point down to whatever data layer it needs.

/qa comes after that and tests those features against their acceptance criteria. It does not stop at running the test suite. For an API it can use curl, for a CLI it can run commands with fixtures, for a web app it can use Playwright. The point is to verify visible behavior.

There are also continuity skills: /update-docs syncs docs with the real state of the code, /next-step inspects the repo and recommends what should happen next, /handoff leaves a summary so the next session does not start blind, and /task handles small changes on shipped products.

Worktrees are still the main piece for parallelization. Several agents can work on the same repo, on the same machine, in different branches, without stepping on each other.

Magic.

So, once the tools are on the table, the only thing left is to put the agents to work. I choose two projects and two roadmap items, open a terminal session and a worktree for each one, and start them before Leo wakes up.

The important part here is not having many tools. The important part is that each tool receives something clearer than a loose prompt written in a hurry. It gets context, goals, constraints, acceptance criteria, and a reasonable idea of when to stop.

That, for me, is the big difference between asking an agent for something and working with specs.

You can copy the shape without copying Harness: write down the problem, the constraints, the user flow, the non-goals, the acceptance criteria, and the weird edge cases you already know about. That alone removes a lot of the ambiguity that agents otherwise fill with confidence.

The afternoon nap

During the first nap of the afternoon, it is time to review the code generated by the agents.

I run the code review in each worktree. If it finds something, I review it manually, test it, and if there is something I do not like, I guide the agent to fix it. When everything is in order, I merge directly into main, no PRs or anything, full YOLO style. They are personal projects where I am the only one working, after all.

That does not mean there is no review. I am more and more convinced that review is the central part of the work. It just does not always look like opening a PR and waiting for checks. In these projects, review is me testing the flow, reading the diff, spotting weird assumptions, and deciding whether the result still matches the original intent.

Finally, when everything is integrated, I run /update-docs so the project documents, the README, and the Harness specs do not drift away from the code.

One of the fastest ways to lose control of a project with agents is to let documentation fall behind. If the next session starts with old context, the agent will work on a reality that no longer exists.

And after that? Well, if it has not happened already, there is probably a diaper change waiting for me, some play time, some drool and babbling... honestly, I am completely in love with this baby.

It does not feel like programming

This new way of approaching software engineering does not feel like programming. And that is not necessarily bad.

Code has become cheaper, faster, and more accessible, but ideas, taste, instinct, and high-level knowledge are, from my point of view, the most valuable things right now. It is very easy to generate a lot of mediocre code quickly if you are not clear about what you are trying to build.

Working this way has allowed me to move forward in a way I could not have before, bring back projects that were practically dead, and at the same time enjoy time with my little one. Of course, it does not always go well. I have had bad experiences because of vague specs, hidden context, and incorrect implementations that had to be rolled back.

But that is where I have learned the most.

Taking specs seriously has not been an aesthetic decision or a trend. It has been a way to move one level up, wear product and architecture hats, and build something solid even when I only have one nap ahead of me.

Top comments (18)

Alex Shev • Jun 23

This is the real shift with coding agents. Specs stop being ceremony and become runtime context. The better the decision trail, constraints, and acceptance criteria are, the less the agent has to infer from vibes.

Ruben • Jun 23

"Runtime context", I am stealing that. That is exactly it. The spec used to be something you wrote to feel responsible and then never opened again. Now it is the thing the agent actually reads at the moment of doing the work, so a vague line in the spec turns straight into a vague line in the code

Alex Shev • Jun 23

Please steal it. The interesting part is that specs become executable context. They are no longer just a planning artifact; they are one of the inputs that shapes the actual edit, test, and review path.

Danil • Jun 23

Runtime context is the reframe. The spec stops being ceremony you write once and becomes what the agent actually reads as it acts, so a vague line in it turns straight into vague code.

Alex Shev • Jun 23

Exactly. Runtime context is a useful way to say it because it makes the spec part of execution, not documentation. The agent is not just reading requirements; it is using them as operating memory while it edits.

UnitBuilds • Jun 23

Yip I feel that. The one time at my job that I actually got a thorough 4 page document on what the module was meant to be. We discussed it, I made the changes. Then fed it to an agent to see just how well it'll do... It finished the entire 1500 LOC module in 30 min. No actually done, not 'find the bug'

Ruben • Jun 23

That is exactly the thing that flipped for me. The agent was never the bottleneck, the ambiguity was. Give it a real spec and it just goes. The hard part is that most of the time nobody writes the 4 pages, so we blame the agent for guessing

UnitBuilds • Jun 23

Exactly. 2 fatal errors: Context starvation and Context bloat. both lead to hallucinations and drift. If you want something specific, be specific. Agents arent meant to guess, they're meant to translate from natural language into code.

Raffaele Zarrelli • Jun 23

The line that lands hardest here is that if the next session starts with old context, the agent works on a reality that no longer exists. That is the whole game with fragmented sessions, and you built the right reflex for it with /handoff and /update-docs. Specs make the work testable, but the half that rots fastest is not the spec, it is the decisions and the why: what you chose, what you ruled out, what is still open. A stale spec is annoying, a stale decision is the one that makes the agent confidently rebuild something you already killed.

What helped me was treating that operating state as its own file layer the agent reads at the start and writes back at the end of a task, not a doc I have to remember to update. Decisions with a status, open questions, what changed since last session, sitting next to the specs in .harness-style folders. Your /handoff is basically that as a command, mine is the same instinct as a closing habit. I open-sourced the shape for Claude Cowork and Code if you want to compare notes, MIT, just folders plus a write-back step: cowork-os.

One thing I keep wrestling with on the handoff: does yours summarize the session freely, or do you pin certain things verbatim (constraints, killed options) so the compression cannot quietly drop them? That boundary is where mine leaks the most.

Ruben • Jun 23

Good question, and you put your finger on the exact thing I was scared of. The split is by layer, not one or the other.

The handoff doc itself summarizes freely. It is disposable, it just gets the next session moving, and it points to the real files instead of re-pasting them.

The stuff I cannot let drift never lives in that summary. Constraints sit in the feature specs, and decisions plus killed options sit in ADRs. An ADR has an "Options considered" section, so the approach I rejected stays written down right next to why it lost. That is the part that keeps the agent from happily rebuilding something I already killed.

The handoff only grabs decisions from the conversation that have not made it into an ADR yet, and it drops the "I picked X because it was faster today" kind of reasoning on purpose.

So, summarize the session, pin the decisions, different files. A killed option does not survive because I quoted it in a summary, it survives because it gets its own little tombstone in the ADRs.

By the way, I've skimmed your cowork-os and it looks promising. I'll take a closer look when I have some time 😉

Raffaele Zarrelli • Jun 23

The tombstone framing is the keeper here. A killed option survives because it gets its own marker next to why it lost, not because someone remembered to quote it in a summary. That is the one thing a disposable doc should never be trusted to hold. And thanks for looking at cowork-os, no rush at all.

The place I still get bitten is the door between the two layers. Your handoff grabs decisions that have not made it into an ADR yet, so there is a window where a real constraint lives only in the disposable summary. If the next handoff re-summarizes it instead of graduating it, it can age out before it ever gets its tombstone. The durable side is solid, the promotion step is where mine leaks.

Mine leans on a closing habit to move things into decisions with a status, which is exactly where I fumble when a session ends mid-thought. So what actually triggers promotion for you: is it /handoff judgment each run, or does referencing a killed option force the ADR to exist on the spot?

Ruben • Jun 23

Honest answer: neither. /handoff is the disposable layer, it never makes the tombstone. It just carries ungraduated decisions forward and flags them as "not in an ADR yet," so that window you spotted is real, the handoff makes it loud instead of closing it.

The real trigger is at decision time: /dev-plan writes ADRs up front, /task has to stop and escalate the second it hits one not already covered. Closest thing to a forcing function, but yeah, nothing forces the ADR the instant a killed option gets referenced.

So mine leaks where yours does, just a step earlier. The fix is the thing you are hinting at: make referencing a killed option spawn the ADR on the spot. Have not built it, but it is the first idea here that kills the judgment call instead of just moving it 😉

Raffaele Zarrelli • Jun 23

Right, and I think the cost hides one step before the spawn. To force the ADR the instant a killed option gets referenced, something first has to recognize that what /task is about to do re-enters a path you already killed. That recognition is the expensive part, and it only works if the killed option left an external handle behind: the thing you chose instead, a flag, a file path, something /task can match against. If the kill lives only as prose in an ADR body, the trigger fires on recall, which is the exact reflex you were trying to stop leaning on. So the on-the-spot ADR quietly depends on killed options being addressable, not just written down. When you sketch it, does a killed option get a stable handle /task checks against, or does catching the reference still come down to the model reading the right ADR at the right moment?

Nazar Boyko • Jun 24

The choice that stood out to me is splitting the work by feature instead of by layer. A "frontend agent" and a "backend agent" both have to guess the contract where they meet, and that seam is exactly where agents drift. Giving one agent the whole slice means the spec it reads is the whole story, with nothing important hiding in another agent's head. The detail about measuring work in naps made me smile too, but that vertical slice is the part I'd steal.

Mykola Kondratiuk • Jun 29

the thing that gets me is a vague spec was always a problem - agents just made it immediate. before, a dev would fill gaps with judgment. now those gaps become literal bugs.

Hiren Kava • Jun 23

I really enjoyed reading your post. What particularly impressed me was not only the AI workflow but also the way you reinterpreted specifications not merely as documentation, but as engineering tools to reduce ambiguity. It felt like you were focusing on optimizing the quality of decision-making as well as coding speed.

View full discussion (18 comments)

Some comments may only be visible to logged-in visitors. Sign in to view all comments.