DEV Community

Bjørn Nordlund
Bjørn Nordlund

Posted on

Your agent writes code fast. But who challenges the decisions?

I've been thinking about this for a while. Maybe someone already solved it better than me, but here's the problem as I see it.

Your coding agent builds a feature in 20 minutes. Tests pass, code is clean, PR looks professional. You approve it. Three months later nobody uses it, or it works but doesn't actually solve the problem. Nice code, bad solution.

You're sitting alone with your agent

I think code reviews doesn't catch this. A reviewer sees a diff, maybe a PR description. By the time code is in a PR there's already an implicit agreement that the thing should exist, so nobody questions the premise. Reviews are for correctness, not for "are we building the right thing?"

But with agents it got worse. You're sitting there with Cursor or whatever, building fast, and there's nobody pushing back. No PM going "wait, that's not what the customer asked for." No teammate saying "didn't we agree on something different last week?" It's just you and the agent, and the agent does what you tell it to. The specs live in Jira or someone's head. You're not pulling up acceptance criteria while you code, so features drift and nobody notices until it's shipped.

Specs next to your code

Keeping specs close to the code isn't a new idea. Spec-driven development has been gaining traction with AI agents, and there's been executable specs and living documentation for ages. Whether you vibe code or do things more structured, specs first or code first, it doesn't matter that much. The point is to have something that says what you're building and why.

I've been trying a simple version of this. A few markdown files per feature in a specs/ folder — who it's for, why it exists, what counts as done. Your agent maintains them as part of normal coding, reads the spec before making changes, updates acceptance criteria when scope shifts. Works in Cursor, Claude Code, Windsurf, Codex.

npx skills add bjornno/skills --skill storyline
Enter fullscreen mode Exit fullscreen mode

Getting people to confront the hard questions

Specs without someone challenging them are just documentation though. What I actually wanted to solve is how do you get people to confront the hard questions before you've shipped.

The skill has a review mode. Say "start review" and the agent reads your specs and diff, finds the most important risk, and makes you take a position on it. Not a list of suggestions, one question.

This works best when it's not just you. Get the team around a screen, pull in the PM, the designer, whoever has context on what users actually need. Start the review in your agent, share your screen, and let it drive the conversation. The agent finds the question, the humans answer it.

Say a user reports the dashboard is slow. The spec says "improve dashboard load time." Three days later your agent has built a Redis caching layer, a denormalized read model, and a background job system:

Premature implementation

The spec says "improve load time" but there's no evidence the root cause was identified. Three days of infrastructure work before anyone checked if it's just one unindexed query.

Are you comfortable investing in this architecture without profiling the actual bottleneck first?

That question is way more valuable when the senior dev is in the room going "wait, did anyone run EXPLAIN on that query?" Or when the PM says "the user only complained about one page, not the whole dashboard." Now there's a decision on record in specs/reviews/ instead of a Slack thread nobody will find again.

Hosted reviews

Sometimes you can't get everyone around a screen. Different timezones, or you just want a review on every PR automatically. I built a hosted version for that — same review logic but as a shared platform with a GitHub Action. Still in closed beta while I figure out the model for it. But start with the skill, the hosted thing is extra.

Might already exist

If you know of something that does this better, tell me. I've looked around and haven't found the exact combination of "agent maintains specs and uses them to challenge decisions" but I might have missed it. And if the reviews feel generic or useless, I want to hear that too.


Install: npx skills add bjornno/skills --skill storyline | GitHub
Hosted Reviews: storyline-review.vercel.app

Top comments (1)

Collapse
 
gimi5555 profile image
Gilder Miller

Right, that's exactly what we're seeing with Cursor. Your spec-maintenance approach is interesting- keeping agents in the loop for updates could finally solve the sync problem.
Thank you!