Your agent writes code fast. But who challenges the decisions?

#ai #codereview #devtools #productivity

A user reports the dashboard is slow. The spec says "improve dashboard load time." In twenty minutes your agent has built a caching layer, a new read-optimized table, and a background job system. The code looks clean, PR looks good, it gets approved. Nobody asks if this could just be an unindexed query.

You're sitting alone with your agent

You're sitting there with Cursor or whatever, building fast, you are in the flow, and there's nobody pushing back. No PM going "wait, that's not what the customer asked for." No teammate saying "didn't we agree on something different last week?" Just you and the agent, and the agent does what you tell it to. The specs live in Jira or someone's head and you're not checking it while you code. The features drift, and nobody notices until it has shipped.

Code reviews don't always help. So much AI generated code, we see the diff and maybe a PR description. But by the time code is in a PR you've already kind of agreed that the thing should exist, so nobody questions "are we building the right thing?".

Specs next to your code

Keeping specs close to the code isn't a new idea. There's been executable specs and living documentation forever. Spec-driven development is getting more attention now with AI agents. Whether you vibe code, write specs first or code first, it doesn't matter that much. The point is to have something that says what you're building and why.

I (or Cursor and I) built a skill for this. A few markdown files per feature in a specs/ folder: who it's for, why it exists, what counts as done. It works like a kind of agent memory that lives in your repo instead of disappearing in a chat thread. The agent maintains it as part of normal coding, reads the specs before making changes, and keeps them in sync when you change direction.

npx skills add bjornno/skills --skill storyline

Getting people to confront the hard questions

But specs nobody challenges are just documentation. What I wanted to solve is how do you get people to confront the hard questions while there is still time to change direction.

The skill has a review mode. Say "start review" and the agent reads your specs and diff. It finds the most important risk and confronts you with it. Yes, No, or Unsure. Then it walks you through the discussion.

This works best when it's not just you. Get the team around a screen, pull in the PM, designer, whoever. Start the review, share your screen, and let it drive. The agent asks, the team decides.

Here's how it could look:

Premature implementation

The spec says "improve load time" but there's no evidence the root cause was identified. A caching layer, a new table, and a job system were built before anyone checked if it's just one unindexed query.

Are you comfortable investing in this architecture without profiling the actual bottleneck first?

Yes / No / Unsure

Maybe someone says: "wait, did anyone run EXPLAIN on that query?" Or the PM says "the user only complained about one page, not the whole dashboard." It will ask follow-up questions based on what you answered, until there's an actual decision. The decision is added back to the specs in specs/reviews/ so you keep the history, instead of a Slack thread nobody will find again.

Hosted reviews

Sometimes you can't get everyone around a screen. Different timezones, or you just want a review on every PR automatically. I built a hosted version for that with the same review logic but as a Web App and a GitHub Action. Log in with your GitHub account.

Tell me what's missing

If you know of something that does this better, I want to hear about it. I've looked around and haven't found this combination of "agent maintains specs and then uses them to challenge decisions" but I might have missed it. Try it and tell me if this actually helps or if this is just me deep down in my AI rabbit hole together with Cursor. The point is getting people to think and talk, not to use more tokens.

Install: npx skills add bjornno/skills --skill storyline | GitHub
Hosted Reviews: npx storyline-review create | storyline-review.vercel.app

Top comments (2)

Harjot Singh • Jun 1

you nailed the point about the importance of having checks in place during the coding process. when everything is moving fast, it's easy to overlook the bigger picture. at moonshift, we help you get a full next.js + postgres + auth app deployed in around 7 minutes, and you own the code on your github. if you're curious, I can set up a free run for you to explore.

Gilder Miller • May 7

Right, that's exactly what we're seeing with Cursor. Your spec-maintenance approach is interesting- keeping agents in the loop for updates could finally solve the sync problem.
Thank you!