Code reviews catch bugs. They don't catch bad decisions.

#productivity #devtools #codereview #ai

I've seen this happen over and over.

A team picks up a feature. Some back and forth, maybe a whiteboard sketch or a Slack thread that gets too long. Someone starts building. Two weeks later a PR goes up. Good naming, tests pass, no obvious issues. Approve, Ship!

Three months later, nobody uses it. Or worse, people use it but they're confused, because what shipped doesn't match the problem it was supposed to solve. The problem drifted somewhere along the way and nobody noticed because everyone was looking at the code.

The code was fine. The decision wasn't.

AI agents made this worse

With coding agents (Cursor, Copilot, Claude, whatever you're using) any developer can produce a ton of code in a very short time. Features that used to take a sprint now take a day.

But it means we're shipping faster than we can think. The bottleneck used to be "can we build this?" Now it's "should we build this?" And most teams don't have a process for answering that.

When an agent generates a full feature in 20 minutes, it comes out looking polished. Clean code, tests, PR looks solid. That makes it harder to push back. You feel like a jerk saying "wait, is this even the right thing?" when it's already done and looks professional. So you approve it.

Our review processes are built for a world where code was expensive to write. If someone spent two weeks writing it, it probably made sense to build. But when it took 20 minutes, that's not a given.

Code reviews can't fix this

Code reviews are great at catching bugs, enforcing patterns, sharing knowledge. But they're not designed to catch bad product decisions.

A reviewer sees a diff. Maybe a PR description. They're looking at how the code works, not why it exists. That context is somewhere else, in a planning doc or a Jira ticket or someone's head.

There's also a social thing. By the time code is in a PR, there's an implicit understanding that we've decided to build this. Questioning the whole premise feels awkward. So people don't do it.

The result: code reviews filter for correctness but let through features that solve the wrong problem or have drifted from their original intent.

The gap between specs and code

Most teams have specs somewhere. A doc, an RFC, a ticket with acceptance criteria. And separately they have code reviews. But do you look at the spec during the review? Do you check if the acceptance criteria from three weeks ago still match what the PR delivers, if they even exists?

Before you argue about whether a function should be async or not, the team needs to agree on more basic stuff. What problem are we solving? Has the scope changed? If nobody asks these questions in a structured way, they don't get asked.

Closing the gap with living specs

I've been trying to close this gap by building tooling around it. The approach is inspired by spec-driven development, but I'm not trying to generate code from specs or replace code with specs. I'm using specs as shared context that makes real team discussions possible.

I built two things that work together: a Cursor skill that keeps specs alive (free, open source), and a review tool that uses those specs to challenge your team on product decisions (closed beta for now, since every review costs me AI tokens).

You keep lightweight specs alongside your code, a few markdown files per feature describing what it's for, who uses it, what's in scope. When you make a change, Storyline reads those specs and your code diff together and finds the single most important question your team needs to answer.

Not a list of comments. One question. The hard one.

Example: you've been working on an auth feature. The spec says it's about simplifying login for end users. But the last three changes are all admin-facing API work. So the question becomes: "This started as login simplification, but recent work is all admin APIs. Is the scope still right?"

Your team has to take a position. Yes, No, or Unsure. Before any discussion. Then you get the evidence and talk about it. It's uncomfortable on purpose.

You can start with just specs in your repo

Keeping specs alongside your code is valuable on its own, with or without the tool. A specs/ folder with a few markdown files per feature gives your team a shared reference that lives where the code lives. Not in Confluence where nobody looks.

The Storyline skill teaches your coding agent to maintain these specs as part of every code change. It reads the spec before coding, writes acceptance criteria before implementing, and keeps things in sync. Works with Cursor, Claude Code, Copilot, Codex.

npx skills add bjornno/skills --skill storyline

If you have specs, you can also run:

npx storyline-review create

It reads your specs and the latest commit, gives you a review URL. Share it with your team, no account needed. You can also set this up as a GitHub Action on every PR.

There's a product review mode (scope creep, intent drift, weak problem definition) and an architecture review mode (security gaps, API design, missing abstractions). Both work the same: one question, take a position, discuss.

Storyline Review is in beta. Sign up on the homepage to get access.

Try it, tell me what you think

If any of this resonates, give the skill a try. Install it, let your agent write specs for a few days, and see if it changes how your team talks about features. If you want the review tool on top, sign up for the beta.

I'm genuinely curious: does something like this already exist and I missed it? Is the "one hard question" format useful or annoying? Does keeping specs in the repo actually stick, or does it rot like everything else? Let me know in the comments or reach out directly.