The Sprint Where We Finally Stopped Lying to the Backlog

#agile #projectmanagement #ai #startup

We had 26 stories in Sprint 9. 155 story points. Our validated velocity is 38 points per sprint.

That is a 4x overcommitment. We have been doing this since Sprint 0.

This is the post about what happened when we finally fixed it, and what we found hiding behind the honest numbers.

The Pattern

Every sprint for 8 consecutive sprints, we planned more work than we could finish. Not by 10 or 20 percent. By 2x to 4x. The sprint would end, we would move unfinished stories to the next sprint, and the next sprint would start even more overloaded than the last.

The MCP framework has a velocity tracking tool. We ran it during the team review. It returned zero. Nobody had ever recorded velocity data. We were planning by ambition, measuring nothing, and wondering why sprints felt rushed.

What Changed

We ran a 14-person AI team review. Each persona audited the project from their expertise. The scrum master said: record the actual velocity. The product owner said: cut to what is achievable. The inception facilitator said: stop calling it a 25-person agency when it is a content tool.

So we did the uncomfortable thing. We moved 19 stories out of Sprint 9 into Sprint 10. Sprint 9 went from 155 points to 42. Seven stories. One sprint. Achievable.

The 19 stories that got moved are not deleted. They are planned, decomposed into tickets, and prioritized. But they are not pretending to fit into a sprint they cannot fit into.

What Hiding Behind the Overcommitment

When we stopped cramming and looked at what was actually in Sprint 9, we found something interesting. The 7 stories that survived the cut are all infrastructure:

SSL/TLS configuration (HTTP-only in production is a blocker)
Test coverage for 43 untested services
Accessibility foundations (zero ARIA attributes in the entire UI)
OpenAPI specification (219 endpoints, zero documentation)
End-to-end pipeline test (never proven with real services)
Sidecar deployment verification (4 Docker services never tested together)
Sprint retrospective (already overdue)

None of these are features. All of them are things we skipped while building features. The overcommitment pattern was hiding infrastructure debt behind feature velocity. We looked productive because we shipped UI components and audio pipelines. But the foundation under those features was untested, undocumented, and insecure.

The Memory Bug That Was Not a Bug

During the first team review, 9 of 14 personas reported that the memory system was dead. Every persona queried their stored memories and got nothing back. We filed it as a critical systemic failure.

The MCP developer looked at the database. 64 memories existed for a single persona. The data was there. The search was failing because agents wrote queries like natural language sentences, but the search engine uses AND-matching on individual keywords. A seven-word query returned zero results. A two-word query found everything.

The fix was two lines of documentation in the tool description explaining how the search works.

We had spent half a planning cycle treating a documentation gap as an architecture failure. The lesson: when every agent agrees something is broken, check whether they are all making the same mistake.

What Developers Can Take From This

Count your actual velocity before planning your next sprint. Not what you think it is. Not what you want it to be. Pull the data from your last 5 completed sprints and divide. If your tracking tool returns zero, you have never measured it.

Infrastructure debt hides behind feature delivery. You can ship 10 features in a sprint and feel productive. But if none of them have tests, none have documentation, and the API has no authentication, you shipped demos. The backlog should contain infrastructure stories that are not optional.

Overcommitment is a habit, not an accident. We did it 8 sprints in a row. Nobody decided to overcommit. It happened because cutting scope feels like failure and adding scope feels like ambition. The fix is mechanical: multiply your velocity by your sprint length. That is your capacity. Do not plan beyond it.

When AI agents unanimously agree, check the shared assumption. Nine personas reaching the same conclusion through the same flawed query pattern is not consensus. It is a shared mode of failure. The more agents agree, the more likely they inherited the same bias.

Sprint 9: 7 stories. 42 points. Infrastructure first, features second. The most productive sprint might be the one where we finally stopped pretending.