This is part 2 of a series documenting an experiment with Definition of Done. Part 1: Trying to Fix 'Done' Before It Breaks Us
Two sprints in, the original hypothesis remains largely untested. The experiment was under-executed: one refinement session with the DoD, one ticket meaningfully shaped by it, and too much environmental noise to draw conclusions about delivery time.
But something unexpected emerged. The DoD didn't primarily change estimation—it forced the creation of a handbook. That handbook is now reshaping how the team talks about code, and surfacing problems the DoD itself can't solve.
Six items became fourteen pages
The DoD itself is minimal. Six items under "always included," a few conditional ones, explicit "never assumed." One page.
But enforcement requires shared definitions. "Errors are handled" points to a page describing the DomainError pattern and how errors bubble through the state layer. "Styling uses prefixes" needs concrete examples. "Code follows existing patterns" is meaningless unless those patterns exist somewhere outside people's heads.
The DoD became a table of contents. The handbook is the system it forces into existence.
This was anticipated—documented knowledge architecture was always a prerequisite. What wasn't anticipated was the volume, and how much of that knowledge had previously been implicit. Writing it down didn't invent complexity; it revealed how much implicit knowledge the system had been running on—and hiding. Fourteen pages today—fewer once the linter stops being technical debt and starts enforcing what shouldn't need prose.
Keeping documentation alive
Fourteen pages is a liability if it remains one person's opinions dressed as consensus.
The mitigation is structural: the handbook lives in the monorepo, written in markdown and mermaid, editable in the same PR as the feature that depends on it. "Update handbook" is a DoD item itself.
The logic: when you introduce a pattern, document it in the same commit. When you notice something missing, fix it where you are. The handbook isn't a parallel artifact owned by someone—it's part of the codebase, subject to the same review process.
So far, this structure is holding. Multiple developers have edited the handbook unprompted, in the same PRs as feature work. That's validation that ownership is starting to distribute.
What hasn't happened yet is pushback—someone proposing to change an existing pattern rather than just adding to it. A handbook nobody challenges may indicate passive compliance rather than genuine shared ownership. That distinction matters, and it's something I'm now watching for explicitly.
Documentation depersonalizes feedback
Before the handbook, code review feedback felt personal. "You should name this variable differently." "This error handling isn't complete." Each comment implicitly claimed authority.
Now there's a page. Naming conventions exist in writing. Error handling has a flowchart. When a comment says "per the naming conventions, this should use the hci- prefix" and links to documentation, the conversation shifts.
The reviewer isn't asserting taste; they're invoking an agreement. The correction comes from the system, not the person.
In these two sprints, about a third of PRs explicitly referenced handbook items—links replacing opinionated comments. Developers proposed additions when they noticed gaps. The handbook created shared language, but more importantly, it created distance between people and feedback.
One ticket, real numbers
Theory is one thing. Here's what the data actually looks like.
A technical debt ticket: document and consolidate error codes. Estimated at one day after refinement using the DoD. Actual time: two to three days.
The developer started the work and found a mess—error codes scattered without consistent patterns. The instinct was natural: fix the system while you're in there. He began refactoring how error codes work across the codebase. Necessary eventually. Not in scope.
"Refactoring adjacent code" sits explicitly under "never assumed" in the DoD. The ticket had been refined with that in mind. The refactor wasn't part of the estimate because it wasn't part of the agreement.
The intervention happened mid-refactor, not before. The response was simple: the DoD had just gone live, so this one passes—but look at the line. The estimate didn't include this work. Without it, the ticket would have landed on time.
There was no pushback. In Sprint Review, the developer brought up the conversation himself as an example of how the DoD would help going forward. The DoD didn't stop the refactor. It made visible an investment that would otherwise have stayed hidden in "it took longer than expected." That visibility is the leverage.
What Product saw
Product didn't resist the DoD—they adopted it. Refinement became methodical: take the ticket, run through the DoD. "Always included"—how long? "Never assumed"—do we need any? What's the cost?
Their response wasn't compliance. They proposed embedding DoD items directly in the ticket template.
The real test hasn't come yet: a ticket where DoD-complete estimates are dramatically higher than expected. When that happens, the system either produces explicit trade-offs—include the work or defer it as documented debt—or it buckles under pressure to "just ship." That moment will reveal whether this is process or theater.
Adoption at capacity
In direct conversations, motivations diverge.
Some see the system working and advocate for it. Others are more guarded—process frameworks have come and gone before, and this one looks like more overhead without guaranteed payoff.
The framing that landed with skeptics wasn't aspirational. It was defensive: this is for us, not for anyone else. It clarifies what we agreed to build. It tracks technical debt explicitly. And if gaps become problems later, there's a paper trail—tickets written, trade-offs documented, decisions made with eyes open.
Cynical? Maybe. But practical. And it works. Skepticism hasn't blocked adoption; it's shifted the motivation. Some believe the DoD will make things better. Others believe it will make things defensible. Both are contributing.
What the DoD made visible
The DoD was meant to clarify what "done" means for a ticket. It ended up clarifying something else: scope.
The handbook documents error handling, WebComponent patterns, state management, styling conventions, SSR concerns, authentication flows. Fourteen pages of technical decisions—all within a coherent perimeter.
Here's what that looks like in practice: when a handbook pattern needs discussion in standup, 80% of the room has no context. When Sprint Review evaluates DoD compliance, most attendees can't assess it. The coordination the DoD requires doesn't map to the current rituals—not because the rituals are wrong, but because they serve a broader group.
Reading the handbook back, a question surfaced: whose work is this actually describing?
The answer isn't "the frontend half of a full-stack team." It's closer to "a product team that consumes an API." Our monorepo has its own backend concerns—auth, session, SSR. Our touchpoint with the backend team is contractual: API shape, response formats, error codes. Day-to-day, the collaboration is minimal.
This isn't a criticism. Teams evolve, structures shift. But the DoD forced an articulation of what we actually build—and that articulation suggests a perimeter that might deserve its own coordination rhythms.
What that looks like, I don't know yet. But the next step is concrete: map the actual coordination flows, evaluate options for dedicated frontend rituals, and test whether the mismatch is friction or fracture. That's Part 3.
The current state
The estimation hypothesis remains unvalidated. Testing it properly requires protected refinements, tracked estimates versus actuals, and more samples.
What exists now: a 14-page handbook in the monorepo, edited by multiple developers. A DoD that Product wants embedded in ticket templates. A shared language that depersonalizes feedback. A structural mismatch now impossible to ignore.
The DoD was meant to fix estimation. Instead, it surfaced what estimation had been compensating for.
Whether it ultimately improves delivery is still open. But for the team, the question has shifted from "why did this take so long?" to "what are we actually agreeing to build?"—and that's a different starting point.
Views and experiments described here are my own and don't represent my employer.
Top comments (0)