Matías Denda

Posted on May 7

Stop estimating in hours. Start estimating in complexity.

#agile #career #productivity #discuss

Stop Estimating in Hours. Start Estimating in Complexity.

There's a quiet truth every developer knows but nobody says out loud at sprint planning:

When we estimate in hours, we always estimate low.

Always. The senior estimates low because they want to look efficient. The junior estimates low because they don't want to seem slow. The team estimates low because the PM is in the room. And then the sprint ends, half the tickets roll over, and everyone pretends to be surprised.

After years of watching this play out across teams, languages, and stacks, I've come to believe that the problem isn't that we're bad at estimating hours. The problem is that hours are the wrong unit.

Let me explain why I now estimate in complexity, and why I think it leads to better software, better teams, and — surprisingly — better deadlines.

The misunderstanding at the core

Here's the trap most teams fall into: they treat complexity and time as the same thing measured with different rulers. They're not. They're two independent axes.

Consider translation. Translating a paragraph from English to Spanish is easy. There's almost no complexity. But translating the entire Bible? That's still easy — the per-sentence cognitive load hasn't changed — it's just long. Easy doesn't mean fast.

Now flip it. A complex distributed-systems migration sounds like it should take weeks. But if your platform happens to have the right tooling already in place, you might pull it off in an afternoon. Complex doesn't mean slow.

Once you internalize this, the whole hour-based estimation game starts looking absurd. You're collapsing two dimensions into one and pretending the result is meaningful.

So what is complexity, then?

In the teams I've worked on, we settled on a Fibonacci-ish scale: 3, 5, 8, 13. Anything bigger than 13 wasn't an estimate — it was a signal to break the task down.

The numbers themselves don't matter much. What matters is what they represent:

3 — Well-understood. We've done this kind of thing before. Few moving pieces. Low risk.
5 — Some unknowns, or more pieces involved, but nothing scary.
8 — Several systems touched, real risk, or genuinely new territory.
13 — Too big. Stop. Break it apart.

You can layer in dimensions like uncertainty, coupling, blast radius, dependencies, team familiarity — but the goal isn't to build a precise rubric. The goal is to give the team a shared vocabulary for talking about how hard something is, separate from how long it takes.

The real magic isn't the number — it's the conversation

Here's what nobody tells you about story points: they're not better than hours because they're more accurate. Honestly, they're probably less accurate in absolute terms.

They're better because they change the conversation.

When you ask someone "how long will this take?", the conversation is individual and defensive. Whoever knows the most throws out a number. Everyone else nods. The junior who's actually going to do the work quietly panics, because they know they can't hit that number, but pushing back means admitting they're slower.

When you ask "how complex is this?", the conversation is collective. Why is this a 5 and not a 3? What pieces does it have? What could go wrong? Juniors learn by watching seniors reason through problems. Seniors occasionally discover that something they called "trivial" wasn't trivial at all. The team understands what they're about to build before they build it.

That's what hours don't give you, no matter how precise they are.

The split that changes everything

Here's the part of my workflow that I think is genuinely underrated:

The team estimates complexity. The individual developer estimates their own hours.

Complexity is a property of the problem. Hours are a property of the person solving it.

I'm a senior architect. A junior on my team is not going to take the same time I do on the same task. That's not a flaw — it's reality. Telling a junior "this should take you 3 hours because the senior said so" is one of the cruelest, most counterproductive things we do in this industry. They burn out trying to hit a number that was never theirs to hit.

So instead: the team agrees this task is a 5. Then the developer who picks it up estimates their own hours. Those hours are mostly for them — to plan their day, to learn calibration, to flag early when they're slipping. We sum them as a sanity check against the sprint capacity, but the commitment to the business doesn't come from those numbers. It comes from velocity (more on that in a sec).

Junior devs get the low-complexity tasks first. Not because we don't trust them, but because low-complexity tasks are where it's cheap to be wrong. That's where you learn to estimate without blowing up the sprint.

"But the junior will estimate wrong too"

Yes. They will. That's the point.

I get this objection every time I describe this system: "if the dev estimates their own hours, they can still get it wrong — for any of the reasons people get hours wrong in the first place." True. A junior estimating their own hours will probably underestimate 9 times out of 10. A senior in unfamiliar territory will do the same.

The difference isn't that the estimate magically becomes correct. The difference is what happens when it's wrong.

When hours are imposed top-down by whoever-knows-most, a missed estimate is a personal failure. The junior is just behind. Tough luck, work weekends.

When the dev estimates their own hours, a missed estimate is a calibration signal. It's the moment the team lead — the architect, the TL, the assigned senior — steps in. Not to scold, but to give context. To explain what the dev didn't see. To walk through why this task that looked like 4 hours was actually 12.

This is where the didactic side of the senior matters, and where teams really differ. Some leads let juniors slam their heads against the wall and call it "learning by doing". Others sit down and unpack the problem with them. The system doesn't fix that for you — but at least it makes the moment visible, instead of burying it under a missed deadline that nobody wanted to admit was unrealistic from the start.

Over time, the junior's estimates get sharper. Not because they got faster, but because they learned to see more of the task before starting it. That's a skill hours-based estimation never teaches, because in hours-based estimation, the junior never gets to estimate at all.

What about the unknown?

Every estimation system breaks at the same place: how do you estimate something nobody has done before?

You don't. You spike it.

A spike is a timeboxed investigation. "Spend 4 hours figuring out if this is feasible, then come back." The output isn't an estimate — the output is enough understanding to estimate. And honestly, half the time the spike basically solves the problem, because the hard part wasn't the doing, it was the figuring out.

This is the part I think most teams miss. They try to estimate the unknown anyway, padding numbers "just in case", and end up with stories that are 80% mystery and 20% work. Spikes are the escape valve. Use them.

How to actually do this

If you're sold on the idea but wondering how it looks in practice, there's no single right answer. Here are a few techniques teams use — pick whichever fits your group's vibe:

Planning poker. Everyone on the team has a deck of cards with the values (3, 5, 8, 13, plus a "?" for "I have no idea, we need a spike"). Someone reads the task. Everyone picks a card face-down, then reveals at the same time. If the numbers diverge wildly, the highest and lowest explain their reasoning, and you re-vote. The simultaneous reveal is the whole point — it stops people from anchoring on whatever the most senior person said first.
T-shirt sizes. Same idea, but with S / M / L / XL instead of numbers. Useful for teams that find numbers feel falsely precise, or for early-stage estimation where you just want a rough bucket. You can always map sizes to points later if you need velocity tracking.
Affinity estimation. Print all the tasks on cards, lay them on a table, and have the team physically group them by relative complexity — "this feels about as hard as that one". Fast for large backlogs, and surprisingly accurate, because humans are much better at comparing than at measuring.

You can mix these. Some teams use affinity estimation for backlog grooming and planning poker for sprint refinement. Others just default to a quick t-shirt sizing in a 15-minute meeting and call it done.

The technique matters less than the conversation it produces. If your team is genuinely talking about the problem — surfacing risks, sharing context, learning from each other — the format is just scaffolding. Pick whatever scaffolding gets you there.

"But the business needs dates"

This is the objection that always comes up, and it's a fair one.

The answer is velocity. Track how many points your team actually completes per sprint over time. After a few sprints, you have a reasonable estimate. Multiply points-remaining by velocity and you have a date range.

I want to be honest, though: velocity isn't magic. It has real problems. It can be gamed by inflating points. It assumes a stable team — when people leave or join, it breaks. It works badly for highly exploratory work. And in the wrong hands it stops being a planning tool and becomes a productivity stick to beat people with.

But used carefully, it gives you something hour-based estimation never does: a system that gets more accurate over time instead of less. The curve is bumpy at first, and then it smooths out. With hours, the curve never smooths out, because the underlying signal was noise from the start.

When this doesn't apply

I'm not selling a silver bullet. Complexity-based estimation is overkill when:

Your team is 1–3 people and you all have the same context anyway
The work is repetitive (pure bug fixing, low-novelty maintenance)
You're in an early prototype phase where everything is changing every day

In those cases, hours — or no estimates at all — are probably fine. Don't impose ceremony where it doesn't earn its keep.

The honest summary

After years of doing this, I don't think estimating in complexity is more accurate than estimating in hours. Probably it isn't. But that was never the right question.

The right question is: what kind of conversation do you want your team to have?

When you estimate in hours, the conversation is individual and defensive. Whoever knows the most throws out a number, everyone nods, and the person with the least experience ends up trapped trying to hit a commitment they never had a real say in. Nobody learns. Nobody talks about the problem itself. Numbers just get distributed.

When you estimate in complexity, the conversation is about the problem. Why is this a 5 and not a 3. What's hiding inside it. What risks it carries. Juniors learn by watching seniors reason. Seniors sometimes realize the "trivial" thing wasn't trivial. The team understands what they're about to do — together — before they do it.

That's what hours don't give you, no matter how precise.

If your team estimates in hours and it works for you, great — keep going. But if you find yourselves fighting estimates that never land, devs burning out, and PMs disappointed sprint after sprint, maybe the hours aren't being measured wrong.

Maybe hours are just the wrong question.

What does estimation look like on your team? Do you fight with hours, swear by points, or have you found something else that works? I'd love to hear it in the comments.

Top comments (2)

Justin • May 8 • Edited

There's so much wisdom in here, and I can actually feel the experience that put this together. From a young engineer to an veteran: this has really helped me understand how teams approach different projects and the subtle depth of project velocity.

Also I really love your balanced approach, and great examples to understanding where time and complexity actually fit into the lower-level business model.

I'm curious, when you "spike" a task/issue (that term's awesome by the way), do approach different solutions (fork candidate solutions from an issue) at once, or do you spike only your issue/task that can't be estimated?

Matías Denda • May 8

Thanks for the comment! Really glad it resonated.

Great question — and honestly, both. It depends on the nature of the unknown.
If the uncertainty is "can we do this at all, and how?", then yes, exploring 2–3 candidate solutions during the spike is exactly the move. You're not just estimating, you're de-risking the design choice itself. Sometimes the spike output isn't "this is a 5", it's "approach A is a 5, approach B is a 13, let's go with A".

If the uncertainty is "we know roughly how to do it, we just don't know the cost", then I tend to spike a single path — usually the most likely one — deep enough to find the surprises. No need to fork candidates if you already have a sensible default.

The timebox is what keeps it honest either way. If you're forking 4 solutions in a 4-hour spike, you're going to do all of them badly. Pick the right depth for the time you have.

Also — totally agree the term "spike" is great. I didn't invent it (it comes from XP, Kent Beck era), but it's one of the most useful pieces of vocabulary the agile world produced. A small word that gives you permission to admit you don't know yet.

Some comments may only be visible to logged-in visitors. Sign in to view all comments.