Leon Pennings

Posted on Apr 28 • Originally published at blog.leonpennings.com

How to Test Whether Your Software Solution Actually Fits The Problem

#softwareengineering #java #softwaredevelopment #architecture

Every application is built once.

There is no second version of the same system, built with different architectural assumptions, run in parallel for a decade, and then compared on maintenance cost, team size, and requirement absorption speed. The alternative is never built. The counterfactual never exists. This is the Singleton Paradox applied to software: because each system is unique, there is no external reference point against which to judge whether it is a good solution to its problem — or merely the only solution anyone bothered to build.

This matters more than it might appear. It means that the quality of an architectural decision can never be measured by comparison. You cannot park the well-modeled system next to the poorly-modeled one and read off the difference. The poorly-modeled system is the only one that exists. So when it becomes expensive to maintain, slow to change, and eventually impossible to extend, those outcomes get attributed to the problem — the domain was complex, the requirements changed, the business grew — rather than to the solution. The solution is never put on trial, because there is nothing to try it against.

The Singleton Paradox does not just make good architecture hard to prove. It makes bad architecture hard to see. The absence of contrast is not neutral. It actively shapes what gets treated as normal. Rising maintenance costs are normal. Growing teams are normal. Slowing feature velocity is normal. Rewrites every seven to ten years are normal. None of this is normal in the sense of being inevitable. All of it is normal in the sense of being what happens when accidental complexity (Fred Brooks' term for the complexity introduced by tools and decisions rather than by the problem itself) compounds over time, and when there is no alternative visible to suggest it could be otherwise.

This creates a specific and solvable problem. If external comparison is unavailable, the only honest measure of whether a system is a good fit for its problem is internal. Not how it compares to another system that was never built, but how it behaves against time. Does it get easier or harder to operate? Does it get cheaper or more expensive to change? Does it remain stable as the domain evolves, or does it accumulate fragility with each passing year?

Those questions have answers. And the answers, taken together, constitute the only reliable verdict on whether the solution fit the problem.

The Ten-Year Cost Test

That internal measure can be made concrete. The Ten-Year Cost Test is a diagnostic any organisation can apply to its own systems — not a comparison against an alternative that was never built, but a set of questions about whether the current architecture is winning or losing against time. The threshold of ten years is not arbitrary. A system that cannot survive a decade without a rewrite has not been maintained; it has been replaced. And replacement, however it gets framed, is the system announcing that it was not a good fit for the problem it was built to solve.

The test is simple. After ten years in production, a well-designed system should satisfy all of the following:

Maintenance cost is the same or lower than year one. As the domain model matures and the team's understanding deepens, maintenance should become cheaper, not more expensive. The team knows where everything lives. The rules are explicit and localised. A change that took two days in year one should take two hours in year ten, because the model has been refined and the team has internalised it.

New requirements are absorbed faster as the system matures. A well-modeled domain does not merely keep pace with new understanding — it accelerates. Each addition deepens the team's knowledge of the model and reveals where the next extension naturally fits. When the business learns something new — a new product type, a new regulatory constraint, a new class of customer — the model should be able to absorb it with decreasing effort over time, not constant effort. If absorption speed is flat, the domain model is adequate but not right. If it slows, the model is failing. A well-modeled system gets easier to extend the longer it has been understood.

The team size required to maintain it has not grown significantly. This is perhaps the most honest measure of architectural health. A system that requires more people every year to maintain the same functionality is a system where accidental complexity is compounding. Each new developer adds coordination overhead. Each new layer of abstraction requires more people to understand it. A well-modeled system with low accidental complexity should be maintainable by a small, stable team indefinitely.

The application is as stable or more stable than it was initially. Stability should increase over time as the model matures and edge cases are understood and handled. If the system becomes less stable over time — more incidents, more unexpected interactions, more fragile integrations — accidental complexity is winning.

The cost of running it has not grown faster than the business it serves. Infrastructure costs, operational overhead, and support burden should scale with business growth, not with architectural entropy. A system that costs significantly more to run in year ten than it did in year one, while serving the same number of users, has a structural problem.

Apply this test honestly to any system you have worked on for more than five years. The results are rarely comfortable.

What the Industry Data Actually Shows

Before examining how the average project scores on this test, an important caveat is necessary. Rigorous longitudinal data comparing domain-first versus framework-first approaches over ten-year periods essentially does not exist in published form. The industry does not measure what it should measure. Deployment frequency, recovery time, and project delivery success rates are tracked. Total cost of ownership relative to architectural approach over a decade is not.

This absence is itself the Singleton Paradox operating at industry scale. Nobody ran the controlled experiment. Nobody built both versions of the same system and compared them over ten years. So the precise cost differential between approaches is genuinely unknown in the scientific sense — even though the directional evidence is consistent and substantial.

What does exist:

The CISQ estimated in 2022 that poor software quality costs US organisations approximately $2.41 trillion annually, with a significant portion attributable to accumulated technical debt. The direction of travel is clear even if the precise attribution to architectural choices is not.

The Standish Group CHAOS Report has tracked project success rates for decades. Despite continuous evolution of methodology — agile, DevOps, cloud-native — the underlying success rates have not dramatically improved. This implies the problem is structural rather than methodological. Better processes applied to the wrong architecture produce better-managed failure, not success.

The DORA research — Google's annual State of DevOps reports, now covering over 39,000 professionals — shows a persistently bimodal distribution. The 2024 report found that elite performing teams have change failure rates around 5% and recover from incidents in under an hour. Low performing teams have significantly higher failure rates and recovery times measured in days or weeks. Only 19% of organisations reached elite performance. The low performance cluster, meanwhile, grew from 17% to 25% of respondents between 2023 and 2024. The distribution is not a bell curve. It is two distinct populations. Architecture and approach appear to be the differentiating variable, not team size, budget, or industry.

Amazon Prime Video published a case study in 2023 describing a 90% infrastructure cost reduction after consolidating a distributed microservices monitoring service into a single process — a result specific to that service, not a platform-wide architectural overhaul, but instructive precisely because the team at Amazon chose to be candid about it. Segment, a data platform company, published a similar account. These are self-selected — organisations that consolidated and saved money are more likely to publish than those that saw no benefit — but they are directionally consistent with the argument being made here.

A McKinsey and University of Oxford study of more than 5,400 IT projects — conducted in 2012 and still the most comprehensive published dataset of its kind — found that large IT transformation projects run on average 45% over budget, 7% over time, and deliver 56% less value than predicted. That is first delivery. The trajectory over the subsequent decade is harder to find in rigorous published form — which is itself telling.

Scoring the Average Project

With that context, here is an honest assessment of how the average project scores on each dimension of the Ten-Year Cost Test. These are not precise figures — the data does not support precision — but they represent the consistent direction of the evidence.

Maintenance cost rises significantly on the average project. Industry estimates consistently place maintenance at 60–80% of total software lifecycle cost, and that proportion grows over time rather than shrinking. On framework-first systems, the annual upgrade cycle alone — broken dependencies, reworked configuration, revalidated integrations — consumes engineering capacity that produces zero business value. In the worst cases, maintenance costs grow 800% or more over a decade, eventually triggering a rewrite. In the best cases — domain-first systems with low accidental complexity — maintenance costs stay flat or fall as the model matures.

Requirement absorption speed slows materially on the average project. In a well-modeled system, new requirements should get faster to implement over time — not slower — as the team's understanding deepens and the model reveals where each extension naturally fits. On the average project, the opposite happens. What starts as a two-week feature becomes a two-month project by year five, as each new requirement must navigate accumulated accidental complexity. In distributed systems, a single business rule change triggers API contract renegotiation, versioning decisions, cross-team coordination, and staged deployments. In the worst cases, the system effectively stops absorbing new requirements — every change becomes a major project and the business routes around the software rather than through it. In the best cases, requirement absorption accelerates as the model matures. Flat speed is a warning sign. Slowing speed is a verdict.

Team size grows on the average project. Industry observation consistently shows teams of two to three times the original size by year ten, maintaining the same functional scope. In the worst cases — full microservices architectures with dedicated platform, SRE, and DevOps functions — the team exists primarily to manage its own infrastructure rather than to serve the business. In the best cases, the team stays small and stable. Three developers. Five hundred domain objects. Fifteen years.

Stability declines on the average project. DORA data shows that low-performing teams — the majority — have change failure rates approaching fifty percent and recovery times measured in weeks. Production increasingly becomes the final validation environment because the integrated system only meets real conditions there. In the worst cases, the organisation develops a chronic incident culture where production instability is treated as a fact of life rather than an architectural signal. In the best cases, stability improves over time as the model matures and edge cases are properly handled.

Running costs grow faster than business value on the average project. The shift to cloud computing made infrastructure costs more visible but did not reduce them. Microservices architectures run fifty to two hundred containers where a monolith needs three to five, with corresponding cost differentials. In the worst cases, infrastructure cost grows an order of magnitude while business capability grows modestly. In the best cases, running costs remain proportional to business growth throughout the system's life.

The rewrite conversation starts on the average project around year seven. In the worst cases, the conversation starts at year three or four — the system has already become unmaintainable before it is fully understood. In the best cases, the conversation never happens. The system absorbs new requirements, accommodates new technology at its boundaries, and continues to serve the business indefinitely.

The Inverse Is Also True

The data does not merely show that the average project fails the Ten-Year Cost Test. It shows that failure is the expected outcome — so expected that the industry has stopped treating it as failure.

Rising maintenance costs are attributed to business complexity rather than architectural choices. Growing teams are treated as evidence of business success rather than architectural inefficiency. Slowing requirements are explained by changing priorities rather than accumulated accidental complexity. Declining stability is managed with better monitoring rather than addressed at its source. The rewrite conversation is framed as modernisation rather than recognised as the bill arriving for choices made before the domain was understood.

This normalisation is the most dangerous consequence of the Singleton Paradox operating at industry scale. When everyone is paying the same inflated price, the inflated price becomes the reference point. The cost of accidental complexity is not visible as a cost. It is visible as the cost of software — the natural, inevitable, irreducible price of building systems.

It is not natural. It is not inevitable. It is not irreducible.

It is the compound interest on a specific set of choices, made consistently, across the industry, before domains are understood. Choices that look like engineering because everyone makes them. Choices that the Singleton Paradox ensures will never be clearly falsified, because the alternative is never built.

The Rewrite as the Final Verdict

There is one more signal worth examining. It requires no data, no research, no longitudinal study. It is available in almost every organisation that has been running software for more than a decade.

The rewrite conversation.

When someone in your organisation argues that the current system cannot support where the business is going — that it needs to be modernised, migrated, rebuilt on a new platform — that system has already announced its verdict on the Ten-Year Cost Test. The rewrite is not a sign of business ambition. It is the bill arriving.

The tragedy of the rewrite is not its cost, though the cost is substantial — typically measured in millions and years. The tragedy is what happens after. The new system almost always makes the same choices. The same framework is selected before the domain is understood. The same patterns are applied before the business concepts are named. The same accidental complexity is introduced in the first sprint and compounds through the same lifecycle.

Because the Singleton Paradox means the organisation never learned from the previous system what actually went wrong. The previous system ran in production. The pipeline was green. The architecture was recognised. The failure was economic and temporal — too slow, too expensive, too fragile to change — not functional. And economic, temporal failure is invisible until it isn't. By the time the rewrite conversation starts, the diagnosis is usually "technical debt" or "legacy architecture" or "we outgrew it." Rarely is the diagnosis accurate: accidental complexity was introduced before the domain was understood, and it compounded for seven years.

So the rewrite reproduces the conditions that made the rewrite necessary. And in another seven to ten years, the conversation starts again.

A well-modeled system does not generate the rewrite conversation. Not because it is perfect, or because requirements don't change, or because technology doesn't evolve. But because the essential complexity — the domain model — is separable from the accidental concerns around it. Frameworks can be replaced without touching the domain. Infrastructure can evolve without restructuring the business logic. The system adapts because its core is stable, and its core is stable because it correctly reflects the domain rather than the technology choices of the year it was built.

The Ten-Year Cost Test can be applied to any system. And the rewrite conversation, or its absence, is the most honest result that test can produce.

The Uncomfortable Conclusion

The Singleton Paradox means the direct proof will always be unavailable. You cannot park the well-architected system next to the poorly-architected one and read off the difference, because only one of them was ever built. You cannot compare the fifteen-year maintenance cost of a domain-first system against a framework-first system because the framework-first system is the only one that exists.

What you can do is apply the Ten-Year Cost Test to what you have. Ask honestly whether maintenance is getting cheaper or more expensive. Whether new requirements are getting faster or slower to absorb. Whether the team is staying small or growing to manage complexity. Whether the system is getting more stable or less. Whether running costs are proportional to business growth or running ahead of it.

And ask whether the rewrite conversation has started.

The industry data — imprecise as it is, incomplete as it necessarily must be — points consistently in one direction. The average project fails all five dimensions of the test. Maintenance rises. Requirements slow. Teams grow. Stability declines. Costs outpace business value. The rewrite conversation starts around year seven and reproduces the conditions that made it necessary.

This has happened so consistently, for so long, that it has been normalised into invisibility. The inflated cost has become the reference point. The compounding expense of accidental complexity has become indistinguishable from the natural cost of building software — because no one in the room has ever seen it otherwise.

The proof that it can be otherwise exists — in systems maintained by small teams in complex domains, absorbing new requirements cleanly, costing the same to run as they did a decade ago. Those systems exist. They simply never get compared to the alternative, because the alternative was never built.

The absence of that proof in your organisation is not evidence that it is impossible.

It is evidence of the Singleton Paradox.

And the Singleton Paradox is not a law of nature.

It is a consequence of choices. But not random choices — choices made under a specific kind of pressure that has nothing to do with fit. Spring Boot is chosen because the last project used Spring Boot. CQRS is chosen because the architect gave a conference talk on CQRS. Event-driven architecture is chosen because it is what sophisticated teams are supposed to use. These are not engineering decisions. They are career decisions dressed as engineering decisions. No one got fired for choosing the framework everyone else is using. The choice is defensible precisely because it is popular — and because the Singleton Paradox ensures it will never be tested against the alternative, it remains defensible indefinitely, regardless of what it actually costs.

This is the root cause the industry rarely names. Not incompetence. Not malice. The systematic selection of solutions on the basis of social safety rather than demonstrated fit — in an environment where demonstrated fit is structurally impossible to measure.

Choices that can be made differently.