DEV Community: Leon Pennings

Scrum Works — But Only When the People Making Decisions Feel the Outcomes

Leon Pennings — Mon, 01 Jun 2026 05:18:35 +0000

There is a version of Scrum that serves the product. There is another version that serves the agile transformation process. They use the same vocabulary, run the same ceremonies, and produce very different outcomes.

The first treats the sprint as a learning unit — a cycle of building, showing, and understanding, with the product as the permanent reference point and the business owner as a continuous presence in the work. The second treats the sprint as a reporting unit — a cycle of planning, delivering, and demonstrating, with velocity as the measure of success and the business owner as an end-of-sprint audience.

Most organisations believe they are running the first. Most are running the second. The difference is not methodology. It is consequence. And once you see it, you cannot unsee it.

Where Scrum Actually Came From

In 1986, Hirotaka Takeuchi and Ikujiro Nonaka published a paper in the Harvard Business Review studying how companies like Honda, Canon, and Fuji-Xerox built complex products faster and better than their competitors. They weren't studying software. They were studying what happened when you gave a cross-functional team a difficult goal, genuine autonomy, and full accountability for the outcome.

What they found was not a process. It was a consequence structure. The engineers at Honda were not following a framework. They were people who could not afford to be wrong — whose careers, reputations, and sense of craft were inseparable from whether the thing they built actually worked. The overlapping development phases, the self-organising teams, the continuous learning — these weren't designed. They were the natural behaviour of committed people given a hard problem and the freedom to solve it.

Takeuchi and Nonaka called one of their six key characteristics "multilearning" — the idea that learning had to happen continuously, at every level, through direct contact with the problem. Not through documentation. Not through handoffs. Through people who understood the domain working alongside people who understood the craft, close enough that ignorance was immediately visible and immediately corrected.

Jeff Sutherland and Ken Schwaber read that paper and recognised something important: software teams were failing catastrophically because they were running relay races when they should have been playing rugby. Waterfall's sequential handoffs — requirements to design to development to testing to deployment — introduced months of lag between a decision and its consequences. By the time you discovered the requirements were wrong, you had built on top of them for a year.

Their insight was correct. Tight feedback loops beat long planning cycles. Short iterations beat big bang releases. Direct domain contact beats document-mediated specification. The Agile Manifesto that followed made the priority order explicit: individuals and interactions over processes and tools, working software over comprehensive documentation, customer collaboration over contract negotiation, responding to change over following a plan.

The right side of those statements had value. It was just less important than the left — and the Manifesto said so explicitly.

That priority order has since been completely inverted — not because Scrum is flawed, but because of what was added to it.

The Pig and the Chicken

Early Scrum folklore told a story about a pig and a chicken who decided to open a restaurant together. The chicken suggested calling it "Ham and Eggs." The pig declined. "For you," the pig said, "that's a contribution. For me, it's a commitment."

The story was eventually removed from the official Scrum literature — perhaps it seemed uncharitable. But the principle it pointed at was exactly right, and its removal is itself a symptom of what went wrong.

Scrum works when the people doing the work are pigs. Fully committed. Consequentially exposed. When the product is wrong, they feel it. When the process slows the team down, they feel it. When the business owner's problem goes unsolved, they feel it. Their skin is in the game and the game's feedback reaches their skin.

Scrum degrades when the chickens accumulate.

A chicken is not a bad person. A chicken is a structurally consequence-free participant — someone who carries authority over how the work happens without bearing the outcome of that authority. They contributed something. They cannot be committed, because the structure doesn't allow it. They will move to the next engagement, the next team, the next organisation. The team will live with the consequences of their recommendations.

The best chickens know this about themselves. The strongest consultant scrum masters and agile coaches actively work to reduce their own authority — pushing consequence back onto the team, making themselves progressively less necessary, effectively working toward their own redundancy. That is a mark of genuine craft in a consequence-free role. But it is a character trait, and you cannot scale character. You cannot hire for it reliably across an organisation. You cannot depend on it as a structural guarantee.

This distinction matters more than any ceremony, any role definition, or any version of the Scrum Guide. Because a team of pigs running imperfect Scrum will self-correct — the feedback is immediate, the incentive to fix problems is intrinsic, and the process will evolve toward what actually serves the work. A team with too many chickens running perfect Scrum will drift toward process performance — because the people with the most authority over the process are the ones least exposed to whether it serves the product.

What Happened to Scrum

Sutherland and Schwaber encoded a pig observation into a framework. That was always going to be difficult — you cannot certify skin in the game. But the framework pointed at the right things. Self-organising teams. Direct customer contact. The scrum master conceived as someone embedded in the team's work, responsible for removing impediments that blocked delivery — not as an external observer of the team's process.

Then the industry arrived.

Not maliciously. Structurally. Organisations running Scrum at scale needed coordination mechanisms. The coordination mechanisms needed owners. The owners needed titles. The titles became roles. The roles became certifications. The certifications became hiring criteria. And at each step, the distance between process authority and process consequence grew a little wider.

The scrum master became a process coach. External to the team. Often shared across multiple teams. Measured on ceremony quality, team satisfaction scores, and adherence to the framework. Not on whether the product served the business. The sentence that captures the failure mode perfectly is one you will recognise if you have heard it: "I'll fix it next week — I have two other teams to coach." That sentence is structurally impossible if the scrum master is inside the team's consequence. It is inevitable if they are outside it.

The product owner — originally a role requiring genuine domain authority and business accountability — became a proxy. A translator sitting between the team and the real decision-maker, filtering business knowledge through the medium of user stories, insulating engineers from the domain rather than connecting them to it.

The infrastructure team — the one Scrum was partly designed to dissolve into the cross-functional whole — re-emerged as the CI/CD team, the platform team, the DevOps function. Different name badge. Same dashboard. Same structural distance from whether the product actually worked for the people who needed it. The dashboard stays green. The pipeline runs. And then there is still plenty of time to spend on minesweeper.

And with each new chicken added, consequence density fell. The feedback loops that should have corrected mistakes grew longer and more attenuated. The process filled the gap — providing the appearance of rigour in the absence of the reality it was substituting for.

Scrum was a revolt against exactly this. It became exactly this.

The Definition of Done Is a Symptom

Nothing illustrates the problem more precisely than what happened to the Definition of Done.

Most DoDs are social contracts: cucumber tests passing, peer review complete, product owner sign-off received. And those boxes answer a different question entirely — not "is this good?" but "whose fault is it if this is wrong?"

If all those boxes are checked, the engineer cannot be blamed if the functionality turns out wrong. They followed the process. The responsibility for whether it was the right thing to build is distributed so thinly across roles and sign-offs that it evaporates entirely. Nobody failed. The process succeeded. The business owner got something they didn't need, delivered on time.

The DoD is not a quality gate. It is a consequence substitute. It exists precisely because the people doing the work cannot feel directly whether it is right — because the business owner is not in the conversation, and because the consequence of being wrong has been spread so thinly across roles that nobody feels it acutely enough without a checklist to reach for.

Now consider the alternative. The business owner is genuinely part of the team — not as a stakeholder who attends the demo, but as a continuous presence in the work. The engineer who hits something that doesn't fit the model talks to them that day. Not next sprint. Not at the demo. That day. The working software is shown informally, mid-sprint, as a thinking tool — not as a deliverable but as a question: is this what you meant? The answer shapes the next two days of work, not the next sprint's backlog.

In that environment, what does the Definition of Done do? It documents what both parties already know, through a checklist that neither party needed to reach the answer. The DoD didn't produce the quality. The conversation did.

This is the cleanest diagnostic available for whether Scrum is serving your product or substituting for it: how prescriptive does your Definition of Done need to be? The more you need it, the further the business owner is from the work. A heavily checkbox-driven DoD is not evidence of good process hygiene. It is a measure of the consequence gap — the distance between the people who build and the people who know whether what was built is right.

Stories Are Not Tickets

The consequence gap shows up everywhere once you know to look for it, but nowhere more clearly than in how user stories are written and treated.

Alistair Cockburn, one of the authors of the Agile Manifesto, described the story card as a token for a conversation — a placeholder that represented a discussion yet to happen, not a specification already agreed. That is a precise and important idea. The card was never meant to replace the conversation. It was meant to prompt it.

What happened instead is that the card became the deliverable. The story became the ticket. And the conversation — the one that would have revealed what the domain actually needed — never happened, because the ticket already contained the answer.

A story written as a ticket describes how. "As an invoice clerk I want to export the invoice to PDF and email it to the customer." That is not a business need. That is a current business process, translated into acceptance criteria, handed to an engineer as a specification. The how has been decided before the what was understood.

The invoice clerk doesn't think of it as a how. For them, that is simply how invoicing works — how it has always worked, how they were trained to think about it. The mental model of the domain and the mental model of the current implementation have merged into one thing. When you ask them what they need, they describe what they do. This is not a failure of articulation. It is the natural epistemology of someone who lives inside a domain.

The engineer who receives "export to PDF and email" as a ticket implements export to PDF and email. The box is ticked. The DoD is met. And the actual business need — that the customer receives timely, accurate confirmation of what they owe — remains unexamined. Maybe PDF email is the right answer. Maybe the customer's system should pull it via API. Maybe the concept of "sending an invoice" is a legacy artefact of a paper-based process that software doesn't need to replicate at all. Nobody asked, because the story was a ticket, not a question.

Treat the story as a discussion item instead — as new information about a domain the team is trying to understand, not a task to be executed — and the entire dynamic changes. The engineer's job becomes domain archaeology: stripping the legacy how from the domain owner's description to find the what underneath. What problem are you actually solving? What would good look like if you had no constraints from the way you currently do it? What would disappear from your working day if this worked perfectly?

Those questions are uncomfortable. They require the domain owner to separate themselves from their own practice. They require the engineer to be genuinely curious about a domain they don't live in. But that discomfort is where the model gets built — and the model is what the software should reflect.

This is also why two or three weeks without domain contact is so dangerous. Every day the engineer works from the story-as-ticket, they make decisions based on their current understanding of the domain. Each decision becomes the foundation for the next. By the end of the sprint, the assumptions are load-bearing. Changing them isn't a story revision. It is structural rework. And the DoD, dutifully signed off, certifies a structure built on assumptions nobody tested.

The Sprint Is a Learning Unit

The most persistent misunderstanding in Scrum practice is what a sprint is for.

A sprint is not a delivery unit. It is a learning unit. The question it should answer is not "what did we complete?" but "what do we understand now that we didn't understand before, and does the software reflect that understanding?"

This distinction changes everything about how the sprint runs, how the review is conducted, and what the next sprint is for.

If the sprint is a delivery unit, the review is a closing ceremony. Stories are demonstrated. Sign-offs are gathered. The board is cleared. Planning begins for the next batch. The product at the end of the sprint is the output. The backlog is the input for the next cycle.

If the sprint is a learning unit, the review is an opening conversation. The product at the end of the sprint is not the output — it is the new starting point. The most important question in the room is not "did we build what we planned?" but "given what we've built and what we've learned, where does this point next?"

This is why the business owner seeing the product for the first time at the demo is a signal that something has gone wrong — not right. By the time the demo happens, they should already know what's there, because they have been part of the conversation as it developed. The demo is a show and tell for the wider organisation — stakeholders, interested parties, people who benefit from visibility into progress. It is valuable for that. It is not the primary feedback mechanism. The primary feedback mechanism is the continuous conversation between engineer and business owner that has been happening all sprint, informally, around working software that is always current enough to think with.

The backlog, in this model, is not a queue of pre-specified work. It is a set of open questions — hypotheses about what the product needs to become, held loosely and revised continuously as understanding grows. The long-term planning the product owner holds is directional, not prescriptive: functional areas, broad horizons, strategic intent. The specific shape of each sprint emerges from where the product currently stands and what the team has most recently learned.

That is what Takeuchi and Nonaka called multilearning. It was not a process characteristic. It was what inevitably happened when people with full commitment to the outcome worked in direct contact with the domain. The learning was continuous because the consequence of not learning was immediate and personal.

Limiting the Chickens

None of this is an argument against scrum masters, agile coaches, or specialised platform teams. There are excellent people in all of those roles — people who compensate for the structural absence of skin in the game through personal commitment, genuine craft, and deep care about outcomes they will never formally be held accountable for.

But you cannot build a reliable system on character traits. You can admire them. You cannot depend on them at scale. And you cannot ignore what the accumulation of consequence-free authority does to the system around those individuals — however excellent they are.

The question to ask about every role on or around a Scrum team is not "is this person good at their job?" It is two questions: does this person feel it when the product fails to serve the business? Does this person feel it when the process slows the team down?

Two yes answers: keep them close, give them authority, trust their judgment. One yes: useful, but watch the ratio. Two no answers: may be excellent. Cannot be the majority. Should not hold process authority over people who answered yes.

This is not about eliminating external expertise. It is about understanding what external expertise can and cannot provide. A good consultant scrum master brings experience, pattern recognition, and perspective that an internal team member might lack. What they cannot bring is consequence. And when consequence-free authority accumulates — scrum master, agile coach, platform team, architecture review board, all operating with authority over how the work happens but none bearing the outcome — the team learns quickly, and correctly, that the process does not belong to them. It was handed down from outside. So they perform it rather than own it. And a performed process is a checkbox factory almost by definition.

Scrum was partly designed to dissolve the independent infrastructure team — the group whose dashboard was their product, whose relationship to the actual product was mediated by tickets and queues. DevOps recognised the same problem and tried to dissolve the boundary between build and run. What neither fully resolved was the deeper pattern: that any specialised group whose success metric is their own domain, rather than the outcome of the product, will optimise for their domain. The pipeline will run. The ceremonies will happen. The dashboard will stay green.

The answer is not to abolish specialisation. It is to ensure that the people who feel the consequence of the product's success or failure are never outnumbered and never out-authorised by the people who don't.

Back to Honda

Honda's engineers did not have a Definition of Done. They had a standard they could not compromise, enforced not by a checklist but by the complete absence of distance between themselves and the consequences of falling short.

They did not have a product owner translating customer needs into stories. They had customers whose reactions to prototypes shaped the next iteration of the design directly, through the hands and judgment of the people doing the work.

They did not have a process coach facilitating their ceremonies. They had senior engineers whose authority came from depth of knowledge and shared consequence — people who removed obstacles because the obstacles were in their way too.

What Takeuchi and Nonaka observed was not a process. It was what process looks like when it is fully owned by the people who cannot afford for it to fail. The ceremonies that mattered emerged from the work. The ones that didn't, didn't happen — because nobody with skin in the game had time for them.

Scrum, at its best, is an attempt to recreate that condition in software teams. The framework is sound. The ceremonies are scaffolding. The roles are starting points. None of them are the point.

The point is consequence density. Who in the room cannot afford to be wrong? Who will feel it tomorrow if the model is off? Who has no dashboard to hide behind and no next engagement to retreat to when the product fails?

Keep those people close. Give them authority. Let the process serve them rather than the other way around. Make the business owner's continuous presence the normal condition rather than the exceptional one. Treat the story as a question, not a ticket. Let the sprint answer it. Let the product as it stands be the permanent starting point for the next conversation.

And when you feel the urge to add another sign-off, another role, another ceremony — ask first whether you are closing a genuine gap or substituting for a conversation that should just happen.

Because if the business owner is in the room, you already know the answer. And you don't need a checkbox to confirm it.

This article is part of a series on software engineering craft. The previous piece, "The Gods That Ate the Engineers," examines how the broader software industry mistook its tools for its craft.

The Gods That Ate the Engineers

Leon Pennings — Wed, 27 May 2026 05:55:38 +0000

How software development mistook its tools for its craft — and what it is paying for that mistake

There is a conversation that happens in software teams every day. Someone proposes a simpler approach. Someone else says "but we need this to scale." The first person asks what scale is actually required. The second person explains that the architecture team has decided on the standard stack. The first person points out that the application has twenty-five users. The second person suggests talking to the infrastructure architect.

The conversation ends there. Not because the technical argument was resolved. Because the two engineers were no longer speaking the same language. One was speaking the language of context — what does this problem actually require? The other was speaking the language of compliance — what does the standard say we should do? Those two languages have no shared grammar. The conversation cannot proceed, so it escalates instead.

This is not a story about stubbornness. It is a story about a profession that has progressively lost the vocabulary of first principles, and replaced it with the vocabulary of tools — and what happens when the people who should be having the hard conversation have never been taught the words.

The Measurement Problem Nobody Talks About

Software engineering has a property no other engineering discipline shares: its quality is almost entirely invisible.

A bridge that is over-engineered costs more to build. A building with poor thermal design costs more to heat. Even a book that doesn't serve its readers fails to sell. In each case there is a signal — a cost, a measurement, a market response — that connects engineering decisions to outcomes.

Software has one test: does it work? If the application runs in production, the engineering passes. If it doesn't, it fails. There is no measurement for whether it could have been built in a fraction of the time with a fraction of the complexity. Nobody built that version. There is no reference to compare against.

This is not just a gap in measurement. It is the foundational problem of the entire discipline. Because when the only validation is "it works," everything that produces working software becomes equally valid. The team that spent three months on spikes and produced a distributed microservices architecture that nobody fully understands — it works. The team that spent one day with domain experts, modeled the core concepts, and built a coherent system in three weeks — it also works. The outcomes look identical. The costs are incomparable.

Fred Brooks captured this tragedy in 1986: every system is built only once. There is no second version built with different assumptions, run for five years, and compared on total cost of ownership. The counterfactual does not exist. The cost of bad decisions is permanently invisible.

What fills the vacuum left by absent measurement? Authority. Convention. And demigods.

The Rise of the Demigods

A demigod is not a false god. That is important. A false god has no power. A demigod has real power — but finite power, power over a specific domain, power that has limits it will not advertise.

TDD is a demigod. It genuinely reduces certain classes of bugs. It creates a feedback loop between intention and implementation. Used with understanding, it is a valuable practice. But TDD defines the questions before it discovers the theory. Write the test, make it pass. The test describes an action — a thing the system should do. It says nothing about the mechanism that should enable that action, the underlying structure that would make the action natural rather than bolted-on. You can TDD your way to a perfectly tested mess. The tests are green. The architecture is incoherent. The demigod delivered what it promised and nothing more.

CQRS is a demigod. Separating reads from writes treats a real symptom — but that symptom is often produced by a deeper failure. When reads and writes conflict, it is frequently because the domain model isn't carrying its weight: state is inconsistent, rules are scattered, the persistence layer has leaked into everything. CQRS resolves the tension by physically separating it, at significant architectural cost, while the cause goes unexamined. The mess that made CQRS feel necessary is sealed behind the architecture and forgotten.

The conventional wisdom holds that complex domains — high-scale transactional systems, regulated industries, extreme concurrency requirements — genuinely justify this kind of architecture. The conventional wisdom has it backwards. Those are precisely the domains where a behavior-carrying domain model would deliver the most value, making invariants explicit and enforcing consistency rules at the model level rather than externalizing them into orchestration layers and read/write splits. What looks like sophisticated enterprise architecture is, in many cases, sophisticated coping with a modeling failure that the architecture was never asked to fix.

Microservices are a demigod. Scrum is a demigod. Each of them originated as an observation — someone looked at good engineering practice, noticed a pattern, and named it. The name spread. The observation became a methodology. The methodology became a certification. The certification became a hiring criterion. And somewhere in that journey, the principle the observation was pointing at quietly disappeared.

What remains is ceremony. Scrum was an insight about feedback loops: build something small, expose it to reality, learn, adjust. Now it is planning poker, velocity points, and a definition of done. The ceremonies survived. The epistemology was discarded. You can run perfect Scrum and never once have a conversation that deepens your understanding of the domain you are building for.

Spring is a demigod — and the most instructive one, because it did not merely obscure first principles. It industrialized their replacement.

Spring's recipe is seductive in its clarity: a controller receives the request, a service orchestrates the logic, a repository handles the persistence. Learn the recipe and you can implement almost any user story. The pattern is consistent, communicable, and scales across teams. It is also procedural programming wearing object-shaped clothing. The service class becomes the address for all behavior, because the recipe has no concept of behavior belonging to the domain objects themselves. Every new requirement gets the same answer: add a method to the service. The mechanism — the structure of the business domain, the responsibilities of its concepts, the rules that govern its behavior — is never considered, because the recipe answered the structural question before you asked it.

This is not a side effect of Spring. It is what Spring teaches. A library gives you capabilities and leaves the thinking to you. Spring gives you the thinking pre-done. Engineers who learned Spring as their foundation did not learn to reason about structure — they learned to apply a structure that was handed to them. When the recipe always fits, you never develop the judgment to know when it doesn't. The capacity atrophies quietly, and working software confirms at every step that nothing is wrong.

Spring did not create the anemic domain model. But it mass-produced it, certified it, and made it the industry default. It turned a modeling failure into a career path.

AI is a demigod — the latest, the most powerful, and the most dangerous one the profession has yet encountered.

AI is genuinely transformative at the implementation level. It can generate code, implement features, navigate unfamiliar frameworks, and eliminate enormous amounts of repetitive work. In a well-understood domain with an explicit model, it is an extraordinary accelerator — handling the mechanical expression of things the engineer already understands. That is real and significant power.

But AI has the same hard limit every demigod has. It cannot ask what the mechanism should be before implementing the action. It cannot determine whether a concept belongs in the domain model or whether it is accidental complexity in disguise. It cannot notice that the service class has become a procedural script, or that the architecture has answered the structural questions before anyone understood the structure. Give AI a well-modeled domain and it accelerates good engineering. Give it a recipe and a backlog and it produces Spring-shaped procedural code at a speed no human team could match — complete with tests, documentation, and a green pipeline, none of which will tell you that the map was never drawn.

The previous demigods papered over the absence of first principles. AI industrializes that papering at a velocity that makes the underlying absence nearly impossible to see and nearly impossible to recover from. The mess accumulates faster than any previous generation of engineers could have produced it. Every demigod arrived as a silver bullet. AI is the latest — and the profession is following the pattern with the same fidelity it always has.

When Tools Become Identity

Here is where the measurement problem and the demigod problem combine into something more serious — and where the economic machinery that drives the industry becomes visible.

Software development scaled faster than the supply of engineers who understood it deeply. The response was industrialization. If you are running a software factory, you need interchangeable parts. Interchangeable engineers require standardized tools. You cannot factory-manage engineering judgment — it is invisible, contextual, slow to assess, and impossible to replicate at scale. But you can factory-manage Spring Boot certification. You can standardize on Kubernetes. You can mandate the architecture diagram before the domain conversation happens, because the architecture diagram fits into a project timeline and engineering judgment does not.

The factory model did not choose tools over judgment because it was ignorant of the difference. It chose tools because tools are manageable and judgment is not. That choice, made millions of times in hiring decisions and project kickoffs and architecture reviews, compounded into an industry.

The economic incentives completed the picture. An engineer cannot put "sound engineering judgment" on a CV. They can put Kubernetes, Kafka, Spring Boot, and AWS. The market rewards tool-hoarding because tool-hoarding is legible and judgment is not. So engineers rationally invest in tools. They accumulate certifications. They learn the next framework. The career incentive and the factory requirement point in the same direction, and the profession follows.

The consequence is a generation of practitioners who were never taught the underlying principles — not because they are poor engineers by disposition, but because the path through the profession did not require those principles. Framework knowledge was sufficient. It got them hired. It gets features shipped. It passes the only test anyone applies.

This is where the Dunning-Kruger effect enters — and it enters structurally, not individually. When "it works" is the only feedback signal, the gap between tool expertise and engineering judgment produces no visible failures. The feedback loop that would expose the gap never fires. An engineer who has only ever navigated by Spring's recipe has no evidence that another kind of navigation exists, because both arrive at working software.

What happens when that engineer is challenged on a technical decision? They cannot retreat to first principles, because those principles were never their foundation. They can only defend the tool. And defending the tool looks like defending engineering — because in the world they have always inhabited, they are the same thing.

This is why the conversation about the build agent ends with "talk to the infrastructure architect." Not stubbornness. Not bad faith. The argument has moved to terrain where their map does not reach, and the only available response is to invoke authority rather than reasoning. The map was never drawn because nobody required it.

The Cost of Working Software

At this point a reasonable person might object: so what? It works, doesn't it? The software ships. The business runs. Teams are productive. Perhaps the architecture is heavier than it needs to be, but that is a philosophical concern, not a practical one.

It is not a philosophical concern. It has a price. And that price is paid in headcount, infrastructure spend, and organizational mass — every month, permanently, at a scale most organizations have never stopped to calculate because they have nothing to compare it against.

Start with the team. A Scrum-based delivery organization does not just have engineers. It has product owners to translate business needs into stories, scrum masters to run the ceremonies, agile coaches to optimize the ceremonies, and program managers to coordinate across the teams that have multiplied because the architecture decomposed the system into services each requiring ownership. None of these roles existed before the ceremony required them. They are not a consequence of software complexity. They are a consequence of the process layer that was wrapped around it.

The infrastructure follows the same logic. A well-modeled application, sized honestly to its problem, might run on a handful of servers with a deployment process a single engineer can understand. The standard stack requires container orchestration, service meshes, distributed tracing, centralized log aggregation, secrets management, cloud cost governance, and a security perimeter that scales with the number of services rather than the complexity of the domain. Someone has to build and own that infrastructure — which means an infrastructure team. Someone has to own the pipeline tooling — which means a platform team. Someone has to operate the observability stack that exists entirely because the system is too opaque to reason about directly — which means an observability practice, which means tooling budgets, which means vendor contracts.

Count it all. The ceremony layer, the infrastructure department, the platform team, the observability tooling, the architecture review board that exists because the architecture requires governing. Compare it to a team organized around an honest domain model, sized to the actual problem, with infrastructure that serves the domain rather than managing the accidental complexity the domain was never asked to absorb.

The difference in team size is not marginal. Doubling is optimistic. Tripling is closer. When infrastructure and tooling costs are included, the multiplier on total cost of ownership reaches further than most organizations want to calculate — because the calculation would require admitting that the standard stack is not an engineering choice. It is an organizational commitment, billed indefinitely, justified by working software that could have been built and maintained at a fraction of the cost by a team that understood what it was building.

The most expensive software is the software everyone agrees is fine.

What First Principles Actually Means

First principles in software engineering are not a methodology. They are not a framework. They cannot be certified.

In any engineering discipline, first principles means reasoning from what is actually true about the problem — from the undeniable constraints of physics, economics, or logic — before selecting any tool or approach. In bridge building, you start with loads, materials, and forces. In software, the undeniable truth is the business domain itself: what it does, what it needs, what rules govern it, what concepts exist within it. Everything else is a choice. The domain is not a choice. It is the ground the system must stand on.

First principles therefore begins with a single question: what mechanism does this business need, and what is the structure of that mechanism?

Not: what actions need to happen. Not: what user stories need to be implemented. Not: what does the recipe provide.

The distinction between actions and mechanisms is the one the entire profession routinely misses — and it is the one that determines everything that follows.

An action is something the system does. Place an order. Send an invoice. Notify a customer. Actions are visible, speakable, easy to write as user stories. They are also infinite. There is always another action. A system built around implementing actions never reaches coherence — it reaches a different kind of completeness, the kind where every story is closed and nobody can tell you where any particular rule lives.

A mechanism is the structure that makes actions possible. The domain concepts, their responsibilities, their relationships, the rules they enforce. Mechanisms are finite. A business domain has a bounded set of real concepts — not infinite. Once you understand them, new actions find their natural place. The mechanism does not need to change because a new action arrived; the action was always expressible in terms of the mechanism. You just had not asked for it yet.

This is why a day spent with domain experts outperforms four sprints of discovery spikes. The spikes are action-oriented. They produce implementations of specific scenarios, each one leaving a deposit of logic somewhere convenient, none of them building toward a coherent structure. The domain conversation is mechanism-oriented. It produces understanding of what the system actually is — and from that understanding, implementations become fast, because they are no longer navigating blind.

The domain expert knows the story. The engineer's job is to understand the mechanisms that story requires — and then model those mechanisms honestly, directly in code, without a documentation layer between the understanding and the implementation. A whiteboard sketch is a thinking tool. The code is the model. There is no pile of upfront design, no architecture document that creates its own maintenance burden and its own resistance to change. Formal documentation does not just resist change mechanically — it raises the social cost of being right. The person who says the abstraction is wrong is not raising a technical question. They are implicitly criticising the judgment of everyone who approved the document. So people stop saying it. Understanding goes directly into structure, continuously, as the understanding grows.

This is not big upfront design. It is the opposite. Big upfront design tries to answer everything before building anything. First principles thinking says: understand what is true now, encode it honestly, and stay honest as truth evolves. A payment system models credit cards — until digital wallets arrive, and the domain reveals the real concept was always a payment method. The model grows because the understanding grew. Not because a story was implemented. Because something was learned.

The Speed That Nobody Measures

The most persistent myth about principles-first development is that it is slow.

It is not slow. It is the fastest path available — and it gets faster as it goes, while the alternative gets slower.

Tool-driven, action-focused development feels fast because it is always moving. Tickets close. PRs merge. Velocity is high. But the team is navigating by taking the next available turn rather than reading the terrain. Enormous distance is covered traveling a short path. Each new feature lands in a codebase without a map, and finding where it belongs takes longer each time, because the codebase is larger and less coherent than it was before.

Principles-first development feels slower at the start because the team is reading the map. But the map converts future distance into present understanding. Features find their place. The model tells you where things belong. The implementation follows from the understanding — and because the model is clear, the implementation is the smaller part of the work, not the larger.

The asymmetry compounds over time. Principles-first gets faster. Tool-driven gets slower. They do not just start at different speeds — they move in opposite directions. And because both produce working software, the team on the slower path has no signal that another trajectory exists. The velocity metric measures motion, not progress. You can cover enormous distance going the wrong way and call it delivery.

The Conversation We Can No Longer Have

Something has been lost that is harder to recover than a methodology or a framework.

When two engineers disagree about a technical decision, resolution requires a shared language: first principles. Does this decision reflect the actual complexity of the domain? Is the added mechanism justified by what the domain requires? Is this accidental complexity or essential complexity? Those questions have answers that are reasoned, not asserted. But they require both participants to have internalized the same foundation — to reason from what is true about the problem, rather than advocate from what their tools provide.

When tool expertise replaces engineering judgment, that conversation becomes structurally impossible. Not because people argue in bad faith, but because they are operating from entirely different premises. One person is asking what the domain requires. The other is asserting what the standard stack provides. These are not positions that can be reconciled by better argument. They are not even positions in the same debate.

The engineer with first principles asks: what scale do we actually need? The engineer with tools answers: we use the scalable architecture. The first engineer points to the user count. The second engineer escalates to the infrastructure architect. This is not a failure of communication. It is a failure of shared foundation — and the shared foundation was never built, because the profession stopped requiring it.

The solution is not a new methodology. It is not another demigod. The last thing the industry needs is a certification in first principles thinking. It is the recovery of something quietly discarded as the profession industrialized — the understanding that engineering judgment precedes tool selection, that mechanisms precede actions, that one focused conversation about what the domain actually is will outperform any number of sprints implementing what the domain appears to do.

The Only Test That Matters

Software engineering currently applies one test: does it work?

That test is necessary but nowhere near sufficient. A system can work and be incomprehensible. A system can work and cost ten times what it should have. A system can work and reflect no coherent understanding of the domain it serves. The pipeline is green. The retrospective is positive. The modeling failure is invisible, as it always was.

The organization built to sustain the demigod stack — the scrum masters and platform teams and observability engineers and architecture review boards — has a structural interest in the stack continuing to be necessary. The demigods do not just persist because engineers worship them. They persist because the organizations that grew up around them cannot afford to question them.

That is where the profession is. Not failing. Working. Expensively, slowly, with tripled teams and bloated infrastructure and a generation of engineers who were handed a recipe instead of a craft.

Until someone asks: but what scale do we actually need?

And the room goes quiet.

And someone says: talk to the infrastructure architect.

And nothing changes — until engineers are once again taught that the map comes before the journey, and that knowing how to apply a recipe is not the same as knowing how to think.

The Properties of Enterprise Software That Lasts

Leon Pennings — Thu, 21 May 2026 08:43:16 +0000

"Perfection is achieved not when there is nothing more to add, but when there is nothing more to remove." — Antoine de Saint-Exupéry

Introduction

Enterprise software is different from other software. Not in the technologies used to build it, not in the frameworks, not in the methodologies. It is different in its purpose: it must work correctly today, remain correct over time, survive the people who built it, and adapt to a business domain that will change in ways nobody can fully predict. Most software is built to solve today's problem. Enterprise software must be built to outlast today's understanding.

That is a fundamentally different design goal. And it demands a fundamentally different way of thinking about software — about what matters, what doesn't, and what the job of a developer actually is.

The code is downstream of the thinking. The properties which determine whether enterprise software survives — or quietly becomes the system nobody dares touch — are not primarily technical. They are properties of understanding. And the thinking starts long before the first line is written.

The Six Properties

1. Longevity

The core of enterprise software should, in retrospect, survive ten to fifteen years. Not the UI framework. Not the ORM. Not the messaging library. The core — the domain logic, the structural decisions, the way the system understands and represents the business.

This sounds obvious until you consider how rarely it is treated as a design constraint. Most development decisions are made under short-term pressure: the sprint deadline, the current team's preferences, the framework that is fashionable today. None of those inputs have any relationship to what the system will need to be in year eight.

Longevity is not achieved by predicting the future. It is achieved by not over-committing to the present. Every unnecessary dependency, every piece of logic tied to a specific framework's idiom, every abstraction built around today's tooling rather than today's domain — these are bets that the present will continue. In enterprise software, the present never continues long enough.

Longevity is the north star. The properties that follow are the means to achieve it.

2. Upgradeability

Upgradeability is not about keeping dependencies current. Keeping dependencies current is maintenance. Upgradeability is structural: it is the capacity of the system to accept functional change without requiring a rewrite of its core.

This distinction matters enormously. A system can have perfectly up-to-date dependencies and be completely unupgradeable — because its structure was built around the features known at the time, implemented in a way that assumes those features are the final shape of the domain. When the business changes, and it will, there is nowhere to go.

Building for upgradeability means building with the understanding that what you know today is not everything. It does not mean building features you don't need — that is the opposite of the principle. It means implementing what you know today in a way that does not foreclose tomorrow. The structure should be open to extension, refactoring, and replacement at the right level of granularity.

This is also where the conventional wisdom about test coverage becomes a liability. Class-level unit tests — one test class per production class, testing the internal mechanics of each — are a contract on the current implementation. They make refactoring expensive by breaking whenever the internals change, even when the behavior is preserved. Over time, they become the reason the system cannot be restructured: the test suite has calcified the implementation.

Behavioral tests — tests that assert what a piece of functionality does, not how a particular class does it — are a contract on the domain. They survive refactoring because refactoring does not change behavior, only implementation. Upgradeability requires the right level of test coupling. Tests should be coupled to what the system does, not to how it currently does it.

3. Maintainability

Maintainability in long-lived software is primarily a question of dependency discipline. Every external dependency is a commitment: to a version, to an API contract, to a community that may or may not continue to support it. Over fifteen years, many of those commitments will become liabilities.

The critical discipline is asking, for every dependency: what does this actually buy us? Not in theory — in practice, in this specific system, for this specific use case. The question is not whether a dependency is good in the abstract — a battle-tested cryptography library, a well-maintained time handling library, a parser for a complex format — these earn their place because the alternative is genuinely worse. The question is whether this dependency serves this production system's domain needs, or whether it serves the tooling, the framework preference, or the developer's convenience.

The dependency that should be rejected without hesitation is the one whose primary justification is testability of the production code. Testability is a testing concern, not a production concern. Production code should not be structured, abstracted, or made more complex to accommodate the needs of the test suite.

This manifests in two particularly damaging patterns. The first is mocking-driven architecture: interfaces created not because the domain has multiple implementations of a concept, but because the test framework needs a seam to inject a mock. An interface with one real implementation, existing purely to enable a unit test, adds a layer of indirection with no domain justification. Every future reader follows the code, hits the interface, and must go find the implementation. The test was marginally easier to write. Every reader pays for that convenience forever.

The second is Aspect-Oriented Programming applied to cross-cutting concerns. The promise was clean separation — keep business logic free of logging, transactions, security, caching. In practice, the result is code where you cannot tell what is executing by reading it. The aspects are invisible in the source. Behavior is woven in at runtime by configuration that must be hunted for separately. You need a debugger to understand what your own code does. That is not decoupling. It is hidden coupling, which is strictly worse than visible coupling because at least visible coupling can be read.

Both patterns share the same failure: a tooling concern reshaped the production code in ways that made it harder to understand. The test suite or the framework became easier to work with. The system became harder to reason about. That is the wrong trade, and it compounds over fifteen years in ways that eventually make the system unreformable.

The simpler path is to make the production code so clear in its intent that the need for complex testing infrastructure is reduced rather than accommodated. Nobody tests string.trim() — not because someone decided it was below the testing threshold, but because its intent and behavior are completely transparent. The ambition for domain logic should be the same. order.send() can be just as obvious if the implementation reads like a statement of business intent rather than a sequence of technical operations.

4. Extensibility

Extensibility requires locatability. Before you can extend a piece of functionality, you must be able to find it — and find it with confidence that you have found all of it, not just the most obvious part.

This is where fat services fail. When business logic accumulates in large service classes organised around user stories or features, the domain structure disappears. Logic that belongs together by domain reason is separated. Logic that is separate by domain reason collides in the same class. Over time, the service becomes an archaeological record of every feature request, in chronological order, and understanding what it does requires reading its entire history.

Extensibility is only achievable when the code is structured around the domain — around what the business actually is, not around how it was requested. When that structure exists, adding a new capability means finding the right place in a coherent map. When it does not exist, extending the system means navigating a maze and hoping you found everything relevant.

5. Readability

Readability is not a soft property. It is not aesthetic. It has direct economic consequences over a fifteen-year lifespan that compound in ways that eventually make a system unreformable.

The measure of readability in enterprise software is not whether an experienced developer finds the code elegant. It is whether the intent and structure are followable to a non-engineer — a domain expert, a compliance officer, a business analyst — who can read the code and recognise their domain in it. This does not mean every line reads as plain prose. Some domains have irreducible technical density: complex financial calculations, regulatory rule engines, actuarial models. The bar is not that the implementation is self-explanatory to someone without domain expertise. The bar is that the structure expresses the domain, that the intent is visible, and that the domain expert can follow the logic well enough to identify where their understanding is or is not correctly represented.

If the code reads like hocus pocus at the structural level to the person who understands the business, the code has failed at its most important communication task.

This standard has consequences for every micro-decision in implementation. It argues against stream operations where a for-loop is clearer to a broader audience — not because streams are wrong, but because in domains where large in-memory sets are never permitted by design, the performance justification evaporates and only the readability cost remains. It argues against boilerplate reduction that sacrifices expressiveness for terseness. It argues against every clever idiom that shortens the code for its author while lengthening the cognitive load for its future readers.

"Boilerplate" is only boilerplate if it has no business purpose. Code that is verbose because it is expressing a business process is not boilerplate — it is documentation, in the only place documentation is always current. The argument to reduce it is always an argument to optimise for the writer. In enterprise software, the reader is nearly always more important. The code will be read an order of magnitude more times than it is written, by people who were not present when it was created.

On large data sets specifically: the correct architectural response is not to optimise how they are processed in memory — it is to enforce a boundary that prevents unbounded datasets from reaching the application layer at all. Chunk the data before it is loaded. This is an architectural constraint, not a performance trick. By making large in-memory sets structurally impossible, the design eliminates the entire class of optimisation pressure they create. The complexity of cursor management and pagination lives at the data access boundary, where it belongs, not scattered as stream operations through business logic. The upstream constraint produces downstream simplicity.

Readability is the condition that makes the other properties achievable. Code that reads like the domain can be upgraded because the domain is visible in it. Code that expresses intent clearly can be maintained because its purpose is self-evident. Code that maps the domain accurately can be extended because the map can be followed. It is not one property among five — it is the keystone.

6. Organisation

Organisation is qualitatively different from the first five properties. Those are visible in the codebase — you can read them, measure them, argue about them in a code review. Organisation is visible in what the codebase was allowed to become. It is the soil in which the other properties grow or fail to grow. Making it an explicit pillar says: this cannot be managed by ignoring it.

The question every development team eventually confronts is whether the organisation is supportive or restrictive. The honest answer is that it is almost always intended to be supportive and frequently experienced as restrictive — and the gap between those two is where a significant amount of enterprise software complexity originates.

The most common form this takes is architectural mandate without domain justification. Platform teams, rightly responsible for consistency and infrastructure standards, apply patterns designed for large distributed systems universally — including to applications that are, by domain definition, a single coherent thing. Microservices architectures get mandated for systems with no independent scaling requirements, no team boundary that would justify a service boundary, no domain reason for a network boundary to exist. The result is artificial complexity: deployment pipelines for services with no independent reason to exist, network calls where function calls would suffice, operational overhead that consumes development capacity without adding production value.

The architecture was not wrong for all systems. It was wrong for this system, for this domain, at this scale. But the mandate did not ask about the domain. It asked about organisational standards. And the production system pays the difference on every deployment, every change, every new hire who must learn the infrastructure before they can touch the domain.

This is organisational complexity billed to the production system. It feels like support. From the production system's perspective it is an undiscussed tax with no domain justification.

The Toyota Parallel

Toyota solved this problem in manufacturing and the solution translates directly to software development. The Toyota Way rests on two pillars: continuous improvement, and respect for people. Both are violated by the organisational patterns that produce restrictive environments.

Respect for people, in the Toyota sense, is not about workplace culture. It is an epistemological principle: the people closest to the work hold the most valuable knowledge about the work. On the production floor, the assembly worker who notices something wrong knows something the engineer in the office does not. Toyota's andon cord exists to make that knowledge immediately actionable — any worker can stop the line when they identify a defect, because the cost of a defect that travels further down the line is exponentially higher than the cost of stopping to fix it now.

In software development the people closest to the work are the developers and the domain experts. The domain expert who says "this doesn't reflect how we actually work" is pulling the andon cord. The developer who identifies a structural problem in the architecture is pulling the andon cord. Organisations that route those signals through layers of translation — product owners, project managers, UX designers, platform architects — are not being more rigorous. They are covering the cord in bureaucratic insulation and walking past it.

The second Toyota concept worth applying directly is genchi genbutsu — go and see for yourself. Do not manage from reports. Do not accept translated summaries. Go to where the work happens and observe it directly. For software this means the developer sitting with the domain expert, watching them work, seeing where the system creates friction, understanding the domain from its source rather than from a requirements document that passed through three people before it arrived. Every layer of translation between the domain expert and the developer is a layer where meaning is lost and assumption is substituted.

The third is jidoka — quality built in, not inspected in after the fact. You cannot UX-design your way to a correct domain model. You cannot test your way to a correct domain model. The correctness must be present from the beginning, in the understanding that shaped the implementation. When domain feedback arrives late — filtered through contact persons who are not the domain authorities, interpreted as a UX problem rather than a domain problem — the system has already been built around an incomplete model. Correcting it at that point is expensive. The organisational structure that produced the late feedback is the root cause, not the feedback itself.

Domain Feedback Is Always a Learning Opportunity

When domain experts say a system is too complex or doesn't make sense to them, the instinct in process-first organisations is to call a UX designer. This is solving the wrong problem at the wrong layer. UX is interface orientation — it makes existing concepts easier to navigate. It cannot fix a missing concept. If the domain model is incomplete, no amount of interface polish makes it clearer. You cannot design your way around a hole in the domain.

"Too complex" from a domain expert almost always means one of two things: a concept that exists in their mental model is absent from the system, or the system is telling a story the domain expert doesn't recognise as their own. Both are domain problems. The correct response is a domain conversation, not a design review.

This reframes what domain feedback actually is. It is not obstruction. It is not a sign that the users don't understand the system. It is the most valuable signal available — an authoritative source reporting that the model is incomplete. Organisations that treat it as a learning opportunity produce better software. Organisations that treat it as a user adoption problem produce expensive workarounds for incorrect models.

Discovery-Driven Implementation

The organisational conditions described above — domain experts who can reach the development team, feedback treated as learning, developers trusted to inquire beyond the story — enable something that process-constrained environments make nearly impossible: discovery-driven implementation.

Most software development is story-driven. The solution space is bounded by what was requested. The developer's job is to implement the described behaviour correctly and completely. This produces correct implementations of incomplete specifications, reliably and at scale.

Discovery-driven implementation starts from the same user story but treats it as a symptom description rather than a solution specification. The developer who asks enough questions about the domain — who wants to understand not just what was asked but why, what problem it actually solves, what the current process costs, where it fails — occasionally discovers that the problem as described is not the real problem. The real problem is upstream. And the solution to the real problem makes the described problem structurally impossible rather than better managed.

This kind of insight cannot be mandated. It cannot be specified in advance. It cannot be written as a test before it exists. It emerges from genuine engagement with the domain, from the developer who treats the user story as a starting point rather than a work order, from the organisation that protects the space for that inquiry rather than constraining every hour to story execution.

The deepest return on domain understanding is not better implementation of what was asked. It is the occasional recognition that the problem as described is a symptom — and that the real solution makes the symptom structurally impossible. That insight cannot be mandated, cannot be specified, cannot be tested before it exists. It emerges from genuine engagement with the domain, and it is available only to the developer who treated the user story as a starting point rather than a work order. Organisations that protect that space — that trust developers to inquire, to discover, to propose solutions nobody asked for because nobody knew to ask — produce software that solves real problems. Organisations that constrain that space to story execution produce software that manages symptoms, expensively, forever.

The Foundation Beneath the Properties

Every property described above is downstream of something that is not a technical practice at all. It is understanding.

You cannot write readable code about something you do not understand. You cannot structure something well that you have not thought through. You cannot know what to leave out — which is often more important than knowing what to put in — unless you understand the domain well enough to recognise what is essential and what is incidental.

The User Story Is Not a Work Order

A user story is a starting point for a conversation, not a specification for implementation. The moment a developer treats it as a work order — something to be implemented against acceptance criteria, tested to green, and closed — they have accepted someone else's translation of the domain as complete and correct. That translation is almost never complete, and sometimes critically incorrect.

The developer's job before the first line of code is to understand the business goal behind the story. Not the described behaviour — the goal. This requires asking questions. Not to clarify ambiguous requirements, but to understand the domain itself. What is this actually trying to achieve? What are the edge cases the domain expert considers obvious? What should this system never do, and why?

Consider a user story about calculating UBO — Ultimate Beneficial Ownership. A developer implementing against the story might write: find all natural persons with ownership percentage above the threshold. That is what the acceptance criteria describe. The tests pass. The implementation is wrong.

A correct understanding of UBO reveals that it is not about direct ownership percentage in isolation. It is about effective control — who ultimately determines the decisions of the entity, regardless of how the ownership structure is arranged. The question is not just who is the UBO. It is who else is the UBO. And it is who also has control. If there is no "also" — there is just one.

That small shift in framing immediately surfaces a class of scenarios that the acceptance-criteria reading misses entirely. Consider natural person 1 who holds 4% in company A and 4% in company B. Company A holds 96% in company B. Company B holds 96% in company A. By direct ownership percentage, natural person 1 appears below the UBO threshold. By effective control, natural person 1 is 100% the UBO of both companies — because the circular cross-ownership means neither company has any independent shareholder beyond this person.

No test-first methodology surfaces this. No refactoring produces it. Domain understanding produces it, in the conversation before a line of code is written, because a developer who understands what UBO law is actually designed to do recognises this scenario not as an edge case but as a textbook example of what the law was written to catch.

What the Implementation Should Not Be

Domain understanding does not only tell you what to build. It tells you what not to build — and that is often more valuable.

When you understand that UBO is about effective control through any structure, you immediately know the implementation should not be a threshold check on direct ownership percentages. That single "should not" eliminates the naive implementation before it is written. It eliminates an entire class of wrong solutions without a single line of code.

This is the discipline of subtraction. Every constraint that comes from genuine domain understanding is a constraint that prevents future complexity. What is not there cannot introduce a bug. What is not there requires no maintenance. What is not there cannot become the thing nobody dares touch because nobody understands why it exists.

The simplest correct solution is also the most durable one. Not because simplicity is aesthetically preferable, but because complexity compounds. Every unnecessary abstraction, every dependency added for theoretical future benefit, every pattern introduced for a problem the system does not have — each one is a tax on every future change, every new hire, every upgrade cycle. Over fifteen years those taxes become the reason a system becomes unreformable.

The Right Level of Test Coverage

Honest test coverage in enterprise software is not a percentage target. It is a risk assessment.

The question is never "what percentage of lines are covered?" It is: "where are the places this system could be silently wrong, and how quickly would we know?" Tests earn their place where the real-world feedback loop is too slow, too infrequent, or too opaque to catch failures naturally.

A login page that breaks gets reported within minutes — high-frequency paths like these are well covered by integration, smoke, and end-to-end tests that run as part of any competent CI pipeline. Deep unit testing of those flows is redundant effort. A UBO calculation might run once a day for a small compliance team. It could be wrong for weeks before anyone notices. The domain is complex enough that failures are non-obvious. That is precisely where a behavioral test earns its place: not as a development guiderail, but as a specification of correctness for something that does not announce when it is wrong.

In practice, this produces test coverage in the range of 30 to 50 percent — not because the rest of the code is untested, but because the rest of the code is covered by higher-level tests and validated continuously by the people using it. The 30 to 50 percent that is explicitly tested at the unit or behavioral level is the core domain logic: the calculations, the rule evaluations, the business-critical paths where silent failure is a real and consequential risk.

This is a more defensible position than 90 percent coverage that includes getters, setters, login flows, and string formatting. Coverage as a metric measures lines executed, not correctness guaranteed. Behavioral tests on the domain core, combined with integration tests on the main flows and a system simple enough that its failures are visible, produces better assurance than a heavily instrumented suite that tests implementation details nobody will care about in year seven.

The Training Wheels Problem

There is a pattern in software development where tests function not as a quality mechanism but as a substitute for understanding. If the developer does not fully understand what they are building, green tests provide a guiderail: as long as the tests pass, the implementation is probably acceptable.

Training wheels do not teach balance. They teach riding without balance — a different skill entirely. A developer conditioned by green tests as their primary signal learns to satisfy the tests. A developer who understands the domain learns what the business actually needs. Those are not the same education, and in complex domains they produce starkly different results.

The test suite becomes a confidence mechanism decoupled from correctness. The tests reflect the developer's mental model of the domain. If that mental model is incomplete — and without domain inquiry it almost certainly is — the tests are an incomplete specification, confidently asserted as complete. This is worse than no tests. It is false assurance.

The cure is not better tests. It is understanding deep enough that the test's contribution becomes marginal. If the code expresses the domain correctly and reads plainly enough for a domain expert to validate its structure, the test suite's role as documentation and safety net diminishes considerably. A tester who says his functional tests serve as documentation of the application is making an admission: the production code has failed at its most important job. Documentation belongs in the place where it is always current — in code that reads like the domain it represents.

When the Process Becomes the Bug

There is a question worth asking of every engineering practice, every tool, every ceremony: is this the best choice for the production system, or is it the best choice for the process, the tooling, or trend compliance?

The production system is the artifact that matters. Everything else — the sprint board, the Jira backlog, the test suite, the deployment pipeline, the architecture decision records — is support infrastructure. It exists to serve the production system. The moment any of it starts making decisions for the production system, the hierarchy has inverted. And it inverts constantly, quietly, and with complete institutional legitimacy.

Nobody says "we are going to let Jira determine our engineering decisions." But when a five-minute bug fix gets put on the backlog because the process requires it, Jira just made an engineering decision. When a developer adds an abstraction layer to satisfy a test framework rather than to express the domain, the test suite just shaped the production system. When a simple piece of logic gets restructured to comply with a framework convention that has no business relevance, trend compliance just overrode domain clarity.

Process thinking asks: are we following the process correctly? Production thinking asks: what is the best outcome for the system?

When they conflict, the answer should be immediate and unambiguous: the production system wins. The process is a tool. Tools do not have votes.

The Bug Economics

Consider the real cost of a simple bug — a button that doesn't work, an enum stored as an integer instead of a string — when it travels through a process-first system versus a production-first one.

In a production-first system with simple, readable code and a CI pipeline that allows release at any time: the bug is reported, understood, fixed, and released the same day. Total engineering time: five minutes to fix, minutes to release. The user experiences a brief interruption and a same-day resolution.

In a process-first system the same bug looks like this:

Reported and logged: 10 minutes of administration
Discussed in standup or triage: 20 minutes
Estimated and planned into a sprint: 15 minutes in a planning meeting
Picked up one or two sprints later by a developer who must first relearn the context, understand the bug, navigate the abstraction layers, fix the code, fix the broken tests, and write new tests: 60 minutes or more

Total: approximately 110 minutes of engineering time to resolve a 5-minute problem, with the user waiting six weeks for a fix that was always trivial. That is a 22-times cost multiplier applied entirely by the process. The bug is not better fixed. The system is not more stable. The outcome is strictly worse in every dimension — cost, speed, and user experience — and the process produced it.

The Kaizen Parallel

This is not a new insight. Toyota's lean manufacturing principles identified this failure mode decades ago under the concept of muda — waste. Waste in production systems is any activity that consumes resources without adding value. The 105 minutes of process overhead on a 5-minute fix is almost pure waste: motion without value, waiting, unnecessary processing.

The deeper Kaizen principle is that the person closest to the problem is best positioned to fix it. The developer who wrote the code, who understands it today, who can see the bug clearly right now — that person fixing it immediately is the optimal outcome by every measure. Deferring it transfers the problem to a different person at a different time with less context, more overhead, and a worse result.

Empirically, this approach does not produce more bugs. Teams that have observed both models report comparable defect rates. The difference is resolution time: same-day fixes versus multi-sprint delays. On the metric that actually matters to the business — how long does a known problem affect users — the simple, production-first system wins decisively.

The Real Job

The assembly part of software development — implementing a described behaviour to pass a set of tests — is a commodity skill. It is increasingly automatable. It produces measurable output in a sprint and moves tickets across a board. It is the part of the job that process-first thinking measures, rewards, and optimises for.

The understanding part is not a commodity. It is not automatable. It does not show up in velocity metrics or test coverage percentages. But it is the part that determines whether the software is actually correct. It is the part that finds the circular ownership scenario before it becomes a compliance incident. It is the part that knows what to leave out. It is the part that produces code readable enough that a domain expert can spot an error without running a test. It is the part that makes a bug a five-minute fix rather than a two-sprint project. And it is the part that occasionally recognises that the problem as described is a symptom — and builds the thing that makes the symptom impossible.

Everything that is not in direct service of the production system is not neutral overhead today. It is an obstacle tomorrow. The fifteen-year lifespan makes this visible in a way that a two-year project never does. The complexity accumulates. The process overhead compounds. The abstractions added for testability become the walls that trap the system. The dependencies added for framework compliance become the liabilities that prevent the upgrade. The architectural mandates applied without domain justification become the constraints that make every change expensive.

Ask of every decision: is this the best choice for the production system? If the honest answer is "no, but it satisfies the process" — remove it. Whatever is not there cannot break, does not need maintenance, and does not need to be understood.

Simplicity is not the absence of effort. It is the result of understanding deep enough to know what to remove.

The properties described in this article — longevity, upgradeability, maintainability, extensibility, readability, and organisation — are not independent qualities to be optimised separately. They are consequences of a single discipline: understanding the domain well enough to represent it simply, correctly, and durably in code that will outlast the people who wrote it. The process serves that goal. When it stops serving that goal, the process is the bug.

What Is a Rich Domain Model?

Leon Pennings — Tue, 19 May 2026 11:48:05 +0000

Most articles about rich domain models get lost in comparisons to anemic models, debates about OOP mechanics, or pattern catalogues. This is not one of those articles.

A rich domain model is not a technical pattern. It is a discipline — one that produces a living, explicit representation of the essential complexity of a business domain. Understanding what that means, and what it unlocks, requires stepping back from the code entirely.

Essential Complexity, Made Explicit

Start with what a rich domain model actually is.

It is a set of objects, each playing a defined role in the business domain, each owning the responsibility that role entails. Not what state they carry — but what they know, what they decide, and what belongs to them. Think of it less like a data structure and more like a cast of actors: each one has a role, and the role defines everything. What they are responsible for. What they know. What they act on. What they refuse.

This distinction matters more than it might seem. An actor on stage is not described by listing their costume and props. They are described by their role — what they do, what they own, what they are accountable for. The props are incidental. In the same way, a domain object is not defined by the fields it holds. It is defined by its responsibility. State may be part of how it fulfills that responsibility — but it is an implementation detail of the role, not the definition of it.

The contrast with an anemic model follows directly. An anemic model is a cast of actors who have been stripped of their roles. They stand on stage holding props while someone offstage calls out instructions. The data is visible. The knowledge of what to do with it is gone — moved into service classes, transaction scripts, and workflow configurations that grow without principle and conflict without resolution.

Fred Brooks gave us the vocabulary to understand why this matters. He distinguished between essential complexity — the complexity intrinsic to the problem itself, which cannot be removed — and accidental complexity, everything else: the frameworks, the indirections, the patterns applied without cause.

The actors and their roles are the essential complexity. They are not a representation of it or a metaphor for it — they are it, made visible and explicit. Every business rule that is genuinely hard, every lifecycle that has real consequences, every constraint that exists because the business demands it: these find their home in a role, owned by an actor, named and present in the model. You can see the essential complexity. You can point to it. You can reason about it directly.

Once the essential complexity is that explicit, accidental complexity loses its camouflage. It cannot pretend to belong. Every framework choice, every infrastructure decision, every pattern applied can be held up against a simple question: does a domain object — a named actor with a defined role — actually require this? If not, it is accidental complexity, and it has no business being there. The model makes that judgment possible because the essential complexity is no longer hiding.

This is not the same as reducing complexity. The business is as complex as it is. What changes is whether that complexity is visible, owned, and honest — or scattered, implicit, and discovered only when things break. The rich domain model ensures the essential complexity is always primary. Everything else is secondary, and known to be so.

A Tool for Learning the Domain

Most development approaches start from requirements. A user story describes motion through a system: a user does something, something happens. This teaches you the rivers — the flows, the happy paths, the scenarios that have been thought of so far.

A domain model teaches you the terrain. Once you understand the terrain, the rivers make sense. Without it, you are always following water, never knowing where you are.

This distinction matters enormously in practice. When a developer learns a business domain through user stories and debugging, they accumulate procedural knowledge. They learn symptoms. They build a mental model that is a patchwork of scenarios, edge cases, and tribal knowledge. That understanding does not transfer easily and does not survive personnel changes.

When a developer learns through the domain model — starting with the core concepts, understanding their responsibilities and relationships — they learn causes. The what and why of the business becomes clear before the how. Onboarding that previously took months can take hours, not because the business became simpler, but because its essential structure was made explicit and navigable.

Canonical Truth for the Business Domain

A codebase without a domain model has no authoritative reference for what the business believes. Logic accumulates in transaction scripts, in service classes, in stored procedures, in workflow configurations. It is never gathered in one place where you can ask: is this consistent? Does this conflict with that?

The rich domain model is that place. It is not documentation in the sense of comments or wikis — those go stale and lie. It is living documentation, expressed in code, that is wrong only when the code is wrong. When two features conflict, the domain model is the referee. When a new requirement arrives, the model is the context in which it is evaluated — is this already expressed somewhere? Does this contradict something that exists?

Without that context, conflicting logic does not just happen occasionally. It is inevitable. There is no shared reference, so there is no way to prevent divergence. The model prevents it not through process or discipline, but through the simple fact of existing.

What Belongs in the Domain Model

A common misconception is that domain objects are database rows dressed up with methods. This conflation produces models that are anemic by construction — the shape of the schema becomes the shape of the domain, and the domain becomes a mirror of the persistence layer rather than a representation of the business.

The domain object is defined by its responsibility, not by its persistence. Whether it holds state is irrelevant to whether it belongs in the model. What matters is whether it represents a genuine business concept with a defined responsibility.

Some domain objects have state that should be persisted. In that case, the ORM annotations live on the domain object itself — there is no separate entity class, no parallel representation. The domain object is the single source of truth, and persistence is simply a capability some objects happen to have. There is no ORM object that is not a domain object. If one exists, that is the smell — not a feature. Some will object that this violates persistence ignorance — that the domain should not know about its own storage. But a domain object declaring what it needs is not pollution. It is honesty. The alternative — a parallel entity class that mirrors the domain object field by field — is not cleaner architecture. It is the same information written twice, with an extra layer of indirection between them and nothing gained in return.

Other domain objects have no persistent state at all. A CurrencyConversion that owns the rules and cache for converting between currencies is a full citizen of the domain model. An Interaction that represents a session of intent against the domain — carrying the current user, the transaction boundary, the active roles — is a domain object. Neither has a table. Both have clear, defined responsibilities.

The question is never "does this have a table?" The question is always "does this represent something real in the business, with a responsibility that can be named?"

The Interaction: A Worked Example

Interaction deserves particular attention because it illustrates what correct modeling unlocks beyond the obvious.

Every non-trivial business application has the concept of an interaction: a moment of intent against the domain, initiated by a known user, within a defined transactional boundary, with a lifecycle that has a beginning and an end. This concept exists whether you model it or not. The question is whether it is explicit or scattered across framework configuration, security filters, transaction annotations, and audit log scrapers.

When modeled explicitly — made available within the execution context, whether via ThreadLocal, scoped storage, or whatever the runtime demands — Interaction becomes the natural owner of everything that belongs to that lifecycle. The storage mechanism is an implementation detail. The concept is not.

During an interaction, any part of the domain can ask Interaction.hasUserRole(CancelOrderRole.class) — not as a security check imposed from outside, but as a domain question answered where the action is performed. Authentication is resolved before the interaction begins; a valid Interaction means a valid user. Authorization is expressed where it is enforced.

At the end of an interaction, deferred actions execute within the same transactional boundary. Emails are sent, events are fired, downstream reactions trigger — and if anything fails, everything rolls back, including the email that had not yet been sent. This guarantee is structurally impossible to achieve with a message broker bolted onto the outside of an application without significant infrastructure overhead. Here it is a natural consequence of the model.

After the end of an interaction, post-transaction actions execute outside the boundary, intentionally and explicitly. On cleanup, state is torn down predictably — no leaked state between requests.

The audit trail — who did what, when, and did it succeed — emerges naturally because Interaction already knows all of it. It is not assembled from logs after the fact.

None of these capabilities were designed individually. They are all consequences of modeling the right concept. This is what essential complexity, made explicit, produces.

The Order: Lifecycle as Domain Responsibility

The same principle applies to any object with a meaningful lifecycle. Consider an Order.

Order is constructed from an OrderRequest. In its constructor — or through an assemble() method called immediately — it validates that all items have prices (failing fast if not), reserves inventory, creates the Invoice, determines from the request whether fulfillment is pickup or delivery, and if delivery, creates the Shipment internally. No external coordinator performs these steps. The Order knows what it means to be an order.

Once assembled, the Order's state gates what is possible. deliver() is only reachable because assemble() completed. Anything attached to an order — documents, notes, events — is evaluated against the current state. The object enforces its own rules.

The lifecycle of the order is expressed in OrderMilestone objects: created at LocalDateTime X, ItemsCompleted at X+1, Shipped at X+2. This is not logging in the developer sense. This is the Order remembering its own history. Audit trails, reporting, and debugging are free consequences of a model that is honest about time.

There is no OrderService that knows the steps. There is no OrderProcessor that coordinates the flow. What is often called orchestration is simply the Order's own behavior, waiting to be claimed.

There Is No Such Thing as Orchestration

"Orchestration" is a concept that appears when objects are not carrying enough responsibility. The argument is that some flows are too complex to live in any single object, that something external must coordinate. But this argument always rests on the same foundation: the objects being coordinated are anemic. They cannot coordinate themselves because they hold no behavior.

The stronger claim is this: orchestration is a business process, and every business process has an owner. The moment you ask "whose responsibility is this flow?" the answer is always a named thing in the business. Named things in the business belong in the domain model.

If the checkout flow belongs to Order, there is no orchestration — only an object doing its job. If a more complex cross-domain process exists, the business has a name for it. That name is your object.

The workflow engine question resolves the same way. A workflow engine is infrastructure for implementing an unmodelled requirement. It allows a business process to be encoded without ever being understood. The process runs, tickets close, and the pressure to model never arrives. Meanwhile the process becomes invisible — it lives in configuration, not in the domain, and the model no longer reflects reality.

By making the process explicit in the model, you force the understanding upfront. Traceability, accountability, and auditability are not bolted on afterward — they are natural consequences of a process that is owned and expressed. And the model becomes resistant to casual change. A workflow engine can be reconfigured quietly. A domain object that explicitly models a process requires intentional change. You must touch the model. That is not a constraint — it is a feature.

The Architecture That Emerges

When the domain model is honest and complete, the architecture that surrounds it becomes remarkably simple.

The domain is the center. Everything else is translation. An adapter takes an external signal — an HTTP request, a queue message, a UI event, a file drop — translates it into something the domain understands, and translates the response back. Whether that adapter is called a web service, a UI connector, or a queue client is an implementation detail. Its functional purpose is always the same: adapt an external request to the domain, and an answer from the domain to the outside world.

This framing eliminates the need for many patterns that exist only because the domain is not carrying its weight. There is no need for a dependency injection container to wire together a domain that is self-contained. There is no need for a repository pattern when persistence is an annotation on the domain object that requires it. There is no layered architecture to enforce when the boundary between domain and adapter is conceptual and obvious.

The complexity budget is spent entirely on essential complexity, because there is nowhere for accidental complexity to hide. Every technology choice can be evaluated against a single question: does a domain object require this? If not, it has no business being there. The domain model is not just a design tool — it is the justifier for every architectural decision, the brake on over-engineering, and the answer to YAGNI grounded not in gut feel but in domain reasoning.

The Principle Underneath

There is a principle that connects everything above:

The ease of implementing something without modeling it is proportional to the hidden cost of never having modeled it.

Every approach that starts from how rather than what — procedural scripts, transaction-script architectures, use-case driven development — shares this characteristic. The requirement is the input, the implementation is the output, and the domain never appears. Each new requirement starts from scratch, because there is no accumulated understanding to build on. The codebase grows. The knowledge does not.

A rich domain model inverts this entirely. The domain is the input. Requirements are queries against that understanding. New requirements find their place in something that already exists — or reveal, through the friction of not fitting, that the domain needs to grow. Either way, understanding accumulates. The model becomes more true over time, not less.

That is what a rich domain model is. Not a pattern. Not a layer. A discipline of making the essential complexity of a business explicit, owned, and honest — and letting everything else follow from that.

Sidebar: On AI-Assisted Development

AI is genuinely useful in a domain-centric codebase — for implementing adapters, generating boilerplate, and accelerating everything that surrounds the model. It pattern-matches well against known structures, and once the domain is understood, there is plenty of that work to do.

Domain modeling is a different activity. It requires understanding what the business actually is — not just what a ticket describes. It requires recognizing when a concept is missing, resisting the obvious implementation in favor of the correct abstraction, and making judgment calls about responsibility that have no objectively correct answer. AI has no access to the lived understanding that produces those judgments.

The most useful role for AI in a modeling context is as a mirror — a Socratic partner for stress-testing a hypothesis about a concept's responsibility or boundary. It surfaces objections, identifies gaps, and forces precision. That is valuable. But the modeling itself remains a human activity, and the discipline of doing it remains more important in an AI-assisted world, not less. Without the model, AI produces procedural code at unprecedented speed — and accumulates the hidden cost of unmodeled requirements faster than any previous approach.

Sidebar: On Practical Effects

The common perception is that a rich domain model requires heavy upfront investment — that you must design everything before writing any code, and that this slows delivery. In practice the opposite is true, and the gap becomes visible quickly.

Early in a project, a team building a rich domain model is establishing core concepts and their responsibilities. This feels slower than a team wiring up framework configuration and generating boilerplate. But by the time the first meaningful features are being built, the domain team is adding behavior to objects that already understand the business. New requirements find their place. The model tells you where things belong. The other team is asking "where does this code go?" for every new feature — and the answers are becoming less consistent, not more.

The acceleration compounds. Maintenance is cheaper because the model is the documentation — it cannot go stale, because it is the code. Debugging is faster because the model expresses business intent, not just technical state. The difference between "the Order refused shipment because it was already delivered" and "some process node returned an unexpected status" is the difference between understanding and archaeology.

The people costs tell the same story. Onboarding a developer onto a well-modeled domain takes hours, not months. The knowledge is in the model, not in the heads of the people who built it. That is not just an efficiency gain — it is a risk reduction. The bus factor of an application with an explicit domain model is structurally higher than one without.

The cost of not modeling is real, large, and almost never measured — because there is no comparable version of the same application where it was modeled. You cannot see the cost of understanding you never accumulated. You only feel it, gradually, in every feature that takes longer than it should, every bug that touches more than it should, and every developer who leaves taking knowledge that was never made explicit.

The Architecture Tax — Why Enterprise Software Is Expensive, and Why AI Won't Fix It

Leon Pennings — Mon, 18 May 2026 07:38:59 +0000

The story the industry tells

Enterprise software is expensive. It requires large teams, significant infrastructure, complex deployment pipelines, and sustained operational effort. Requirements that sound simple take weeks. Systems that should be stable require constant attention. The codebase that was coherent at year one is opaque by year four. New developers take months to become productive. Changes that touch multiple parts of the system require coordination that absorbs more time than the implementation itself.

This is treated as a given. Enterprise software is complex, therefore it costs what it costs. The architecture — microservices, distributed infrastructure, containerised deployments, orchestration layers — is presented as the response to that complexity. Sophisticated problems require sophisticated solutions.

The argument this article makes is the opposite.

Most of what the industry calls the cost of enterprise software is not the cost of the domain. It is the cost of workarounds for a missing domain model — compounded over years, normalised by the fact that every team around you is paying the same price and calling it inevitable. The architecture is not the response to the complexity. In most cases, it is the cause of it.

And the reason this remains invisible is that the alternative was never built. You cannot compare your system to the system that does not exist. So the costs accumulate, get attributed to the nature of enterprise software, and become the baseline against which all future decisions are made.

This article is about what is actually in that price tag, and what it would cost without it.

The context problem

When a team starts building a system, the code is small. The domain is not yet fully understood, but the surface area is manageable. A developer can hold the whole thing in their head. A new feature means adding a function. The system works. Nobody is in pain.

Three years later, the same team — or more likely, a partially replaced team — is asking a different question. Not "does this work?" but "where does this live?" Where does the discount calculation happen? Who owns the rule that a cancelled order cannot be reinstated after shipment? If we change how rush orders are priced, how many places do we need to touch, and how many of those will we miss?

These are not questions about the business domain. The business domain has not become harder. An order is still an order. The questions are about the system — specifically, about where the system chose to put things, and whether that choice was made deliberately or simply accumulated over time.

This is the context problem. It is the root cause of most of the complexity that teams eventually reach for distributed architectures to solve. And it has nothing to do with the scale or ambition of the domain. It is a structural property of how the code was organised from the beginning.

Context, in the sense used here, has a specific meaning. It is not a folder, a module name, or a service boundary. It is the answer to a structural question: given a concept in the domain, is there one authoritative location where all rules governing that concept are defined and enforced?

A concept has a context when the answer is yes. It does not have a context when the answer is "it depends" — or "mostly here, but also there, and that other place handles the exception."

The distinction matters because systems do not stay small. Rules accumulate. Exceptions are added. Behaviour that was simple in year one becomes conditional in year two and contradictory in year three. In a system with clear context ownership, that accumulation is manageable — the rules are in one place, contradiction is visible, and the design either holds or signals clearly that it needs to change. In a system without context ownership, accumulation is invisible until it becomes crisis.

Object orientation was supposed to solve this

The context problem is not new. It is precisely the problem that object-oriented programming was designed to address.

Object orientation, in its original conception, was not about classes, inheritance hierarchies, or design patterns. It was about a single structural idea: that data and the rules governing that data belong together, in one place, unreachable from outside except through defined behaviour. An object is not a container for data with methods attached. It is a context — a thing that knows its own state, enforces its own rules, and decides what to do when asked. The outside world cannot manipulate its internals. It can only send messages.

This is context ownership as a structural property of the code. Logic cannot drift to wherever it is convenient to put it, because the object's state is private. The rule that a shipped order cannot be cancelled does not live in a service method that someone has to know to call. It lives on the order itself, enforced by the fact that the order's status cannot be changed except through the order's own behaviour. It is not a convention. It is a constraint.

This is what object orientation was for.

What Java enterprise actually practises

The dominant pattern in Java enterprise development — and in enterprise development more broadly — looks like this:

An Order entity holds fields annotated for persistence. Its fields are private, which gives the appearance of encapsulation. An OrderService contains the business logic — the methods that create, modify, and query orders. An OrderRepository handles the database interaction. Data transfer objects carry information between layers.

This pattern is widely understood to be object-oriented. It uses objects. It has private fields. It has classes with clear names and single responsibilities. Senior developers teach it. Frameworks are built around it. It is the default.

It is procedural programming.

The test is not whether the code uses classes. The test is whether data and the rules governing that data are in the same place. In the service-DTO-repository pattern, they are not. The Order entity holds data. The OrderService holds logic. The logic is separated from the data it governs. That is the definition of procedural code — regardless of the language, regardless of the annotations, regardless of the private keyword on the fields.

The private fields are not encapsulation in any meaningful sense. Encapsulation means the object protects its own invariants. Nothing outside can put it in an invalid state. But if OrderService loads an Order, inspects its fields, and decides what to do — the private keyword is decoration. The order is a struct. The service is a function that operates on it. The fact that both are expressed as classes changes nothing about the structure.

A senior developer once described object orientation as "just using a lot of objects." In the Spring ecosystem, that description is accidentally accurate. The objects are present. The orientation — the structural commitment to context ownership — is not.

This matters because it means most teams believe they are already doing what a rich domain model offers. The gap between what they believe and what is actually true is where the context problem silently grows — invisible, until it becomes the thing that makes the system expensive.

How a procedural system rots

The rot does not happen at once. It has a characteristic progression that is worth tracing, because understanding the mechanism is what makes the solution legible.

Year one. The system is small. The team is mostly the original team. The rules fit in one or two services. OrderService is coherent because it is young and the domain is still understood by everyone who touches it. Velocity is high. The architecture feels like a good decision.

Year two. The product grows. New rules are added. The team adds members who know the services they own but not the full picture. A pricing exception is added in OrderService because that is where the original pricing logic lives. A second exception is added in PricingService because by then the first developer has left and the new one reasonably concluded that pricing rules belong in the pricing service. Both are correct by local reasoning. Neither is aware of the other.

Year three. The team is running two integration tests that cover the same scenario and produce different results depending on which code path is invoked. A bug report arrives: under certain conditions, the price shown to the customer differs from the price on the invoice. Three services are involved in producing those two numbers. The fix requires coordinating changes across all three, understanding the original intent of logic nobody wrote, and ensuring that the correction does not break the scenarios the divergent logic was accidentally handling correctly.

This is not a failure of discipline. The developers are competent. It is the structural consequence of a system that provided no home for rules — so rules went wherever they were needed, and the system slowly became a map of historical decisions rather than a coherent model of the domain.

No amount of discipline permanently solves a structural problem. Discipline degrades over time and team turnover. Structure does not.

The rich domain model as structural answer

A rich domain model addresses the context problem through structure, not discipline.

The principle is simple: an object owns the rules that govern its own state, and state changes happen only through that object's behaviour. An Order does not have its price calculated by a service. An Order knows its price — it is a property of the order, derived from the order's own data and the rules encoded in the order's own methods. The service does not reach in and manipulate the order's internals. It asks the order to do something, and the order either does it according to its rules, or refuses.

Consider an order system modelled this way:

public class Order {

    private final List<OrderLine> lines;
    private final Customer customer;
    private OrderStatus status;
    private ShippingMethod shippingMethod;

    public Money calculatePrice() {
        Money base = lines.stream()
            .map(OrderLine::lineTotal)
            .reduce(Money.ZERO, Money::add);
        return shippingMethod.applyTo(base);
    }

    public void confirm() {
        if (status != OrderStatus.DRAFT) {
            throw new IllegalStateException("Only draft orders can be confirmed.");
        }
        this.status = OrderStatus.CONFIRMED;
    }

    public void cancel() {
        if (status == OrderStatus.SHIPPED) {
            throw new IllegalStateException("Shipped orders cannot be cancelled.");
        }
        this.status = OrderStatus.CANCELLED;
    }
}

The rule that a shipped order cannot be cancelled lives on the Order. Not in OrderService, not in a validator upstream, not in a flag checked somewhere in the call chain. It lives in the only place it could coherently live: the object that owns the concept. A developer three years from now, touching this code for the first time, cannot accidentally bypass that rule — not because the system trusts their discipline, but because the structure does not give them a way to.

The service that orchestrates this is correspondingly simple:

@Transactional
public OrderConfirmation createOrder(OrderRequest request) {
    Order order = new Order(request);
    inventory.reserve(order);
    return OrderConfirmation.of(order);
}

The database transaction is the failure boundary. If anything fails, nothing happened. There are no compensating calls, no saga steps, no partial states to reconcile. The infrastructure serves the domain. The domain is not distorted to accommodate the infrastructure.

Design pressure as a feature

There is a property of the rich domain model that is easy to overlook: it makes bad design visible before it becomes operational pain.

When a new rule is added that does not fit cleanly — when a developer sits down to implement something and cannot find a natural home for it in the model — that is not an inconvenience. It is a signal. The model is telling you that either the rule is being misunderstood, or the model needs to evolve to accommodate a concept it does not yet represent.

In a procedural system, that signal does not fire. The developer adds a condition to an existing service method, or adds a new service if the feature is large enough. The rule is implemented. It works. The fact that it created divergence from an existing rule, or that it sits awkwardly between two existing concepts, is not visible until months later when something breaks in a way that requires archaeology to understand.

The rich model converts architectural drift from a silent accumulation into an explicit design question. That question is not always comfortable. But discomfort at design time costs a discussion. Discomfort at runtime costs an incident.

The business changed. As it always does.

The system above handles standard orders. The domain is coherent. The rules are clear. Now the business introduces a new requirement.

Rush orders. A customer can request expedited fulfilment. This attracts a surcharge — the order price increases by fifteen percent, and the shipping method is upgraded to express.

In a procedural system, this requires touching multiple places. The pricing calculation needs a condition. The shipping assignment needs a condition. If those live in different services, both need to change, both need to be deployed, and the rule "rush orders cost fifteen percent more and ship express" exists nowhere as a statement. It exists as a set of conditional branches distributed across the system.

In the rich domain model, the question the implementation forces you to answer is: what is a rush order? Is it a type of order? A property? Does it affect the order itself or its fulfilment? Answering that question is the design. And the answer produces something like:

public class Order {

    private final List<OrderLine> lines;
    private final Customer customer;
    private final boolean rush;
    private final ShippingMethod shippingMethod;

    public Order(OrderRequest request) {
        this.lines = request.lines();
        this.customer = request.customer();
        this.rush = request.isRush();
        this.shippingMethod = rush
            ? ShippingMethod.EXPRESS
            : ShippingMethod.STANDARD;
    }

    public Money calculatePrice() {
        Money base = lines.stream()
            .map(OrderLine::lineTotal)
            .reduce(Money.ZERO, Money::add);
        Money withShipping = shippingMethod.applyTo(base);
        return rush
            ? withShipping.multiplyBy(1.15)
            : withShipping;
    }
}

The rule lives on the Order. It cannot live anywhere else. Every developer who touches order pricing in the future will find it here, because there is only one place to look.

The business changed again.

Three weeks after the rush order feature ships, a new requirement arrives.

VIP customers do not pay the rush surcharge. The expedited shipping still applies — VIPs get the faster fulfilment — but the fifteen percent price increase is waived as a benefit of their status.

This requirement is three sentences of business logic. What it does to a system without context ownership is disproportionate to its size.

In a procedural system, the question is: where does this condition go? The rush surcharge is currently in — actually, let us retrace that. The original pricing was in OrderService. The rush surcharge was added in PricingService because that seemed more appropriate for a pricing concern. The VIP status lives in CustomerService. A rule that says "apply the surcharge unless the customer is a VIP" now requires either a call from PricingService to CustomerService — coupling two services that were not coupled before — or an orchestration layer that assembles the inputs before calling either, or a flag passed through the call chain from wherever the customer is known to wherever the pricing happens, leaking context across layers that should not share it.

Each of these is a workaround. Each adds a seam. And each seam is a place where, two years from now, someone adds another condition, and the question "what does this order actually cost?" requires reading four services to answer.

In the rich domain model, the question is different and better: who owns the rule that VIP customers are exempt from the rush surcharge? Is it the Order? The Customer? A pricing policy?

This is a domain design question. It has a defensible answer:

public Money calculatePrice() {
    Money base = lines.stream()
        .map(OrderLine::lineTotal)
        .reduce(Money.ZERO, Money::add);
    Money withShipping = shippingMethod.applyTo(base);

    if (rush && !customer.isVip()) {
        return withShipping.multiplyBy(1.15);
    }
    return withShipping;
}

The rule is in one place. It reads as a statement of business intent. It is testable in isolation. When the next requirement arrives — "VIP customers also get free express shipping on rush orders over two hundred euros" — the developer knows exactly where to go, and the existing logic tells them exactly what the current rules are.

If the pricing logic grows complex enough, the model signals it:

public Money calculatePrice() {
    Money base = lines.stream()
        .map(OrderLine::lineTotal)
        .reduce(Money.ZERO, Money::add);
    Money withShipping = shippingMethod.applyTo(base);
    return PricingPolicy.forCustomer(customer).apply(withShipping, this);
}

The complexity of calculatePrice has surfaced a new concept: a PricingPolicy. Not because a framework required it, not because a service boundary forced it, but because the model told you that pricing rules had become rich enough to deserve their own home. This is design evolution driven by the domain — the right kind of complexity, appearing at the right time, for the right reason.

The distributed workaround

Teams that build procedural systems eventually hit the context problem at scale. Logic is spread across a growing codebase with no clear ownership. Rules diverge. The system becomes expensive to change. The industry's standard response is to enforce context through service boundaries. Order rules live in the Order service. Pricing rules live in the Pricing service. The boundary makes it structurally difficult for one service to reach into another's domain.

This is attempting, through infrastructure, to solve a problem that a domain model solves through structure.

The intuition is understandable. The result is a workaround that costs more than the problem it replaces.

Consider what the VIP rush exemption requires in a distributed system. The Order service needs to price a rush order for a VIP customer. It cannot reach into the Pricing service's data — that violates the boundary. So it calls the Pricing service. But the Pricing service needs to know whether the customer is a VIP — and the Customer service owns that. Now the services are coupled in ways the original boundary was meant to prevent, or an orchestration layer is required to assemble inputs before calling either service, or an event-driven flow is constructed in which services react to each other asynchronously — introducing eventual consistency, message ordering concerns, and a debugging surface that spans multiple log streams.

And this is before considering what happens when the action fails halfway through.

In a monolith with a rich domain model, failure costs a database rollback. One word. The action either completed or it did not. There is no intermediate state. There is no question of what to clean up.

In the distributed system, there is no transaction. If the order is created but the pricing service fails before responding, the system is in a partial state. That partial state must be resolved — not by the database, which knows nothing about it, but by compensating logic: a designed, implemented, tested, and maintained sequence of calls that undoes the steps that completed before the failure. For four services, the failure paths grow as O(n²). Each compensation is a domain operation that must be reachable, idempotent, and tested both in isolation and in combination.

Before any of this business logic runs, the infrastructure required to support it exists permanently: a message broker, a saga framework or hand-rolled saga state table, distributed tracing with correlation IDs propagated through every service and every event envelope, an idempotency layer in every service because message brokers guarantee at-least-once delivery, API contracts and versioning because a breaking schema change is a production incident in every downstream service, and per-service CI/CD pipelines, databases, and operational overhead — multiplied by the number of services.

None of this delivers business value. All of it exists solely to reconstruct, at permanent cost, the properties that a single database transaction provided for free: atomicity, consistency, rollback on failure, and a single coherent answer to what just happened.

The VIP rush exemption — three sentences of business requirement — now requires coordinating across three services, with asynchronous event flows, compensating transactions, and a debugging surface that no single developer can hold in their head.

The Russian space program used a pencil.

The refactorability that distribution destroys

There is a cost of microservices that receives less attention than sagas and eventual consistency, but which compounds more severely over time: the loss of refactorability.

In a rich domain model, a refactoring is a restructuring of code within a coherent boundary. If PricingPolicy needs to become its own concept, the compiler identifies every place that needs to change. You make the changes, run the tests, deploy. The refactoring is complete.

In a distributed system, a refactoring that touches a service contract is a migration. The event schema consumed by downstream services cannot simply change — it requires a versioning strategy, a migration window, a period of running old and new schemas simultaneously, and coordination across teams who own the downstream consumers. The boundary introduced to enforce ownership has become a fossilised contract. The ownership is preserved. The ability to evolve is not.

This is the trade that distribution forces: you gain enforcement of service boundaries, and you lose the ability to change them cheaply. In a domain that is still being understood — which is most domains, for most of their lifetime — that trade is almost always wrong. The boundaries drawn at year one reflect year-one understanding. The domain will teach you things in year two that make those boundaries look naive. In a monolith with a rich domain model, you redraw the boundary and the compiler helps you. In a distributed system, you live with it, or you pay the migration cost. Most teams live with it. The boundaries fossilise. The system carries the imprint of how the domain was understood at its beginning, permanently.

When distribution is genuinely warranted

Distribution has legitimate use cases. They share a common property: they are external constraints on the system, not assessments of the current domain.

Proven, asymmetric load. When one component has a demonstrably different scaling profile — proven by measurement under real conditions, not anticipated in theory — isolating it may be warranted. The question is not "could this theoretically need more scale?" It is "is this the measured bottleneck today, and does the cost of isolation exceed the cost of scaling the whole?" In most systems, no individual component is the bottleneck. The constraint is the atomic action as a whole. Scaling the whole is cheaper and simpler than the industry assumes.

Physical or regulatory constraints. When data must remain within a specific jurisdiction by law, geographic distribution is warranted. The right approach is to deploy a complete instance of the domain within that boundary — not to split the domain action across a jurisdictional boundary. The atomic action stays atomic. The domain model stays unified. What changes is the deployment target, not the architecture.

Notice what is absent from this list: domain concepts that currently appear independent.

Independence is a present-tense assessment of a future-tense system. Two concepts that have no transactional relationship today may acquire one tomorrow when a requirement arrives that neither anticipated. A recommendation engine and a payment processor appear independent until the business introduces a rule that links them. When that happens in a rich domain model, you answer a design question. When it happens in a distributed system, you face a migration — or you violate the boundary with a coupling that was supposed to be impossible, and accumulate the technical debt of a boundary that no longer reflects reality.

Distribution should be warranted by constraints that are immune to domain evolution. Load and regulatory geography qualify. Current domain independence does not. It is a prediction dressed as a structural justification, and systems that are built on predictions about domain shape tend to look naive by the time they are old enough to evaluate.

The modelling capability problem

A rich domain model does not build itself. It requires developers who can model — who can look at a domain, identify the concepts, understand their rules, and express those rules in objects that own them. This is a different skill from implementing features in a service layer. It is rarer, harder to teach, and not well served by the frameworks and patterns that dominate enterprise Java development.

This is worth stating honestly, because it is the most common objection to everything argued above. "In theory, yes — in practice, we don't have the developers who can do this."

The objection is real. But it is also a consequence of the same feedback loop. The industry has spent two decades building curricula, frameworks, and hiring pipelines around the service-DTO-repository pattern. Developers trained on Spring Boot are trained to think in services and data flows, not in domain concepts and object behaviour. The modelling skill atrophied because the dominant patterns did not require it — and then its absence became a justification for patterns that do not require it.

The distributed architecture does not require modelling capability. It requires operational capability — the ability to manage brokers, sagas, contracts, and deployment pipelines. Those skills are available. They are well-documented. They are what the frameworks teach. So the distributed system gets built, not because it is the right architecture, but because it is the one the available skills support.

What the industry normalised as "enterprise development" is, in significant part, the consequence of this skills gap and the infrastructure that grew up around it. The expensive architecture is the one that does not require the harder skill. The cheaper architecture — cheaper in every long-term dimension — requires developers who can model. Cultivating that capability is a different investment from buying more infrastructure. But it is the one with the compounding return.

But AI will fix this

The most current version of the objection to everything argued above is not about developer skill. It is about AI coding tools. The argument runs: with AI assistance, the cost of writing procedural code drops dramatically. Features are generated in minutes. Boilerplate disappears. The velocity problem that made structural discipline seem expensive is solved by the tool. So the modelling skill gap does not matter — AI fills it.

This is a plausible argument for small systems at early stages. It does not survive contact with the actual problem.

AI coding tools are, in their current form, genuinely impressive at procedural implementation. Describe a feature clearly and the tool produces technically correct, well-structured code, fast. But the tool does not hold the domain. It holds the prompt. It implements what the prompt describes, in whatever pattern the surrounding codebase suggests — which in most enterprise codebases means a service method, a DTO, and a repository call. The implementation is correct with respect to the request. Whether it is consistent with the system's existing rules is a different question, and one the tool is structurally unable to answer reliably.

The contradiction arrives quietly. In January, a developer prompts: "add a fifteen percent surcharge for rush orders." The AI implements it, correctly, in PricingService. In March, a different developer prompts: "VIP customers should not pay extra for rush orders." The AI implements that too, correctly, somewhere in the call chain — perhaps in OrderService, where the customer context is available. Both implementations are technically sound. Neither developer intended a contradiction. The AI had no way to know one existed, because the domain has no center. The rule "what does a rush order cost?" is not owned by anything. It is distributed across the history of prompts that touched it.

In a rich domain model, this contradiction surfaces immediately. Both rules must live on Order. When the second developer — or the AI they are directing — goes to implement the VIP exemption, the rush surcharge is already there, visible, in the same method. The conflict is structural and immediate. The developer makes a decision. The model is updated. The system reflects the current understanding of the business.

In a procedural system, the conflict is invisible until a customer receives a price that is neither the intended standard price, nor the intended VIP price, but an artifact of two implementations that never knew about each other.

There is a counterargument worth taking seriously: AI tools with sufficient codebase context — through large context windows, retrieval-augmented generation, or persistent memory across sessions — could theoretically detect such contradictions before implementing. Some tools already attempt this. The counterargument is real, and it would be wrong to dismiss it entirely.

But even if the AI detects the contradiction, it cannot resolve it. The question "should VIP customers pay the rush surcharge?" is not answerable by reading the codebase. It is a business decision. The AI can surface the conflict. It cannot determine which rule reflects the current intent of the business, which rule is outdated, or whether both should coexist under different conditions. That requires domain understanding — and domain understanding requires a human with a model, not a tool with a context window.

What the rich domain model provides is not a barrier to AI assistance. It is the structure that makes AI assistance most effective. When the domain is explicit, concepts are well-named, and rules are owned by the objects they govern, AI-generated code within that model tends to be good — because the model itself provides the context the AI needs to generate correctly. The right place to put a new rule is unambiguous. The existing rules are co-located and readable. The AI operates within a structure that guides it toward coherent output.

The deeper issue is velocity. Procedural systems accumulate drift gradually, over years, as developers add logic wherever it is convenient. AI-assisted development does not change the direction of that drift. It changes the speed. What used to take three years of incremental addition now takes months of accelerated feature generation. The same structural absence of context ownership, at an order of magnitude higher throughput. The codebase grows faster than any team's ability to understand it, and the AI has no understanding to compensate with — only pattern matching against what is already there.

AI does not fix the context problem. In a system without a domain model, it compounds it. The same rot, faster. The same contradictions, earlier. The same invisible price tag, arriving sooner.

What AI changes is the cost of implementation. What it does not change — what nothing changes — is that implementation without structure is the most expensive kind. The structure has to come first. The model has to exist before the tool can be trusted to work within it. AI is a powerful accelerant. The question, as always, is what it is accelerating toward.

The invisible price tag

Consider what a mature enterprise system built on microservices actually costs, outside the domain work itself.

A containerised infrastructure running tens or hundreds of services. An orchestration layer — Kubernetes or equivalent — with its own operational model, upgrade cycle, and expertise requirement. A message broker cluster maintained for high availability. A distributed tracing stack. A log aggregation platform, because individual service logs are unreadable without one. A schema registry and contract testing infrastructure. Per-service CI/CD pipelines, each with its own configuration, deployment windows, and rollback strategy. An on-call rotation that covers distributed failure modes — partial outages, broker lag, compensation failures — that do not exist in a single-process system. A platform or infrastructure team whose entire function is to keep the operational substrate running.

None of this is the domain. None of it delivers business value. All of it is the permanent operational cost of workarounds for missing context ownership.

Now consider the same domain in a well-modelled monolith. A small number of deployable artefacts — perhaps one, perhaps a handful if genuine load asymmetry has been measured and justified. A relational database. A load balancer. Standard application monitoring. A CI/CD pipeline that deploys the whole. An on-call rotation that reads stack traces. The failure modes are the domain's failure modes, not the infrastructure's.

The difference in team size, infrastructure cost, and operational overhead is not the cost of enterprise software. It is the cost of the workaround. The domain is the same. The business rules are the same. The problem being solved is the same. What differs is whether the system paid for a domain model or paid for the infrastructure required to simulate one.

This difference is invisible in most organisations because the alternative was never built. The costs of the distributed system accumulate, get attributed to the scale and complexity of the enterprise domain, and become the benchmark against which new decisions are made. The next system is also built with microservices, because that is what enterprise software costs — and the incomparability between what was built and what could have been built means the attribution is never seriously questioned.

What the rich domain model actually gives enterprise software

The argument for the rich domain model in large enterprise systems is not that it is elegant or theoretically correct. It is that it is the mechanism by which enterprise software remains manageable over time.

Oversight. When every rule about an order lives on Order, a developer can understand order behaviour by reading one place. Not by reconstructing a distributed flow across services, event schemas, and asynchronous reactions. One place. This is not a convenience — it is what makes oversight possible as the system grows. Without it, understanding the system requires understanding its history, because the structure no longer maps to the domain.

Insight. A rich domain model makes the domain legible to the team. The concepts are explicit. The rules are expressed in the language of the domain, not buried in service method conditionals and event handler logic. A new developer can read the model and understand the business. A non-technical stakeholder can, with modest translation, verify that the model reflects their understanding. That legibility is not incidental — it is the mechanism by which teams catch misunderstandings before they become bugs.

Simplicity under growth. A procedural system grows by addition — new services, new methods, new conditions. A rich domain model grows by evolution — concepts become richer, responsibilities shift, new objects emerge when the design signals they are needed. Evolution is guided by the model. Addition is guided by expediency. Over five years, the difference in the resulting codebase is not marginal.

Preserved optionality. A well-modelled domain in a single deployable can be split later, when measurement proves a specific boundary is warranted. The model already knows its own concepts — the split follows the domain's natural lines, guided by evidence. A distributed system cannot be reassembled cheaply once contracts have fossilised and team ownership has hardened around service lines. The simple starting point preserves optionality. The complex starting point spends it immediately, in exchange for flexibility that may never be needed.

First principles

There is nothing novel in the argument this article makes.

Structure your thinking before you structure your infrastructure. The question of where a rule lives is a question about the domain. Answer it in the domain — in the model, in the objects that own the concepts — before reaching for any infrastructure to enforce it. Infrastructure that enforces a boundary you have not yet thought through will enforce it permanently and expensively.

The location of a rule is part of the design. A rule in the right place is findable, testable, and changeable. A rule in the place that was convenient to add it becomes a historical artefact, discoverable only by reading the history of the system.

Complexity introduced to compensate for missing structure is the most expensive kind. It does not reduce over time. It compounds. Every saga that exists because a transaction boundary was removed, every contract that fossilises a year-one boundary decision, every service that owns zero domain concepts but exists to coordinate between services that do — these are permanent operational costs, paid every day, for the lifetime of the system.

What the industry calls the cost of enterprise software is largely the cost of not modelling. The infrastructure, the teams, the operational overhead — these are not the price of scale or complexity. They are the price of workarounds for a missing domain model, normalised by the fact that everyone around you is paying the same price and the alternative was never built to compare against.

The rich domain model is not a technique for senior engineers on greenfield systems. It is the thing that makes enterprise software manageable at all — the only mechanism that preserves oversight, insight, and simplicity as a system grows. The alternative is the same complexity, without the structure to contain it, with an expensive distributed scaffolding erected around it to simulate the containment the model would have provided for free.

Build the model. Let the model tell you where the rules live, when the design needs to evolve, and when — if measurement ever demands it — a boundary has genuinely earned the right to become a service.

The model will not mislead you. The path of least resistance will.

Engineering a UI for a Java Backend: Maintainability, Longevity, and Why the Answer Might Surprise You

Leon Pennings — Wed, 13 May 2026 07:59:01 +0000

Most teams pick a UI framework the same way they pick a restaurant — by what is popular right now, what colleagues recommend, or what appeared at the top of a search result. This article takes a different approach: establish what a well-engineered UI for a Java backend actually needs to be, from first principles, and then see what framework honestly satisfies those requirements. The conclusion may not be what you expect.

Part 1: Where the Client Lives

Before requirements, one distinction that frames everything else.

Server-side rendering: the client lives on the server. The server maintains state, computes views, and pushes HTML to the browser. The browser is a display terminal. Every interaction is a round-trip. Network interruptions break the experience. Horizontal scaling requires session affinity or replication.

Fat client: the client lives in the browser. It holds its own state, manages its own behaviour, and calls the server only when it needs data or needs to record an action. Server calls are as simple as API calls. The server is stateless. Network interruptions are survivable. Any server instance handles any request.

This distinction is not a stylistic preference. It determines where state lives, how the system scales, how resilient the user experience is to infrastructure events, and what the server is actually responsible for. Everything that follows builds on it.

Part 2: The Requirements

These requirements are not Java-specific preferences. They describe what any disciplined engineering team should want from a UI layer, regardless of backend language. Java is the context. The principles are universal.

The architecture has three distinct layers, each with different skill requirements:

Layer	Purpose	Skills Required
Platform / component	Defines HTML structure, CSS, GWT wrappers	Semantic HTML, CSS
Communication infrastructure	Communication between browser and server	Java
Feature development	Views, interactions, domain behaviour	Java only

The requirements below apply to the architecture as a whole. The skill boundary is explicit: HTML and CSS expertise is required at the component layer, and only there. Feature developers — the majority of the team, doing the majority of the work — operate entirely in Java.

1. Frontend-Requirement-Down Design

The UI should be designed from what the user needs to accomplish, not from what the backend domain model happens to look like. User interactions frequently span multiple backend domain objects. Designing upward from DTOs or entity shapes produces interfaces that reflect implementation details rather than user intent. The frontend is a peer application with its own concerns — not a projection of the server model.

2. The Browser is the Client's Home

The client must live in the browser. A fat client holds its own state, survives server restarts and transient network interruptions, and communicates with the server only when necessary. Client-side state is typed, structured, and available across the full session — without cookies, without server-side session objects, without distributed session infrastructure. The server is stateless. Scaling follows directly. This is not a performance preference — it is an architectural correctness preference with operational consequences that compound over the lifetime of the system.

3. Compile-Time Validation over Runtime Discovery

Structural integration errors — type mismatches, missing handler implementations, incorrect data shapes, gaps between UI and backend contracts — should fail at build time rather than in browser execution. If the Maven build passes, the integration is correct. Treating the browser as the place where structural errors are discovered is an avoidable cost in debugging time, deployment cycles, and user impact.

4. Minimal Boilerplate per Feature

Adding a new feature — a new view, a new action, a new data field — should require changes in the minimum number of places, ideally one. The codebase structure should guide the developer to the correct location and pattern. Architectural decisions should not be reopened on every addition.

5. 100% Ownership of Components — No Escape Hatches

Component frameworks typically define generic components covering the majority of use cases, then offer escape hatches for the rest. This is presented as flexibility. In practice it is a structural liability: escape hatches couple the project to framework internals, and framework upgrade cycles risk breaking those couplings. Long-term maintainability improves substantially when the project owns its rendered HTML and component contracts completely, so the question of escaping the framework never arises.

6. Semantic HTML is Non-Negotiable — and Must Be Owned

The browser is a world of HTML. Producing semantically correct, standards-compliant HTML should be an explicit engineering goal — not an afterthought, not something delegated to a third-party framework's component library.

Adopting a framework's component library because "we are not HTML/CSS experts" trades a knowledge gap for a control gap. The framework's HTML is a black box. When it changes its DOM structure, CSS breaks. When it revises class naming conventions, the project adapts. The project is permanently downstream of someone else's HTML decisions, on someone else's release cycle.

The correct response is to own the HTML. The investment is made once: define each component in clean, semantically correct HTML. The resulting HTML belongs to the project. It cannot be broken by a third-party upgrade. The component HTML should remain clear and concise — obvious to anyone who opens the file — so that maintenance is equally obvious.

The CSS for each component lives in the same file used to define the component's HTML. One source of truth. No indirection. No ambiguity about which styles apply to which structure. When a component needs to change, HTML and CSS are reviewed together. One CSS file styles the entire application. Component class names are functional and identifiable — they reflect what the component is, not what it looks like. CSS can evolve entirely independently of Java code. A designer can restyle the full application by modifying CSS alone, without touching a single Java class.

7. Any Java Developer Can Build Application Features — No JavaScript Ecosystem Expertise Required

Any Java developer should be able to build application features within this UI architecture without requiring JavaScript, CSS, or HTML knowledge. Not a full-stack developer. Not a Java developer who also knows a JS framework. Any Java developer.

The developer base for Java is large. The developer base for Java developers who are also proficient in modern JavaScript, CSS architecture, and semantic HTML is substantially smaller. A framework requiring that intersection creates a staffing constraint that compounds as the team changes over time.

UI engineering involves more than syntax — interaction design, state modelling, async behaviour, information hierarchy. These remain the developer's responsibility. What this architecture removes is the requirement to acquire a second language ecosystem to express them.

Part 3: Why These Are the Right Requirements

Each requirement is independently justifiable. Together they reinforce each other.

Frontend-requirement-down is product thinking applied to architecture. The backend serves the frontend; the frontend serves the user. Reversing this dependency produces interfaces that feel like database forms — and that break whenever the domain model evolves.

Fat client as the client's home reflects what the browser is: a capable, stable application runtime. Treating it as a display terminal forces server infrastructure to compensate for what the client could handle locally — state management, session continuity, resilience to transient failures. These become server problems when they could be client responsibilities, solved more cheaply, closer to the user, without cross-request infrastructure.

Compile-time validation is the highest-leverage quality tool available to the team. Every structural error that escapes the build and reaches the browser costs more to find and fix by a significant margin. The compiler is free at runtime. Moving validation earlier is always the better trade.

100% component ownership is the only durable resolution to the escape hatch problem. Partial ownership — using a framework's components for most cases — means living with the framework's HTML decisions, its upgrade cycle, and its constraints indefinitely. Full ownership means none of that. The project defines the components. The project owns the HTML.

Owning semantic HTML is not idealism — it is engineering discipline. HTML is the foundation of everything the browser renders. Teams that do not own their HTML foundation do not fully control their accessibility, CSS architecture, DOM structure, or maintenance costs. A shared component library means this investment is made once and leveraged across every application in the organisation.

Large accessible developer base recognises that sustainable software is built by teams over time. An architecture requiring rare skill intersections is a staffing risk. Reducing the entry requirement for feature development to "knows Java" is a durable organisational advantage.

Part 4: Why Popular Alternatives Fall Short

Requirement	React / TS	Thymeleaf / HTMX	Vaadin (Flow)	Required from the architecture
Compile-time structural validation	Partial — no unified cross-language bridge	No	Yes	Full
Client lives in the browser	Yes	No	No	Yes
Single language — Java	No	No	Yes	Yes
100% component ownership	Possible, rarely achieved	Partial	No	By construction
No JS ecosystem expertise needed	No	No	No	Yes
Stateless server	Yes	No	No (standard architecture)	Yes
HTML ownership by construction	Possible, not guaranteed	Partial	No	Full

JavaScript Frameworks (React, Angular, Vue, Svelte)

These frameworks can build any UI a browser can render. The evaluation here is not about output capability — it is about architectural fit for a Java backend team.

A Java team adopting a JavaScript framework acquires a second language, a second type system, a second build toolchain, and a second ecosystem to maintain in parallel with the Java backend. The type systems do not share a validation boundary: TypeScript validates the client, Java validates the server, and structural mismatches between them surface at runtime. Shared contracts must be maintained in two places by two compilers.

Component ownership is theoretically possible in these frameworks but structurally not guaranteed. A disciplined team can own their HTML in React. Most teams, in practice, adopt component ecosystems that control the DOM on their behalf — trading ownership for convenience and inheriting the maintenance consequences. The architecture described in this article makes full HTML ownership the default, not the exception.

Framework churn is a real cost. The JavaScript ecosystem changes significantly on a multi-year cycle. Architectural commitments made today carry implicit future migration costs. These are difficult to quantify at decision time and easy to underestimate.

For a Java backend team building domain applications, the trade-offs do not stack up.

Server-Side Rendering (Thymeleaf, Spring MVC, JSP, HTMX)

The client lives on the server. Every interaction is a round-trip. State requires server-side session management. Horizontal scaling requires session affinity or replication. Template expressions are strings — a renamed Java method leaves a broken template the build cannot detect. Dynamic behaviour requires JavaScript added on top, reintroducing a dependency without the benefits of a proper fat client.

Reasonable choices for content sites and simple form-based applications. Not suited for the class of domain application this article addresses.

Vaadin

Targets the same use case as this architecture. In its standard configuration, recent Vaadin versions run server-side UI logic over a persistent WebSocket connection, which reintroduces server state and makes server restarts visible to users. Every UI interaction crosses the network. The fat client advantage — local state, local computation, resilience — is surrendered. Earlier Vaadin versions used GWT as their client-side foundation, which is architecturally much closer to what this article describes.

Part 5: What the Architecture Needs — and What Delivers It

The requirements converge on a specific model:

A Java application that runs in the browser
Compiled by a Java toolchain into a browser-executable artifact
With full ownership of HTML output through a project-defined component library
Communicating with the Java backend through a typed, compiler-validated protocol
Built and verified by a single Maven build

This is not a description of a framework. It is a description of a compiler that targets the browser runtime.

The mental model is exact:

Java → JVM bytecode → runs on Linux, Windows or any JVM host

Java → browser-executable artifact → runs in any browser

The browser is a runtime environment, just as the JVM is a runtime environment. The developer writes Java. The compiler bridges the language to the runtime. The output format — bytecode or JavaScript — is an implementation detail, not a concern of the developer. This repositions browser compilation from exotic to normal. It is the same problem Java solved for heterogeneous server environments, applied to a new runtime target.

Once this model is understood, the selection question becomes concrete: what exists that actually delivers it?

GWT — the Google Web Toolkit — is the only mature, production-proven option for Java.

GWT compiles Java to JavaScript. It has done so since 2006, at Google scale. Its type system is Java's type system. Its build integration is Maven. Its module boundaries are compiler constraints — client-only code cannot be invoked on the server; server-only code is not compiled into the client artifact.

The Critical Distinction: GWT as Compiler vs GWT as Component Framework

This distinction is architectural, not rhetorical.

GWT used with JSON communication and its built-in widget library is, in practice, just another web framework. The compiler provides Java syntax, but the architecture is conventional: a third-party component library controls the HTML, communication is untyped, and the project is downstream of GWT's component ecosystem. In this mode GWT offers limited advantage and inherits familiar maintenance liabilities.

GWT used as a pure compiler — with a project-owned component library and a typed communication protocol — is a fundamentally different thing. The HTML belongs to the project. The CSS belongs to the project. The communication contracts are validated by the Java compiler. GWT provides one thing: the ability to write Java that runs in the browser. Everything else is owned by the project.

This distinction also resolves the "GWT is dead" criticism at a structural level. If GWT's component ecosystem were abandoned tomorrow, a project using GWT as a pure compiler would be unaffected. The compiler is the only dependency — and compilers are among the most stable software artifacts in existence. Stability is not abandonment.

Part 6: The Component Library — HTML Owned, CSS Independent, Java Exposed

Semantic HTML Defined Once, Owned Completely

HTML and CSS specialists define every component from scratch. This is a one-time investment — made properly, by people with the relevant expertise — that benefits every subsequent line of feature code written against it. The library covers the full UI vocabulary: root layout, navigation, header, footer, main content area, tables, lists, description lists, forms, dialogs, buttons, selects, confirmation prompts, composite panels.

Each component is clean, semantic HTML. The structure of a table is <table> with <thead>, <tbody>, <th>, and <td>. Navigation is <nav>. A description list is <dl> with <dt> and <dd>. The HTML communicates intent to the browser, to assistive technologies, and to any developer who opens the file. It should remain clear and concise — maintenance should be obvious from inspection.

The CSS for each component lives alongside its HTML definition. One source of truth. A developer maintaining a component sees structure and styling together. One CSS file styles the entire application. Class names are functional — transfer-table, not blue-bordered-grid. Visual redesign is a CSS concern. It requires no Java changes and no recompile of application code.

GWT Wraps Each Component in a Typed Java Class

For each HTML component definition, a GWT Java class emits the correct HTML structure and exposes a typed Java API: builder methods, typed parameters, event handlers — all in Java, all compiler-validated.

Building a wrapper requires knowing what an HTML tag is and when to use it — not CSS, not JavaScript, not layout theory. GWT provides the primitives: set a tag name, compose child elements, assign a class attribute. The wrapper author works in Java, guided by the HTML definition.

Above the wrapper layer, feature developers never write HTML. They never reference a CSS class name. They never open a stylesheet. They instantiate typed Java classes. The HTML is an implementation detail of the wrapper. The CSS is an implementation detail of the stylesheet. Neither is visible, relevant, or accessible to feature developers.

Shared Libraries Across the Organisation

The component library is a standalone Maven artifact. Multiple applications can depend on it. Every application in the portfolio gets uniform, semantically correct, standards-compliant HTML automatically. Each application maintains its own CSS where visual design differs — or shares it where it does not.

HTML standards adherence is guaranteed across the portfolio by one maintained library, not by discipline in each project. Accessibility improvements propagate from one place to every application on the next build. When semantic best practices evolve, one wrapper update benefits all consumers.

A strict separation is enforced by construction: component styling lives in the library, screen-level code lives in the application. A developer building a screen cannot accidentally mix component-level styling concerns into application code because they never touch CSS at all. This is the kind of separation that is hard to achieve through convention and trivial to achieve through architecture.

The Maintenance Cycle Reframed

Conventional framework maintenance involves upgrading versions, adapting to API deprecations, resolving conflicts between framework changes and application code, and re-learning patterns the framework revised.

In this architecture:

Visual redesign: update the CSS file. No Java touched. No application recompile.
HTML structure change: update one wrapper class. Application code above it is untouched.
New component: define the HTML, write the wrapper. Immediately available to all feature developers as a typed Java class.
GWT compiler update: affects only the compiler, not the component API or application code.

The dependency on GWT is a dependency on a compiler. Compiler interfaces are more stable than component framework APIs. The upgrade cost is proportional to what actually changed — not to what the framework decided to revise.

Part 7: The Communication Architecture — One Pattern, Always

Command as the Unit of Interaction

Every client-server interaction is a Command — a Java object in the GWT shared package, serializable over GWT-RPC, carrying both the request parameters and, on return, the result. A single RPC endpoint receives all Commands and routes them to Visitor handlers. No servlet proliferation. No REST design decisions. No JSON schema. No API documentation to keep synchronised with implementation.

A Command carries its request parameters to the server. The Visitor populates the result on the same Command object. The Command returns to the client. The same type throughout. The compiler validates the entire round trip.

public class GetIntendedTransferDetailsCommand extends Command {

    private Long id;

    private IntendedTransferDetails intendedTransferDetails;

    //for serializable purposes
    public GetIntendedTransferDetailsCommand(){}

    public GetIntendedTransferDetailsCommand(Long id) {
        this.id = id;
    }

    public void setIntendedTransferDetails(IntendedTransferDetails details) {
        this.intendedTransferDetails = details;
    }

    public IntendedTransferDetails getIntendedTransferDetails() {
        return intendedTransferDetails;
    }
}

Used on the client:

new GetIntendedTransferDetailsCommand(transferId)
    .execute(new CommandResult<GetIntendedTransferDetailsCommand>() {
        @Override
        public void onResult(GetIntendedTransferDetailsCommand command) {            panel.add(command.getIntendedTransferDetails().getWidget(o -> reload()));
        }
    });

Any Java developer reads this and understands it immediately. Create a Command with parameters. Execute it asynchronously. The result arrives back on the same Command object. Call whatever you need. No HTTP verbs. No JSON mapping. No async framework to learn. Java objects, Java callbacks, Java types throughout.

The Server-Side Lifecycle

Command arrives at single RPC endpoint
  → Extract session UUID from Command
  → Load and validate UserSession from database
  → Set user context in ThreadLocal       (Interaction begins / transaction opens)
  → Route to Visitor handler by Command type
  → Visitor executes domain logic
  → Visitor populates result on Command
  → Command returned to client
  → ThreadLocal cleared                   (Interaction ends / transaction closes)

The Interaction scope is the transaction scope. Every Command is exactly one Interaction, one transaction boundary, one security check. This is structural — it cannot be accidentally skipped. Server-side input validation applies as in any server-side system; the Interaction boundary enforces scope, not content correctness.

Security as a Type Property

Commands requiring elevated privileges implement marker interfaces — RequiresAdministrator, for example. The infrastructure checks for these before routing to any application code. Security is declarative, compile-time visible, and cannot be bypassed from application code. No annotation processing, no AOP, no filter chain configuration. Java interfaces and a single infrastructure check. Every privileged Command in the codebase is identifiable by a type search.

Adding a Feature

Add a Command class in the shared package
Add a Visitor implementation on the server
Call the Command from the client

No servlet registration. No routing configuration. No JSON schema. No API documentation update. The type system connects Command to handler. The compiler verifies the connection. Maven validates the whole.

Part 8: Object-Oriented UI — A Natural Consequence

GWT enables a pattern that most frontend architectures make difficult or impossible: applying standard object-oriented principles directly to UI objects. This is not a requirement of the architecture — it is a possibility it unlocks, and one worth examining because it illustrates how far the "just Java" principle extends when taken seriously.

In most frontend architectures, data and behaviour are separated by design. A data object carries fields. Separate components, controllers, reducers, or stores manage what happens when the user interacts with that data. The data object is inert — it knows nothing of its own presentation or behaviour.

In a Java fat client, this separation is a choice, not a constraint. A domain summary object can carry both its data and its behaviour, exactly as a well-designed Java object does in any other context.

The same class that defines how a TransferSummary appears in a table also defines what happens when the user clicks a row: which popup appears, which actions are offered, which Commands are issued for each action, which dialogs are composed for data entry. All co-located. All in Java. The object is alive in browser memory — it holds its full operational context, not just its display data.

public void show(OnResult onResult) {
    Popup popup = new Popup(getSummarizedSummary());
    popup.addPopupButton("View details", button -> showTransferDetails());
    popup.addPopupButton("Edit transaction", button ->
        new GetEditableTransferDetailsCommand(id, sourcePersonId, serviceProviderId)
            .execute(cmd -> cmd.getTransferDetails().edit(onResult))
    );
    popup.show();
}

Adding a field means editing one class. Add the field. Add the table header column. Add the table row cell. One place, one commit, compiler-validated. Compare this to a conventional layered approach: add to backend DTO, update TypeScript interface, update table component, update API response mapper, update state store, update tests per layer. Multiple locations, across potential team boundaries, where a mismatch at any point is a runtime surprise.

Conditional UI behaviour is conditional Java. Whether to show a "Remove agreement" button is if (agreements.size() > 0) — a domain condition expressed directly, not a separate UI state flag.

No DTO duplication. There is no parallel UI model that mirrors the domain object. The domain object is the UI object. The object carried to the client is the object that renders, the object that acts, the object that issues Commands. One model, one place, no synchronisation required.

Summary objects carry more than they display. A summary may show five fields in a table but carry twelve. The hidden fields are operational context — entity identifiers, related references, state flags — that drive popup actions and Command parameters. This is only possible because the fat client keeps the full object in browser memory. No round-trip to reconstruct context. No hidden state pushed into the URL.

This is basic encapsulation applied consistently. It is not a novel pattern. It is OO design working exactly as intended, in a context where most architectures actively prevent it.

Part 9: Testing in a Compiler-Validated Architecture

Compiler validation eliminates a specific class of error: structural integration failures. Type mismatches, missing Visitor implementations, incorrect method signatures, RPC serialization failures — these do not reach runtime in a correctly built system.

This does not replace testing. It removes the need for a category of test.

Domain logic, workflow correctness, UX behaviour, edge cases in user interactions, and business rule validation all require tests. The compiler is not a substitute for verifying that the system does the right thing — it is a guarantee that the system does not break structurally. These are different concerns and both matter.

In practice this means:

Unit tests cover domain logic and Visitor behaviour — pure Java, fast, no browser required. A unittest that checks that all Commands have 1 corresponding Visitor implementation ensures all commands can be executed.
Integration tests cover workflow correctness and Command/Visitor round trips
The compiler covers structural integration: type contracts, module boundaries, serialization correctness

The result is a test suite that is smaller, faster, and more focused than one that must also catch structural integration failures at runtime.

Part 10: Simplicity as an Economic Argument

The requirements above are engineering arguments. They have a direct economic translation.

Getting something to work is the scope for prototypes. Building so that maintainability and cost are optimal is the scope for production code. Almost any framework clears the first bar. Very few clear the second consistently over time. The economic argument for this architecture is entirely a production-scope argument.

Implementation speed. A feature developer adds a panel by writing a Command, a Visitor, and composing typed Java components. No context switch, no second toolchain, no JSON mapping, no parallel type maintenance. The pattern is always the same. A developer who has built one feature understands the pattern for all subsequent features. Onboarding is measured in hours, not weeks.

Maintenance cost. The dominant long-term cost in software is not building features — it is maintaining them. In this architecture, maintenance is localised by construction. A field change is one class. A visual redesign is CSS. An HTML structure update is one wrapper. There is no architectural archaeology to determine where a change belongs. Changes do not ripple.

Upgrade cycle cost. Framework upgrade cost in JavaScript-heavy projects is a recurring drain on development capacity. Major upgrades require rework proportional to how deeply the framework is woven into the application. In this architecture the upgrade cycle is: update CSS when design standards evolve, update wrapper classes when HTML best practices change, update the GWT compiler when a new version is available. Application code is untouched by all three.

Payload and runtime. A mature production application with 3,000–4,000 UI classes produces 15–25MB of compiled, obfuscated output. This figure covers all application logic, all UI behaviour, and all dynamically generated HTML — the base HTML page is a minimal shell with an empty body; everything visible is generated by the compiled client. On modern connection speeds, for domain applications used by authenticated users, this is paid once and cached. It is not a meaningful operational concern.

Organisational scale. A shared component library amortises the HTML investment across every application in the portfolio. Accessibility improvements, semantic updates, and visual redesigns propagate from one place. Teams across multiple projects work from the same HTML foundation without coordinating on it. The per-application maintenance cost trends toward the cost of application logic alone.

Team composition. Because any Java developer can build UI features, the team does not maintain a specialist frontend/backend split with the communication overhead that implies. Junior developers contribute from day one. Senior developers are not bottlenecked on UI concerns. The team required to maintain and extend the system is smaller and easier to staff.

Future safety. HTML and CSS have been backward-compatible for thirty years. An architecture founded on semantic HTML and a Java compiler is not a bet on a framework's commercial continuity — it is a bet on the web platform itself. The compiler-plus-owned-library combination means no part of the stack is dependent on a third party making the right product decisions.

Correctness and economy point in the same direction. That is a consequence of building from first principles rather than from accumulated convention.

Part 11: Addressing the Criticisms

"GWT is abandoned / dead"

GWT 2.13.0 was released February 11, 2026. GWT 2.12.2 in March 2025. 2.12.1 in November 2024. 2.12.0 in October 2024. This is an active project with a consistent release cadence. Version 2.13 removed legacy IE polyfills, modernised project samples to Maven multi-module structure, added JFR events for compiler observability, delivered the largest JRE emulation improvements since 2.9.0, and added support for Jakarta Servlet APIs — meaning this stack runs cleanly on Spring Boot 3 and modern Jakarta EE servers.

Structurally: GWT's JRE emulation has been progressively migrated to JsInterop to converge with J2CL, Google's next-generation Java-to-browser compiler. The API surface — Elemental2, JsInterop annotations, jsinterop-base — is shared between them. J2CL is Google's internal compiler for production-scale web applications today; GWT remains the most stable, Maven-integrated distribution for enterprise teams building on this model.

More fundamentally: this architecture depends on GWT as a compiler, not as a component framework. The project-owned HTML, CSS, and communication infrastructure are independent of GWT's component ecosystem entirely. The compiler is the only dependency — and compilers are among the most stable artifacts in software. The Java compiler's core behaviour has not changed in years. Stability is not abandonment.

"Compile times are too long"

In a mature production application with 3,000–4,000 UI classes, compile times on modern hardware run between one and two minutes. Compile time is a function of permutation count — one permutation per browser per locale combination. With a single RPC endpoint, no code splitting, and a controlled locale set, permutation count is minimal. GWT 2.13.0 added JFR events specifically for compiler observability, making it straightforward to profile and address any compile-time concern. This criticism has most force for large multi-permutation applications. It does not apply here.

"The widget library is inadequate"

Correct — and beside the point. This architecture does not use GWT's widget library. The HTML/CSS component library is defined by the project and owned by the project. The quality of GWT's built-in widgets is irrelevant.

"You still need to know HTML"

At the component wrapper layer, the author needs to know what an HTML tag is and when to use it. Above that layer, feature developers work entirely in Java. The HTML knowledge required is modest, applied once per new component type, and confined to the component library team. It is explicitly not a feature development concern.

"There is no defined development approach in GWT"

This article defines one. Command/Visitor for all communication. Project-owned component wrappers for all HTML output. Domain objects carrying their own UI behaviour. Maven as the single build and validation step. The criticism applies to teams using GWT without architectural intent. The architecture is the answer.

Part 12: When This Architecture Is Not the Right Choice

Public-facing, SEO-critical sites are a different problem domain. This architecture delivers a minimal HTML shell and populates the body dynamically. For tools used by people to get work done, that is the right trade-off. For sites whose success depends on search engine indexing of content or on first-render performance for anonymous users, use server-side rendering. These are different problems and should be solved with different tools.

Teams without Java as their primary language will find less leverage here. The architecture's value comes from keeping Java developers in Java. A team already fluent in TypeScript and React is not gaining that advantage — they would be acquiring a new tool rather than deepening an existing strength.

Deep JavaScript ecosystem integration — complex native browser API interop, third-party JavaScript widgets with no Java wrapper, WebGL pipelines — may add friction at the GWT boundary. GWT provides JsInterop for these scenarios, but it requires the component layer author to understand that boundary explicitly.

Stating these boundaries is scope clarity, not concession. This architecture is designed for domain applications: business tools, dashboards, admin interfaces, internal platforms, data-intensive workflows. For that class of application, it is the strongest available option.

Conclusion

Evaluated from first principles — not by popularity, not by ecosystem size, not by recency — the architecture described here is the most coherent, most maintainable, and most economically sound approach to UI development for Java backend domain applications.

The key insight is not about GWT specifically. It is about where the client should live, what the build should guarantee, and what the team should need to know. The client lives in the browser — fully, not as a thin view over server state. The build guarantees structural correctness — by construction, not by convention. Feature developers work in Java — without HTML, CSS, or JavaScript knowledge, as a daily reality, not an aspiration.

GWT, used as a compiler combined with a project-owned component library, is the only mature option that delivers this model. The compiler provides the Java-to-browser bridge. The component library provides the HTML ownership. Together they provide something no JavaScript framework and no server-side rendering approach offers in this combination: a complete, type-safe, Java-only feature development experience for domain application UI, where the web platform is the foundation and no third party controls the HTML.

The criticisms dissolve on contact with the architecture: compile times are fast at the permutation counts this setup requires; the widget library is irrelevant because it is not used; abandonment misreads compiler stability as stagnation; and the HTML boundary is exactly as thin as it needs to be — confined to the component layer, invisible above it.

The result:

Any Java developer builds UI features from day one
The Maven build is the integration test
Adding a feature is one Command, one Visitor — the same pattern, always, guided by the type system
Visual evolution is CSS. HTML evolution is a wrapper update. Application code is untouched by both.
The client lives in the browser. The server is stateless. Scaling follows directly.
Component ownership is total. The HTML is yours. The CSS is yours. No escape hatches, because there is nothing to escape from.
Implementation is faster. Maintenance is cheaper. Teams stay smaller. Upgrade costs are minimal. The foundation is the web platform itself.

There is no simpler, no more maintainable, no more economically defensible UI architecture for Java backend domain applications — when GWT is understood for what it is: a compiler that makes the browser a first-class Java runtime target, combined with the discipline to own everything above it.

Parts in transit - Why most distributed systems are prematurely complex

Leon Pennings — Sun, 10 May 2026 20:29:58 +0000

The incomparability problem

Here is a question that has no clean answer.

How do you know whether the architecture you chose was the right one?

Not right in the sense of working — most systems work, eventually, after enough effort. Right in the sense of optimal. Right in the sense that the complexity you introduced was warranted by the problem you were solving, and that a simpler approach would have cost more rather than less.

The honest answer, in most cases, is that you cannot know. Because the alternative was never built.

This is not a gap in the data. It is the mechanism of the problem. Most systems are built only once. There is no second system built with different assumptions, run for five years, and compared on total cost of ownership, ease of change, and operational stability. The counterfactual does not exist. Therefore the cost of the wrong choice — if it was the wrong choice — is permanently invisible.

None of this is to say that distributed systems cannot work. Many organisations have made them function, sometimes at considerable scale — usually through exceptional engineering discipline, strong platform investment, and genuine operational maturity. The question is different: how much of the total effort, over years, went into managing the consequences of the distribution itself, rather than advancing the domain? And would a simpler boundary choice have delivered more value with less sustained overhead? The counterfactual remains hard to prove, which is precisely why we need sharper prospective indicators.

And here is what makes the problem genuinely difficult: the entire industry tends to converge on the same patterns at the same time. When every team uses a similar stack, incurs similar coordination overhead, and grows to a similar size — those costs stop being visible as costs. They become the definition of what software costs. Normal and wasteful become indistinguishable.

So the question sharpens. If we cannot compare architectures retrospectively, is there anything we can measure prospectively — before five years have passed — that gives us a leading indicator of whether we are building something appropriately simple, or something unnecessarily complex?

There is. And it comes from an unlikely place.

The warehouse and the system boundary

Consider an order fulfilment operation. An order arrives. A picker walks to the rack holding the product, picks it, and places it on the assembly line. Routine.

Now consider what happens when that order is cancelled.

If the picker has not yet left the rack, cancellation is a system operation. One record updated. The state change is contained. The cost is negligible and the outcome is certain.

If the picker is already walking the floor — part in hand, mid-transit — the picture changes entirely. The picker must be located and reached. The instruction must be communicated and confirmed. The picker turns around, returns the part, re-shelves it in the correct position, and logs the return. The assembly line must be told the part is not coming and adjust accordingly. Each of those steps can fail. Each failure requires its own recovery. If the picker has already placed the part on the line, someone else must retrieve it, the line has already reacted to its arrival, and the cleanup compounds further.

The correction costs more than the original action. Not marginally more — multiplicatively more. More people, more coordination, more opportunity for secondary failure, and a system left in a state requiring verification before it can be trusted again.

This is the principle that makes architectural cost measurable before a system is built:

As long as domain actions happen within a single system boundary, the cost of failure is a rollback. The moment actions propagate outside that boundary, the cost of failure becomes coordination.

This is not a preference. It is a structural property of distributed systems, and it applies regardless of how well the coordination is engineered. You can manage the cost with better tooling. You cannot eliminate it. It is inherent to the boundary crossing.

The warehouse makes this visible in a way that software obscures. In the warehouse, you can see the picker walking. You can see the empty rack. You can see the stalled line. The cost of the part in transit is physically apparent. In software, the equivalent states — the uncommitted saga step, the unacknowledged event, the stalled compensating transaction — are invisible unless you built dedicated instrumentation to see them. The cost is identical. The visibility is not. That invisibility is precisely why the cost became acceptable.

The well-run warehouse minimises the time parts spend in transit, because parts in transit are the expensive state. The leading indicator of a well-designed system is the same: how much of the domain work happens within a single rollback boundary, and how much crosses outside it?

Rollbackability — the degree to which a failed action can be fully undone by the system without external coordination — is a concrete, prospective benchmark for simplicity. If you are designing a system and the failure path requires coordinating compensation across multiple services, you have already committed to a significant and permanent cost. The question is whether the benefit justified it.

In most cases, that question was never asked.

A concrete example: order creation

Take a canonical domain flow: an order is created, inventory is reserved, an invoice is generated, a shipment is planned. Four concepts. One business action. It either succeeds completely or it does not happen.

In a monolith with a well-modelled domain, this is the entirety of the orchestration:

java

@Transactional
public OrderConfirmation createOrder(OrderRequest request) {
    Order order       = new Order(request);
    Inventory.reserve(order);
    Invoice invoice   = new Invoice(order);
    Shipment shipment = new Shipment(order);
    return OrderConfirmation.of(order, invoice, shipment);
}

The database transaction is the system boundary. If anything fails, nothing happened. The domain concepts — Order, Inventory, Invoice, Shipment — do the work. The technology serves them. Rollbackability is total. The failure path costs nothing beyond the failed attempt itself.

This example is deliberately straightforward — but the principle holds as domain complexity increases. In fact, the more complex the domain, the more important it becomes that the infrastructure does not add noise. A complex financial workflow with regulatory holds is hard enough to reason about correctly without the additional burden of distributed coordination, partial failure states, and eventual consistency layered on top of it.

Now split those four concepts across four services. The business requirement has not changed by a single word. What changes is everything else.

The infrastructure required before writing a line of business logic

A message broker. Services cannot call each other synchronously if you want any resilience. Kafka or RabbitMQ: a three-node production cluster, topic design, schema registry, retention policies, consumer group monitoring, and a local development environment every developer must run and maintain.

Saga infrastructure. There is no transaction. Coordination must be made durable — if the orchestrator crashes mid-flow, it must resume from the correct step. This means a saga framework (Axon, Temporal, AWS Step Functions — each a substantial system with its own operational model and learning curve) or a hand-rolled saga state table with step tracking and a crash recovery process. Either way, there is now a fifth service whose entire existence is accidental complexity. It owns no domain concept. It exists solely because the transaction boundary was removed.

Distributed tracing. Four services produce four independent log streams with no shared identity unless you build one. Jaeger or Zipkin for the trace infrastructure. Every service propagates a correlation ID in HTTP headers, event envelopes, and log output. A log aggregation stack on top, because reconstructing an incident across four separate log streams without tooling is not a debugging workflow — it is an archaeology project.

Idempotency handling — in every service. Message brokers guarantee at-least-once delivery. The same event will arrive twice. Every consumer must handle this without creating two invoices or two shipments. An idempotency key strategy per event type. A deduplication store — typically a processed-events table — checked on every inbound message. This is not a framework you install. It is code you write, in every service, correctly, and maintain forever.

Compensating transactions — per failure path. The rollback equivalent. Designed, coded, tested, and maintained per service per failure scenario. For four services the paths are: inventory fails — cancel order; invoice fails — release inventory, cancel order; shipping fails — void invoice, release inventory, cancel order. Each compensation is a domain operation that must exist, be reachable, be idempotent, and be tested both in isolation and in combination. The failure paths grow as O(n²) with the number of services.

API contracts and versioning. In a monolith, a method signature change is a compiler error caught before deployment. Across services it is a potential production incident. OpenAPI specifications or event schemas in the schema registry. A versioning strategy for deploying new service versions while old ones are still running. Consumer-driven contract tests — an entirely new test layer that did not exist before.

Per-service operational overhead — multiplied by four. Each service needs its own CI/CD pipeline, its own database (shared databases between services defeat the architectural purpose), its own health checks, its own deployment configuration, its own secret management, and its own database migration strategy.

None of this is business logic. All of it requires expertise to operate correctly. In practice it means a platform or infrastructure team to own the broker and deployment infrastructure, application developers who understand distributed systems failure modes rather than just domain logic, and an ongoing operational load that scales with the number of services — not with the complexity of the domain.

The cost, made visible

The following table makes the prospective cost explicit — before the first line of business logic is written, and before five years have passed.

Concern	Monolith	Microservices	What the split actually costs
Atomicity and failure
Rollback on failure	Database transaction. One word.	Saga pattern. Hundreds of lines.	Design, code, and test a compensating action per service per failure path. O(n²) paths for n services.
Partial failure state	Impossible. Transaction is atomic.	Permanent possibility. Must be designed around.	Order exists, invoice does not. Every consumer of your data now reasons about completeness. Forever.
Consistency	Immediate. Guaranteed.	Eventual. A property you live with.	Not solvable with better tooling. A structural consequence of the boundary choice.
Infrastructure before business logic
Message broker	None.	Kafka or RabbitMQ. 3-node cluster.	Topic design, schema registry, retention policy, consumer group monitoring, local dev setup.
Saga / orchestration	None.	Axon / Temporal / hand-rolled plus a fifth service.	Durable saga state, crash recovery, step tracking. An entire service that owns zero domain concepts.
Distributed tracing	One stack trace.	Jaeger / Zipkin plus correlation IDs everywhere.	Every service propagates trace IDs in headers, event envelopes, and log output. Log aggregation stack on top.
Idempotency	N/A. Methods are naturally idempotent.	Required in every service. Always.	Deduplication store per service. Idempotency key strategy per event. Written, maintained, tested forever.
API contracts	Compiler. Free.	OpenAPI / schema registry plus versioning strategy.	Consumer-driven contract tests. A breaking change is a production incident. Another test layer that did not exist.
Per-service operational overhead
CI/CD pipelines	1	4+	Independent versioning, deployment windows, rollback strategies. Coordination overhead on every release.
Databases	1	4+	Independent migration strategies per service. Schema changes coordinated across deployment boundaries.
Local dev environment	One process.	4+ services plus broker plus docker-compose.	Onboarding measured in days not hours. Partial environments produce integration bugs that only appear in the full stack.
Debuggability and sustainability
Debug a production failure	One stack trace. One log stream.	Reconstruct a timeline across 4+ log streams.	Clock skew between services. Correlation IDs that were not propagated. Broker lag that shifted event order.
Bug surface	Domain complexity only.	Domain multiplied by accidental complexity.	Each async handoff is a new class of timing bug. Compensating paths run rarely, are tested inadequately, and fail in production.
Codebase legibility	Domain is the code.	Domain distributed across event schemas and API contracts.	"What does order creation actually do?" has no single answer. The behaviour is implicit in subscriptions across four codebases.
Maintenance cost over time	Proportional to domain complexity.	Domain plus accidental complexity.	Accidental complexity does not reduce over time. Services accumulate. Contracts fossilise. Framework versions break. Teams leave.
Scaling
Unit of scale	The atomic action. Run more instances.	Individual steps — which are not the bottleneck.	Invoice creation and shipment planning are simple writes. They are not traffic hotspots. The decomposition solves a problem that does not exist.
Infrastructure to scale	Load balancer plus N identical instances.	Everything above, multiplied.	All the saga, broker, and tracing infrastructure exists solely to reconstruct what the database transaction provided for free.

The scaling argument that is rarely examined closely

The case for microservices typically rests on scalability. You can scale the parts that need scaling independently, rather than scaling everything together.

This sounds rational until you ask what actually needs scaling.

In an order creation flow, the bottleneck is almost never the invoice logic or the shipment record creation. These are simple writes that happen once per order. The thing that needs scaling is the number of concurrent orders being created — the atomic action as a whole.

Scaling the atomic action requires a load balancer and N identical instances of one deployed artefact. Each instance connects to one database. The database handles concurrent transactions reliably, as it has for decades. The infrastructure cost is a fraction of the distributed alternative. The operational complexity is a fraction. The failure surface is a fraction.

A well-modelled core domain is not large. This is not an aspiration — it is what remains when accidental complexity is removed. The essential logic of order-to-shipment fits comfortably in one process, understood by one team. What makes codebases large is not the domain. It is frameworks imposing their structure on domain code, duplication caused by unclear boundaries, accidental complexity accreting around poor models, and boilerplate generated by architectural patterns that do not fit the problem.

Strip those out and the core is small, fast to deploy, cheap to run, and trivially scalable as a unit.

The industry asked "how do we scale the parts?" before asking whether the parts needed to be separate. It then built an entire ecosystem of frameworks, patterns, and operational infrastructure to answer the first question — all solving a decomposition problem that, in most cases, did not need to exist.

When distribution is the right answer — and when the arguments do not hold

Distribution has genuine use cases. They are narrower than the industry's adoption rate suggests, and several of the most commonly cited justifications do not survive close examination.

Physical and regulatory constraints

The standard argument: if data must live in a specific jurisdiction for regulatory reasons, you need a distributed architecture.

The better answer: replicate the full domain logic into that regulatory cell. The atomic action stays atomic. The cell — with its own deployment, its own database, its own complete stack — is the unit of distribution. What you do not do is split the domain action across a jurisdictional boundary, routing parts of it between regions. That creates the coordination cost of distribution without the isolation that justified it. The constraint is geographic. The solution is geographic deployment of the whole, not decomposition of the parts.

Independent scaling profiles

The standard argument: if one component needs more scale than others, separating it avoids scaling everything unnecessarily.

The better answer: the cost of splitting a single component out of an otherwise coherent domain action is large, fixed, and permanent — as the table above makes clear. The question is not only "does this component need more scale?" but "does the benefit of isolating its scale exceed the full coordination cost of the split?" In most cases it does not, because the component that appears to need independent scaling is rarely the actual bottleneck under measurement, and because scaling the whole is cheaper than the industry assumes. If there is no compelling reason not to scale everything, scale everything. Simplicity requires a reason to abandon it, not a reason to adopt it.

Organisational boundaries

The standard argument: Conway's Law — systems tend to mirror the communication structures of the organisations that build them. If teams are separated, align the architecture accordingly.

Conway's Law is a useful observation in retrospect. It describes what tends to happen when architecture is not deliberately managed. It is not a prescription, and it should never be used as one. Using it as a justification for a service boundary is encoding organisational structure permanently into the system — and paying the technical cost of that boundary in every sprint, by every developer, for the lifetime of the product.

The cost of an artificially introduced service boundary compounds over years. The cost of reorganising a team is paid once. The engineering should define the ideal architecture with as few compromises as possible. The organisation should be arranged to serve that architecture, not the other way around. This pays dividends — perhaps not in year one, but reliably by year five, and every year thereafter. Teams that succeed with microservices often do so despite the architecture, through heroic platform investment and operational discipline. The patterns can be made to work. The deeper question is whether they were the right starting point for the domain in front of them.

Genuinely independent domain concepts

This is the one case where distribution has a legitimate technical argument — and even here, the bar should be high.

Domain concepts are genuinely independent when they have no transactional relationship with each other. Not merely different in name or ownership, but different in the sense that one completing or failing has no bearing on the integrity of the other. A recommendation engine and a payment processor are genuinely independent. An order and its invoice are not.

The strongest version of this argument comes from systems with a fundamentally asymmetric workload — a platform where reads vastly outnumber writes, where the read path has no transactional requirement, and where the scale difference between the two is large and proven. A social platform where the overwhelming majority of requests are reads with no transactional requirements is a system where isolating the read path separates two genuinely different kinds of work with different resource profiles and different failure tolerances.

But this is a workload argument supported by measurement, not an architectural principle applied by default. It applies to a small fraction of the systems that have adopted microservices, and it should be reached by evidence, not anticipated in advance.

Three tests before splitting a boundary

The rollback test. If this action fails halfway through, what does recovery cost? If the answer is a database rollback, the action belongs inside a single boundary. If the answer is a coordinated sequence of compensating calls across multiple services, each of which can itself fail, ask whether that coordination cost was consciously accepted — or simply inherited from a pattern that was never examined.

The scaling test. Which specific step in this action is the measured bottleneck under current or near-term load? Not the theoretical bottleneck. The step that is demonstrably the constraint today, under real conditions. If the answer is none of them individually, the action does not need decomposition. It needs more instances of the whole.

The standup test. In the daily standup, what language does the team use? If the items are about services, pipelines, brokers, schemas, and migrations — the team is working on accidental complexity. If the items are about domain concepts — what an order means, who owns a responsibility, what a rule actually requires — the team is working on the right problems. You do not need a cost model to apply this test. You need one conversation.

Measuring it in a system you already have

If these tests apply prospectively, they also apply to systems already in production. A short audit reveals more than any architecture review.

Count the sagas. How many business capabilities require a saga or orchestrator to complete? Each one is a boundary crossing that converted a rollback into a coordination problem. The number tells you how much of the domain is currently in transit.

Measure the standup ratio. Over two weeks, track how many standup items are about infrastructure, services, pipelines, and schemas versus domain concepts, rules, and business questions. The ratio is a direct reading of how much of the team's daily energy is absorbed by accidental complexity.

Trace a failure end to end. Pick a recent production incident. Count the number of log streams, services, and correlation IDs required to reconstruct what happened. That reconstruction cost — in time, in tooling, in expertise — is paid on every incident. It is the maintenance tax of the boundary choices made at design time.

Apply the migration heuristic. A well-modelled monolith can be split later, when measurement proves a specific boundary is warranted. A distributed system can rarely be reassembled cheaply once the boundaries have fossilised into contracts, event schemas, and separate team ownership. Optionality has value. The simpler starting point preserves it. The complex starting point spends it immediately, in exchange for flexibility that may never be needed.

First principles

There is nothing novel in the argument this article makes. It is an application of principles that engineering has held for as long as engineering has existed.

Minimise the moving parts. Every component that can fail will eventually fail. Every interface between components is a surface for misunderstanding, for version drift, for timing errors that only appear under conditions nobody anticipated. The system with fewer moving parts is not the primitive system — it is the disciplined one.

Solve the problem in front of you. The system that is over-engineered for scale it has not reached, for distribution it does not need, for independence that its domain does not have — that system is not prepared for the future. It is burdened by it. It is paying, today and every day, for problems it may never have.

Prefer reversibility. The decision that can be undone when it proves wrong is worth more than the decision that cannot, regardless of how confident you are at the time. A monolith that can be split later, when the evidence demands it, is a better starting point than a distributed system that cannot be reassembled after the evidence proves the split was premature.

Measure before you commit. The incomparability problem — the fact that the alternative architecture was never built, so its cost can never be directly compared — cannot be fully solved. But its worst effects can be mitigated by demanding evidence before committing to complexity: evidence of the scaling requirement, evidence of the domain independence, evidence that the coordination cost is worth the benefit it buys.

The software industry has a habit of adopting solutions before fully understanding the problems they were designed to solve, and then normalising the cost of those solutions until the cost becomes invisible. The distributed systems patterns that dominate today were developed by organisations with genuine physical distribution requirements, at a scale that a small fraction of systems ever reach. They solved real problems. They are also expensive, complex, and failure-prone in ways that compound over time and rarely appear on the original architectural diagram.

The question to ask, before any architectural decision, is not "how do others solve this?" It is "what does this problem actually require?" Start from first principles. Follow the cost. Build the simplest thing that genuinely solves the problem in front of you. Treat every boundary crossing — every point where a database rollback becomes a distributed coordination problem — as a commitment with a known, permanent price tag.

Because it will cost exactly that. Invisibly, continuously, and for as long as the system runs.

AI can build anything except an understanding of what you are building

Leon Pennings — Mon, 04 May 2026 08:28:51 +0000

There is a distinction in software development that the industry has spent twenty years pretending doesn't exist. It is the distinction between building software and understanding what you are building. The first is implementation. The second is engineering. They are not the same thing, they do not require the same skills, and conflating them is the single most expensive mistake a development organisation can make.

The mistake is now being turbocharged by AI. But to understand why, you first need to understand what was already broken.

Part One: The 85% Nobody Talks About

Two Kinds of Work

Ask most developers how long it takes to build a feature and they will give you an implementation estimate. How long to write the code, wire up the endpoints, get the tests green. That estimate — the part where fingers meet keyboard — accounts for roughly 10 to 15 percent of what building good software actually requires.

The other 85 to 90 percent is structuring. Understanding what the system is. Identifying where things belong, not just where they are needed. Naming the concepts correctly. Finding the natural boundaries in the domain. Modelling the business so that the code expresses it rather than merely approximating it.

This is the work that determines whether a system is maintainable in year five, extensible in year seven, or being quietly replaced shortly after.

Most systems are being replaced by year seven. The 85% was skipped.

Three Approaches, One Honest Assessment

There are essentially three ways to approach building a system, and only one of them qualifies as engineering.

The first is upfront design. You model the domain completely before writing code. The risk is rigidity — the model is fixed before the code has had a chance to reveal its gaps. Reality has a way of not fitting the diagram.

The second is evolutionary modelling. You begin with a hypothesis about the domain and use code as a feedback instrument. The model and the implementation refine each other continuously. An hour into implementation the starting model may have changed dramatically — a new concept discovered, a responsibility reassigned, a boundary redrawn. That is not failure. That is the process working. The model remains the authority throughout, but it is a living authority — responsive and correctable, never frozen.

The third approach is template filling. You select a framework. You receive a user story, which functions as a work order. You find the place in the template where this kind of story goes. You implement it there. You close the story.

There is no model in this process. There is no conceptual centre. The framework is the authority, and the code documents what the framework was configured to do. Frameworks turned engineers into assembly line workers, and the Singleton Paradox — the impossibility of comparing the system that was built using approach A against the system using approach B that was never built — hid the cost. This is not a different kind of design. It is the absence of design, wearing design's clothes.

The Model as Construction Tool, Discovery Tool, and Filter

The perception is that domain modelling is slow — that it delays visible output while the team thinks instead of ships. The reality is the opposite.

A domain modelling session is twenty to thirty minutes at a whiteboard, followed by code that shapes the actual business interactions. This is not a prototype or a spike. It is production code — the domain coming into existence, business logic finding its natural form. By the end of the first day there is working code that expresses what the business does. The template developer, meanwhile, is configuring YAML, wiring injections, setting up repositories. The motion looks productive. Not a line of it describes the business.

This is the construction side of what a domain model does. But it has two further functions that are equally important.

It is a discovery tool. When implementation is hard — when a concept resists being placed, when a responsibility has no natural home — that difficulty is information. The model is telling you something is missing, or something is wrong. A trial-and-error developer experiences this friction as a local problem to solve locally. A modelling developer experiences it as the domain asking to be understood more precisely. The response is not a workaround. It is a model refinement.

It is also a filter. If a behaviour cannot be fitted naturally into the model — if no object has a clear reason to own it, if it contradicts what the model already captures — that resistance is a signal. Either the model needs a new concept, or the behaviour itself does not belong in the system. The model's inability to absorb something cleanly is not a failure of the model. It is the model doing its job, filtering out accidental complexity dressed as a requirement. If you cannot fit behaviour into the model, you probably do not need it.

A domain model is simultaneously the thing you build with, the instrument that tells you what you are missing, and the filter that tells you what does not belong. The industry has largely stopped building them.

Essential Complexity vs Accidental Complexity in Code

The essential/accidental distinction from Fred Brooks is not just an architectural principle. It applies at the level of every object, every responsibility, every line of code — and getting it right at that level is what separates systems that age well from systems that don't.

Consider a practical example. When building a system that communicates with external services, the essential complexity is what those communications are — what a request contains, what posting requires, what the business needs to express. The accidental complexity is how those communications happen — the transport protocol, the connection handling, the session management, the specific library in use this year.

Model the responsibilities first. A client object owns the mechanics of communication. A request abstraction defines what communication content looks like. A posting variant adds what posting specifically requires. These are modelled as business responsibilities, technology agnostic. The how — whether the underlying transport is HTTP, MQ, or a database — sits entirely behind those responsibilities, invisible to everything that depends on them.

The consequence is significant. The technology can change completely — from web service to message queue to direct database write — without touching a single line of the business logic that constructs and uses those requests. The essential complexity is stable. The accidental complexity is genuinely replaceable.

This is not an interface trick. It is what happens when you model responsibility first and let technology serve the model, rather than letting technology shape what responsibilities are possible. The difference only becomes visible when the technology needs to change — which it always does, eventually. At that point, a system where accidental complexity was kept genuinely separate from essential complexity absorbs the change quietly. A system where the framework grew roots into the business logic requires the business logic to change when the framework changes. The technology that was supposed to serve the domain ends up constraining it instead.

The User Story as Work Order

Something specific happened to the user story as agile methodology was industrialised. It began as an invitation — a prompt to have a conversation with a domain expert, to understand a piece of the business well enough to model it. It became a specification. Then a work order. Then a checkbox.

In its current form the user story arrives at the developer already closed. The conversation with the domain expert happened upstream, in refinement, in planning, in the product owner's head. The developer receives a summary and works from that. The question the developer asks is not "what is this telling me about the domain" but "where in the template does this go."

The diagnosis is visible in what developers say when asked where the hard part of a system is. A template developer describes framework complexity — which abstraction to use, which pattern applies, how to configure the integration. A modelling developer describes domain complexity — what the business is actually doing here, what concept is missing, what existing object is being asked to carry weight it was not designed for.

These are not the same question. They do not produce the same system. And over seven years, the difference between the systems they produce is not marginal.

Where Things Belong vs Where They Are Needed

The most consequential difference between template filling and domain modelling is not visible in the first sprint. It becomes visible in maintenance, and it compounds with every passing year.

A template developer fixes problems where they occur. A table misbehaves on page B, so page B gets adjusted. The fix works. The story is closed. What is not visible is what has just happened structurally: page B now owns part of the table's behaviour. The table behaves one way on page A and another way on page B, and both pages carry part of the responsibility for what the table does. The next developer to touch either page must understand both. Maintenance has doubled, invisibly, for that one component.

A modelling developer asks a different question: what owns this behaviour? The answer is the table itself. The table owns its own presentation. The page owns its usage of the table. A fix to the table propagates everywhere the table is used, because behaviour lives in the component, not in the pages that consume it.

This is not an aesthetic preference. It is the mechanical difference between maintenance costs that stay flat and maintenance costs that compound.

Multiply this pattern across a codebase over five years and you have the prototype in production — a system held together with toothpicks, paperclips, and glue, where every workaround is load-bearing and every change requires understanding not what the system is, but what it has become.

The difference between fixing the problem where it occurs and fixing it where it belongs is the difference between prototype code and production code. At scale, it is the difference between a system that costs the same to maintain in year seven as it did in year one, and a system that is already being rewritten.

The Contradiction Problem

There is a specific consequence of building without a domain model that becomes critical at scale. It is underappreciated, and AI makes it significantly worse.

A domain model is not just a design preference. It is a contradiction-detection mechanism.

When business logic has a conceptual centre — a well-named domain object that owns its own behaviour — contradicting rules become visible. If two requirements make incompatible demands on the same object, you encounter the conflict when you try to model it. The structure surfaces the problem before it reaches production.

When business logic is scattered — across service methods, event handlers, configuration files — contradictions are invisible until they collide in production. Two requirements can contradict each other completely and coexist undetected for months, because there is no common reference point that would make the conflict visible. The system implements both rules, resolves the conflict arbitrarily at runtime, and produces behaviour that nobody designed and nobody can explain.

CQRS, microservices, and event-driven architecture were proposed, in part, as responses to the complexity that accumulates without a domain model. The tragedy is that they add architectural elaboration without supplying the missing conceptual centre. They do not make contradictions visible. They distribute logic across more moving parts, which makes contradictions harder to see, not easier. The problem is obscured by the solution.

Part Two: AI Became the Framework

The Same Pattern, Faster

Which brings us to the present moment, and to the claim that AI is transforming software development.

It is. But not in the way most of the conversation assumes.

There are two ways to think about AI-assisted development, and they map precisely onto the distinction between design and template filling established in part one.

AI is revolutionary in the sense that you conceive what you want, express it, and something builds it. The implementation barrier has been dramatically lowered. Code that would have taken days takes minutes. This is real and significant.

But AI-assisted development is also pure template filling. You are not modelling. You are instructing. The output is code that documents what the prompt said, with AI as the framework. The assembly is faster, the templates are more flexible, the results are more immediately impressive. The absence of a modelling process is identical.

And it inherits both failure modes simultaneously.

From upfront design, it inherits rigidity at the point of prompting. The model — such as it is — is fixed in the prompt. The code cannot talk back, because you are not in dialogue with it. You are receiving output. The feedback loop that makes evolutionary modelling work — where implementation friction becomes structural insight — is broken. The AI absorbs the friction. You never feel it. You never learn from it.

From template filling, it inherits the absence of a conceptual centre. The logic lives in the prompts, scattered and unreconciled, exactly as it lived in the fat services and event handlers before it. Except now it is even less visible, because a service class at least had a name and a location in a codebase. A prompt has neither.

The framework abstracted the developer from the infrastructure. AI abstracts the developer from the code. Each layer of abstraction makes "it works" faster to achieve and the absence of a domain model harder to see.

What the industry is currently calling "AI produces spaghetti" is not a new problem. It is framework templating amplified. The spaghetti was already there. AI makes it faster to produce, more voluminous, and more convincing — because it arrives in clean syntax with passing tests. The structural absence underneath looks better than ever.

AI did not replace the framework. AI became the framework. And it inherited the same problem the framework always had — it can build anything except an understanding of what you are building.

The Maintenance Proposition Does Not Hold

The proposition being made for AI-assisted maintenance is that rewrites are now cheap, so structural problems do not accumulate the same way. This deserves examination.

A rewrite can reproduce the syntax of a system faster than ever before. What it cannot do is verify that the rewrite is correct in the only sense that matters for a business system — that it accurately represents what the business actually does. Correctness here is not syntactic. It is semantic. It requires a reference against which to check the implementation.

The reference is the domain model. And the domain model is exactly what was never built.

So the rewrite, however fast, produces new code that implements the same contradictions, the same scattered logic, the same implicit assumptions. It is not a fix. It is a reprint. The toothpicks are replaced with newer toothpicks. The paperclips are shinier. The structure is identical.

Consider the contradiction problem at scale. Two prompts with conflicting business logic — you will probably spot it. Twenty — possibly. Eighty — almost certainly not. There is no structure that makes the contradiction visible. A rewrite from those eighty prompts does not resolve the contradiction. It reproduces it in fresh syntax. And in another cycle, the same conversation about rewriting will begin again, for the same undiagnosed reasons.

What Disappears and What Doesn't

Frameworks will likely disappear, and probably sooner than the industry expects. Hibernate exists because writing database session management by hand is tedious and error-prone for humans. AI has no such limitation. It can write the queries, manage the sessions, handle the mapping — contextually, specifically, without a generic abstraction layer designed for every possible use case. The framework was a productivity tool for human limitations. As those limitations are removed, the justification for the framework dissolves. This is not a loss. Frameworks were always accidental complexity — complexity introduced by tools rather than by the problem itself.

But the domain model does not disappear with the framework. It becomes more critical. Because the framework, for all its costs, at least imposed some structure. Generic, clumsy, domain-agnostic structure — but structure nonetheless. Without it, and without a domain model, the only thing standing between a system and total architectural entropy is the conceptual model in the developer's head.

Or its absence.

The Skill That Cannot Be Prompted

The ability to model a domain — to hold a structural representation in your head, refine it through implementation, and express it in code that means something beyond its own execution — does not appear to be a skill that AI can supply or that prompting can replicate.

It appears to correlate with a specific kind of spatial reasoning: the ability to see a three-dimensional object from its two-dimensional components, to hold structure in the mind and manipulate it without losing the whole. Developers who have this skill behave differently when they encounter implementation friction. Where a template developer sees a local problem to solve locally — a fix applied where the problem occurs rather than where it belongs — a modelling developer sees structural information. The friction is the domain asking to be understood more precisely. The response is not a workaround. It is a model refinement.

You cannot prompt your way to that response. The prompt eliminates the friction. And the friction was the signal.

The Only Honest Measure

There is a simple diagnostic for whether a system was built or merely assembled. Apply it after seven years.

Is maintenance getting cheaper or more expensive? A well-modelled system gets cheaper — the model matures, the team internalises it, changes become faster as understanding deepens. A template-filled system gets more expensive, as accidental complexity compounds and each change must navigate the accumulated residue of earlier decisions made without a model.

Are new requirements getting faster or slower to absorb? A well-modelled domain accelerates — each addition deepens understanding and reveals where the next extension naturally fits. A system without a conceptual centre slows — each requirement negotiates with the existing tangle rather than extending a coherent structure.

Has the rewrite conversation started?

The rewrite is not a sign of business ambition or technical progress. It is the bill arriving for the 85% that was skipped. And it will reproduce the conditions that made it necessary, because the organisation never learned what actually went wrong. The diagnosis will be "technical debt" or "legacy architecture." Rarely will it be accurate: no domain model was ever built, and without one, the rewrite begins the same accumulation from sprint one.

AI makes none of this cheaper in the long run. It makes the first two years cheaper and the subsequent five more expensive, because the prototype is produced faster and looks more convincing, and the discovery that it is a prototype comes later and costs more.

The 85% cannot be prompted. It cannot be templated. It cannot be abstracted away by a sufficiently powerful framework, however intelligent that framework becomes.

It requires understanding what you are building.

That has always been the hard part. It remains the hard part. And the industry's increasing sophistication at avoiding it is not progress.

It is a more expensive way of arriving at the same rewrite conversation, on roughly the same schedule, having learned roughly the same nothing.

How to Test Whether Your Software Solution Actually Fits The Problem

Leon Pennings — Tue, 28 Apr 2026 06:04:11 +0000

Every application is built once.

There is no second version of the same system, built with different architectural assumptions, run in parallel for a decade, and then compared on maintenance cost, team size, and requirement absorption speed. The alternative is never built. The counterfactual never exists. This is the Singleton Paradox applied to software: because each system is unique, there is no external reference point against which to judge whether it is a good solution to its problem — or merely the only solution anyone bothered to build.

This matters more than it might appear. It means that the quality of an architectural decision can never be measured by comparison. You cannot park the well-modeled system next to the poorly-modeled one and read off the difference. The poorly-modeled system is the only one that exists. So when it becomes expensive to maintain, slow to change, and eventually impossible to extend, those outcomes get attributed to the problem — the domain was complex, the requirements changed, the business grew — rather than to the solution. The solution is never put on trial, because there is nothing to try it against.

The Singleton Paradox does not just make good architecture hard to prove. It makes bad architecture hard to see. The absence of contrast is not neutral. It actively shapes what gets treated as normal. Rising maintenance costs are normal. Growing teams are normal. Slowing feature velocity is normal. Rewrites every seven to ten years are normal. None of this is normal in the sense of being inevitable. All of it is normal in the sense of being what happens when accidental complexity (Fred Brooks' term for the complexity introduced by tools and decisions rather than by the problem itself) compounds over time, and when there is no alternative visible to suggest it could be otherwise.

This creates a specific and solvable problem. If external comparison is unavailable, the only honest measure of whether a system is a good fit for its problem is internal. Not how it compares to another system that was never built, but how it behaves against time. Does it get easier or harder to operate? Does it get cheaper or more expensive to change? Does it remain stable as the domain evolves, or does it accumulate fragility with each passing year?

Those questions have answers. And the answers, taken together, constitute the only reliable verdict on whether the solution fit the problem.

The Ten-Year Cost Test

That internal measure can be made concrete. The Ten-Year Cost Test is a diagnostic any organisation can apply to its own systems — not a comparison against an alternative that was never built, but a set of questions about whether the current architecture is winning or losing against time. The threshold of ten years is not arbitrary. A system that cannot survive a decade without a rewrite has not been maintained; it has been replaced. And replacement, however it gets framed, is the system announcing that it was not a good fit for the problem it was built to solve.

The test is simple. After ten years in production, a well-designed system should satisfy all of the following:

Maintenance cost is the same or lower than year one. As the domain model matures and the team's understanding deepens, maintenance should become cheaper, not more expensive. The team knows where everything lives. The rules are explicit and localised. A change that took two days in year one should take two hours in year ten, because the model has been refined and the team has internalised it.

New requirements are absorbed faster as the system matures. A well-modeled domain does not merely keep pace with new understanding — it accelerates. Each addition deepens the team's knowledge of the model and reveals where the next extension naturally fits. When the business learns something new — a new product type, a new regulatory constraint, a new class of customer — the model should be able to absorb it with decreasing effort over time, not constant effort. If absorption speed is flat, the domain model is adequate but not right. If it slows, the model is failing. A well-modeled system gets easier to extend the longer it has been understood.

The team size required to maintain it has not grown significantly. This is perhaps the most honest measure of architectural health. A system that requires more people every year to maintain the same functionality is a system where accidental complexity is compounding. Each new developer adds coordination overhead. Each new layer of abstraction requires more people to understand it. A well-modeled system with low accidental complexity should be maintainable by a small, stable team indefinitely.

The application is as stable or more stable than it was initially. Stability should increase over time as the model matures and edge cases are understood and handled. If the system becomes less stable over time — more incidents, more unexpected interactions, more fragile integrations — accidental complexity is winning.

The cost of running it has not grown faster than the business it serves. Infrastructure costs, operational overhead, and support burden should scale with business growth, not with architectural entropy. A system that costs significantly more to run in year ten than it did in year one, while serving the same number of users, has a structural problem.

Apply this test honestly to any system you have worked on for more than five years. The results are rarely comfortable.

What the Industry Data Actually Shows

Before examining how the average project scores on this test, an important caveat is necessary. Rigorous longitudinal data comparing domain-first versus framework-first approaches over ten-year periods essentially does not exist in published form. The industry does not measure what it should measure. Deployment frequency, recovery time, and project delivery success rates are tracked. Total cost of ownership relative to architectural approach over a decade is not.

This absence is itself the Singleton Paradox operating at industry scale. Nobody ran the controlled experiment. Nobody built both versions of the same system and compared them over ten years. So the precise cost differential between approaches is genuinely unknown in the scientific sense — even though the directional evidence is consistent and substantial.

What does exist:

The CISQ estimated in 2022 that poor software quality costs US organisations approximately $2.41 trillion annually, with a significant portion attributable to accumulated technical debt. The direction of travel is clear even if the precise attribution to architectural choices is not.

The Standish Group CHAOS Report has tracked project success rates for decades. Despite continuous evolution of methodology — agile, DevOps, cloud-native — the underlying success rates have not dramatically improved. This implies the problem is structural rather than methodological. Better processes applied to the wrong architecture produce better-managed failure, not success.

The DORA research — Google's annual State of DevOps reports, now covering over 39,000 professionals — shows a persistently bimodal distribution. The 2024 report found that elite performing teams have change failure rates around 5% and recover from incidents in under an hour. Low performing teams have significantly higher failure rates and recovery times measured in days or weeks. Only 19% of organisations reached elite performance. The low performance cluster, meanwhile, grew from 17% to 25% of respondents between 2023 and 2024. The distribution is not a bell curve. It is two distinct populations. Architecture and approach appear to be the differentiating variable, not team size, budget, or industry.

Amazon Prime Video published a case study in 2023 describing a 90% infrastructure cost reduction after consolidating a distributed microservices monitoring service into a single process — a result specific to that service, not a platform-wide architectural overhaul, but instructive precisely because the team at Amazon chose to be candid about it. Segment, a data platform company, published a similar account. These are self-selected — organisations that consolidated and saved money are more likely to publish than those that saw no benefit — but they are directionally consistent with the argument being made here.

A McKinsey and University of Oxford study of more than 5,400 IT projects — conducted in 2012 and still the most comprehensive published dataset of its kind — found that large IT transformation projects run on average 45% over budget, 7% over time, and deliver 56% less value than predicted. That is first delivery. The trajectory over the subsequent decade is harder to find in rigorous published form — which is itself telling.

Scoring the Average Project

With that context, here is an honest assessment of how the average project scores on each dimension of the Ten-Year Cost Test. These are not precise figures — the data does not support precision — but they represent the consistent direction of the evidence.

Maintenance cost rises significantly on the average project. Industry estimates consistently place maintenance at 60–80% of total software lifecycle cost, and that proportion grows over time rather than shrinking. On framework-first systems, the annual upgrade cycle alone — broken dependencies, reworked configuration, revalidated integrations — consumes engineering capacity that produces zero business value. In the worst cases, maintenance costs grow 800% or more over a decade, eventually triggering a rewrite. In the best cases — domain-first systems with low accidental complexity — maintenance costs stay flat or fall as the model matures.

Requirement absorption speed slows materially on the average project. In a well-modeled system, new requirements should get faster to implement over time — not slower — as the team's understanding deepens and the model reveals where each extension naturally fits. On the average project, the opposite happens. What starts as a two-week feature becomes a two-month project by year five, as each new requirement must navigate accumulated accidental complexity. In distributed systems, a single business rule change triggers API contract renegotiation, versioning decisions, cross-team coordination, and staged deployments. In the worst cases, the system effectively stops absorbing new requirements — every change becomes a major project and the business routes around the software rather than through it. In the best cases, requirement absorption accelerates as the model matures. Flat speed is a warning sign. Slowing speed is a verdict.

Team size grows on the average project. Industry observation consistently shows teams of two to three times the original size by year ten, maintaining the same functional scope. In the worst cases — full microservices architectures with dedicated platform, SRE, and DevOps functions — the team exists primarily to manage its own infrastructure rather than to serve the business. In the best cases, the team stays small and stable. Three developers. Five hundred domain objects. Fifteen years.

Stability declines on the average project. DORA data shows that low-performing teams — the majority — have change failure rates approaching fifty percent and recovery times measured in weeks. Production increasingly becomes the final validation environment because the integrated system only meets real conditions there. In the worst cases, the organisation develops a chronic incident culture where production instability is treated as a fact of life rather than an architectural signal. In the best cases, stability improves over time as the model matures and edge cases are properly handled.

Running costs grow faster than business value on the average project. The shift to cloud computing made infrastructure costs more visible but did not reduce them. Microservices architectures run fifty to two hundred containers where a monolith needs three to five, with corresponding cost differentials. In the worst cases, infrastructure cost grows an order of magnitude while business capability grows modestly. In the best cases, running costs remain proportional to business growth throughout the system's life.

The rewrite conversation starts on the average project around year seven. In the worst cases, the conversation starts at year three or four — the system has already become unmaintainable before it is fully understood. In the best cases, the conversation never happens. The system absorbs new requirements, accommodates new technology at its boundaries, and continues to serve the business indefinitely.

The Inverse Is Also True

The data does not merely show that the average project fails the Ten-Year Cost Test. It shows that failure is the expected outcome — so expected that the industry has stopped treating it as failure.

Rising maintenance costs are attributed to business complexity rather than architectural choices. Growing teams are treated as evidence of business success rather than architectural inefficiency. Slowing requirements are explained by changing priorities rather than accumulated accidental complexity. Declining stability is managed with better monitoring rather than addressed at its source. The rewrite conversation is framed as modernisation rather than recognised as the bill arriving for choices made before the domain was understood.

This normalisation is the most dangerous consequence of the Singleton Paradox operating at industry scale. When everyone is paying the same inflated price, the inflated price becomes the reference point. The cost of accidental complexity is not visible as a cost. It is visible as the cost of software — the natural, inevitable, irreducible price of building systems.

It is not natural. It is not inevitable. It is not irreducible.

It is the compound interest on a specific set of choices, made consistently, across the industry, before domains are understood. Choices that look like engineering because everyone makes them. Choices that the Singleton Paradox ensures will never be clearly falsified, because the alternative is never built.

The Rewrite as the Final Verdict

There is one more signal worth examining. It requires no data, no research, no longitudinal study. It is available in almost every organisation that has been running software for more than a decade.

The rewrite conversation.

When someone in your organisation argues that the current system cannot support where the business is going — that it needs to be modernised, migrated, rebuilt on a new platform — that system has already announced its verdict on the Ten-Year Cost Test. The rewrite is not a sign of business ambition. It is the bill arriving.

The tragedy of the rewrite is not its cost, though the cost is substantial — typically measured in millions and years. The tragedy is what happens after. The new system almost always makes the same choices. The same framework is selected before the domain is understood. The same patterns are applied before the business concepts are named. The same accidental complexity is introduced in the first sprint and compounds through the same lifecycle.

Because the Singleton Paradox means the organisation never learned from the previous system what actually went wrong. The previous system ran in production. The pipeline was green. The architecture was recognised. The failure was economic and temporal — too slow, too expensive, too fragile to change — not functional. And economic, temporal failure is invisible until it isn't. By the time the rewrite conversation starts, the diagnosis is usually "technical debt" or "legacy architecture" or "we outgrew it." Rarely is the diagnosis accurate: accidental complexity was introduced before the domain was understood, and it compounded for seven years.

So the rewrite reproduces the conditions that made the rewrite necessary. And in another seven to ten years, the conversation starts again.

A well-modeled system does not generate the rewrite conversation. Not because it is perfect, or because requirements don't change, or because technology doesn't evolve. But because the essential complexity — the domain model — is separable from the accidental concerns around it. Frameworks can be replaced without touching the domain. Infrastructure can evolve without restructuring the business logic. The system adapts because its core is stable, and its core is stable because it correctly reflects the domain rather than the technology choices of the year it was built.

The Ten-Year Cost Test can be applied to any system. And the rewrite conversation, or its absence, is the most honest result that test can produce.

The Uncomfortable Conclusion

The Singleton Paradox means the direct proof will always be unavailable. You cannot park the well-architected system next to the poorly-architected one and read off the difference, because only one of them was ever built. You cannot compare the fifteen-year maintenance cost of a domain-first system against a framework-first system because the framework-first system is the only one that exists.

What you can do is apply the Ten-Year Cost Test to what you have. Ask honestly whether maintenance is getting cheaper or more expensive. Whether new requirements are getting faster or slower to absorb. Whether the team is staying small or growing to manage complexity. Whether the system is getting more stable or less. Whether running costs are proportional to business growth or running ahead of it.

And ask whether the rewrite conversation has started.

The industry data — imprecise as it is, incomplete as it necessarily must be — points consistently in one direction. The average project fails all five dimensions of the test. Maintenance rises. Requirements slow. Teams grow. Stability declines. Costs outpace business value. The rewrite conversation starts around year seven and reproduces the conditions that made it necessary.

This has happened so consistently, for so long, that it has been normalised into invisibility. The inflated cost has become the reference point. The compounding expense of accidental complexity has become indistinguishable from the natural cost of building software — because no one in the room has ever seen it otherwise.

The proof that it can be otherwise exists — in systems maintained by small teams in complex domains, absorbing new requirements cleanly, costing the same to run as they did a decade ago. Those systems exist. They simply never get compared to the alternative, because the alternative was never built.

The absence of that proof in your organisation is not evidence that it is impossible.

It is evidence of the Singleton Paradox.

And the Singleton Paradox is not a law of nature.

It is a consequence of choices. But not random choices — choices made under a specific kind of pressure that has nothing to do with fit. Spring Boot is chosen because the last project used Spring Boot. CQRS is chosen because the architect gave a conference talk on CQRS. Event-driven architecture is chosen because it is what sophisticated teams are supposed to use. These are not engineering decisions. They are career decisions dressed as engineering decisions. No one got fired for choosing the framework everyone else is using. The choice is defensible precisely because it is popular — and because the Singleton Paradox ensures it will never be tested against the alternative, it remains defensible indefinitely, regardless of what it actually costs.

This is the root cause the industry rarely names. Not incompetence. Not malice. The systematic selection of solutions on the basis of social safety rather than demonstrated fit — in an environment where demonstrated fit is structurally impossible to measure.

Choices that can be made differently.

The Underestimated Power of Encapsulation in Software Engineering

Leon Pennings — Mon, 27 Apr 2026 08:46:44 +0000

Most Java developers today can explain encapsulation. They will tell you it means making fields private and adding getters and setters. They can recite SOLID principles on demand. They know the vocabulary.

What most of them have never experienced is what genuine object-oriented design actually feels like in practice — and that is the real problem.

Object-oriented principles did not disappear because of technology hype or the pace of change. They were never properly learned. A generation of developers was trained on frameworks, not on design. They learned Spring before they understood objects. They learned dependency injection before they understood responsibility. They learned how to make things work before they understood how to structure things well.

The result is an industry where object-oriented vocabulary is used to justify procedural habits. The Interface Segregation Principle — which is fundamentally about keeping responsibilities separate and coherent — gets applied as a rule for how to slice Spring interfaces. Encapsulation becomes a checkbox: private fields, public getters, done. The deeper meaning, and the profound practical value behind it, is lost entirely.

What dominates instead is procedural programming in disguise. Fat service classes orchestrate anemic data bags. Logic is scattered across layers. Objects exist to hold data, not to own behavior. The goal is implementation — make it work, ship it — not design. Not structure. Not a system that remains small, simple, robust, and maintainable as it grows.

This article is about what encapsulation actually means, what it actually does, and why practicing it properly changes both the software you build and the way you think about building it.

What Encapsulation Really Means

Encapsulation means that the "how" stays completely inside the object. Clients see only the "what" — the responsibilities the object fulfills. Nothing about implementation, nothing about mechanism, nothing about technology ever surfaces in the public interface.

Private fields are the minimum. The real discipline is in the public surface of the object. If a method exposes internal data, leaks a storage detail, or forces the caller to know anything about how the object works internally, encapsulation has already failed — regardless of whether the fields are private.

This extends to the constructor. A constructor that accepts implementation details — a storage mechanism, an external resource, a configurable strategy — is already exposing the "how." The object must own its implementation completely, from the moment it comes into existence.

A helpful guiding principle is "Tell, Don't Ask": tell the object what to do. Do not ask it for its data so that you can make decisions with it elsewhere. When you find yourself pulling data out of an object to decide what to do next, that decision almost certainly belongs inside the object itself.

The Cognitive Shift: From Technology to Responsibilities

Encapsulation is more than a coding rule. Practiced properly, it becomes a thinking tool that changes how you model systems from the ground up.

When you commit to hiding the "how," you are forced to think clearly about the "what." Technical questions — how do I store this, which framework handles this, which layer does this belong to — become the wrong questions. They are about implementation, and implementation is not your concern at this level. The right questions are: what is this object responsible for? What should it be able to do? Which other objects would it naturally talk to?

In a typical Spring application this shift never happens. Developers think in layers — controller, service, repository — and the central question is always "where does this code go?" That question produces a filing system for procedural code. It does not produce a domain model. The objects that emerge from it are empty by design, because the template has already decided that behavior lives in services, not in objects.

Asking "whose responsibility is this?" produces something entirely different: a coherent network of objects that each own their behavior completely, and that together tell the story of the domain.

Example: A Well-Encapsulated Document

Consider a compliance-heavy application where documents — PDFs, scanned forms, certificates — play a central role. They get created, stored, retrieved, and checked for compliance throughout the system.

The typical Spring-influenced approach treats Document as a data bag:

java

public class Document {
    private UUID id;
    private String filePath;    // leaks storage details
    private String mimeType;
    private byte[] content;     // exposes raw data

    public String getFilePath() { return filePath; }
    public void setFilePath(String path) { this.filePath = path; }
    public byte[] getContent() { return content; }
    // ... more getters and setters
}

This is not an object. It is a struct with ceremony. Every implementation detail is visible and reachable. Logic that belongs to the Document — storage, compliance checking, content retrieval — lives somewhere else, in a service class, spread across layers, written procedurally. Changing the storage mechanism means hunting through the entire codebase because the entire codebase is coupled to the implementation.

Now consider a Document that actually owns its responsibilities:

java

public class Document {
    private final UUID id;
    private final String name;
    private final String mimeType;

    public Document(String name, String mimeType, InputStream content) {
        this.id = UUID.randomUUID();
        this.name = name;
        this.mimeType = mimeType;
        // becoming a Document includes taking care of its own storage
        // the how is nobody else's business
    }

    public void writeToStream(OutputStream outputStream) {
        // retrieves and writes content — fully internal
        // the caller gets their bytes, nothing more
    }

    public boolean isCompliant() {
        // compliance logic lives here, where it belongs
    }

    public String getName() { return name; }
    public String getMimeType() { return mimeType; }
}

The Document figures out its own storage as part of coming into existence. It knows how to give its content back via writeToStream. It knows whether it is compliant. No file path is exposed. No byte array leaks out. No storage mechanism is visible to anything outside.

Usage across the system stays clean and expressive:

java

Document invoice = new Document("invoice.pdf", "application/pdf", contentStream);

transaction.attach(invoice);
invoice.writeToStream(responseStream);

Transaction knows it can attach a Document. It does not know — and has no reason to know — how the document stores itself, where it lives, or how it retrieves its content. The Document figures it out. That is the point.

Why This Matters at Scale

The benefits of this discipline are not always obvious on a small codebase. They become impossible to ignore as the system grows — and they show up most clearly when things need to change.

The core logic tells the story of the domain. When objects are modeled around responsibilities rather than technical concerns, reading the code means reading the domain. A Transaction attaches a Document. A Document knows whether it is compliant. The objects speak in business terms because they were designed in business terms. There is no framework noise, no layer indirection, no infrastructure vocabulary polluting the domain model. A new developer — or a returning one after six months — can understand what the system does by reading the objects, not by reverse-engineering a tangle of service classes and annotations.

Framework upgrades become a bounded problem. The dominant template in Java development today is well known: logic goes into services, data gets carried by DTOs, persistence is managed by repositories, and domain objects exist mainly to map to database tables. This pattern is taught as architecture. It is actually a prescription for hollowing out the domain. The objects end up empty. The behavior ends up scattered across service classes that have no natural boundary, no clear responsibility, and no reason to stay coherent as the system grows.

The consequence is that the framework and the domain become inseparable — not because of annotations on classes, but because the logic itself has been relocated into framework-managed components. Services are Spring beans. Transaction boundaries are framework concerns. The business reasoning is hosted inside the framework rather than sitting independently of it. When the framework changes, the logic has to move with it, because the logic lives inside it.

When domain objects genuinely own their responsibilities, this changes entirely. The core domain is a network of objects talking to each other in business terms, with no knowledge of the framework hosting them. The framework sits at the edges — handling HTTP, managing sessions, coordinating persistence — but it does not host the logic. Upgrading it, replacing it, or restructuring it becomes a bounded problem. The domain does not change because it was never coupled to the framework in the first place.

The five to seven year rebuild cycle is not inevitable. Most software organisations accept the full rewrite as a fact of life. After a few years, the codebase has become so entangled with its own technology choices that evolution is no longer possible — the only way forward is to start again. This cycle is expensive, disruptive, and demoralising. It is also, in large part, a consequence of building systems where business logic is hosted inside framework components rather than inside the domain itself.

When the core logic is a network of objects talking to each other in terms of responsibilities, it does not age the same way. The business rules, the domain relationships, the behavioural contracts between objects — these survive. Technology changes around them. The core endures.

Architectural evolution becomes manageable. Moving from a monolith to a distributed architecture, extracting a bounded context, splitting a service — these are genuinely difficult problems when business logic is woven through framework plumbing. When domain objects carry no framework baggage and communicate purely through their responsibilities, the same logic can move between architectural boundaries without fundamental redesign. The objects do not care whether they run in one process or ten. Their responsibilities do not change. Their interfaces do not change. The architecture is a deployment concern, not a domain concern.

The less the core depends on frameworks, the longer it survives. This is the underlying premise. Frameworks evolve, get replaced, fall out of favour, and eventually die. Business logic, when it is well modelled, does not have the same lifecycle. Keeping them genuinely separate — not just in theory, but in practice, through strict encapsulation — means the thing that actually matters, the domain model, accumulates value over time rather than accumulating debt.

Common Pitfalls

The data bag. A class whose primary purpose is to hold data with getters and setters is not an object in any meaningful sense. It is a data structure. Logic that should belong to it lives elsewhere, and that scattered logic is the source of most maintenance pain in large Java codebases.

The leaking constructor. A constructor that accepts implementation details — storage strategies, injected resources, configurable mechanisms — is already exposing the "how." This is dependency injection, and despite its near-universal adoption in Java development, it is a direct violation of encapsulation. The object should own its implementation fully. If it needs to talk to an external resource, it does so internally. That is not a variable, not a configuration point, not something the outside world participates in. It is simply what the object does. The widespread embrace of DI as a default pattern reflects an aversion to singletons and a desire for testability — both legitimate concerns — but it solves them at the cost of encapsulation, and that cost is rarely acknowledged.

Procedural code in disguise. A service class that takes data out of one object, makes decisions about it, and puts results into another object is a procedural function with a class wrapper. The behavior belongs in the objects themselves. The service class is a symptom of objects that do not own their responsibilities.

SOLID as a technical checklist. When principles like Interface Segregation or Single Responsibility are applied to framework configuration and layer boundaries rather than to object design, they produce architectural cargo cult — the appearance of structure without the substance. These principles are about responsibilities and design, not about how to wire up a Spring context.

A Note on Pragmatism

No codebase exists in a vacuum. Frameworks, ORMs, and serialization libraries are part of real-world development, and they sometimes need to know things about your domain objects. This is accidental complexity — the overhead introduced by the tools and environment you work in, as opposed to the essential complexity of the domain itself.

The key distinction is whether the accidental complexity adapts to the essential, or corrupts it.

JPA annotations on a domain object are a good example of acceptable accidental complexity. They decorate the object — they tell the framework how to map it — but they do not change what the object does, how it reasons, or how it protects its own state. The domain logic is untouched. If you removed JPA tomorrow, the object would still make complete sense. The essential complexity is intact. Accidental complexity that adapts to essential complexity without reshaping it is always acceptable — and recognising that distinction is itself a design skill.

The line is crossed when the framework starts dictating structure. A no-argument constructor that leaves the object in an invalid state. A setter that exists purely because the ORM needs to hydrate a field. A transaction boundary that forces business logic to be organised around framework sessions rather than domain responsibilities. At that point the accidental complexity is no longer adapting to the essential — it is reshaping it. The tool is now designing the domain, and the domain is losing its integrity.

The test is simple: if the accidental complexity were removed, would the core object still be coherent, valid, and complete on its own terms? If yes, the compromise is acceptable. If no, the framework has gone too far and the design needs to push back.

Conclusion: Encapsulation as a Force Multiplier

True encapsulation is strict. The object alone owns and hides everything about how it works. Clients see only responsibilities. The "how" is nobody else's business.

Practiced properly, it changes more than the code. It changes how you think about systems. You stop modeling data flow and start modeling behavior. You stop thinking in layers and start thinking in responsibilities. You stop asking "where does this code go?" and start asking "whose job is this?" The software becomes a network of objects that each know their job and do it — completely, independently, and without leaking their secrets.

That network survives in a way that layered, framework-dependent systems do not. It survives framework upgrades because the framework was never inside it. It survives architectural shifts because the objects carry no architectural assumptions. It survives time because it is organised around the domain — around what the software actually is — rather than around the technology that happens to be running it today.

Most Java developers today have never worked in a codebase built this way. That is not an accusation — it is a consequence of an industry that taught frameworks before it taught design. But it means that for many, genuinely object-oriented development would feel like a different discipline entirely.

It is. And it is worth learning.

Rich domain modelling: a library story

Leon Pennings — Sun, 19 Apr 2026 12:23:04 +0000

Most software doesn't have a domain model. It has a database schema, a set of service classes that orchestrate calls to it, and a collection of user stories that have been implemented one by one, each leaving a small deposit of logic somewhere convenient. This works, until it doesn't — until a framework needs replacing, a regulation changes, or someone asks a question the system was never quite designed to answer, and the answer turns out to be scattered across fourteen service methods and three database joins.

This article is about a different approach, illustrated through a deliberately simple example: a library system. The example is old-fashioned on purpose. The familiarity lets you focus on the reasoning, not the subject matter.

The core argument is this: a rich domain model is not something you design once at the start of a project and then implement. It is something you grow, continuously, as your understanding of the business deepens. Every requirement, every refinement session, every new user story is not just a work order — it is new information about the domain. The question to ask at each step is not "how do we implement this?" but "does this change what we understand the domain to be?"

If the answer is yes, the model changes. Not in a future story. Not as tech debt. Now. The implementation timeline is not sacred. The correctness of the domain is. The cost of a misaligned domain compounds over time — it gets into every new feature, every workaround, every "we can't easily change that" conversation. A missed sprint to correct the model is almost always cheaper than six months of working around a wrong abstraction.

The other side of this is: you only model what you understand. If something is unclear, that is not a reason to guess at an abstraction — it is a reason to ask more. Refinement sessions exist precisely for this. The domain expert knows things the model doesn't yet reflect. The job is to close that gap, incrementally, with each new piece of understanding.

That is what this article shows. Not a perfect model arrived at in one go, but a model that starts where the knowledge starts, and adapts as the knowledge grows.

User story 1: "We want to lend out books"

The first conversation with the domain expert goes predictably. The library wants to lend books. They want to know where each book is — on which shelf, or on loan to whom, from when until when.

From this, the initial domain objects emerge: Book, Lender, and somewhere, the loan dates. And this last point — where do the loan dates live? — is the first real decision.

The path of least resistance puts them in Book. The book knows where it is; if it's on loan, it knows to whom and for how long. It seems natural. But pause here, because this is the decision that will constrain everything that follows.

Ask a simple domain question: is knowing when it was borrowed, and by whom, part of what a book is? A book is a title, an author, a physical object. The loan is an event — an agreement between the library and a person, at a point in time, concerning that book. Two different things. Putting loan dates in Book is the same category of error as storing someone's employment history in their passport: adjacent subjects stitched together because it was convenient.

There is also a practical problem that makes the conceptual one concrete: a book can be borrowed many times, by different people, at different points in time. A single set of loan fields cannot represent that history without overwriting it. The model isn't just conceptually imprecise — it is structurally incapable of answering basic questions the business will eventually ask.

The first model, with its warning signs visible:

Recognising the problem, a Loan entity is introduced. It points to a book and a lender, and carries its own data: start date, end date, and a return date for when the item actually comes back.

Book is clean. Each entity is responsible for what it actually is.

Emergent behaviour: what the model now gives you for free

Here is something worth making explicit, because it tends to get overlooked.

When the domain is modelled correctly, it doesn't just solve the problem at hand — it makes available capabilities that nobody wrote a story for.

With Loan as a first-class entity, the model now contains the answers to questions like:

How many times has this book been borrowed in the last year?
Is it borrowed back-to-back — should we order a second copy?
Which items are overdue right now?
Which lender has the most active loans?

No one asked for any of this. And more importantly, no one needs to change the model to support it. These questions are answerable as a natural consequence of the right abstraction — zero additional structural cost. This is what correct domain modelling produces: not just a solution to the stated requirement, but a foundation that doesn't resist future questions.

The opposite — loan dates buried in Book — means that every one of those questions requires working around an accidental constraint. The data is there, technically, but it is in the wrong place conceptually, and that mismatch has a cost that accumulates with every new question the business wants to ask.

A correct abstraction doesn't just solve the current problem. It shapes every solution that follows.

User story 2: "We also want to lend out DVDs"

A new requirement arrives. The library wants to lend DVDs too.

On most teams, this is treated as a work order. There is now a DVD entity. Fields are defined — title, director, runtime. The ticket is closed.

This is precisely the failure mode the introduction described: a user story implemented rather than understood. The arrival of this requirement is not an instruction to add DVD. It is new information about the domain. And new information about the domain means it is time to re-examine the model.

The question is not "how do we add DVD?" The question is: was Book ever the right abstraction for this domain?

Think about what the lending system actually cares about. It doesn't care that a book has pages or that a DVD has a runtime. From the perspective of the lending domain, both are things that can be borrowed, returned, and tracked. If you add a DVD entity you are not modelling the lending domain — you are modelling a classification detail that the domain does not act on. And the next story will bring magazines. Then tools. Then a request that breaks the pattern entirely, and by then there are four parallel entity types, duplicated service logic, and a reporting layer full of unions.

The correct response to this user story is not implementation. It is evaluation. And the evaluation reveals that the concept the domain actually needs is not Book — it is a lendable item. Something that can be borrowed, regardless of what it is.

Modelling the domain, not the world

This is the point where a common objection appears: isn't LendableItem with a generic attribute collection just an EAV pattern with a different name? Isn't it losing type safety? Isn't it too abstract?

These are implementation concerns, not domain concerns. And that distinction matters enormously.

A book and a DVD are genuinely different things in the real world. They have different physical forms, different metadata, different cultural contexts. But the domain model is not a model of the real world. It is a model of how the business operates. And in the lending domain, a book and a DVD are the same thing: an item that can be lent to a person for a period of time, tracked, and returned. The domain acts on that concept. It does not act on the distinction between pages and runtime.

The risk in domain modelling is not abstraction. The risk is the wrong abstraction — and the most common wrong abstraction is modelling the real world instead of the business domain. When that happens, the model fills up with concepts that feel correct because they match physical reality, but that the business never actually operates on as distinct things. Book and DVD as separate domain entities is that mistake. The library doesn't lend books and DVDs differently. It lends items.

LendableItem is not generic for the sake of flexibility. It is precise — precisely what the domain requires.

This is not overengineering. Starting with Book was correct — at the time, only books existed, and naming the concept after the only known instance of it is entirely reasonable. Good domain modelling does not demand abstraction before there is evidence for it. But when the evidence arrives, the model must respond.

The revised model:

Book becomes LendableItem. The type — book, DVD, magazine, whatever comes next — is an ItemType instance defined in data, not in code. Each ItemType carries the attribute definitions relevant to it: a book has ISBN and author; a DVD has runtime and director. The LendableItem holds the attribute values as a key-value collection shaped by the ItemType — not arbitrary data, but controlled variation. A new lendable type can be defined through the UI, without a software release. The domain absorbs the variation without being touched.

Notice what also appears here: LendPolicy. Lending rules — how long something can be borrowed, whether it can be renewed — are not properties of items. They are policies, and policies have their own identity. A 7-day loan period might apply to all DVDs, a 21-day period to most books, and a specific rare edition might carry its own exception — all configurable, without code changes. By modelling LendPolicy as an entity that points to items rather than belonging to them, the granularity becomes a business decision. The domain reflects it correctly.

What this example is really about

Three things are worth naming directly.

The domain is not a one-off. The biggest misconception about domain modelling is that it happens at the start of a project, produces a diagram, and is then finished. In practice, a domain model is only as good as the understanding that produced it. Understanding grows — through refinement sessions, through new requirements, through conversations with domain experts who reveal nuance the model doesn't yet capture. Every one of those moments is an opportunity to improve the model. Treating them as implementation tickets instead is how misalignment accumulates.

Correctness compounds. A wrong abstraction doesn't just cause one problem. It causes every problem that grows on top of it. When the framework needs replacing five years from now, the core business logic should be the stable thing — the part that doesn't change because it correctly reflects the domain. If the logic has leaked into service methods, database queries, and framework-specific glue, the framework and the logic are inseparable. A rich domain model is what makes the core of the application resilient to the things around it changing.

User stories are input, not instructions. "We want to lend DVDs" is not a specification. It is a piece of information about the business. The correct response is to understand what it reveals about the domain, and let that understanding reshape the model if necessary. On teams where user stories are treated purely as work orders, DVD gets added, the ticket is closed, and the model silently drifts further from reality. On teams where user stories are treated as domain conversations, the arrival of DVD prompts the question that leads to LendableItem — and the system becomes more correct, not just more complete.

A note on SOLID

This article has used two principles from SOLID without naming them. It is worth naming them now — not to add jargon, but because these principles are widely known and almost as widely misunderstood, and the library example shows exactly what they were designed for.

SOLID is a tool for domain modelling. Applied to technical layers — controllers, services, repositories, packages — it is the wrong tool for the job. Not because it produces nothing useful there, but because it is answering questions that belong to a different space. Asking whether your BookService violates the Single Responsibility Principle is like applying flight-route optimisation to a city street map. You will get answers. They will be coherent. They will just not be answers to the right question. The right question is always about the domain.

When SOLID is applied only at the technical layer, the domain model is typically left untouched — a set of anemic objects with no real behaviour — while all the interesting decisions accumulate in a service class that nobody can coherently describe the responsibility of. The system is, in a narrow sense, well-structured. It models nothing.

The uncomfortable truth this produces is worth stating plainly: you can apply SOLID perfectly and still end up with a system that does not model the business. The principles do not tell you what to model. They evaluate whether what you have modelled makes sense. If what you have modelled is technical structure rather than domain concepts, SOLID will faithfully validate that structure — and the domain will remain a mess.

Applied to the domain, the principles are genuinely illuminating.

Single Responsibility Principle is what drove the Book → Book + Loan split. The question it asks is not "does this class do too many technical things?" It asks: does this concept carry responsibility that belongs to a different concept? A book is not responsible for knowing when it was borrowed. That is the responsibility of the loan event. One domain question, one correct answer, one new entity. Applied at the domain level, SRP produces clean, stable concepts with clear boundaries. Applied only at the technical level, it tends to produce BookHelper, BookManager, and BookUtil — classes that exist to split code rather than to model anything.

Open/Closed Principle is what drove the Book + DVD → LendableItem + ItemType move. The principle says a model should be open for extension but closed for modification. In domain terms: when new kinds of things appear, the model should absorb them without requiring existing concepts to change. A DVD entity requires a code change and a deployment every time a new item type is introduced. LendableItem with ItemType instances defined in data requires neither — the model is extended through configuration. The domain is open for new item types and closed against needing to touch LendableItem to accommodate them.

The remaining principles have domain equivalents too. But the point here is not to survey all five — it is to show that SOLID belongs in the domain conversation. Bringing it into the technical conversation is not a sequencing problem — it is a category problem. The principles ask domain questions. Technical layers are not a domain. The questions do not apply. It's like applying makeup to a horse. It works but the results have no benefit.

The model should always reflect the best current understanding of the domain. When that understanding changes, the model changes with it. Not later. Now.

Software Engineering Is Living The Golden Hammer Antipattern — And Everyone Loves It

Leon Pennings — Tue, 14 Apr 2026 05:25:25 +0000

Why the industry simultaneously agrees with Brooks and ignores him — and why it's structured to stay that way

The Paradox Nobody Talks About

Ask any experienced software engineer about essential versus accidental complexity. They will nod. Ask them about Brooks' central argument in No Silver Bullet — that the hard part of software is the conceptual work of understanding the problem, not the mechanical work of expressing it in code. They will nod again.

Then watch what happens when the next project starts.

Someone opens Spring Initializr. Someone proposes microservices. Someone puts Kubernetes in the architecture diagram before a single domain concept has been named. The technology stack is decided in the first week. The business domain is still being understood in month six.

Nobody in that room forgot Brooks. The choice was never really about Brooks.

That is the paradox this essay is about. Not that the industry is ignorant of the problem — but that it is structured to reproduce it perfectly, indefinitely, at enormous and invisible cost.

What Brooks Actually Said

In 1975, Frederick Brooks published The Mythical Man-Month, based on his experience managing the development of OS/360 at IBM. The project was late, over budget, and initially didn't work particularly well. Brooks spent the rest of his career trying to understand why.

The insight most people remember is the coordination problem. Adding people to a late software project makes it later. Nine women cannot make a baby in one month. Communication overhead scales quadratically. You cannot parallelise work that is fundamentally interdependent. Everyone knows this. It shows up in every post-mortem, every engineering blog, every conference talk about why the rewrite took three years instead of six months.

What people remember less clearly is the deeper argument Brooks made in his 1986 essay No Silver Bullet, later added to the anniversary edition of the book.

Brooks drew a distinction between two kinds of complexity in software. Essential complexity is inherent to the problem itself — the rules, the relationships, the invariants, the genuine difficulty of the business domain being modelled. Accidental complexity is everything else — the tools, the frameworks, the infrastructure, the deployment machinery, the coordination overhead introduced by the way we choose to build systems.

His claim was precise and devastating: there is no silver bullet because the hard part of software is essential complexity, and no tool or methodology can compress it. You cannot automate your way out of needing to understand the problem. You cannot framework your way past the conceptual work.

Then he said something that was either ignored or misunderstood: the industry's persistent belief that the next tool, the next methodology, the next architectural pattern will finally solve the problem of software difficulty is itself the symptom of failing to make this distinction.

That was 1986. Since then the industry has produced structured programming, object orientation, UML, SOA, agile, microservices, event-driven architecture, CQRS, cloud-native development, and AI-assisted coding.

Each one arrived as a silver bullet. Each one was greeted with the same enthusiasm. Each one was applied before the domain was understood.

Brooks' own framework predicted every step of it

The Golden Hammer The Industry Forgot To Question

There is a well-known antipattern in software called the golden hammer. It describes the tendency to over-apply a familiar tool regardless of whether it fits the problem. Named after Maslow's observation that if all you have is a hammer, everything looks like a nail.

The modern software industry does not have one golden hammer. It has a coordinated set of them — and they are chosen as a bundle, before the problem is understood, in almost every project that starts today.

The bundle looks like this: a popular framework for the application layer, microservices for decomposition, an event-driven or REST-based communication model, a cloud platform for deployment, and Kubernetes for orchestration. The specific tools vary by organisation and year. The pattern does not vary.

What makes this particular golden hammer different from the textbook antipattern is a crucial property: it is unfalsifiable.

A normal golden hammer eventually gets retired. Something demonstrates it was the wrong tool — the screw still won't turn, the nail bent, the joint failed. There is a moment of visible failure that creates pressure to reconsider.

The modern software stack has no such moment. If the system runs in production, the stack gets the credit. If the system struggles — if changes are expensive, if the team grows endlessly, if understanding the codebase requires months of archaeology — the blame goes to requirements changing, team turnover, business complexity, or simply the nature of software. The stack is never in the dock.

This is not an accident. It is a structural property of how software success is defined. A system running in production passes the only test anyone applies. There is no test for whether it could have been built at a fraction of the cost with a fraction of the complexity. Nobody built that version. Nobody ever does.

The golden hammer persists not because people are lazy or ignorant — but because the thing that should replace it is invisible to every organisational instrument the industry has built.

Agile Was The Correction. Then It Was Captured.

In 2001, the Agile Manifesto proposed something that was, underneath its somewhat vague language, a precise epistemological claim.

Software development is fundamentally a process of learning. You do not fully understand the domain at the start. You build a version of your understanding, expose it to reality — specifically to the domain experts who live in that business every day — and you refine it. Each iteration is not primarily a delivery mechanism. It is a question: did we understand the domain correctly?

The working software at the end of a sprint is not the point. It is the test. The test of whether your conceptual model of the business — your understanding of what the domain actually is, what rules govern it, what concepts belong together — corresponds to reality. Domain experts are not approving features. They are stress-testing your model.

That is what Agile was. A mechanism for continuously refining essential understanding through structured contact with reality.

That is not what Agile became.

What Agile became was a process for efficiently transcribing user stories into framework components. Two-week sprints. Velocity points. Definition of done. Backlog refinement. The ceremonies survived. The epistemology was quietly discarded.

And then CI/CD completed the transformation.

Continuous integration and continuous deployment are genuinely valuable practices for managing the operational complexity of releasing software. But they introduced a subtle and devastating redefinition of what "production ready" means.

Before, production readiness was at least nominally connected to domain correctness — does this system correctly implement the business? After, production readiness means the pipeline is green. Tests pass. Build succeeds. Deploy proceeds.

These are not the same question. A passing test suite validates that the code does what the code was written to do. It says nothing about whether the code was written to do the right thing. Whether the domain concepts are correctly identified. Whether the invariants are correctly enforced. Whether the model reflects the business reality or merely the user story that described one interaction with it.

You can have one hundred percent test coverage and zero domain correctness. The pipeline will be green. The system will go to production. The retrospective will be positive.

The feedback loop Agile promised — between domain experts and the conceptual model being built — was replaced by a feedback loop between the code and its own tests. We optimised the loop while removing the thing it was supposed to validate.

The Sociological Lock-In

So far this looks like an intellectual failure. Engineers and organisations that know better making choices they shouldn't. A problem of discipline or culture that better education might eventually correct.

It is not. It is structural. And the structure actively selects against correction.

Consider how a software project begins. Before a single domain conversation happens, several things must occur. The project must be staffed. That requires a job posting. A job posting requires a technology stack. The project must be estimated. Estimation requires a known architecture. The kickoff deck must be prepared. The kickoff deck needs something in the architecture diagram.

All of these organisational necessities demand a technology decision at the precise moment when the only intellectually honest answer is: we don't know yet. We haven't understood the domain.

That answer is organisationally impossible to give. So the stack gets chosen. Not out of ignorance. Not out of laziness. Out of genuine organisational necessity. The machinery of project initiation requires it.

And once the stack is chosen, it shapes everything that follows. The hiring criteria. The team composition. The onboarding process. The architecture decisions. The decomposition strategy. The system that emerges is not primarily a model of the business domain. It is primarily an expression of the technology choices made before the domain was understood.

This is not the worst part.

The worst part is what happens at the hiring stage.

Conceptual thinking — the ability to reason about what a business concept actually is, what it should own, what it should never be responsible for, where the real boundaries lie — is extremely difficult to assess in an interview. It requires time, domain context, and a level of conversation that most hiring processes cannot accommodate. It does not show up cleanly on a CV.

Tool fluency shows up immediately. Spring Boot, Kubernetes, Kafka, event-driven architecture — these are expressible, searchable, assessable. You can screen for them in thirty seconds. You can test them in a one-hour technical interview. You can verify them with a take-home assignment.

So organisations hire for tool fluency. Not because they don't value conceptual thinking. Because tool fluency is what their hiring process can see.

The consequence is a team that reaches for the familiar tools. The team ships systems using those tools. Those systems run in production. The hiring criteria get validated. The loop closes.

Engineers who push back on premature technology decisions get filtered out at the CV screen, outvoted in the kickoff meeting, or labelled as impractical idealists who don't understand how real projects work. The selection pressure is quiet, consistent, and almost entirely invisible.

When everyone hired thinks the same way, the golden hammer stops looking like a hammer. It looks like engineering.

The Cost Nobody Can See

Here is the claim that cannot be proven and cannot be dismissed.

A system built with a full modern distributed stack — framework, microservices, cloud infrastructure, orchestration — could in many cases have been built far more simply, maintained by a fraction of the team, and been more correct, more stable, and more responsive to business change.

That statement cannot be verified. Because the simpler version was never built. Nobody built it. The team that chose the distributed architecture never built the alternative to compare against. The organisation that approved the budget never saw a competing proposal. The engineers who maintained the system never worked on a well-modelled equivalent.

This is not a gap in the data. It is the mechanism of the problem.

Brooks identified it precisely: most systems are built only once. There is no second system built with different assumptions, run for five years, and compared on total cost of ownership, ease of change, and conceptual correctness. The counterfactual does not exist. Therefore the cost of the wrong choice is permanently invisible.

And here is what makes it truly unfalsifiable: the entire industry is paying the same inflated price. There is no reference point. When every team uses the same stack, incurs the same coordination overhead, grows to the same size, and struggles with the same maintenance costs — those costs stop being visible as costs. They become the definition of what software costs. Normal and wasteful become indistinguishable.

But the difference is not just in cost. It is in what the work actually consists of every single day.

In a team organised around accidental complexity, the daily work is about the technology. Configuring services. Connecting components. Managing framework upgrades. Fixing pipeline failures. Debugging integration issues. Updating dependencies. Understanding the codebase means knowing which service owns which endpoint and how the data flows between them. The business domain is somewhere in there, translated into controllers and DTOs and event schemas, but it is not what the day is about.

In a team organised around essential complexity, the daily work is about the domain. Which concept owns this responsibility. What this rule actually means. What the domain expert said yesterday that changed how they understand the model. The implementation follows from that understanding — and because the model is clear, the implementation is the smaller part of the day, not the larger.

The difference is visible — immediately and without any instrumentation — in the daily standup.

In one team, the language is technical. Spring, Kafka, the pipeline, the service, the endpoint, the migration. Progress is reported in terms of tickets and story completion. The word "business" appears occasionally, usually in the phrase "business requirement."

In the other team, the language is conceptual. The Order, the Invoice, the Payment, what a Shipment is responsible for, whether a Client and a User are really the same thing. Technology appears occasionally, usually briefly, because the implementation of a well-understood concept is rarely the hard part.

You do not need metrics or cost analyses to know which team is working on the right problems. You need one standup.

If every item on the standup is about accidental complexity — go back. Ask what the essential complexity actually demands. Then and only then choose the technology that serves it.

If every garage in the world were built to the standard of a luxury hotel, nobody would know a garage could cost less. The price would simply be what it is. The inflated standard would be the only standard anyone had ever seen.

That is where the software industry is today. Paying Burj Al Arab prices for a garage that needed to store a jar of paint. And maintaining a universal, genuine, unforced consensus that this is simply what garages cost.

Two Rules That Cost Nothing

Most prescriptions for this problem are expensive. Hire differently. Retrain your engineers. Adopt a new methodology. Bring in consultants. Run workshops.

These are not wrong. But they require budget, time, and organisational will that most teams do not have in the moment a project starts.

There are two rules that cost nothing, require no external help, and can be applied starting tomorrow.

Do not choose technology upfront.

Technology enters the project when the domain demands it, not when the kickoff deck needs an architecture diagram. The first weeks of a project produce domain understanding — what the business actually is, what concepts exist in it, what rules govern them. Technology choices follow from that understanding, added only when essential complexity makes them necessary, and only to the degree that it does.

This feels impossible in most organisations. The job posting needs a stack. The estimate needs an architecture. The kickoff slide needs something in the boxes.

Those are real constraints. They are also exactly the organisational machinery that inverts Brooks before the first line of code is written. Recognising that the machinery is the problem is the first step toward not letting it make the decision by default.

Mandate that standups should be about business concepts only. Never technology.

This is the litmus test made into a practice. If someone says "I'm working on the Kafka consumer," the immediate question is: what business concept does that serve, and does that business concept actually require it? If the answer is unclear, the technology choice is premature. If the answer is clear, state the business concept first and let the technology be the footnote it should be.

A standup where every item is about services, frameworks, pipelines, and endpoints is a standup where the team has been captured by accidental complexity. It will feel entirely normal. It will sound like engineering. The terminology will be confident and precise.

But the business domain — the essential complexity that justifies the system's existence — will be invisible. And a team that cannot talk about the business in its daily standup is a team that is not working on the business. It is working on the technology that was supposed to serve it.

These two rules do not solve the problem entirely. The sociological pressures remain. The hiring pipelines remain. The organisational machinery remains. But they create two moments — one at the start of a project, one every single day — where the inversion becomes visible. Where someone can point at the standup and say: we have not mentioned a business concept in three days. What are we actually building?

That question, asked consistently, is more powerful than any methodology.

Closing

The most expensive software is the software everyone agrees is fine.

It runs in production. The pipeline is green. The team is stable. The architecture is recognisable. The job postings write themselves. The onboarding takes three months instead of three days, but that is just how software works. The changes take longer than they should, but the domain is complex. The team keeps growing, but the system keeps growing too. The costs keep rising, but software is expensive.

None of this is inevitable. All of it is a consequence of a single inversion: accidental complexity chosen before essential complexity is understood. A choice made not out of ignorance, but out of organisational necessity, sociological pressure, and the permanent invisibility of the alternative.

Brooks saw it in 1975. Named it clearly. Watched the industry quote him extensively and change nothing.

The golden hammer is not a mistake. It is the product. The template is not a shortcut. It is the destination. The assembly is not the means. It has become the craft.

Two rules. No technology upfront. Standups about the business only.

They will feel radical. They are just Brooks, applied.

Everyone agrees with Brooks.

Then the next project starts.

DEV Community: Leon Pennings

Scrum Works — But Only When the People Making Decisions Feel the Outcomes

Where Scrum Actually Came From

The Pig and the Chicken

What Happened to Scrum

The Definition of Done Is a Symptom

Stories Are Not Tickets

The Sprint Is a Learning Unit

Limiting the Chickens

Back to Honda

The Gods That Ate the Engineers

How software development mistook its tools for its craft — and what it is paying for that mistake

The Measurement Problem Nobody Talks About

The Rise of the Demigods

When Tools Become Identity

The Cost of Working Software

What First Principles Actually Means

The Speed That Nobody Measures

The Conversation We Can No Longer Have

The Only Test That Matters

The Properties of Enterprise Software That Lasts

Introduction

The Six Properties

1. Longevity

2. Upgradeability

3. Maintainability

4. Extensibility

5. Readability

6. Organisation

The Toyota Parallel

Domain Feedback Is Always a Learning Opportunity

Discovery-Driven Implementation

The Foundation Beneath the Properties

The User Story Is Not a Work Order

What the Implementation Should Not Be

The Right Level of Test Coverage

The Training Wheels Problem

When the Process Becomes the Bug

The Bug Economics

The Kaizen Parallel

The Real Job

What Is a Rich Domain Model?

Essential Complexity, Made Explicit

A Tool for Learning the Domain

Canonical Truth for the Business Domain

What Belongs in the Domain Model

The Interaction: A Worked Example

The Order: Lifecycle as Domain Responsibility

There Is No Such Thing as Orchestration

The Architecture That Emerges

The Principle Underneath

Sidebar: On AI-Assisted Development

Sidebar: On Practical Effects

Further Reading

The Architecture Tax — Why Enterprise Software Is Expensive, and Why AI Won't Fix It

The story the industry tells

The context problem

Object orientation was supposed to solve this

What Java enterprise actually practises

How a procedural system rots

The rich domain model as structural answer

Design pressure as a feature

The business changed. As it always does.

The business changed again.

The distributed workaround

The refactorability that distribution destroys

When distribution is genuinely warranted

The modelling capability problem

But AI will fix this

The invisible price tag

What the rich domain model actually gives enterprise software

First principles

Engineering a UI for a Java Backend: Maintainability, Longevity, and Why the Answer Might Surprise You

Part 1: Where the Client Lives

Part 2: The Requirements

1. Frontend-Requirement-Down Design

2. The Browser is the Client's Home

3. Compile-Time Validation over Runtime Discovery

4. Minimal Boilerplate per Feature

5. 100% Ownership of Components — No Escape Hatches