Leon Pennings

Posted on May 21 • Originally published at blog.leonpennings.com

The Properties of Enterprise Software That Lasts

#softwareengineering #devops #java #architecture

"Perfection is achieved not when there is nothing more to add, but when there is nothing more to remove." — Antoine de Saint-Exupéry

Introduction

Enterprise software is different from other software. Not in the technologies used to build it, not in the frameworks, not in the methodologies. It is different in its purpose: it must work correctly today, remain correct over time, survive the people who built it, and adapt to a business domain that will change in ways nobody can fully predict. Most software is built to solve today's problem. Enterprise software must be built to outlast today's understanding.

That is a fundamentally different design goal. And it demands a fundamentally different way of thinking about software — about what matters, what doesn't, and what the job of a developer actually is.

The code is downstream of the thinking. The properties which determine whether enterprise software survives — or quietly becomes the system nobody dares touch — are not primarily technical. They are properties of understanding. And the thinking starts long before the first line is written.

The Six Properties

1. Longevity

The core of enterprise software should, in retrospect, survive ten to fifteen years. Not the UI framework. Not the ORM. Not the messaging library. The core — the domain logic, the structural decisions, the way the system understands and represents the business.

This sounds obvious until you consider how rarely it is treated as a design constraint. Most development decisions are made under short-term pressure: the sprint deadline, the current team's preferences, the framework that is fashionable today. None of those inputs have any relationship to what the system will need to be in year eight.

Longevity is not achieved by predicting the future. It is achieved by not over-committing to the present. Every unnecessary dependency, every piece of logic tied to a specific framework's idiom, every abstraction built around today's tooling rather than today's domain — these are bets that the present will continue. In enterprise software, the present never continues long enough.

Longevity is the north star. The properties that follow are the means to achieve it.

2. Upgradeability

Upgradeability is not about keeping dependencies current. Keeping dependencies current is maintenance. Upgradeability is structural: it is the capacity of the system to accept functional change without requiring a rewrite of its core.

This distinction matters enormously. A system can have perfectly up-to-date dependencies and be completely unupgradeable — because its structure was built around the features known at the time, implemented in a way that assumes those features are the final shape of the domain. When the business changes, and it will, there is nowhere to go.

Building for upgradeability means building with the understanding that what you know today is not everything. It does not mean building features you don't need — that is the opposite of the principle. It means implementing what you know today in a way that does not foreclose tomorrow. The structure should be open to extension, refactoring, and replacement at the right level of granularity.

This is also where the conventional wisdom about test coverage becomes a liability. Class-level unit tests — one test class per production class, testing the internal mechanics of each — are a contract on the current implementation. They make refactoring expensive by breaking whenever the internals change, even when the behavior is preserved. Over time, they become the reason the system cannot be restructured: the test suite has calcified the implementation.

Behavioral tests — tests that assert what a piece of functionality does, not how a particular class does it — are a contract on the domain. They survive refactoring because refactoring does not change behavior, only implementation. Upgradeability requires the right level of test coupling. Tests should be coupled to what the system does, not to how it currently does it.

3. Maintainability

Maintainability in long-lived software is primarily a question of dependency discipline. Every external dependency is a commitment: to a version, to an API contract, to a community that may or may not continue to support it. Over fifteen years, many of those commitments will become liabilities.

The critical discipline is asking, for every dependency: what does this actually buy us? Not in theory — in practice, in this specific system, for this specific use case. The question is not whether a dependency is good in the abstract — a battle-tested cryptography library, a well-maintained time handling library, a parser for a complex format — these earn their place because the alternative is genuinely worse. The question is whether this dependency serves this production system's domain needs, or whether it serves the tooling, the framework preference, or the developer's convenience.

The dependency that should be rejected without hesitation is the one whose primary justification is testability of the production code. Testability is a testing concern, not a production concern. Production code should not be structured, abstracted, or made more complex to accommodate the needs of the test suite.

This manifests in two particularly damaging patterns. The first is mocking-driven architecture: interfaces created not because the domain has multiple implementations of a concept, but because the test framework needs a seam to inject a mock. An interface with one real implementation, existing purely to enable a unit test, adds a layer of indirection with no domain justification. Every future reader follows the code, hits the interface, and must go find the implementation. The test was marginally easier to write. Every reader pays for that convenience forever.

The second is Aspect-Oriented Programming applied to cross-cutting concerns. The promise was clean separation — keep business logic free of logging, transactions, security, caching. In practice, the result is code where you cannot tell what is executing by reading it. The aspects are invisible in the source. Behavior is woven in at runtime by configuration that must be hunted for separately. You need a debugger to understand what your own code does. That is not decoupling. It is hidden coupling, which is strictly worse than visible coupling because at least visible coupling can be read.

Both patterns share the same failure: a tooling concern reshaped the production code in ways that made it harder to understand. The test suite or the framework became easier to work with. The system became harder to reason about. That is the wrong trade, and it compounds over fifteen years in ways that eventually make the system unreformable.

The simpler path is to make the production code so clear in its intent that the need for complex testing infrastructure is reduced rather than accommodated. Nobody tests string.trim() — not because someone decided it was below the testing threshold, but because its intent and behavior are completely transparent. The ambition for domain logic should be the same. order.send() can be just as obvious if the implementation reads like a statement of business intent rather than a sequence of technical operations.

4. Extensibility

Extensibility requires locatability. Before you can extend a piece of functionality, you must be able to find it — and find it with confidence that you have found all of it, not just the most obvious part.

This is where fat services fail. When business logic accumulates in large service classes organised around user stories or features, the domain structure disappears. Logic that belongs together by domain reason is separated. Logic that is separate by domain reason collides in the same class. Over time, the service becomes an archaeological record of every feature request, in chronological order, and understanding what it does requires reading its entire history.

Extensibility is only achievable when the code is structured around the domain — around what the business actually is, not around how it was requested. When that structure exists, adding a new capability means finding the right place in a coherent map. When it does not exist, extending the system means navigating a maze and hoping you found everything relevant.

5. Readability

Readability is not a soft property. It is not aesthetic. It has direct economic consequences over a fifteen-year lifespan that compound in ways that eventually make a system unreformable.

The measure of readability in enterprise software is not whether an experienced developer finds the code elegant. It is whether the intent and structure are followable to a non-engineer — a domain expert, a compliance officer, a business analyst — who can read the code and recognise their domain in it. This does not mean every line reads as plain prose. Some domains have irreducible technical density: complex financial calculations, regulatory rule engines, actuarial models. The bar is not that the implementation is self-explanatory to someone without domain expertise. The bar is that the structure expresses the domain, that the intent is visible, and that the domain expert can follow the logic well enough to identify where their understanding is or is not correctly represented.

If the code reads like hocus pocus at the structural level to the person who understands the business, the code has failed at its most important communication task.

This standard has consequences for every micro-decision in implementation. It argues against stream operations where a for-loop is clearer to a broader audience — not because streams are wrong, but because in domains where large in-memory sets are never permitted by design, the performance justification evaporates and only the readability cost remains. It argues against boilerplate reduction that sacrifices expressiveness for terseness. It argues against every clever idiom that shortens the code for its author while lengthening the cognitive load for its future readers.

"Boilerplate" is only boilerplate if it has no business purpose. Code that is verbose because it is expressing a business process is not boilerplate — it is documentation, in the only place documentation is always current. The argument to reduce it is always an argument to optimise for the writer. In enterprise software, the reader is nearly always more important. The code will be read an order of magnitude more times than it is written, by people who were not present when it was created.

On large data sets specifically: the correct architectural response is not to optimise how they are processed in memory — it is to enforce a boundary that prevents unbounded datasets from reaching the application layer at all. Chunk the data before it is loaded. This is an architectural constraint, not a performance trick. By making large in-memory sets structurally impossible, the design eliminates the entire class of optimisation pressure they create. The complexity of cursor management and pagination lives at the data access boundary, where it belongs, not scattered as stream operations through business logic. The upstream constraint produces downstream simplicity.

Readability is the condition that makes the other properties achievable. Code that reads like the domain can be upgraded because the domain is visible in it. Code that expresses intent clearly can be maintained because its purpose is self-evident. Code that maps the domain accurately can be extended because the map can be followed. It is not one property among five — it is the keystone.

6. Organisation

Organisation is qualitatively different from the first five properties. Those are visible in the codebase — you can read them, measure them, argue about them in a code review. Organisation is visible in what the codebase was allowed to become. It is the soil in which the other properties grow or fail to grow. Making it an explicit pillar says: this cannot be managed by ignoring it.

The question every development team eventually confronts is whether the organisation is supportive or restrictive. The honest answer is that it is almost always intended to be supportive and frequently experienced as restrictive — and the gap between those two is where a significant amount of enterprise software complexity originates.

The most common form this takes is architectural mandate without domain justification. Platform teams, rightly responsible for consistency and infrastructure standards, apply patterns designed for large distributed systems universally — including to applications that are, by domain definition, a single coherent thing. Microservices architectures get mandated for systems with no independent scaling requirements, no team boundary that would justify a service boundary, no domain reason for a network boundary to exist. The result is artificial complexity: deployment pipelines for services with no independent reason to exist, network calls where function calls would suffice, operational overhead that consumes development capacity without adding production value.

The architecture was not wrong for all systems. It was wrong for this system, for this domain, at this scale. But the mandate did not ask about the domain. It asked about organisational standards. And the production system pays the difference on every deployment, every change, every new hire who must learn the infrastructure before they can touch the domain.

This is organisational complexity billed to the production system. It feels like support. From the production system's perspective it is an undiscussed tax with no domain justification.

The Toyota Parallel

Toyota solved this problem in manufacturing and the solution translates directly to software development. The Toyota Way rests on two pillars: continuous improvement, and respect for people. Both are violated by the organisational patterns that produce restrictive environments.

Respect for people, in the Toyota sense, is not about workplace culture. It is an epistemological principle: the people closest to the work hold the most valuable knowledge about the work. On the production floor, the assembly worker who notices something wrong knows something the engineer in the office does not. Toyota's andon cord exists to make that knowledge immediately actionable — any worker can stop the line when they identify a defect, because the cost of a defect that travels further down the line is exponentially higher than the cost of stopping to fix it now.

In software development the people closest to the work are the developers and the domain experts. The domain expert who says "this doesn't reflect how we actually work" is pulling the andon cord. The developer who identifies a structural problem in the architecture is pulling the andon cord. Organisations that route those signals through layers of translation — product owners, project managers, UX designers, platform architects — are not being more rigorous. They are covering the cord in bureaucratic insulation and walking past it.

The second Toyota concept worth applying directly is genchi genbutsu — go and see for yourself. Do not manage from reports. Do not accept translated summaries. Go to where the work happens and observe it directly. For software this means the developer sitting with the domain expert, watching them work, seeing where the system creates friction, understanding the domain from its source rather than from a requirements document that passed through three people before it arrived. Every layer of translation between the domain expert and the developer is a layer where meaning is lost and assumption is substituted.

The third is jidoka — quality built in, not inspected in after the fact. You cannot UX-design your way to a correct domain model. You cannot test your way to a correct domain model. The correctness must be present from the beginning, in the understanding that shaped the implementation. When domain feedback arrives late — filtered through contact persons who are not the domain authorities, interpreted as a UX problem rather than a domain problem — the system has already been built around an incomplete model. Correcting it at that point is expensive. The organisational structure that produced the late feedback is the root cause, not the feedback itself.

Domain Feedback Is Always a Learning Opportunity

When domain experts say a system is too complex or doesn't make sense to them, the instinct in process-first organisations is to call a UX designer. This is solving the wrong problem at the wrong layer. UX is interface orientation — it makes existing concepts easier to navigate. It cannot fix a missing concept. If the domain model is incomplete, no amount of interface polish makes it clearer. You cannot design your way around a hole in the domain.

"Too complex" from a domain expert almost always means one of two things: a concept that exists in their mental model is absent from the system, or the system is telling a story the domain expert doesn't recognise as their own. Both are domain problems. The correct response is a domain conversation, not a design review.

This reframes what domain feedback actually is. It is not obstruction. It is not a sign that the users don't understand the system. It is the most valuable signal available — an authoritative source reporting that the model is incomplete. Organisations that treat it as a learning opportunity produce better software. Organisations that treat it as a user adoption problem produce expensive workarounds for incorrect models.

Discovery-Driven Implementation

The organisational conditions described above — domain experts who can reach the development team, feedback treated as learning, developers trusted to inquire beyond the story — enable something that process-constrained environments make nearly impossible: discovery-driven implementation.

Most software development is story-driven. The solution space is bounded by what was requested. The developer's job is to implement the described behaviour correctly and completely. This produces correct implementations of incomplete specifications, reliably and at scale.

Discovery-driven implementation starts from the same user story but treats it as a symptom description rather than a solution specification. The developer who asks enough questions about the domain — who wants to understand not just what was asked but why, what problem it actually solves, what the current process costs, where it fails — occasionally discovers that the problem as described is not the real problem. The real problem is upstream. And the solution to the real problem makes the described problem structurally impossible rather than better managed.

This kind of insight cannot be mandated. It cannot be specified in advance. It cannot be written as a test before it exists. It emerges from genuine engagement with the domain, from the developer who treats the user story as a starting point rather than a work order, from the organisation that protects the space for that inquiry rather than constraining every hour to story execution.

The deepest return on domain understanding is not better implementation of what was asked. It is the occasional recognition that the problem as described is a symptom — and that the real solution makes the symptom structurally impossible. That insight cannot be mandated, cannot be specified, cannot be tested before it exists. It emerges from genuine engagement with the domain, and it is available only to the developer who treated the user story as a starting point rather than a work order. Organisations that protect that space — that trust developers to inquire, to discover, to propose solutions nobody asked for because nobody knew to ask — produce software that solves real problems. Organisations that constrain that space to story execution produce software that manages symptoms, expensively, forever.

The Foundation Beneath the Properties

Every property described above is downstream of something that is not a technical practice at all. It is understanding.

You cannot write readable code about something you do not understand. You cannot structure something well that you have not thought through. You cannot know what to leave out — which is often more important than knowing what to put in — unless you understand the domain well enough to recognise what is essential and what is incidental.

The User Story Is Not a Work Order

A user story is a starting point for a conversation, not a specification for implementation. The moment a developer treats it as a work order — something to be implemented against acceptance criteria, tested to green, and closed — they have accepted someone else's translation of the domain as complete and correct. That translation is almost never complete, and sometimes critically incorrect.

The developer's job before the first line of code is to understand the business goal behind the story. Not the described behaviour — the goal. This requires asking questions. Not to clarify ambiguous requirements, but to understand the domain itself. What is this actually trying to achieve? What are the edge cases the domain expert considers obvious? What should this system never do, and why?

Consider a user story about calculating UBO — Ultimate Beneficial Ownership. A developer implementing against the story might write: find all natural persons with ownership percentage above the threshold. That is what the acceptance criteria describe. The tests pass. The implementation is wrong.

A correct understanding of UBO reveals that it is not about direct ownership percentage in isolation. It is about effective control — who ultimately determines the decisions of the entity, regardless of how the ownership structure is arranged. The question is not just who is the UBO. It is who else is the UBO. And it is who also has control. If there is no "also" — there is just one.

That small shift in framing immediately surfaces a class of scenarios that the acceptance-criteria reading misses entirely. Consider natural person 1 who holds 4% in company A and 4% in company B. Company A holds 96% in company B. Company B holds 96% in company A. By direct ownership percentage, natural person 1 appears below the UBO threshold. By effective control, natural person 1 is 100% the UBO of both companies — because the circular cross-ownership means neither company has any independent shareholder beyond this person.

No test-first methodology surfaces this. No refactoring produces it. Domain understanding produces it, in the conversation before a line of code is written, because a developer who understands what UBO law is actually designed to do recognises this scenario not as an edge case but as a textbook example of what the law was written to catch.

What the Implementation Should Not Be

Domain understanding does not only tell you what to build. It tells you what not to build — and that is often more valuable.

When you understand that UBO is about effective control through any structure, you immediately know the implementation should not be a threshold check on direct ownership percentages. That single "should not" eliminates the naive implementation before it is written. It eliminates an entire class of wrong solutions without a single line of code.

This is the discipline of subtraction. Every constraint that comes from genuine domain understanding is a constraint that prevents future complexity. What is not there cannot introduce a bug. What is not there requires no maintenance. What is not there cannot become the thing nobody dares touch because nobody understands why it exists.

The simplest correct solution is also the most durable one. Not because simplicity is aesthetically preferable, but because complexity compounds. Every unnecessary abstraction, every dependency added for theoretical future benefit, every pattern introduced for a problem the system does not have — each one is a tax on every future change, every new hire, every upgrade cycle. Over fifteen years those taxes become the reason a system becomes unreformable.

The Right Level of Test Coverage

Honest test coverage in enterprise software is not a percentage target. It is a risk assessment.

The question is never "what percentage of lines are covered?" It is: "where are the places this system could be silently wrong, and how quickly would we know?" Tests earn their place where the real-world feedback loop is too slow, too infrequent, or too opaque to catch failures naturally.

A login page that breaks gets reported within minutes — high-frequency paths like these are well covered by integration, smoke, and end-to-end tests that run as part of any competent CI pipeline. Deep unit testing of those flows is redundant effort. A UBO calculation might run once a day for a small compliance team. It could be wrong for weeks before anyone notices. The domain is complex enough that failures are non-obvious. That is precisely where a behavioral test earns its place: not as a development guiderail, but as a specification of correctness for something that does not announce when it is wrong.

In practice, this produces test coverage in the range of 30 to 50 percent — not because the rest of the code is untested, but because the rest of the code is covered by higher-level tests and validated continuously by the people using it. The 30 to 50 percent that is explicitly tested at the unit or behavioral level is the core domain logic: the calculations, the rule evaluations, the business-critical paths where silent failure is a real and consequential risk.

This is a more defensible position than 90 percent coverage that includes getters, setters, login flows, and string formatting. Coverage as a metric measures lines executed, not correctness guaranteed. Behavioral tests on the domain core, combined with integration tests on the main flows and a system simple enough that its failures are visible, produces better assurance than a heavily instrumented suite that tests implementation details nobody will care about in year seven.

The Training Wheels Problem

There is a pattern in software development where tests function not as a quality mechanism but as a substitute for understanding. If the developer does not fully understand what they are building, green tests provide a guiderail: as long as the tests pass, the implementation is probably acceptable.

Training wheels do not teach balance. They teach riding without balance — a different skill entirely. A developer conditioned by green tests as their primary signal learns to satisfy the tests. A developer who understands the domain learns what the business actually needs. Those are not the same education, and in complex domains they produce starkly different results.

The test suite becomes a confidence mechanism decoupled from correctness. The tests reflect the developer's mental model of the domain. If that mental model is incomplete — and without domain inquiry it almost certainly is — the tests are an incomplete specification, confidently asserted as complete. This is worse than no tests. It is false assurance.

The cure is not better tests. It is understanding deep enough that the test's contribution becomes marginal. If the code expresses the domain correctly and reads plainly enough for a domain expert to validate its structure, the test suite's role as documentation and safety net diminishes considerably. A tester who says his functional tests serve as documentation of the application is making an admission: the production code has failed at its most important job. Documentation belongs in the place where it is always current — in code that reads like the domain it represents.

When the Process Becomes the Bug

There is a question worth asking of every engineering practice, every tool, every ceremony: is this the best choice for the production system, or is it the best choice for the process, the tooling, or trend compliance?

The production system is the artifact that matters. Everything else — the sprint board, the Jira backlog, the test suite, the deployment pipeline, the architecture decision records — is support infrastructure. It exists to serve the production system. The moment any of it starts making decisions for the production system, the hierarchy has inverted. And it inverts constantly, quietly, and with complete institutional legitimacy.

Nobody says "we are going to let Jira determine our engineering decisions." But when a five-minute bug fix gets put on the backlog because the process requires it, Jira just made an engineering decision. When a developer adds an abstraction layer to satisfy a test framework rather than to express the domain, the test suite just shaped the production system. When a simple piece of logic gets restructured to comply with a framework convention that has no business relevance, trend compliance just overrode domain clarity.

Process thinking asks: are we following the process correctly? Production thinking asks: what is the best outcome for the system?

When they conflict, the answer should be immediate and unambiguous: the production system wins. The process is a tool. Tools do not have votes.

The Bug Economics

Consider the real cost of a simple bug — a button that doesn't work, an enum stored as an integer instead of a string — when it travels through a process-first system versus a production-first one.

In a production-first system with simple, readable code and a CI pipeline that allows release at any time: the bug is reported, understood, fixed, and released the same day. Total engineering time: five minutes to fix, minutes to release. The user experiences a brief interruption and a same-day resolution.

In a process-first system the same bug looks like this:

Reported and logged: 10 minutes of administration
Discussed in standup or triage: 20 minutes
Estimated and planned into a sprint: 15 minutes in a planning meeting
Picked up one or two sprints later by a developer who must first relearn the context, understand the bug, navigate the abstraction layers, fix the code, fix the broken tests, and write new tests: 60 minutes or more

Total: approximately 110 minutes of engineering time to resolve a 5-minute problem, with the user waiting six weeks for a fix that was always trivial. That is a 22-times cost multiplier applied entirely by the process. The bug is not better fixed. The system is not more stable. The outcome is strictly worse in every dimension — cost, speed, and user experience — and the process produced it.

The Kaizen Parallel

This is not a new insight. Toyota's lean manufacturing principles identified this failure mode decades ago under the concept of muda — waste. Waste in production systems is any activity that consumes resources without adding value. The 105 minutes of process overhead on a 5-minute fix is almost pure waste: motion without value, waiting, unnecessary processing.

The deeper Kaizen principle is that the person closest to the problem is best positioned to fix it. The developer who wrote the code, who understands it today, who can see the bug clearly right now — that person fixing it immediately is the optimal outcome by every measure. Deferring it transfers the problem to a different person at a different time with less context, more overhead, and a worse result.

Empirically, this approach does not produce more bugs. Teams that have observed both models report comparable defect rates. The difference is resolution time: same-day fixes versus multi-sprint delays. On the metric that actually matters to the business — how long does a known problem affect users — the simple, production-first system wins decisively.

The Real Job

The assembly part of software development — implementing a described behaviour to pass a set of tests — is a commodity skill. It is increasingly automatable. It produces measurable output in a sprint and moves tickets across a board. It is the part of the job that process-first thinking measures, rewards, and optimises for.

The understanding part is not a commodity. It is not automatable. It does not show up in velocity metrics or test coverage percentages. But it is the part that determines whether the software is actually correct. It is the part that finds the circular ownership scenario before it becomes a compliance incident. It is the part that knows what to leave out. It is the part that produces code readable enough that a domain expert can spot an error without running a test. It is the part that makes a bug a five-minute fix rather than a two-sprint project. And it is the part that occasionally recognises that the problem as described is a symptom — and builds the thing that makes the symptom impossible.

Everything that is not in direct service of the production system is not neutral overhead today. It is an obstacle tomorrow. The fifteen-year lifespan makes this visible in a way that a two-year project never does. The complexity accumulates. The process overhead compounds. The abstractions added for testability become the walls that trap the system. The dependencies added for framework compliance become the liabilities that prevent the upgrade. The architectural mandates applied without domain justification become the constraints that make every change expensive.

Ask of every decision: is this the best choice for the production system? If the honest answer is "no, but it satisfies the process" — remove it. Whatever is not there cannot break, does not need maintenance, and does not need to be understood.

Simplicity is not the absence of effort. It is the result of understanding deep enough to know what to remove.

The properties described in this article — longevity, upgradeability, maintainability, extensibility, readability, and organisation — are not independent qualities to be optimised separately. They are consequences of a single discipline: understanding the domain well enough to represent it simply, correctly, and durably in code that will outlast the people who wrote it. The process serves that goal. When it stops serving that goal, the process is the bug.

DEV Community