Leon Pennings

Posted on Jun 19 • Originally published at blog.leonpennings.com

What is the reason for using a rich domain model in the age of AI?

#ai #softwaredevelopment #java #architecture

Most software architecture debates can't actually be settled. Every system is built once. The alternative approach — the one that wasn't chosen — is never built alongside it, under the same conditions, with the same team, against the same market. So when a system works, "it works" gets quietly promoted to "the approach was right," and when a system rots, the rot gets blamed on the domain being inherently complex, or the requirements changing too much, or the previous developers having been careless. Almost never does anyone conclude that the architecture itself was the variable that mattered, because there is no control group to compare it to.

This is the unfalsifiability problem, and it is the reason architecture discussions tend to be so unproductive. Everyone is generalizing from an n of one, or a handful of isolated ones, with team skill, domain difficulty, and plain luck as uncontrolled variables throughout. Two competent engineers can each have ten years of experience, complete confidence in their conclusions, and have learned nothing transferable to each other, because neither has ever seen their belief tested against an alternative.

If we can't run the controlled experiment, we need a substitute. Fred Brooks gave us most of one, decades ago, by separating essential complexity — the difficulty that comes from what the problem actually is — from accidental complexity, the difficulty we introduce ourselves through our tools, our process, our representations. Brooks' point was that a lot of suffering in software is self-inflicted, layered on top of a problem that wasn't that hard to begin with.

What Brooks didn't give us is an operational test — a question you can ask in the middle of an actual design decision to tell which kind of complexity you're looking at. That's the test this article is trying to supply: was this decision forced by a genuine, current understanding of the domain, or was it forced by a constraint that existed before that understanding did? Essential complexity should always be the thing leading. Accidental complexity should always be downstream of it, serving it. The moment that order inverts — the moment a technology choice, a deployment topology, or a process gate starts dictating what the domain is allowed to look like — you have accidental complexity in charge, and the system will eventually make you pay for it.

This problem is more urgent now than it has been at any point before, for a reason this article will come back to at the end: AI has made implementation — the actual writing of code — nearly free. Free implementation removes exactly the kind of friction that used to nudge developers toward a correct model almost by accident, whether or not they could ever have named what they were doing. What's left, once that friction is gone, is only the question of whether anyone is still asking it on purpose.

A tool, and what it takes for the tool to work

A rich domain model is, I'll argue, a tool — not a style preference, not an aesthetic about classes versus functions, but a tool built for three specific jobs. It's how you learn what a domain actually is, since the act of trying to give a concept a clean shape is what exposes whether you understood it in the first place. It's how you define a domain precisely enough that "what to build" stops being a matter of taste or memory. And it's how you document a domain in a form that has to keep working — unlike a diagram or a wiki page, which can drift quietly out of date for years with nobody noticing, a domain model that's wrong tends to say so. It is essential complexity made tangible — something you can actually point at — and testable — something that tells you when it's wrong, rather than something you have to take on faith.

That's the claim the rest of this article is going to spend its length defending. It comes with a condition attached, because a tool only does its job under specific circumstances, and most of the software industry's familiar habits — fat service layers, splitting early into bounded contexts or microservices, treating a new user story as a work order instead of as evidence — violate that condition constantly, usually without anyone noticing they've done it.

The condition has three parts. Essential complexity has to stay whole — in one place, reachable by one mind at a time, not dispersed across two hundred services inside a single codebase, and not dispersed across a boundary drawn between teams or deployments. The model has to give feedback when your understanding of it turns out to be incomplete — a behavior with no natural home, a compile error at every site that assumed an old shape, a constraint violated at the exact moment an assumption proves wrong. And folding new insight into the model, once you have it, has to stay cheap — paid once, in one place, rather than hunted for across however many places happened to encode the old understanding. Lose any one of these three and the tool stops being a tool. The code may still run. There may even be a model on a slide somewhere. But the thing that was supposed to be doing this work isn't doing it anymore — only its appearance survives.

Take the first condition first, because it's the one most quietly violated, usually with the best of intentions. When essential complexity lives inside one core model, you can look at the model and see the business. When it doesn't — when it's spread across fat services, buried in repositories, scattered across distributed components, or split along departmental lines that felt obvious at the time — the legibility is the first casualty, and with it goes the ability to even ask the question this article opened with: is the application serving the domain, or has the domain quietly started serving the accidental complexity that was supposed to be in service of it? Once essential complexity stops having one visible, coherent home, that observation can no longer be made by anyone, because there's no longer a single place left to look. What follows is, in effect, an extended demonstration of what it costs to lose each of these three conditions, one at a time — and of how rarely losing them announces itself as a mistake while it's happening.

A model is something you learn through, not something you draw once

Take a deliberately simple example: a library that lends things out. Old and familiar on purpose, so the reasoning is the point, not the subject matter.

The first conversation with the domain expert goes predictably. The library wants to lend books. They want to know where each book is — on a shelf, or on loan to someone, from when until when.

The path of least resistance puts the loan dates directly on Book. The book knows where it is; if it's out, it knows to whom and until when. It seems natural enough that most developers wouldn't pause on it.

But pause on it anyway, because this is the decision that quietly constrains everything downstream of it. Ask a plain domain question: is knowing when a book was borrowed, and by whom, part of what a book is? A book is a title, an author, a physical object. A loan is an event — an agreement between the library and a person, at a point in time, about that book. These are different things, stitched together for convenience, the same category of error as storing someone's employment history inside their passport.

There's a structural problem hiding behind the conceptual one, too. A book gets borrowed many times, by different people, at different points in time. A single set of loan fields on Book can't represent that history without overwriting it on every new loan. This isn't a style complaint — the model is structurally incapable of answering questions the business will eventually ask.

So a Loan entity gets introduced. It points to a book and a borrower, and carries its own data: start date, end date, return date. Book goes back to being just a book. Each concept is responsible for what it actually is.

Nobody asked for this refinement. The user story was "we want to lend out books," not "please separate the concept of a loan from the concept of a book." But the story was never a specification — it was a piece of information about the domain, and the job was to ask what it revealed, not to type it directly into a Book class and close the ticket.

Once Loan exists as its own thing, something becomes visible that nobody requested: how many times a book has been borrowed this year, whether it's going out back-to-back often enough to justify a second copy, which loans are overdue right now, which borrower has the most items out. None of this required touching the model again. It fell out of having put the responsibility in the right place the first time. A correct abstraction doesn't just solve the stated problem — it stops resisting the next ten questions nobody has asked yet.

The second correction

A new requirement arrives: the library wants to lend DVDs too.

The path of least resistance here is just as easy to predict: add a DVD entity. Title, director, runtime. Close the ticket. And this is exactly the failure this whole article is about, in miniature — the request "we also want to lend DVDs" got treated as an instruction to add a DVD class, instead of as new information about a domain that had just revealed something about itself.

The actual question isn't "how do we add DVD." It's: was Book ever the right concept for this domain in the first place? The lending system doesn't care that a book has pages or a DVD has a runtime. It cares that both are things that can be borrowed, tracked, and returned. Model Book and DVD as siblings and the next story brings magazines, then tools, then something that breaks the pattern outright, and four parallel entity types are now duplicating service logic and complicating every report.

The concept the domain actually needed, it turns out, was never Book. It was LendableItem — something that can be lent, regardless of what it physically is. Book becomes LendableItem; what kind of item it is becomes data (ItemType), not a class; the attributes specific to a type (ISBN and author for a book, runtime and director for a DVD) live in a small typed collection shaped by that ItemType. A new lendable thing can be defined through configuration, without a release.

This isn't abstraction for its own sake — starting with Book was the right call when only books existed; naming a concept after its only known instance is reasonable, not naive. The point is that when the second instance arrived, it was evidence, and the model was obligated to respond to evidence rather than absorb it as a special case bolted onto the side of the original guess.

Here is the part worth sitting with: in both corrections, the cost of being wrong was paid exactly once, at exactly one place, and the compiler told you everywhere else that needed to change. Turning Book into LendableItem produces a wave of compile errors at every call site that assumed a Book — every one of them a worked checklist, not a hunt. There is no step where you have to remember which of fourteen services touched the old assumption. The type system already knows.

Picture the alternative: a codebase with two hundred service methods, accumulated over years, several of them written by people who've since left. Some of those services read a book's loan status off a flag on Book. Some duplicate the "is this thing currently out" check inline. Some call into a shared BookService that does it correctly, and some call into an older one that doesn't quite. When the DVD requirement lands, finding every place that encoded an assumption about books is now a research project, conducted from memory and grep, with no tool confirming you found all of them — and if two different developers wrote two of those services, they may have encoded two subtly different mental models of what a book even is, neither of which was ever forced to reconcile with the other, because nothing in the architecture ever made them collide.

That's two of this article's three conditions doing their work at once: the model gave feedback the moment a concept had no natural home to be wrong in, and folding that correction back in cost one change, enforced by a tool, rather than a hunt across however many places had quietly encoded the old assumption. That's the actual argument for a domain model, stated as plainly as I can: it is the cheapest known way to be wrong, because being wrong gets caught in one place, by a tool, instead of being wrong silently in fourteen places, caught eventually by a domain expert noticing the software does something they never agreed to.

Fat is not a size problem

The library example shows feedback and cheap correction working together, inside a single concept. The first condition — that essential complexity stays whole — fails differently, and far more commonly, and it's worth seeing exactly how, because the failure is almost always mistaken for a different problem than it is.

Take Customer. Almost every enterprise system has one, and almost every one of them eventually starts absorbing things that don't belong to it: a preferredCarrier field set because shipping needed it, a creditLimit because billing needed it, an slaTier because support needed it. Years of this, and Customer is enormous — hundreds of fields, half of them nullable, conditional logic scattered through anything that touches it, and nobody able to describe what Customer actually means anymore, because it means five different things depending on who's asking.

This is a real failure, and the diagnosis matters, because two very different responses are available, and only one of them fixes anything.

The popular response is to split. Give shipping its own ShippingCustomer, billing its own BillingCustomer, support its own SupportCustomer — separate models, separate teams, separate services if you go all the way, joined by some kind of translation layer that maps one context's idea of a customer onto another's. This is the bounded-context move, and on paper it sounds disciplined: each context gets a clean, focused model instead of one bloated shared one.

Look closer and notice what actually happened: ShippingCustomer is not a different concept from the bloated Customer. It's the same god object, just with the bloat partitioned by department instead of concentrated in one file. The information that crept into Customer because nobody asked "whose responsibility is this" hasn't been resolved — it's been relocated, and the relocation comes with a new bill attached. Where before, a change to how loyalty tier affects shipping could be seen and verified in one place, by one compiler, it now has to travel: Billing's context has to publish something, Shipping's context has to subscribe to it, maintain its own copy, and recompute its own derived state asynchronously, hoping the event arrives, hoping the definitions of "loyalty tier" haven't quietly diverged between the two contexts that were specifically built not to share one. The coupling between billing and shipping didn't go away because they're now in different rooms. It just stopped being visible to anyone reading either room on its own — and a dependency you can't see is not a dependency you've solved, it's a dependency that will surface later, in production, as an "integration issue" nobody can trace back to its origin.

This is the same shape as the god object, except distributed. Splitting the pain across contexts is, at best, splitting the pain — not preventing it.

What the fix actually looks like

The right response to a fat Customer is the same response that turned Book into Loan and LendableItem: ask what responsibility doesn't belong here, and extract it — into a new, named concept, still inside the same model, still reachable by an ordinary reference, still subject to the same compiler.

But extract it carefully, because there's a trap one level down that looks like a fix and isn't. The instinct might be to give Customer a List<ShippingPreference> directly — replace the flat preferredCarrier field with a small polymorphic hierarchy of rules, ranked by precedence. That's progress over the flag, but it's still the same mistake in a thinner disguise: Customer has no business knowing that shipping preferences exist as a concept at all. A ShippingPreference living directly on Customer is Customer quietly absorbing knowledge of how it's consumed downstream — the exact failure that produced ShippingCustomer in the first place, just wearing an interface instead of a flag.

The responsibility that's missing a home isn't "the customer's shipping rules." It's "how this customer relates to shipping" — and that relationship is its own concept, with its own name: CustomerShipping. It holds a reference to the Customer it concerns, and a list of CustomerShippingPreference instances — a default, a tier-based upgrade, an explicit override — each one only meaningful inside the context of shipping, which is exactly where they now live.

interface CustomerShippingPreference {
    int precedence();
    CarrierChoice resolve(Order order);
}

class DefaultShipping implements CustomerShippingPreference {
    public int precedence() { return 1; }
    public CarrierChoice resolve(Order order) {
        return new CarrierChoice("UPS", Duration.ofDays(3));
    }
}

class GoldTierShipping implements CustomerShippingPreference {
    public int precedence() { return 2; }
    public CarrierChoice resolve(Order order) {
        return new CarrierChoice("DHL", Duration.ofDays(0));
    }
}

class ExplicitDateOverride implements CustomerShippingPreference {
    public int precedence() { return 3; }
    public CarrierChoice resolve(Order order) {
        return new CarrierChoice(order.requestedCarrier(), order.requestedDate());
    }
}

class CustomerShipping {
    private final Customer customer;
    private final List<CustomerShippingPreference> preferences;

    CarrierChoice shippingMethodFor(Order order) {
        return preferences.stream()
            .max(Comparator.comparingInt(CustomerShippingPreference::precedence))
            .map(p -> p.resolve(order))
            .orElseThrow();
    }
}

Customer itself never branches on tier, never checks for an override, never holds a single field related to shipping — it doesn't even know CustomerShipping exists. Shipment, when it needs a carrier, doesn't ask Customer anything directly. It takes an Order, reads the Customer off it, looks up or builds the CustomerShipping for that customer, and asks that object for the shipping method given the order:

class Shipment {
    Shipment(Order order, CustomerShippingRepository shippingLookup) {
        Customer customer = order.customer();
        CustomerShipping shipping = shippingLookup.forCustomer(customer);
        CarrierChoice carrier = shipping.shippingMethodFor(order);
        // ...
    }
}

A new rule — a holiday rush exception, a regional carrier restriction, a future platinum tier — is a new class implementing CustomerShippingPreference, added to CustomerShipping's list, never touching Customer at all. The arbitration logic — given several applicable rules, which one wins — has exactly one home, and Customer stays exactly as ignorant of shipping as Book stayed ignorant of loans.

This is a handful of small, plainly readable classes. It is not impressive-looking code. And it resolves more correctly, with less effort, than either the original flag-on-Customer design or the bounded-context split would have, because it correctly identifies what was actually going on twice over: not "customer is too big," but "shipping's view of a customer had no home, so it got jammed either into a field on Customer or into a separate ShippingCustomer clone — when what it actually needed was its own name, sitting between the two, owning exactly the relationship it represents and nothing else."

Now try to build the same arbitration across three separate services — a shipping-preference service, a loyalty-tier service, an order-override service, however the bounded contexts happened to get drawn. The precedence rule doesn't belong to any one of them; it belongs to the relationship between them — which is precisely the responsibility CustomerShipping exists to hold — and that relationship has nowhere to live except in glue code outside all three contexts once it's been split that way: code nobody will consider part of "the domain," code that has to either make three synchronous calls and recompute the ranking itself, or maintain a denormalized, eventually-stale copy of all three rule types just to compare them locally. Either way, the actual essential complexity here — how privilege and explicit intent interact — has become homeless, in a system specifically designed to give every concept a clean home. Good luck.

There's a second, less obvious benefit to CustomerShipping worth naming, because it points at something larger than this one example. Notice what this design actually is: an add-on. It attaches a new concern to Customer after the fact, without modifying Customer, without Customer ever being aware it exists. That's normally the property bounded contexts and microservices claim for themselves — loosely coupled, independently addable — except here it's achieved without any of the cost usually attached to it, because the looseness came from correct responsibility assignment, not from physical separation. It's glue, without the pain that usually comes with glue.

The boundary as a bet you can't yet price

Here is the order of moves so far, made explicit, because the second move only works after the first one has landed. First: most of what bounded contexts are reached for to fix — a bloated Customer, a god object, departments fighting over one shared model — is solved more simply and more cheaply by keeping the first condition intact inside a single codebase: ask what responsibility doesn't belong, extract it into its own named object, connect it by reference. CustomerShipping is the proof. The usual justification for splitting evaporates once the extraction is done properly, because the thing the split was trying to relieve never had to exist in the first place.

Second, and this is the sharper claim: even where a boundary still looks justified on the day it's drawn — even if the team did genuine, careful event-storming, even if the language really does diverge between two parts of the business — the boundary is a bet, and it's a bet placed with incomplete information, because you cannot know today every cross-cutting rule the business will need tomorrow. This is the same condition failing on a different axis. Inside a codebase, the failure mode was a name with no responsibility. Across a network, it's a boundary that looked justified on the day it was drawn, and wasn't, because the thing that would have falsified it hadn't happened yet. A boundary drawn between Customer-handling and Shipping-handling is implicitly a claim that nothing will ever need to act on both sides of that line atomically. That claim is being made before the business has finished telling you what it needs — and it never finishes, the same way the library's understanding of what a lendable thing was never finished after one conversation.

The rule that eventually crosses the boundary doesn't have to be a compliance requirement. It's tempting to reach for GDPR's right to erasure as the example, because it's vivid and has a regulator attached — and it is a real instance of this, worth walking through on its own merits. A customer asks to be forgotten, and Customer needs to be deleted, fully and verifiably. In a single database, behind a single transaction, this is mostly handled by the database itself: if CustomerShipping references Customer and nobody wrote code to remove it first, the foreign key constraint refuses the delete, loudly, immediately, pointing exactly at what's still attached — the same constraint that should also prevent erasure when an order is still open or a complaint unresolved, again without anyone having to remember to write that check by hand. That failure is itself a small instance of the same learning loop the rest of this article has been describing: a ConstraintViolationException at the moment of deletion is the system telling you, synchronously and for free, that your understanding of "what does removing a customer actually require" was incomplete — caught at the cheapest possible moment, before anything was lost. Spread CustomerShipping's data across an independently owned datastore in a separate service, and that guarantee disappears with it: there's no foreign key spanning two databases, so erasure becomes a saga of calls with compensating logic if any step fails, and the entire guarantee now depends on someone having remembered, months earlier, to wire CustomerShipping into that flow. Forget one service and nothing breaks loudly. The data that should have been gone simply continues to exist, discovered eventually by an audit, if it's discovered at all.

But making GDPR the centerpiece would be a mistake, because it hands every team without a regulator standing over them a clean exit: we're not compliance-heavy, so this doesn't apply to us. It applies to them too, because the same shape of rule shows up constantly with no compliance angle at all. A loyalty program launches eighteen months in, and upgrading a customer mid-month needs to retroactively adjust the shipping terms on every order still in transit — Customer, Order, and CustomerShipping, read and changed together. A fraud signal fires, and every open order and pending shipment for that customer needs to freeze atomically, in one step, not as three separate notifications hoping three separate systems all apply the freeze correctly and in time. An account gets closed, but anything already in transit is contractually entitled to still ship — one rule, reading across three concepts at once, treating each differently based on the others' current state. None of this is compliance. All of it is just Selling, understood a little more completely than it was on day one, the same way Loan and LendableItem were Lending, understood a little more completely than Book ever was.

The price of having split early isn't paid on the day of the split. It's paid the day one of these rules arrives, and what would have been a few small domain objects — a class, a method, a foreign key — turns out instead to require an application integration effort: a saga, a compensating-transaction design, a new piece of cross-service observability just so anyone can tell, after the fact, whether the rule actually applied everywhere it needed to. That price was never on the table when the boundary was drawn, because the rule that triggers it didn't exist yet. The boundary wasn't wrong because the modeling was sloppy. It was wrong because it was a permanent commitment made against a domain that was still, and always will be, in the process of being discovered — and discovery doesn't pause for the convenience of an architecture diagram that's already been agreed on.

There is a name for the assumption that a system can be correctly specified before the work of building it reveals what you didn't know. Waterfall made that assumption about requirements. Bounded contexts make the same assumption one level down — about domain boundaries. The parallel is precise: in both cases, a commitment is made at the moment of least knowledge, the commitment hardens as work accumulates on top of it, and the cost of the thing you didn't know becomes visible only after the commitment is too expensive to revise cheaply. The difference is that waterfall's failure eventually became undeniable enough that the industry moved on from it. The bounded-context version of the same mistake is currently being actively marketed.

The boundary that doesn't justify itself

It's worth being explicit about why this keeps happening, because the architectural move — bounded contexts, services drawn along them — is usually defended with a real and legitimate-sounding observation: the same word genuinely means different things in different parts of a large business. A "policy" to an underwriter is not what a "policy" means to someone handling a claim. A "trade" looks different to the front office than to settlement.

That observation is correct. The conclusion usually drawn from it — therefore, model it five times, once per context, and translate between the copies — is not the only available response, and I'd argue it's rarely the right one. When a single word is doing genuinely different jobs in different parts of the business, that is usually evidence that it was never one concept to begin with. It's evidence of exactly the same mistake Book made before Loan was extracted from it — a name covering more than one responsibility — except at a larger scale, and instead of doing the extraction (naming the actual underlying concepts: a contract, a claim case, a reserve calculation — each with its own identity, its own lifecycle, connected by ordinary references, the same way Order, Invoice, and Shipment are three objects rather than three departments' versions of one), the bounded-context move keeps the original, overloaded name in every room and adds a translation layer at each door. That's not respecting the business's multiple truths. It's declining to find out what the business's multiple truths are actually called.

This is worth stating plainly, because it's easy to mistake for a concession it isn't: the deeper the semantic divergence, the more extraction work is implied, not less — and the more reason to do it inside one model, where the newly-named concepts can still reference each other directly, rather than across a boundary that forces every relationship between them through an anti-corruption layer. A reinsurance contract and the claim filed against it are obviously different things with different lifecycles; that's an argument for ReinsuranceContract and ClaimCase as two well-named, related objects, not for two disconnected "Policy" models maintained by two teams who've agreed never to look directly at each other's data. Genuine semantic depth is the strongest case for doing the modeling work, not the exception that excuses skipping it.

None of this is an argument that physical distribution is always wrong. There are real, legitimate reasons to run things as separate deployable units: independent failure isolation that actually matters operationally, genuinely independent scaling needs, regulatory requirements that mandate separation for audit or compliance reasons unrelated to modeling at all. The test for whether a split like that is healthy is simple, and it's the same test from the start of this article: does the domain model have to change shape to accommodate the split? If the answer is no — if the same concepts, the same responsibilities, the same rules hold, and only the mechanism for reaching across them changes from a method call to a network call — then the split is a free, reversible decision about deployment, made after the model earned the right to be trusted, and accidental complexity is correctly staying downstream of essential complexity. If the model does have to change shape — if concepts get duplicated, renamed per-context, or translated through an anti-corruption layer to paper over a divergence nobody actually investigated — then the split came first, and the modeling work that should have preceded it never happened. The boundary became a substitute for understanding, not a consequence of it.

Measuring the wrong thing very precisely

A reasonable objection at this point: surely modern engineering practice catches this. Code review, static analysis, test coverage gates, architecture review boards — surely all of this machinery exists to prevent exactly the kind of drift described above.

It doesn't, and it's worth being precise about why, because the machinery is not useless — it's aimed at a different target entirely. A static analyzer can tell you a method is too long, that a class has too many dependencies, that cyclomatic complexity has crossed a threshold. None of that is a domain question. SonarQube has no opinion on whether Customer should hold a preferredCarrier field directly or delegate that entirely to a CustomerShipping object that doesn't exist on Customer at all, because that isn't a code-smell question, it's a question about whether the model corresponds to how the business actually works — and no tool that operates on syntax has any way to check a fact that only exists in a domain expert's head.

So an organization can run an elaborate, expensive process — fully pipelined microservices, every commit reviewed, every merge gated on a green static analysis run, deployment fully automated — and produce, at the end of all of it, a system whose model is confidently, fluently, rigorously wrong. Every visible signal says the engineering is going well, because every visible signal is measuring implementation hygiene, and implementation hygiene and model correctness are different axes that happen to get conflated constantly, because rigor feels like one thing.

This connects back to where the article started. Nothing in the standard toolkit is built to catch a violation of any of the three conditions this article has been tracing — none of them are code-smell questions, and no linter has an opinion on whether essential complexity stayed whole, gave feedback, or remained cheap to correct. The absence of a controlled alternative means a team can run this kind of theater for years, ship working software the whole time, and never learn that a few small, ordinary classes — built around the right concepts instead of the existing process — would have outperformed all of it. A well-designed model with mediocre implementation has a much higher ceiling than a brilliantly implemented wrong one, because the brilliance in the second case is mostly being spent compensating for the model — defensive checks for cases that shouldn't exist, synchronization between copies of state that never needed to be duplicated, translation layers between contexts that never needed separating — and all of that compensating effort gets thrown away the moment someone finally corrects the model underneath it. Effort spent on a correct model compounds. Effort spent on an incorrect one partially evaporates, no matter how rigorously it was reviewed on the way in.

Good engineering practice, by this account, is not the pipeline. It's the discipline of being able to say, clearly, what the model is and why — the implementation afterward is the easy part, and it has always been the easy part. The pipeline measures the easy part very thoroughly.

Why this gets more urgent, not less, with AI

Here is the part that didn't apply five years ago in quite the same way.

Implementation has historically had a floor of friction underneath it that nudged people toward structure almost by accident. Hacking procedural code together against a complex domain became unmanageable quickly enough — the special cases piled up, the conditionals nested, the same logic got copy-pasted into three places — that developers were pushed toward extracting structure out of self-preservation, even teams who'd never read a line of object-oriented theory. The friction wasn't a deliberate teacher, but it taught something, by making the wrong path visibly painful to keep walking.

AI-assisted coding removes a great deal of that friction — and it's worth being precise about what that means, because "AI breaks the feedback loop" is a slightly different and less accurate claim than what's actually happening. AI doesn't break the loop. It removes the pressure that used to force the loop into existence in the first place, often for teams who never deliberately chose it and couldn't have named it if asked. Take that pressure away and the loop doesn't vanish — it just stops being automatic. From here on, keeping it is a deliberate choice, the same as any discipline that doesn't enforce itself.

But there's a second, subtler effect that goes beyond friction removal, and it maps directly onto all three conditions this article has been tracing. Consider what happens when a library system needs to lend DVDs. A human developer who adds DVD as a sibling of Book, then adds Vinyl six months later, then writes an increasingly complex query to aggregate loan counts across three separate entity types — that developer feels something. Not necessarily consciously, and probably not articulately. But the query is harder to write than it should be. The next story that touches lending takes longer than expected. Something resists. That resistance is a weak signal, easily ignored and often misattributed to "the domain is just complex," but it exists. Occasionally it prompts a conversation, a refactor, or a senior developer asking why this feels harder than it should. It is, in a loose and informal way, the model giving feedback through the second condition.

AI generates the three-way join with exactly the same fluency as the one-way query. It doesn't experience resistance. The code is clean, the tests pass, the feature ships. Nobody in the process felt anything. The signal that a wrong shape generates — growing complexity, queries that accumulate joins, stories that quietly take longer than they should — exists nowhere in the experience of either the AI or the prompter, who is working at a level of abstraction that sees "does this feature work," not "is this implementation getting harder than it ought to be." The feedback that used to live in development, however weakly, has moved entirely to production: corrupt data, incoherent transactions, a simple-sounding feature that turns out to require three months because nobody can find a clean place to put it in a model nobody shaped for it. That's the most expensive place for a feedback loop to live, and it's where AI pushes everything — not just for procedural code or bounded contexts, but for any approach that wasn't built around a model designed to give feedback structurally rather than through the developer's pain.

That question gets answered in a refinement session, by watching a domain expert's reaction to a model that doesn't quite match what's in their head, by treating a new user story as evidence rather than as an instruction. AI has no access to that room. It can implement what it's told with great fluency, but it has no mechanism for discovering that what it was told was an incomplete or slightly wrong description of the domain, because discovering that requires exactly the adversarial, repeated checking against reality that this entire article has been describing as the actual function of a domain model. A model built without that checking is not a faster way to get to a correct system. It's a faster way to arrive, confidently and with clean code, at the same unfalsifiable mistake the rest of the industry has been making for decades — just produced at a speed that makes it considerably harder to notice before the cost compounds.

The bottleneck in software quality was never really implementation, even before AI; it only looked that way because implementation was the part that consumed the most visible hours. Collapse the cost of those hours toward zero, and what's left, undisguised, is the question that was always the only one that mattered: did anyone actually understand what they were building, or did they just build the first plausible shape it was described as, and call it done.

Essential complexity, made tangible and testable

A rich domain model is a tool. A tool to learn what a domain actually is, a tool to define it precisely enough that implementation stops being a guess, a tool to document it in a form that has to keep working, because unlike a wiki page, it can't silently drift out of date without a compiler, or a database constraint, saying so. It is essential complexity made tangible — something you can point at — and testable — something that tells you, specifically and immediately, the moment it's wrong.

Everything in this article has really been one long demonstration of what happens when the three conditions that tool depends on get broken, one at a time. Split a domain along functional lines that made sense given what was known at the time, and every scenario you already knew about still works — but the cross-cutting rule that arrives later, whether it's a compliance deadline or an ordinary business decision nobody had thought of yet, now costs an integration project instead of a few small classes, because the domain that should have stayed whole was cut before anyone could know what would eventually need to reach across the cut. Disperse the logic into fat services and repositories and DTOs instead, and the model stops giving feedback at all, because there's no longer one place for a wrong assumption to collide with itself and be caught. Hand the implementation to something that writes fluent code without ever asking whether the shape it was given was the right one, and the loop that used to force discovery — slowly, expensively, but eventually — stops being forced. It doesn't disappear. It just stops happening unless someone chooses, deliberately, to make it happen.

Which is where this circles back to where it started. Software architecture is unfalsifiable — no control group, no alternative built alongside the one that shipped, every conclusion drawn from an experience of one. That problem isn't going away. But a rich domain model is the closest substitute available for the experiment nobody gets to run: not proof that a decision was right, but a running, continuous test of whether it still is — for as long as the essential complexity stays whole enough to look at, gives feedback when it's wrong, and stays cheap enough to correct that correcting it remains something a team will actually do, rather than something they agree, in principle, they should.

None of this requires architects and engineers to want it to be true. That's the uncomfortable part, and it's worth ending on. The costs of drifting away from it — the fat service, the boundary drawn early, the AI-fluent implementation of a shape nobody examined — are deferred, distributed across people who didn't make the original decision, and individually invisible at the moment each one gets made. Nobody sets out to make software hard to change. They choose a service split that solves this quarter's problem, a pattern from a conference talk, a completion that passes the tests in front of them. The same unfalsifiability that opened this article is exactly why none of those choices announce themselves as mistakes at the time — there's no control group showing what the alternative would have looked like. A rich domain model doesn't argue anyone out of making those choices. It just makes the cost of having made them visible while the bill is still small enough to pay.

The purpose of a rich domain model is not to be right. It is to make being wrong visible while the cost of correction remains small.

DEV Community