Leon Pennings

Posted on Jun 10 • Originally published at blog.leonpennings.com

How To Prevent Contradicting AI Prompts

#ai #softwaredevelopment #java #architecture

You've Either Seen This Already, Or You Will

You're building with AI. It's going well. Features appear quickly, the code is clean, the application works. You describe what you need, the AI implements it, you move on.

Fifty prompts in, maybe a hundred, maybe two hundred — something breaks. Not dramatically. A behaviour that should be consistent isn't. A rule that was established early is being violated somewhere downstream. A customer finds an edge case that produces an answer that contradicts another part of the system.

You dig in. The code at each location looks reasonable. Both implementations made sense when they were written. But they cannot both be right. Somewhere, somehow, the application has developed two incompatible beliefs about how something works.

The immediate instinct is to fix the prompt. Be more explicit next time. More structured. More careful about context. Give the AI better instructions and this won't happen again.

That instinct is wrong. And acting on it — more careful prompting, stricter templates, longer context windows — will delay the next contradiction but will not prevent it. Because the contradiction did not come from the prompting. It came from somewhere the prompting cannot reach.

This article is about where it actually comes from. And about a solution that is older than AI, older than the frameworks that preceded it, and consistently buried by an industry that keeps rediscovering the same problem and forgetting the same answer.

The Prompt Isn't The Problem

Here is what the contradiction actually looks like.

A B2B sales platform. Early in the build, prompt 75 establishes what an Order is: it belongs to a single customer, ships to a single delivery address, and is invoiced to a single billing contact. Clean, simple, the AI implements it correctly. Every subsequent prompt that touches Orders — discount calculation, delivery estimation, invoice generation, fulfilment tracking, customer notifications — is written on that assumption. None of those prompts are wrong. They are all consistent with the terrain as it was understood at the time.

Eight months later, a different developer picks up a new requirement. Corporate customers need to split a single order across multiple departments, each with their own delivery address and cost centre. Prompt 235 asks for multi-address order support.

The AI implements it correctly. Locally it is reasonable. But it has just redefined what an Order is — from a thing that belongs to one address to a thing that can belong to many. The terrain underneath has shifted. Every prompt written between 75 and 235 that touched delivery address, invoice recipient, or customer identity was built on ground that no longer exists.

The developer writing prompt 235 does not know this. They were not there for prompt 75. Eight months is long enough for team composition to change, long enough for the original assumption to exist only in the memory of someone who may no longer be on the project. There is no artifact they could have consulted. The assumption was never written down. It was the water everyone was swimming in — until it wasn't.

So where do you look? The AI wrote both implementations correctly. The prompts were both reasonable. There was no mistake at the point of instruction. The contradiction exists in the space between the prompts — in the overall model of what an Order actually is, which was assumed but never defined.

And the cascade is not just these two prompts. It is every prompt in between. Reporting, discounting, fulfilment, notifications — all of it was written on the assumption of a single address. None of it is obviously broken. All of it is now wrong in ways that will only surface when a corporate customer places their first multi-department order.

Better prompting cannot fix this. You cannot write a prompt that corrects a contradiction you do not know exists. You cannot ask the AI to be consistent with a model that was never articulated. The problem is not the quality of the instructions. The problem is the absence of something the instructions could be consistent with.

Why Rebuilding Doesn't Work

The rebuild instinct is understandable. The application is a mess. The logic is scattered. Nobody knows where anything lives. Start over, do it right this time.

But doing it right this time requires understanding the domain correctly this time. And the domain was not understood correctly before — not because the team was incompetent, but because understanding a domain correctly requires implementing it, adjusting it, hitting the contradictions, resolving them with the people who own the domain, and implementing again. That process takes time. It cannot be replaced by more careful planning.

A rebuild without that process reconstructs the same misunderstandings into a cleaner codebase. The new system starts with higher accidental complexity — the lessons of the previous system encoded as defensive patterns — and the fundamental contradiction is still there, now buried deeper.

This is not a failure of AI. This is the predictable result of building without a map. The AI is doing exactly what it is told. The problem is that what it is told has no center — no single coherent explicit model of the domain that all instructions must be consistent with. Without that center, contradictions are not just possible. They are inevitable. And no amount of rebuilding or re-prompting creates that center retroactively.

The center has to come first.

What Fred Brooks Knew

The center has to come first. Fred Brooks identified why, sixty years ago, and the industry has spent most of that time ignoring him.

Brooks distinguished between two kinds of complexity in software. Essential complexity is the complexity intrinsic to the problem itself — the business rules, the domain constraints, the lifecycle of an Order, the eligibility rules for a customer. It cannot be removed. It does not care what tools you use or what architecture you choose. The business is as complex as it is, and that complexity must be represented somewhere.

Accidental complexity is everything else. The frameworks, the indirections, the patterns applied without cause, the services that exist because nobody decided where the behaviour actually belonged. Accidental complexity is not intrinsic to the problem. It was introduced by the approach. And unlike essential complexity, it can be reduced — or avoided entirely.

The distinction matters because it defines what is permanent and what is replaceable. The essential complexity of an application — correctly modelled — should outlast every framework it ever runs on, every infrastructure decision ever made about it, every team that ever works on it. It is the permanent part. Everything around it is the replaceable part.

The problem the industry keeps having — with frameworks, with outsourcing, with AI — is that accidental complexity accumulates invisibly while essential complexity remains unmapped. The scaffolding grows. The domain shrinks. You end up with systems that are enormously complicated but that nobody truly understands, because the complication is in the support structure, not in the problem the system was built to solve.

Rivers and Terrain

Requirements describe motion. A user does something, something happens, something else is notified. User stories are motion. Process diagrams are motion. Even event-driven architecture — at its conceptual heart — is motion wearing a technical hat. The entire tradition of software specification is built around describing flows.

Flows are rivers. And rivers follow terrain.

The river is not the landscape. It is what happens when water finds the landscape and takes the path of least resistance. Change the landscape and the river moves. The river is a consequence, not a cause. Model only the river and you have captured something real — but something that will change every time the underlying landscape shifts.

Terrain is what things are. A watershed. A valley. A ridge that separates two drainage systems. These don't change when the season changes or when a new road gets built nearby. They predate the rivers and they will outlast them.

In software, the terrain is the domain. What an Order actually is. What it means for a customer to be eligible. What obligations a contract creates and what events discharge them. These things don't change because a new payment provider came along or because the fulfilment process got reorganised. The terrain outlasts the rivers by years — often by decades.

Prompt 75 was a river. Prompt 235 was a river. Both made sense as rivers. They contradicted each other because there was no terrain underneath them — no shared model of what an Order actually is that both rivers had to flow through. Without the terrain, each river gets its own private geography. Eventually they meet and the water goes somewhere it was never supposed to go.

The missing center is the terrain. The fix is to build the map before you build the rivers.

The Domain Expert's River

The natural response is: talk to the domain experts. Capture the requirements thoroughly. Understand the business before building. Let them define the terrain.

This is right in intent and consistently wrong in execution — for a reason that matters enormously.

Domain experts know their domain the way someone knows a city they grew up in. They can navigate it perfectly without being able to draw the map. They know what they do. They know how they do it. They have decades of accumulated practice and judgment. But they know it as motion — as rivers — because motion is how work presents itself. Nobody experiences their job as terrain. They experience it as things they do.

There is a deeper problem. The domain expert's current implementation is already shaped by their tools. The spreadsheet that manages the process, the manual step that exists because the old system could not handle the edge case, the workaround that became standard practice so long ago that nobody remembers it was a workaround — these are all rivers. Rivers shaped by the banks that the tools imposed.

When a business moves from spreadsheets to an application, the naive approach is to reproduce the spreadsheet process in code. The rivers are clearly visible, the domain expert can describe them precisely, the implementation matches. It works. And the technical limitations of the spreadsheet have been permanently encoded into software that has no such limitations.

The constraint that created the workaround is gone. The workaround remains. Now it is load-bearing.

The right conversation with a domain expert is not "how do you do this." It is "why does this need to happen." Not the process — the obligation. Not the river — the terrain feature the river is flowing around.

That question is uncomfortable. It implies the current process might be unnecessary, or suboptimal, or a historical accident. Domain experts have professional identity invested in how they work. The why question asks them to step outside that identity and examine the ground beneath it. Many have never been asked to do that. Some discover, when asked, that the why is murkier than they expected — that two people on the same team have different answers, that the original reason for a rule was forgotten decades ago, that what seemed like policy is actually habit.

The developer who can ask why — and persist through the discomfort until the terrain becomes visible — is doing the hardest and most valuable work in software development. It is not a technical skill. It is closer to archaeology.

The Contextual Center

When the terrain is mapped — when the domain is understood at the level of what things are rather than what they do — it becomes possible to build a contextual center.

The contextual center is the domain model. Not a database schema. Not a service layer. Not a collection of DTOs. The living, honest encoding of what the domain actually is — its entities, their invariants, their obligations, their lifecycles — expressed in code that a domain expert could read and recognise.

When an Order knows what it means to be cancelled — not as a service method called from somewhere, but as behaviour that belongs to Order because cancellation is something that happens to Orders — the contextual center is doing its job. The logic is findable. It is in one place. A new developer can locate it. A domain expert can verify it. A compliance requirement can be checked against it.

And contradictions become immediately visible. If prompt 235 contradicts prompt 75, the contradiction surfaces the moment you try to encode both in the same place. The Order cannot simultaneously honour two incompatible rules about what it is. The terrain model forces the question that the river implementations never asked.

This is the fix for the contradicting prompt problem. Not better AI. Not more careful prompting. Not an agent that scans for logical inconsistencies. A contextual center that makes contradictions structurally impossible to hide.

The contextual center also provides the simplicity test. If the domain model is honest — if it correctly reflects the terrain — then implementing a new river should be simple. The new requirement finds its place in something that already exists, or reveals through the friction of not fitting that the model needs to grow. Either outcome deepens understanding. Either outcome improves the system.

If the implementation is getting complicated, the terrain is wrong. The complexity is not a problem to be solved with more framework or more abstraction. It is a signal. The domain is pushing back. Something in the model does not match something in reality, and the code is showing you where.

Complexity is the symptom. Simplicity is the proof.

The Scale Problem

Here is where the industry is currently making its most expensive mistake.

AI works. On small applications, on prototypes, on systems with a limited number of domain objects and a shallow set of business rules, AI-assisted development is genuinely fast and the results are genuinely clean. A developer can build a working application in two days that would have taken two weeks before. That is real. It is not marketing.

The problem is that this success is being treated as proof that the approach scales.

It does not. And the reason it does not is precisely the terrain problem.

On a sufficiently small system, a skilled developer can hold the entire terrain in their head informally. No explicit model is needed because the model exists as intuition. The contradictions surface quickly because the whole system is visible at once. The developer notices when prompt 235 conflicts with prompt 75 because they remember prompt 75. The cognitive map is small enough to carry.

Past the point where that informal map breaks down, everything changes. The developer can no longer hold all of it. The contradictions stop surfacing naturally and start accumulating silently. Each new feature lands in a system that is slightly less understood than it was before. The AI keeps implementing faithfully. The terrain keeps drifting from the model nobody wrote down.

This is the same reason waterfall worked on small projects and failed on large ones. Small projects could be designed upfront because the designer could hold the full domain in their head. Large projects could not because the domain was too complex to fully understand before implementation began. The implementation friction — the discovery process — was not optional on large systems. It was the mechanism by which the design became correct.

The scale threshold is also closer than most teams expect — and AI makes it arrive faster. A real business domain hits the limits of informal terrain mapping sooner than it appears, and AI compresses that timeline further. What took months of traditional development now takes weeks of AI-assisted development. The cognitive collapse happens before anyone realises they are out of their depth. The prototype that took two days felt manageable. The enterprise system that grew from it in two months does not.

A prototype that works is not proof that the architecture scales. It is proof that the architecture works at prototype scale. These are different things, and confusing them is one of the most consistent and expensive mistakes in software development.

Why The Feedback Loop Cannot Be Outsourced

If the terrain needs to be mapped, and domain experts know the terrain, why not map it thoroughly upfront and then implement? Design the domain model first, hand it to AI, let AI build the rivers.

This is waterfall. And the industry already learned — expensively — why it does not work on complex domains.

Waterfall failed not because the process was badly designed. It failed because its founding assumption was wrong. You cannot fully know a complex domain before you implement it. The implementation is part of how you come to know it.

Code is the only medium that does not permit vagueness. A conversation can agree on a concept while each participant imagines something different. A document can describe a process while leaving its edge cases undefined. Code cannot. When you try to implement something ambiguous, the ambiguity surfaces. The implementation forces the question. That forcing is not a bug in the process. It is the mechanism by which the terrain gets mapped.

Agile's real insight — the one that got buried under standups and story points and velocity metrics — was never about delivery speed. It was about shortening the feedback loop between building and learning. The two-week sprint is not valuable because it ships faster. It is valuable because it forces a confrontation with reality every two weeks. Assumptions get tested. Misunderstandings surface. The terrain model gets corrected before it drifts too far from the domain.

Agile slowed down to learn faster. Each sprint is a correction cycle. The terrain is never assumed to be known — it is continuously refined through the friction of implementation.

Now "AI makes waterfall possible again" is being said as though it is a good thing. As though the problem with waterfall was implementation speed. It was not. The problem was the learning gap — the distance between assumption and correction. AI does not close that gap. It widens it. You design upfront, AI implements the full design in days, and the contradictions are baked in at scale before a single domain expert has seen the system running.

The implementation friction is not waste. It is the curriculum. Remove it and you have output without comprehension. Rivers without terrain. Working software that nobody truly understands, built at a speed that makes the misunderstanding very expensive to correct.

The Outsourcing Lesson

This specific mistake — removing the implementation friction in pursuit of cheaper, faster output — has been made before. Recently enough that people who lived through it are still working.

In the first outsourcing boom, the promise was cheaper implementation. Move the development work to lower-cost locations. The rivers would still get built. The application would still ship. Why pay more for the same output?

It worked — in the same way that building rivers without terrain works. The applications shipped. The initial costs were lower. And then the invisible invoice arrived.

Because the friction disappeared. The developer working from a specification document in a different building, in a different timezone, had no access to the terrain discovery process. They implemented what was written. What was written was a river. The why never made the journey — not because anyone was careless, but because the why was not in the document. It was in the conversation, in the hallway, in the moment a developer overhears a domain expert explaining something to a colleague and realises the mental model in the code is wrong.

The industry learned — expensively — that proximity was not a preference. It was the mechanism. The daily friction of shared space and shared context, being present when the domain expert says something offhand that rewrites your understanding of the terrain, cannot be async. It cannot be documented. It cannot be specified in a ticket.

The correction was to bring development back. Not for cultural reasons. Not for communication style. To keep the learning loop intact.

The lesson was learned. Then it was forgotten. Because it was never written down as a principle. It was attributed to communication problems, to cultural differences, to time zone friction. The real cause — that implementation is a learning process and learning cannot be outsourced — was never stated clearly enough to survive as institutional knowledge.

"Get onboard with AI or get left behind" is the same sentence as "outsource or get left behind." Same promise. Same mechanism. Same blind spot. Same invoice, on its way.

Unfalsifiability, Again

Why does this keep happening?

Because working software is unfalsifiable as a measure of quality. The application that shipped — built with rivers and no terrain — is always beating the hypothetical application built with a domain model first. The delivered system always beats the unbuilt better one. There is no comparison. The invisible invoice has no line items. The cost shows up as enterprise complexity, as technical debt, as that is just how large systems work — and it is never traced back to the decision to build rivers without mapping the terrain.

This is how the outsourcing lesson got forgotten. The costs arrived years after the decisions. By then the teams had changed. The attribution was impossible.

This is how frameworks became permanent. Spring, CQRS, microservices, event-driven architecture — each one took a real problem and encoded a solution into a methodology. Each introduced accidental complexity that was invisible against the essential complexity it was supposed to manage. Each generated costs that arrived too late and too diffusely to be attributed. Each got adopted more widely because it was working — at the moment of evaluation, the only moment that counted. The pattern became the answer. The practice it was meant to serve got lost inside it.

Domain-Driven Design followed the same path. Its early emphasis on shared language and rich domain models — the genuinely useful insight — gradually became overshadowed by discussions about bounded contexts, repositories, service layers, and event-driven decomposition. The vocabulary survived. The underlying purpose largely did not. Teams learned to say domain model while building something that looked like a domain model from the outside and functioned as a collection of data structures with behaviour scattered across service classes. The industry did to DDD what it does to everything else: turned a way of understanding reality into a collection of implementation patterns.

And this is how AI will follow the same path. The small application works. The prototype is clean. The approach is validated — at the scale where informal terrain maps are sufficient, at the scale where the developer can hold it all in their head. The success is real. And it proves nothing about what happens at the scale where it matters.

Unfalsifiability will do the rest.

The Career Ceiling Nobody Discusses

Junior developers learn rivers. That is where everyone starts, and it is the right place to start. Rivers are visible, implementable, testable. You can see when they work.

Medior developers begin to notice that rivers have shapes — that some implementations feel natural and others feel like fighting the problem. This is the first intimation of terrain. The friction is trying to teach something.

Senior developers think in terrain first. They talk to domain experts and hear why rather than how. They implement rivers to test terrain hypotheses and adjust when the implementation pushes back. They read complexity as a diagnostic signal rather than a problem to be solved with more pattern.

The step from medior to senior is the step from river-thinking to terrain-thinking. And it is a step that frameworks and patterns have systematically prevented — not because the developers using them lack capability, but because the tools never forced the question. The framework absorbed the friction that would have taught it. The accidental complexity had somewhere to hide. The essential complexity stayed unmapped. The developer got faster at applying patterns, not better at questioning them. The work never demanded more, so more was never developed.

This is not an indictment. It is a description of a system that produced exactly what it was designed to produce. The market said learn the framework, get the job. The framework said here is the structure, fill it in. The application shipped. Unfalsifiability validated everything. The question of whether there was terrain underneath never arose because it never had to.

A significant proportion of working developers entered the field through routes — bootcamps, self-teaching, career changes — that are entirely oriented around framework fluency because that is what gets you hired quickly. That is a rational response to market incentives, not a character flaw. But it means the dominant population of working developers has been optimised for exactly the skill AI is now making unnecessary.

AI does not eliminate these developers. It transforms them into AI operators. The framework templates get replaced by prompts. The pattern application gets replaced by merge request reviews. The output looks similar. The speed increases. And the bar lowers further, because prompting requires even less structural understanding than filling in a framework template did.

What does not change is the invoice. The AI operator builds the same rivers faster, accumulates the same terrain debt faster, and hits the same ceiling faster. The application is cheaper to start and more expensive to maintain — the same curve as always, now compressed. And unfalsifiability protects the transition just as it protected everything before it. The framework developer becomes the AI operator and nothing looks different until the cascade arrives.

How AI Should Actually Be Used

For small applications, AI as primary implementor is fine. The scale section explains why — the terrain is shallow enough to hold informally, the contradictions surface quickly, the cognitive map fits in one head. There is no problem to solve at that scale that AI creates.

The problem starts when the application grows, or when the development team grows. Past the point where informal terrain maps break down, AI as primary implementor becomes the mechanism by which contradictions accumulate invisibly. Not because AI is the wrong tool — because the approach that worked at small scale does not transfer. Something has to change.

What changes is how AI is used.

AI is a pattern matcher with a vast, structured lexicon — and crucially, with understanding of what that lexicon contains. It has processed everything written about software, technology, architecture, and domains. That is not nothing. That is a remarkable instrument, if you use it for what it actually is.

What it cannot do is discover terrain. A domain expert's specific business, with its specific history and specific constraints and specific why — that terrain has never been written down anywhere AI was trained on. It exists in conversation, in friction, in implementation. AI has no access to it. The developer is the only instrument that can pick it up.

Which means AI and the developer are genuinely complementary. AI works on the known. The developer works on the specific. They operate on completely different material.

As a discussion partner AI is genuinely useful — thinking out loud, testing an argument, asking what happens if a particular assumption is wrong. Not as a modeller, not as a designer. The conversation is the value. The understanding stays with the developer.

As a technology consultant it earns its place completely. How does this technology work? What are the tradeoffs? How is this done in Java? These are questions AI answers well precisely because they are pattern questions — answered from a lexicon of everything written on the subject. The developer takes that knowledge and decides what it means for the domain model. That decision is never delegated.

The code is written by the developer. Always. Because the act of writing it is the act of learning. The friction of making something work is how the terrain model gets validated. Outsource that friction and you outsource the understanding.

Used this way, AI does not prevent learning. It removes the noise that would otherwise slow it down. The technology questions that used to cost an afternoon now cost ten minutes. Those minutes go back into the terrain work. The friction that was just overhead is gone. The friction that actually teaches something is preserved. The learning does not stop — it accelerates.

Two Approaches, Two Invoices

AI does not level the playing field between the terrain approach and the river approach. It widens the gap between them.

The AI operator — prompting rivers into existence without a contextual center — builds faster than a framework developer ever could. The initial output is impressive. The application ships quickly. But the terrain debt accumulates at the same rate as always, now compressed into a shorter timeline. The contradictions arrive sooner. The cascade of invalidated assumptions hits harder. The ceiling is the same ceiling. The invoice is the same invoice. It just arrives faster, with more confidence on the way there.

The terrain mapper uses AI differently. Not as a primary implementor but as a mirror, a feedback loop, and a technology consultant. The discovery process still happens. The domain expert conversations still happen. The why questions still get asked. The contextual center still gets built. But the iteration cycles are faster, the edge case surfacing is faster, the technology decisions are faster. AI compresses the learning without bypassing it.

This means the cost curve that was already cheaper in the long run gets cheaper in the short run too. The terrain mapper moves faster than before without accumulating the debt that was previously the price of moving fast.

From the outside, at month two, the two approaches look identical. Both are shipping quickly. Both are producing working software. Unfalsifiability does its work. Nobody sees the difference until the contradictions start surfacing — by which point the AI operator is already describing it as enterprise complexity and looking for a pattern to absorb it.

The industry is measuring AI's value in speed. Speed is real. But speed applied to the wrong approach does not reduce cost. It compresses the timeline to the invoice. The question was never how fast you can build rivers. It was always whether the terrain underneath them is honest.

AI makes the right approach faster. It makes the wrong approach faster too. The difference is what you are left with when the speed runs out.

The Solution Is Thirty Years Old

There is no new methodology needed here. The problem is real and urgent and the answer has been available for decades — practised long before it acquired a name, and largely buried since it did.

Build a domain model. Not a framework-prescribed structure, not a pattern applied because the textbook recommends it — an honest, simple encoding of what the domain actually is. Make it the contextual center of the application. Keep it simple enough that a domain expert can read it and recognise it. Keep it simple enough that complexity registers as a signal when it appears.

Talk to domain experts about why, not how. Push through the river they offer you to the terrain underneath. Distinguish what the business requires from what the spreadsheet required. Implement rivers one at a time, learning the terrain as you go. Adjust the model as understanding deepens — because understanding will deepen, because it never stops deepening, and because that is the point.

Build the shared vocabulary between the development team and the domain experts so the words in the code mean the same thing as the words in the business. Not because naming is important for aesthetic reasons, but because shared language is how you know you are mapping the same terrain. When a developer and a domain expert use the same word and mean different things, the terrain model is wrong. The language makes that visible before the code does.

Accept that the first map is wrong. It will be. That is not a failure of the approach — it is the approach working. The map gets corrected through implementation. Each river teaches you something. Each correction makes the next river easier. The terrain model should get more true over time, not more obscure. That is the measure of whether the process is working.

The contradiction between prompt 75 and prompt 235 is the same contradiction that lived in the fat service class, in the three microservices with incompatible Order logic, in the spreadsheet workaround encoded into the application. Different tools, different eras, same missing center.

The center was always the answer. It still is.

Build the map before you build the river. The rivers will be faster for it, and they will still be running in fifteen years.

This article is a follow-up to The Invisible Invoice: The Cost of Building Software Without Understanding It.

DEV Community