Simon Wang

Posted on Jan 23 • Originally published at itnext.io

Complexity Can't Be Eliminated. It Can Only Be Moved.

#architecture #complexity #systemsthinking #software

Cover Image Photo by Sunder Muthukumaran on Unsplash

Six patterns that determine where complexity lives in your codebase — and how to choose consciously

A senior engineer made our API faster by caching responses. Query time dropped 80%. We celebrated.

Two months later, the cache was stale. Data was wrong. Users complained. We spent weeks debugging cache invalidation.

The speed didn't come from nowhere. The complexity didn't disappear. We just moved it.

This pattern behaves like a conservation law from physics. Not perfectly, but close enough to be useful.

Why Complexity Relocates (Not Disappears)

In physics, certain quantities can't be created or destroyed. Only transformed or moved. Energy conservation says energy can't be created or destroyed, only converted (chemical to kinetic, kinetic to heat). Momentum conservation says total momentum stays constant in a closed system. Mass conservation says mass doesn't appear or disappear, just rearranges.

These aren't guidelines. They're laws. You can't violate them. You can only work within them.

Software has something similar: essential complexity (the inherent difficulty your problem requires) can only move, not disappear. Larry Tesler famously called it "Conservation of Complexity": complexity can't be eliminated, only moved. UX designers know Tesler's Law intimately. But while this principle is well-recognized in design circles, software architects rarely discuss it explicitly or apply it systematically.

I've noticed we treat "simplification" as if we're eliminating complexity rather than relocating it. We don't measure both sides of the trade. We don't name what's actually being relocated.

This isn't quite like physics conservation laws, where total energy stays exactly constant. Software complexity can increase or decrease. But there's a pattern, and a floor.

Every problem has essential complexity, what Fred Brooks called the inherent difficulty of what you're trying to solve. Authentication must verify identity. Distributed systems must coordinate. These requirements create complexity that can only relocate, or be eliminated by dropping features entirely. You can't design it away.

Then there's accidental complexity, from how we implement solutions. Poor abstractions, unnecessary indirection, tech debt. This can be eliminated through better design.

When net complexity increases (code drops 40%, config grows 60%, net +20%), you're seeing accidental complexity added during relocation. When complexity genuinely disappears (deleting 500 lines of dead code), you're removing accidental complexity that never contributed to solving the problem.

The pattern: essential complexity moves. Accidental complexity varies. And there's a floor: you can't simplify below essential complexity without losing functionality.

To be precise: when we say "complexity relocates," we mean essential complexity (the irreducible difficulty of your problem domain). You can't simplify a tax calculation system below the complexity of the tax code itself. You can only choose where that essential complexity lives in your architecture.

This explains why some systems resist simplification. You're not fighting bad design. You're hitting essential complexity. The question shifts: Where should this essential complexity live to minimize total cost?

When you "simplify" a system, you're not eliminating complexity. You're relocating it. When you make a decision configurable instead of hardcoded, you haven't reduced the number of decisions. You've moved where the decision happens. When you cache data, you haven't eliminated the work of keeping data fresh. You've transformed query complexity into cache invalidation complexity.

Understanding relocation patterns changes how you think about software design. You stop asking "how do I eliminate this complexity?" and start asking "where do I want this complexity to live?"

Six patterns emerge consistently. We'll call them relocation patterns that behave like conservation laws. Not physics-perfect, but strong enough to guide architectural decisions.

The Six Patterns

Pattern 1: Complexity Relocation

The caching story is a perfect example. Before caching, we had high query complexity: every request hit the database, queries were slow, load was high. Cache management complexity was zero because we didn't have a cache. After caching, query complexity dropped dramatically. Requests were fast, database load was low. But cache management complexity exploded. We now had staleness issues, invalidation logic, consistency problems, memory pressure.

Total complexity didn't decrease. We moved it from "slow queries" to "cache management." The system felt simpler in one dimension and more complex in another. The essential complexity of data consistency didn't disappear. It moved from query time to cache invalidation. But if your cache implementation is inefficient, you've added accidental complexity on top.

I've learned you can't eliminate complexity. You can only move it. The question isn't "how do I make this simpler?" The question is "where should this complexity live?"

Consider adding an abstraction layer. Before abstraction, you have high duplication complexity: the same database query logic appears in twenty places. But you have low abstraction complexity because there's no layer to understand. After creating an ORM, duplication complexity drops to near zero. Database logic lives in one place. But abstraction complexity rises. Now you need to understand the ORM, its query builder, its caching behavior, its transaction handling.

You didn't reduce total complexity. You traded duplication complexity for abstraction complexity. The essential complexity of database operations remains. You just centralized where it lives. Whether abstraction adds accidental complexity depends on design quality.

Whether that's a good trade? Depends on your context. For a system with many developers, centralizing complexity in an abstraction that a few people deeply understand might be better than distributing complexity across the codebase where everyone encounters it. For a tiny system with two developers, the abstraction might not be worth it: the duplication is manageable, the abstraction is overhead.

This is why "simplification" is such a loaded term. When someone says "let's simplify this," what they usually mean is "let's move complexity from where it bothers me to somewhere else." (Which, to be fair, is sometimes exactly what you want.) But recognize you're relocating complexity, not eliminating it.

Where can complexity go? You can push it to infrastructure: move complexity from application code to Kubernetes, but now you need to understand Kubernetes. You can push it to configuration: move complexity from code to config files, but now configuration management becomes complex. You can push it to runtime: use dynamic dispatch instead of explicit wiring, but behavior becomes harder to trace. You can push it to operations: microservices simplify individual services but operational complexity explodes.

The complexity goes somewhere. It doesn't vanish. Choose consciously where you want it to hurt least.

Pattern 2: Knowledge Relocation

Knowledge can't be reduced, only relocated. You can't reduce what needs to be known about a system. You can only change where that knowledge lives.

Take abstraction layers again: before adding an ORM, knowledge about database queries is distributed across every function that touches the database. After adding an ORM, that knowledge concentrates in the ORM layer. Total knowledge hasn't decreased. You still need to understand how queries work, how connections are managed, how errors are handled. You've just relocated the knowledge.

This creates a trade-off. Distributed knowledge means each piece is simple: local context is enough to understand what's happening. But finding patterns is hard because knowledge is scattered. Global understanding requires synthesizing information from many places.

Concentrated knowledge means finding answers is easy: look in the abstraction layer. But each piece is more complex: the ORM is harder to understand than any individual query was. Which distribution is better depends on your team, your system, your change patterns.

When a new developer asks where logic lives, I can say "check the ORM" instead of "check twenty controllers." Same knowledge needed, better location. But now that developer needs to understand the ORM's complexity.

I've seen teams struggle with this trade-off. A microservices architecture distributes knowledge across service boundaries. Each service is simpler to understand in isolation, but understanding cross-service workflows requires mental synthesis of multiple codebases. A monolith centralizes that knowledge. You can trace a request end-to-end in one codebase, but the concentration makes the monolith harder to navigate.

The knowledge exists either way. The question is: where does it hurt least? If you have autonomous teams, distributing knowledge across service boundaries might work. If you have frequent cross-cutting changes, centralizing knowledge in a monolith might be better. You're not reducing knowledge. You're choosing where developers encounter it.

Pattern 3: Decision Relocation

Decisions can't be eliminated. Every decision must be made somewhere. Moving where decisions happen doesn't reduce total decisions.

Consider configuration. You have a decision: "Which database connection string to use?" You can make it in code: if environment equals production, use this connection; otherwise use that one. Or you can make it in config: read from environment variable or config file. Same decision. Different location. Someone still decides what the database URL is. The decision moved from code to configuration. It didn't disappear.

The choice of where to make decisions has consequences. Compile-time decisions mean fast runtime but slow development: changing behavior requires changing code. Runtime decisions mean slow runtime but fast iteration: change config and restart. Configuration-time decisions mean flexible behavior but configuration becomes complex: now you have configuration management, templating, validation. Convention-based decisions mean simple explicit code but you must learn the conventions. "Magic" behavior that's invisible until you know the pattern.

I've debugged systems where configuration grew so complex it became code by another name. YAML files with conditionals, includes, variable substitution. Essentially a programming language without the tooling. The decisions didn't decrease; they just moved to a less maintainable place.

The reverse is also true. Hard-coding decisions in code means every environment difference requires a code change. I've seen teams with many if-statements checking environment variables because they never moved decisions to configuration. Same total decisions, worse location.

Feature flags are the modern version of this trade-off. You move decisions from deploy time (merge to production) to runtime (toggle in a dashboard). This gives you safety and speed. You can deploy dark and enable gradually. But you pay in testing complexity: with N flags, you have 2^N possible system states. Three flags mean eight configurations to test. Ten flags mean 1,024. The decision didn't disappear. It multiplied.

Pick where decisions happen based on who needs to change them and how often. If operators need to change behavior without deploying code, configuration makes sense. If developers need to understand decision logic during debugging, code makes sense. If the decision rarely changes, hard-coding might be fine. You're not reducing decisions. You're choosing who makes them and when.

Pattern 4: Failure Mode Transformation

Failure modes can't be eliminated. They can only be transformed. You can't eliminate how systems fail. You can only trade failure modes you understand for failure modes you don't.

Moving from synchronous to asynchronous is classic. Synchronous systems fail with timeouts, deadlocks, resource exhaustion when threads block. Asynchronous systems fail with message loss when queues drop messages, ordering issues when messages arrive out of sequence, partial failures when some operations complete and others don't. You traded known failures for different failures. Total failure surface area might even increase.

I've debugged async message loss that took days to track down. With sync systems, timeouts show up immediately in logs. I'm not saying one is better. I'm saying they fail differently, and you're choosing which failure mode you'd rather debug.

The same pattern appears everywhere. Move from monolith to microservices? You trade in-process call failures (immediate stack traces) for network call failures (distributed tracing, timeouts, partial failures). Move from SQL to NoSQL? You trade constraint violations (database enforces referential integrity) for data inconsistency (application must enforce integrity).

I've watched teams adopt new technologies expecting them to be "more reliable," then spend months learning their failure modes. The new system wasn't less reliable. It just failed differently. And the team's existing monitoring, debugging practices, and mental models were all tuned to the old failure modes.

This doesn't mean you shouldn't go async, or adopt microservices, or use NoSQL. It means recognize the trade-off. You're not eliminating failure modes: you're choosing which failure modes you'd rather handle. Maybe async failures are easier to handle in your context. Maybe you have better tools for debugging message loss than deadlocks. Maybe your team has experience with distributed systems failure modes. That's a valid trade. Just don't pretend the old failure modes disappeared: they transformed into new ones. And plan to invest in learning how the new system fails.

Pattern 5: Testing Burden Relocation

Testing burden can't be reduced, only relocated. You can't reduce what needs to be tested. You can only move where testing happens.

Type systems are the clearest example. Without static types, you need more runtime tests because type verification happens at runtime: tests must verify both types and logic. With static types, you need fewer runtime tests because type verification happens at compile time: tests verify logic only, types are checked by the compiler.

Testing effort didn't disappear. It moved from runtime tests to compile-time checks. The shift has trade-offs. Compile-time verification gives faster feedback: you know about type errors before running code. But it adds compilation overhead and can't test runtime-only behaviors like "does this API actually return the structure we expect?" Runtime testing gives slower feedback but tests actual system behavior. Same amount of verification work. Different timing.

The same pattern appears with integration vs. unit tests. Heavy integration testing means you verify actual system behavior but tests are slow and brittle. Heavy unit testing with mocks means tests are fast and isolated but you need integration tests anyway to verify the mocks match reality. The testing burden didn't change. You're choosing between "test real interactions slowly" and "test mock interactions quickly plus verify mocks match."

I've seen teams swing between extremes. All integration tests: comprehensive but painfully slow, so developers avoid running them. All unit tests with mocks: fast but brittle when mocks drift from reality, leading to "tests pass but production fails." The burden exists either way.

The question is: where do you want verification to happen? Early in development (static types, unit tests, compile-time checks) or late in deployment (runtime tests, integration tests, production monitoring)? Each approach has different feedback loops and different failure modes. You're not reducing testing. You're choosing when you discover problems and how much machinery you need to discover them.

Pattern 6: Assumption Visibility Trade-off

Assumptions can't be eliminated, only made explicit or implicit. You can't reduce assumptions. You can only change their visibility.

An implicit assumption looks like this: a function expects user.email to exist and be a string. The code just calls user.email.lower() and hopes. An explicit assumption documents it: add type hints, add null checks, add validation. Same assumption: user must have an email that's a string. Now it's visible instead of hidden.

Implicit assumptions are cheaper to write but expensive to debug. When they're violated, you get cryptic errors: AttributeError: 'NoneType' has no attribute 'lower'. You have to trace back to figure out the assumption. Explicit assumptions are expensive to write but cheap to debug. When they're violated, you get clear errors: ValueError: User must have email. Total cost is conserved. You're choosing when to pay it: upfront with explicit checks, or later when debugging implicit assumptions.

The same trade-off appears with API contracts. Implicit contracts mean less documentation, less validation code, faster development. But when clients violate expectations, you get runtime failures that are hard to diagnose. Explicit contracts mean more upfront work (OpenAPI specs, request validation, comprehensive error messages) but violations are caught immediately with clear feedback.

I've debugged production issues that took hours to diagnose because assumptions were buried deep in code. "Why does this fail for some users but not others?" Eventually you discover an implicit assumption: the code assumes users have an email, but imported users from legacy systems don't. The assumption existed either way. It just wasn't visible until it broke.

The question is: where do you want to pay the cost? Write explicit checks upfront (slower development, clearer debugging) or deal with implicit assumptions when they break (faster development, cryptic failures)? Neither reduces the total assumptions in your system. You're choosing whether to document them in code or discover them during debugging.

Why These Patterns Matter

Once I understood these relocation patterns, how I approached design changed completely. When someone proposes "simplifying" the system, the first question should be: "Where does the complexity go?" It doesn't disappear. It moves. The proposal might still be good: maybe the new location is better. But recognize it's a trade, not an elimination.

This doesn't mean simplification is impossible. You can absolutely reduce total complexity:

Delete dead code: If code contributes nothing to requirements (truly dead), removing it eliminates complexity. No relocation.

Use better abstractions: Replace 50 lines of manual logic with 1-line library call. The library maintains complexity, but amortized across thousands of users, your system's complexity drops.

Remove accidental complexity: Decouple unnecessarily entangled components. Clean up tech debt. Simplify overly complex solutions.

The key: These eliminate accidental complexity. Essential complexity (what the problem inherently requires) is what relocates, not eliminates.

Common examples: "Let's use microservices to simplify development." Where does complexity go? From code organization to service coordination. You trade monolith complexity for distributed system complexity. "Let's add caching to speed things up." Where does complexity go? From query performance to cache management. You trade slow queries for invalidation logic. "Let's make the API more flexible." Where does complexity go? From API code to API consumers. You trade server complexity for client complexity.

These might all be good decisions. But they're trades, not improvements in absolute terms. Microservices might be the right trade if you have the team size and tooling to handle distributed systems. Caching might be right if query performance is your bottleneck and you can handle invalidation. Flexible APIs might be right if you have sophisticated clients and want to iterate server-side less often.

The key is naming what's being relocated and choosing where you want it to live. Before changing anything, identify the relocating quantity: Is this complexity? Where will it move? Is this knowledge? Where will it concentrate? Is this a decision? Where will it happen instead?

How to Work With These Patterns

Where should complexity live? Where will it hurt least?

Example: API design. You can have a complex API with simple client code, or a simple API with complex client code. Neither eliminates complexity: they distribute it differently. Complex API means server handles edge cases, versioning, validation. Clients just call simple methods. Simple API means server provides primitive operations. Clients compose them to handle edge cases.

I've worked with APIs that do everything (clients love it, server team drowns) and APIs that provide primitives (clients write boilerplate but have control). Same complexity, different distribution.

The complexity is conserved. Where should it live? If you have many clients, push complexity to the API: pay the cost once, save it N times. If you have few clients and a rapidly changing server, simple API with complex client code might work fine.

Choose your trades consciously. You can't eliminate conserved quantities. But you can choose better locations. Moving complexity from the hot path to the cold path is usually good: cache invalidation runs less often than queries. Moving complexity from novices to experts is often good: let experienced developers handle the abstraction so junior developers use a simpler interface. Moving complexity from many places to one place is often good: centralize knowledge even if that one place becomes more complex.

But measure both sides. When you move complexity, measure both the source and destination. Code complexity decreased 40%, configuration complexity increased 60%, net result is +20% total complexity. If you only measure one side, you'll think you eliminated complexity. You didn't: you relocated it, and it grew. Measure what you gained and what you paid.

Accept that some things don't simplify. If you keep trying to simplify something and complexity keeps showing up elsewhere, maybe the system has inherent complexity. Some problems are just complex. No architectural cleverness eliminates their complexity. You can only distribute it more or less well. Recognizing irreducible complexity lets you stop fighting it and start managing it.

What Actually Lasts

But step back from the code for a moment. If everything eventually gets rewritten or deleted, what's the point of these choices?

The answer: some things outlast the code. Patterns last. Design patterns outlive implementations. Separation of concerns, dependency injection, event-driven architecture: these patterns transfer across rewrites. The specific code gets replaced but the patterns persist. When you're choosing where complexity lives, you're really choosing patterns. Those patterns will outlast the code.

Understanding lasts. Understanding the domain outlives the code. How the business works, what users need, why systems interact: this knowledge compounds over time. The code gets rewritten but understanding remains. When you're deciding where knowledge should live, invest in shared understanding. Documentation rots but team knowledge grows.

Tests as specification last. Tests document expected behavior. They outlive implementations. When you rewrite, tests preserve requirements while code changes. The investment in test quality pays off when refactoring or replacing code. Tests preserve intent: what should this system do?

Team culture lasts. How your team writes, reviews, and maintains code outlasts any particular codebase. Quality standards, review practices, testing discipline: these transfer to the next system. When you're working with these relocation patterns, you're building patterns of thinking that persist beyond the current code. Invest in culture. It compounds.

The liberation comes from seeing these patterns. Once you understand that complexity relocates rather than disappears, you stop looking for solutions that eliminate it. You look for solutions that put complexity where it belongs. You measure both sides of the trade. You name what's being relocated and choose where it lives. And you invest in what actually lasts: patterns, understanding, and culture. While accepting that code is temporary.

These relocation patterns aren't limitations. They're reality. You can't violate them. But you can work with them. And working with them is better than pretending they don't exist.

Note: Originally published on ITNEXT: https://itnext.io/complexity-cant-be-eliminated-it-can-only-be-moved-d122f7952715

DEV Community