Yannick Loth

Posted on Apr 7

Three Questions Before You Add a Microservice — and Why They All Collapse Into One

#architecture #microservices #softwaredesign #modularization

A recent LinkedIn post from John Crickett has been making the rounds. It offers three questions to ask before adding another microservice:

Does this need to be a service?
Does this need to be its own service?
Does this need to be its own service right now?

Crickett credited Craig Ferguson, the comedian, for the format — Ferguson has a bit about three questions a man should ask himself before he speaks. The post is in that register: deliberately light, deliberately compressed, the kind of aphorism a practitioner can carry in their head into a Monday-morning design meeting. Crickett's questions have spread because they work. They surface the right conversation in teams that might otherwise never have it.

What I want to do is unpack what the aphorism is compressing. My claim isn't that Crickett's questions are wrong or incomplete — it's that they're pointing very precisely at something a formal theory of modularization can name. And the something they're pointing at is more unified than the three-part structure suggests: the three questions turn out to be three natural-language phrasings of one structural question. Seeing why is, I think, the most interesting thing you can do with the post.

Let me walk through it.

The criterion: same drivers together, different drivers apart

I'll use the Independent Variation Principle (IVP) as the lens. You don't need to have read the formalization to follow this article. The idea in plain words is:

Every element in a system — every function, every class, every module — has a set of change drivers. A change driver is anything that, when it changes, forces that element to change. Requirements, domain rules, regulations, performance targets, deployment constraints — any cause of future modification.

The principle says: group together elements that share the exact same set of change drivers, and separate elements whose sets of change drivers differ.

That's it. Once the driver sets are fixed, the comparison is binary: two elements either have the same set of change drivers or they don't. No gradient, no threshold, no "similar enough." Same set → together. Different set → apart.

That binary comparison is what distinguishes this principle from SRP's "one reason to change" or CCP's "things that change together." Both of those formulations leave the comparison itself underspecified — reason and change together admit interpretations a team can argue about indefinitely. IVP shifts the imprecision: the comparison step becomes mechanical, but the driver discovery step — figuring out what's actually in the driver set for a real element — still requires deep engineering judgment. The principle doesn't make the hard work disappear. It moves it from "how do we compare these modules?" to "what are the actual causes of change for these elements?", which is the question domain expertise can answer.

We'll come back to the discovery step. The point for now is that once the inputs are fixed, the answer follows. That's a stronger guarantee than the classical principles offer, even if it isn't the magic-wand guarantee absolute decidability would suggest.

Question 2 first, because it's the cleanest

Let me take the questions out of order and start with Q2: "Does this need to be its own service?"

Read literally, Q2 is a yes-or-no question. "Its own" means separate from existing services — a binary property. And that maps directly onto the principle's criterion:

Is there an existing service whose set of change drivers equals the set of change drivers for this new capability?

If yes: the new capability belongs inside that existing service from a structural standpoint. Putting them in separate services would mean two services that must change in lockstep whenever any of their shared drivers change — which is the structural shape of a distributed monolith. Other constraints (security boundaries, regulatory isolation, organizational ownership) may still justify keeping them apart, but those would be explicit trade-offs against the structural ideal, not refutations of it.

If no: the new capability belongs in a separate module from a structural standpoint. Keeping it inside an existing service would mean that service now has elements changing for different reasons, tangling concerns that should be independent.

Q2, read structurally, is asking exactly this. It doesn't say "change drivers," but the question it poses is the one the principle formalizes. What it lacks is a method for answering — it tells you to ask the question but doesn't tell you how to compute the answer. We'll come back to that.

Question 3: the temporal illusion

Now Q3: "Does this need to be its own service right now?"

This looks like it adds a new dimension — timing — to Q2. On closer inspection, it doesn't. It adds nothing the principle can see.

Here's why. The principle evaluates a module structure against the current set of change drivers. It has no temporal dimension. It doesn't care about drivers you imagine will exist in the future, or drivers you had last year, or drivers you might have if the product succeeds. It cares about the drivers that actually cause modification now, in the system as it exists.

So "right now" is either redundant or incoherent:

Redundant if the asker means "given the drivers we actually have today." That's what the principle already assumes. Q3 reduces to Q2.
Incoherent if the asker means "given drivers we imagine might emerge later." The principle doesn't accept imagined future drivers as inputs. If you don't have evidence that a driver exists and applies to these elements, it isn't in the set, and speculating about it doesn't put it there.

There's a third reading: "should we do this decomposition work in this sprint versus later." That's a project scheduling question, and it has nothing to do with modularization theory. You might defer the work because you're busy, because the team is tired, because another priority dominates — those are real reasons, but the principle says nothing about them.

So Q3 either collapses into Q2 (structural reading) or dissolves into something outside the theory's scope (scheduling reading). Either way, it adds no new structural content.

There is, however, something the "right now" phrasing accidentally brushes against, and it's worth saying explicitly — not as part of Q3, but as a warning about the "drivers we imagine might emerge later" reading above. IVP has a consequence sometimes called the knowledge theorem: the correct partition reflects the causal knowledge you actually have about the system's current drivers. When you add elements designed to serve drivers that don't yet exist — speculative extensibility points, plugin systems waiting for plugins, abstractions anticipating requirements that haven't arrived — those elements don't share drivers with anything real in the system. They land in their own cells, structurally disconnected from the rest, and they push real elements into shapes that accommodate hypothetical concerns rather than actual ones. The partition the principle produces over the enlarged element set is structurally incorrect relative to the system's actual causal reality.

That's a structural claim, not a moral verdict. The principle isn't saying "speculative design is bad." It's saying that a partition built on drivers that haven't materialized doesn't match the partition the system's actual change history will reward, and that mismatch shows up as elements getting touched together for reasons the structure didn't anticipate. Whether the cost of that mismatch outweighs the cost of waiting until the drivers are real is an engineering judgment IVP doesn't make for you. What IVP does say is that the cost is structural and not free. There's an important exception: extensibility justified by currently observed drivers — plugin systems for which plugins already exist, abstractions over already-shipping variations — is a different case. Those drivers are real, the elements that serve them have a place in the partition, and the principle has no objection.

So if Q3 is doing any work beyond Q2, it's this: don't import imagined future drivers into a decomposition you're making today. Decide based on what you actually know causes change in the system as it stands.

Q3 collapses into Q2.

Question 1: the technology trap

Now Q1: "Does this need to be a service?"

There's an intuitive reading of Q1 that makes it look orthogonal to the principle — that it asks about realization technology (service vs. library vs. in-process module) rather than about module boundaries. On this reading, the principle tells you the logical partition, and then a separate engineering decision picks how each module is deployed. Modularization and deployment are treated as two layers: first decide what goes with what, then decide how to ship it. It's a sensible-looking division of labor, and it's something close to the default mental model in much architecture writing.

I want to argue it's incomplete, and the gap it leaves is the one Q1 is actually pointing at.

The gap lies in what counts as a change driver. The two-layer reading tacitly restricts the driver set to functional drivers — business rules, domain concepts, user-facing requirements — and treats everything else as "non-functional" concerns that live downstream. But that split is a convention, not a principle. A change driver, in IVP's sense, is anything whose change forces a corresponding modification of the element. And architectural quality requirements clearly cause modifications:

"Must scale independently to 10,000 requests per second." If that target shifts — up to 100,000, or down to 100 — caching strategy, concurrency model, data structures, and possibly the storage layer have to change. It's a driver.
"Must survive the failure of neighboring components." If the failure-tolerance target changes, retry logic, circuit breakers, state replication, and idempotency guarantees have to change. It's a driver.
"Must be deployable without coordinating with team X." If that independence requirement changes, interface contracts, versioning strategy, and backward-compatibility code have to change. It's a driver.

These satisfy the definition of a driver in the same sense as "the tax code might change" or "the pricing model might be revised" do. There's no principled way to admit one kind and exclude the other.

But just admitting architectural-quality drivers into the set isn't by itself what reshapes the partition. There's an intermediate step that's easy to skip past: a quality requirement doesn't act on the system as an abstract force — it acts through concrete elements that implement it. A scalability target above what a single instance can handle isn't satisfied by wishing; some element has to actually distribute the work. A failure-isolation target isn't satisfied by intent; some element has to actually contain failures. A deployment-independence requirement isn't satisfied by good will; some element has to actually broker compatibility. These elements are real components with their own implementations and their own change profiles. They aren't optional decorations on the functional code; they're the concrete carriers of the quality requirements, and the quality requirements can't act on the partition without going through them.

The interesting question is what driver sets those carrier elements have. A circuit breaker's behavior is shaped primarily by the failure modes it guards against, the observability it reports to, and the retry policies it coordinates with. Its implementation responds to changes in those drivers, not to changes in the business rules of the service it fronts. (There's a real edge case here: a circuit breaker whose failure-classification logic is driven by business semantics — "treat this as a failure only for premium customers" — does have business drivers in its set. In that case the principle would correctly group it with the business logic, not with other infrastructure. The general claim isn't that infrastructure drivers and functional drivers never overlap; it's that they often don't, and where they don't, the partition reflects that.) A caching layer's drivers are cache-invalidation semantics and memory pressure, not the functional logic of what's being cached. When the typical case holds — when the carrier element's drivers are genuinely distinct from the functional element's — the principle separates them into different cells, and the resulting partition is different from the one functional drivers alone would produce.

This is where the service-versus-library choice connects to the partition rather than sitting after it. The principle determines the logical module boundary: which elements share a driver set and therefore belong in the same cell. Whether that cell can be realized as an in-process module is a separate question, answered by whether the drivers in the cell are operationally compatible with sharing a process. A driver that requires independent failure containment is not operationally compatible with in-process coexistence — failures inside one process take everything in that process with them, so the failure-isolation driver can't be made actionable inside a shared process. A driver that requires independent horizontal scaling is not operationally compatible with in-process coexistence — you can't scale one component of a process without scaling the whole process. These operational-compatibility judgments aren't part of IVP's formal apparatus; they're engineering facts about runtimes, processes, and networks. But they connect directly to the partition: when a cell's drivers include any whose operational requirements rule out shared-process realization, the only realization that lets all the drivers in the cell be satisfied at once is a separate deployable unit.

So the "is this a service?" question is really asking: when we account for the elements that carry the system's quality requirements, and we let the principle partition over the enlarged element set, does this element land in a cell whose drivers — functional and quality — collectively rule out sharing a process with anything else? That's the same shape as Q2 — is there an existing module this element's driver set matches? — but evaluated over the full driver set rather than just the functional one. The realization decision isn't downstream of the partition; it's the operational consequence of which drivers are in the partition's cells, evaluated against engineering facts about how those drivers can actually be served.

Q1 collapses into Q2.

One question, not three

Here's where we land. Crickett's three questions, read structurally, aren't three independent checks. They're three natural-language phrasings of the same underlying question:

Taking into account the full set of change drivers for this element — functional drivers and quality drivers alike — does its set of drivers equal that of some existing module? If yes, the structural answer is to merge. If no, the structural answer is to separate, and which realization (in-process module, library, separate service) the separated module needs is determined by which drivers are in its cell and how those drivers can be operationally served.

That's the whole decision, once the inputs are fixed. One question, one criterion, binary comparison. The principle's contribution is making the question precise: given an agreed-upon driver assignment, the answer is determined.

That precision is the gap classical principles haven't quite closed. Parnas, in his 1972 paper on decomposing systems into modules, came close — his "design decisions likely to change" is essentially a change driver in everything but the formal apparatus. The intuition was right; what was missing was the explicit treatment of driver-set equality as the criterion. SRP points at separation by reason-to-change but leaves "reason" undefined. Separation of Concerns points at orthogonal grouping but leaves "concern" undefined. CCP correctly targets co-variation but doesn't distinguish causal co-variation (shared drivers) from accidental co-modification (drivers that happened to fire together). All three are pointing at something genuine. What IVP adds is the formal apparatus that makes the criterion precise enough for two engineers who disagree to identify exactly what they disagree about — which driver applies to which element — rather than talking past each other about "reasons" and "concerns."

But here's the catch

The principle makes the question well-posed and decidable. It does not make the question answerable without expertise.

This is the part that matters for practitioners, and it's the part I want to be careful about. The principle consumes a driver set and produces a partition. It does not produce the driver set. Identifying what actually belongs in the driver set for a real system — which architectural qualities are genuine drivers versus current accidental properties, which functional requirements are real causes of future change versus one-time decisions, which deployment constraints are inherent versus incidental — is empirical work. It requires:

Performance engineering expertise to know which scaling drivers are real. Splitting a service "for independent scalability" when nothing in the system's actual or projected load suggests the element will ever need independent scaling means splitting on a driver that isn't there. The partition looks principled, but the input it rests on is imagined.
Reliability engineering expertise to know which failure-isolation drivers are real. Blast-radius concerns are drivers when there's a credible failure mode that actually needs to be contained; in their absence, they're a split justified by a driver the system doesn't have.
Capacity planning and load analysis to distinguish current properties from future causes of change. "It runs fast enough today" is not a driver. "It must maintain sub-10ms latency as traffic grows ten-fold over the next two years" is.
Deployment and organizational analysis to know which independence drivers are real. Team-autonomy is a driver when teams genuinely need to deploy independently; it isn't when the organization doesn't actually work that way.
Domain knowledge to identify the functional drivers without confusing them with implementation choices.

None of that is the principle. The principle has no method for deciding whether "must scale to 10K RPS independently" is a real driver or an imagined one. That determination comes from engineering disciplines IVP doesn't subsume.

The honest picture is: the principle plus quality knowledge together answer the question Crickett is pointing at. Neither alone does. Quality knowledge without the principle gives you a list of architectural concerns but no principled way to combine them into a partition. The principle without quality knowledge gives you a partition machine with no inputs. Both are common in isolation; the combination is rarer than it should be, and that's where the leverage is.

A worked example: 10,000 users and four servers

Let me make this concrete, because the interaction between capacity math and the principle is where a lot of confusion lives.

Suppose you're told: "this system must serve 10,000 concurrent users on hardware X." A natural question is: does the principle tell you how to modularize this? The honest answer is that the principle, on its own, has nothing to say about the number 10,000, or about hardware X, or about how many servers you'll end up running. It has no model of throughput, queuing, concurrency, or hardware characteristics. If you ask it "how many servers do I need?" it has no answer, not because it's incomplete, but because that's not a modularization question. It's a capacity-planning question, and capacity planning is a separate engineering discipline with its own methods: load modeling, benchmarking, queuing analysis, back-of-envelope arithmetic against hardware specs.

So where does a number like "four servers" come from? From that capacity calculation, done entirely outside the principle. You measure or estimate per-request cost on hardware X, you multiply by concurrency, you apply a safety margin, you get a count. Maybe it's four. Maybe it's fourteen. The principle doesn't care and couldn't produce the number if it tried.

But now something important happens, and this is where the principle re-enters. The capacity calculation has produced knowledge about the system — structural knowledge, not just a number. You now know:

A single instance cannot handle the required load.
The workload must be distributable across multiple instances running in parallel.
Distribution requires elements that didn't previously exist: a way to route requests across instances, a way to share or partition state, a way to coordinate on things that can't be freely replicated, a way to detect and recover from instance failure.

Those "requires" introduce new elements into the system. A load balancer. A session store or sticky-routing scheme. A distributed lock or a partition-key strategy. A health-check mechanism. A failure handler. None of these existed in the pre-analysis system. They exist now because the capacity knowledge forced them into existence.

And here's the key point: each of those new elements carries its own driver set, and those drivers are typically distinct from the drivers of the business logic they serve. They're genuinely new drivers, brought into the system by the structural decisions the capacity analysis forced.

The load balancer's drivers are the routing strategy, the health-check protocol, the traffic patterns it has to handle. They are not the business rules of the requests passing through it.
The session store's drivers are the consistency model, the eviction policy, the replication strategy. They are not the domain rules that produced the sessions.
The partition-key strategy's drivers are the skew characteristics of the key space and the rebalancing cost. They are not the semantics of the partitioned entities.

Once these elements are added to the system, the principle does its usual work over the enlarged element set. Functional elements group with functional elements that share their functional drivers. Infrastructure elements group with infrastructure elements that share their drivers. Where a functional element genuinely participates in an infrastructure driver — say, a business rule whose modification is itself triggered by scaling constraints because it encodes a degradation policy — the principle correctly groups it with the infrastructure cluster rather than its original functional one. The partition is recomputed over a richer set of inputs, and the result is a different modularization than functional drivers alone would give.

Concretely, the resulting cells look something like this:

Cell	Representative drivers	Operational compatibility with other cells
App logic	Domain rules, business workflows, validation logic	Compatible with replication; not compatible with sharing a process with the load balancer (cyclic)
Load balancing	Routing strategy, health-check protocol, traffic patterns	Cannot live inside an instance it balances
Session / state coordination	Consistency model, eviction policy, replication strategy	Must outlive any single instance — independent failure domain
Failure detection	Health-check semantics, timeout policies, recovery rules	Must survive failures of what it monitors

The interesting move happens in the right column. Each cell has, in addition to its driver set, an operational profile: a set of facts about whether the drivers in the cell can be served by sharing a process with another cell. The load balancer cell can't share a process with the app logic cell it balances, because a load balancer running inside one of the app instances can't distribute requests across the four of them — the routing driver becomes unactionable. The session coordination cell can't share a process with any single app instance, because the consistency-and-replication driver requires it to outlive that instance. These aren't claims IVP itself derives. They're engineering facts about runtimes — what a process is, what it means for one process to fail, how routing works. The principle determines that the cells exist as distinct modules; the operational facts determine that some of those modules cannot be realized in-process and must be separate deployable units.

The app logic cell, now freed from the concurrency and routing concerns that have been factored into their own modules, becomes replicable precisely because the concerns that would have prevented replication are no longer tangled into it.

The chain, in order:

Capacity math (outside the principle): 10,000 users on hardware X cannot be served by a single instance.
Element addition (outside the principle, prompted by step 1): the system must now contain load balancing, state coordination, failure detection, and whatever else horizontal scaling requires.
Driver attribution (outside the principle, requires engineering judgment): each new element's drivers are identified — routing drivers for the balancer, consistency drivers for the store, health drivers for the detector.
Repartition (this is the principle's job): the enlarged element set and enlarged driver set are fed in; the principle produces a new partition separating business logic from infrastructure, routing from state, health checking from everything else. The realization of each cell isn't a later decision — it's the operational consequence of which drivers ended up in the cell, evaluated against the engineering facts about how those drivers can actually be served. Cells whose drivers can't all be served at once inside a shared process need to be separate deployable units, and that follows from step 4 directly.
Deployment topology (outside the principle, back to capacity math): the separate deployable unit for app logic runs on four instances, because that's what the original capacity analysis said.

Steps 1, 2, 3, and 5 are performance engineering and architectural judgment. The principle has no access to any of them. Step 4 follows deterministically once the elements, the driver attributions, and the engineering facts about which drivers can coexist in a shared process are all in place. The principle does the partition work; the engineering facts decide which cells can be in-process and which can't.

Two distinct kinds of knowledge

Notice that "there must be a separate service deployed onto four servers" is actually two different claims fused into one sentence, and they enter the principle through different doors.

"There must be a separate service" is a structural claim about elements and drivers. It says: there exist elements whose drivers (combined with engineering facts about runtimes) cannot all be served inside a shared process — they need independent deployment, independent scaling, or an independent failure domain. Once you know that, the principle takes over and propagates the structural consequence: those elements form a cell whose realization is a separate deployable unit. The principle doesn't discover the underlying drivers, but it consumes them and propagates their implications through the partition.

"Deployed onto four servers" is a quantitative fact about capacity and resource allocation. It is not a structural claim about modularization at all. The principle has nothing to say about whether the number is four, forty, or four hundred. You can change the number tomorrow without touching a single module boundary — you adjust a deployment config, you don't re-modularize. But you can't quietly change "separate service" to "in-process library" without reshaping the partition, because the drivers that put it in its own cell haven't gone away.

So the key knowledge, from the principle's point of view, is the structural knowledge. The quantitative part is what made the structural knowledge visible in the first place — capacity math is what revealed that horizontal scaling was necessary — but once the structural fact is established, the specific instance count drops out of the modularization question entirely. If a genie whispered "this system will need to run horizontally scaled" without telling you the number, the principle could already produce the correct partition. If the same genie whispered "this system will need four servers" without telling you why, the principle couldn't do anything with the information — it has no slot for a raw count.

This is the layering the honest answer requires. Capacity math produces numbers and reveals structural necessities. The principle consumes the structural necessities and produces the partition. Operations consumes the partition and the original numbers to decide how many instances of each module run where. Each layer does its own job. When a single mental model tries to carry all three at once, the layers blur, and that blurring is where a lot of microservice sizing confusion comes from.

What the three questions are compressing

The reason Crickett's three questions work is that they compress, in a form practitioners can carry into a meeting, the structural reasoning the principle does formally. That kind of compression is valuable in its own right — in fact it's what good practitioner writing does, and it's what most academic writing fails at. The three questions don't try to do what IVP does; they're a different artifact aimed at a different job. They're the question you ask in the hallway. IVP is what you reach for when the hallway question produces a disagreement neither of you can resolve.

Read this article's argument as unpacking the compression rather than replacing the questions. The reason all three "collapse into one" isn't that the original three are redundant — it's that they're three useful angles on a single underlying question that, viewed structurally, has one shape. Ferguson's three questions before you speak presumably unpack into a single underlying question too: should I say this? That doesn't make Ferguson's three-part formulation useless; it makes it a compression worth understanding.

What the formal lens adds, beyond the compression, is a way to make the question precise enough to settle disagreements. Two engineers asking "does this need to be its own service?" can give different answers and have no shared ground for resolution. Two engineers asking "does this element's driver set match the driver set of any existing module?" can, in principle, identify exactly where they disagree — which driver one of them is including that the other isn't, or which element they're attributing it to. The classical principles SRP, SoC, and CCP each point at related intuitions and run into the same gap when disagreement arises: their core terms aren't sharp enough to localize where the disagreement actually lives. Parnas was already most of the way there in 1972 with "design decisions likely to change"; what was missing was the formal step from that intuition to driver-set equality as the partition criterion.

The practical takeaway

Next time you face a "should this be a service?" decision, Crickett's three questions are a perfectly good prompt — they'll surface the conversation. When they surface a disagreement that the conversation can't resolve, the underlying structural question they're pointing at is this:

What are the change drivers for the elements I'm considering? Include functional drivers, scaling drivers, reliability drivers, deployment-independence drivers, team-autonomy drivers — anything that can genuinely cause these elements to be modified. Then ask: is there an existing module whose set of change drivers matches this one? If yes, the structural answer is to merge. If no, the structural answer is to separate — and which realization (in-process module, library, separate service) the separated module needs is determined by which drivers are in its cell and how those drivers can actually be served operationally.

Be honest about what you know and don't know. If you can't name the quality drivers for an element, you don't have enough information to decide yet. The principle has no "wait until you know" rule — it just has nothing to evaluate when the inputs aren't there. Going and finding out is the right move. Acting on guesses you haven't surfaced as guesses is the move that goes wrong silently.

A few honest scope notes. This article has been talking about server-side, request/response systems with a horizontal-scaling story. The same principle applies elsewhere — to brownfield migrations, FaaS, batch and streaming systems, regulated environments where compliance imposes its own boundaries — but the inputs (what counts as a driver, how the operational compatibility column gets filled in) shift with the context. In a brownfield migration, the order in which you act on the partition is constrained by data and dependency reality; the principle tells you the target, not the path. In regulated industries, compliance boundaries can force separations that aren't visible from drivers alone — those should be modeled as static constraints on realization, not bolted onto the partition. None of these are refutations; they're places where applying the principle requires specific quality knowledge the article hasn't tried to provide.

That's the picture. Three questions, one underlying criterion, and a reminder that no theory of modularization can substitute for knowing your domain and your quality requirements. The theory gives you a precise question; your expertise gives you the inputs; the answer follows. You need both halves.

References

Loth, Y. (2025). The Independent Variation Principle. Zenodo. https://zenodo.org/records/18024111
Loth Y. (2026). IVP as a Meta-Principle: A Unifying Software Architecture Theory. Zenodo https://zenodo.org/records/18748561

DEV Community