Rafael Costa

Posted on Apr 22

Software Architecture as Educated Coarse-Graining

#architecture #computerscience #discuss #software

Every serious practitioner agrees that software architecture matters. Almost nobody agrees on what the word means. That's an operational problem disguised as philosophical nuisance. If you can't define what counts as architectural, you can't scope an architecture review, you can't decide what belongs in an ADR, and you can't tell a junior engineer why their "small refactor" just changed the system's failure modes.

The field has definitions. None of them are wrong, exactly. But each one captures a piece and mistakes it for the whole.

What the canonical definitions get right — and where they stop

Bass, Clements, and Kazman get closest: architecture is "the set of structures needed to reason about the system." That makes architecture relative to reasoning goals, which is the right move. But "set of structures" leaves the boundary underdetermined — the definition doesn't tell you which structures, or how many, or when to stop. Fowler's "the decisions that are hard to change" focuses on irreversibility, which is genuinely where architectural risk lives, but it's circular — you don't know what's hard to change until you try — and it has no mechanism. Why is something hard to change? Coupling? Coordination cost? Contractual obligation? "Hard" carries all that weight silently. Booch's emphasis on cost of change gets closer to an operationalizable metric, but cost is an output of architectural structure, not a definition of it.

ISO 42010 deserves more careful engagement than practitioners usually give it. Its framework — architecture descriptions organized around concerns, viewpoints, and architecture principles — is richer than the single-line "fundamental organization of a system" that most people quote. The viewpoint mechanism is genuine: it says architecture is always described relative to concerns, which is directionally the same insight the coarse-graining frame formalizes. Where 42010 stops is at the operation itself. It specifies that you should organize by concern and should use viewpoints, but doesn't explain why different concerns produce structurally different architectures, or what makes one concern-relative model more faithful than another. The framework is a governance template. It's not a theory of the operation it's governing.

Each gets something right. None answers the question I actually need: what exactly makes an element architectural, and how would I know?

The definition

Software architecture is a concern-relative coarse-grained model of a system — both reasoning instrument and governing structure — stabilized by its decisions, defaults, and inherited constraints.

An element is architectural to the extent that changing it predictably changes governed system-wide properties, the coordination topology around them, or the non-local cost, risk, or feasibility of future change over the relevant horizon.

That's the descriptive core. It says what architecture is without smuggling in what architecture should be. Bad architecture, accidental architecture, architecture-by-default — all fit. Systems accumulate architecture through legacy constraints and organizational accidents as much as through deliberate choice. A definition that excludes those isn't describing the phenomenon; it's prescribing an ideal. And "system" here includes the organizational context — teams, contracts, compliance regimes — because real systems are governed by organizational coupling as much as by technical coupling. Scoping the definition to code alone would leave out half of what actually governs the system.

The normative layer is separate: good architecture is educated coarse-graining — team-legible, property-preserving, and attentive to future change economics. The gap between the descriptive and normative is where architectural judgment lives. A system always has some architecture. Whether that is educated — whether it preserves what matters and remains legible to whoever needs to reason about it — is the thing you're actually evaluating when you review it.

Three things need unpacking: what "educated coarse-graining" means and why it's more than a metaphor, what the model-governance duality buys you, and why the element test works where previous demarcation criteria don't.

Architecture as educated coarse-graining

The idea that architecture involves multiple views organized by concern isn't new — it's been the field's organizing principle since Kruchten's 4+1 model, and it's embedded in every version of Bass et al. and in ISO 42010's viewpoint framework. What's new is the claim about mechanism: architecture is not just "multiple views for different concerns." It's a property-preserving reduction of information, and the properties you choose to preserve determine the reduction you get.

In condensed matter physics, this operation has a precise name. The renormalization group builds effective theories at a chosen scale by systematically integrating out degrees of freedom that are irrelevant at that scale — while preserving the ones that matter. You don't model a lattice by tracking every electron. You choose which physical property you're investigating — magnetic ordering, transport, superconductivity — and you build a model that's faithful to that property while discarding everything else. The model is the theory.

The distinction between naive and educated coarse-graining is the key. Naive coarse-graining just throws information away — average over a region, collapse detail, and hope nothing important was lost. Educated coarse-graining discards detail selectively, preserving the information that governs the properties you care about at the scale you're working at. The result is a model that's simpler than the full system but faithful — within the scope of the chosen concerns — to the behavior that matters.

I claim that software architecture is this operation applied to a software system.
You're building a coarse-grained model — one that's simpler than the codebase, the deployment topology, and the full graph of runtime interactions. The question is whether your model is naive or educated: did you abstract away whatever seemed unimportant and hope for the best, or did you identify the specific structures that govern your chosen system-wide properties?

A quick example makes the distinction concrete. Say your system uses Celery for async task processing and you need to choose a broker. The naive model says: "Celery talks to a broker. Redis or RabbitMQ — doesn't matter, pick one." Both brokers accept tasks and deliver them to workers, so the model is currently correct. The educated model asks which future system-wide properties the broker choice governs. RabbitMQ's AMQP protocol gives you durable queues, routing keys, dead-letter exchanges, and per-queue priority at the protocol level; Redis implements similar features as application-layer conventions through Celery/Kombu. The difference isn't "can versus can't" — it's where the guarantees live, which governs how much you'd need to re-architect if you outgrow the client-side abstractions. Same current dispatch. Different structural ceiling. The naive model isn't wrong — it's uninformed. It classifies a degree of freedom as irrelevant that turns out to govern a property the system will need.

This framing does real work that "multiple views" doesn't. It explains why different concerns produce different architectures (different properties are governed by different degrees of freedom, so the reduction differs) and why some models are better than others (fidelity to the coupling structure that actually governs the property you're reasoning about, not aesthetic preference). It also names what makes architecture hard: identifying the relevant degrees of freedom requires understanding the system deeply enough to know which details are safe to discard. That judgment is the core architectural skill, and no framework can automate it.

Fairbanks's notion of architectural hoisting — architecture directly owning global properties — is subsumed by this frame: a hoisted property is one that survives the reduction. But coarse-graining also explains why some properties resist hoisting, a point I'll develop below.

Concern-relative and team-legible follow as consequences, not independent axioms. If architecture is a property-preserving reduction, then the properties you choose to preserve determine the reduction — that's the concern-relativity. And the model must be legible to whoever needs to reason about those properties — that's team-legibility. The grain of the model is set by cognitive load: fine enough to preserve what matters, coarse enough that the team can hold it in working memory. Coarse-graining isn't a bonus feature. It's a consequence of the legibility requirement.

Architecture as model and governance

The em-dash clause — "both reasoning instrument and governing structure" — resolves a tension the field has, as far as I can tell, never explicitly named.

The architecture-as-decisions school (Jansen and Bosch, Tyree and Akerman, the entire ADR tradition) treats architecture as the set of decisions that constrain the system. The architecture-as-model school (Bass et al., Kruchten, the viewpoint tradition) treats architecture as a representation you reason with. These are but two aspects of the same thing. Architecture is the governing model stabilized by decisions, where the model makes the decisions legible and the decisions make the model enforceable.

"Stabilized by" rather than "constituted by" is a deliberate choice. Architecture isn't reducible to its decisions — that loses the epistemic role, the model that makes the system thinkable at the right grain. But architecture without enforcement mechanisms degrades: decisions that nobody checks and nothing enforces aren't architectural, they're aspirational. The two roles need each other. Without the model, you have scattered constraints with no way to reason about their interactions. Without the decisions, you have a whiteboard diagram that governs nothing.

Perry and Wolf's tripartite — elements, form, rationale — implicitly contains both sides but never names the duality. Making it explicit resolves a tension practitioners live with but rarely articulate: the same engineer who points at the architecture diagram when asked "what's our architecture?" will point at the ADR log when asked "how did we decide that?". Both gestures are right. Architecture is the governing model those diagrams represent, stabilized by the decisions those ADRs record.

The coarse-graining frame explains why the duality is necessary rather than accidental. A property-preserving reduction is simultaneously a cognitive operation (you need a model simple enough to reason with) and a system constraint (the invariants must actually be enforced, or the reduction is fiction). A model with no enforcement is a wish; enforcement with no model is governance by accident.

The element test

Most definitions go vague here. "Significant decisions," "hard to change," "fundamental organization" — intuitions, not tests. Here's the test:

An element is architectural to the extent that changing it predictably changes those properties, the coordination topology around them, or the non-local cost, risk, or feasibility of future change over the relevant horizon.

"To the extent that" is deliberate. Architectural relevance is a gradient, not just black or white.

The most relevant prior work I'm aware of is Zimmermann's Architectural Significance Test (published 2020, following discussions at ECSA 2020), which provides seven criteria for identifying architecturally significant requirements: cross-cutting impact, high business value, QA sensitivity, first-of-a-kind, and several situational factors. Zimmermann's test is a practitioner checklist — it tells you when to pay architectural attention. The element test I'm proposing operates at a different level: it's a demarcation principle that tries to explain why something is architectural, grounded in a causal framing ("predictably changes governed properties") rather than an advisory one. Zimmermann asks "should this requirement trigger architectural work?" The element test asks "is this element structurally coupled to system-wide behavior?". Those are complementary framings. But the causal grounding is what makes the element test falsifiable: you can name a coupling channel and check whether the predicted propagation actually occurs.

The gradient framing matters practically: it lets you ask "how architectural is this?" rather than forcing a yes/no that the coupling structure of real systems doesn't support.

The element test probes along three dimensions:

Predictable non-local impact on governed properties. You can trace the causal path from the element's variation to a system-wide property change. Not "might, in some scenario, conceivably affect" — trace. "Trace" means: you can name the coupling channel, name the governed property, and describe the mechanism by which variation in one produces variation in the other — before the change is attempted. If you rename a variable and accidentally break a regex in a deployment script that causes a production outage, that's a bug. The variable was not predictably connected to availability. The database's consistency model, the authentication middleware, the message format between services — these are predictably connected to governed properties, and their effects propagate beyond their immediate module boundary.

A fair pushback: couldn't a thorough enough engineer predict the variable → regex → deployment → outage chain and so the variable was predictably connected to availability, just not to you? This is where "predictably" has to be anchored to the architectural model your team is actually reasoning with, not to an omniscient observer. Usually, a coupling channel that only becomes visible by reading deployment scripts isn't architectural under the model in use; it's a channel the model legitimately integrated out. When it bites, you're seeing the IR/UV mixing I'll describe below — not a counterexample to the test, but an instance of the thing the test already says is real.

A related objection: isn't "predictably" doing the same work as Fowler's "hard"? Both depend on the observer's knowledge. The difference is the anchor. "Hard to change" is grounded in nothing — hard for whom, under what constraints, over what horizon? "Predictably changes" is grounded in a traceable causal path articulable ex ante: name the coupling channel, name the governed property, describe the mechanism. The prediction can be wrong — and when it is, that's information about where your coupling map is incomplete. Fowler's test generates no prediction to evaluate; you only discover "hard" retrospectively, which means it can't guide decisions, only narrate the outcome.

Coordination topology. A boundary becomes architectural not only through technical coupling but through the coordination it demands: multiple teams ship across it, a compliance regime inspects it, incident response depends on it, a vendor contract freezes it. The service boundary that two teams own independently is more architectural than the one a single team controls, even if the technical coupling is identical, because changing it requires cross-team coordination whose cost scales with organizational friction.

This isn't Conway's Law as a sociological observation — it's coordination topology as a first-class architectural dimension. The definition absorbs it without a separate theory because coordination topology is an interaction structure, not merely analogous to one. The interacting components include teams, processes, and contracts alongside services and databases, and the coupling channels — approval gates, shared release schedules, cross-team incident ownership — propagate the effects of change non-locally through the organization just as technical coupling channels propagate them through the system. When changing a service boundary requires three teams to coordinate a release, that coordination cost is as architecturally real as a shared database schema.

Future change economics over the relevant horizon. Architecture shapes what you can build next, how expensive it will be, and what becomes impossible. The auth-gateway decision below is the clearest case: it makes adding new services cheaper (auth comes free) while making migration away from the gateway expensive (every service must reimplement). Same current behavior. Different future. The horizon isn't arbitrary — it's the timeframe over which the system needs to remain viable and evolvable for its stakeholders.

A demarcation principle that can only say "yes" isn't a test — it's a rubber stamp. So, let's consider one positive case and one negative case, both broadly legible.

Positive: your team decides to enforce authentication at the API gateway rather than in each service. The element test says architectural on all three dimensions. Governed properties: the system's security posture is now coupled to the gateway's behavior — a misconfiguration there exposes every service behind it. Coordination topology: every team that ships a new service inherits the gateway's auth contract; changing that contract requires coordinating across all of them. Future change economics: adding services is cheaper (auth comes free), but migrating away from the gateway requires every service to reimplement auth — the decision opened one path and narrowed another. You can name the coupling channels, name the governed properties, and trace the propagation before anything goes wrong.

Negative: consider a frontend framework migration — Angular to React, entire UI layer rewritten, every component rebuilt, team retrained. By Fowler's test, this is unambiguously architectural: it's hard to change, expensive, and affects the whole frontend surface. Most practitioners would call it architectural without hesitating.

The element test disagrees — given specific conditions. If the backend API contracts don't change, the deployment topology doesn't change, SLO-relevant behavior (latency, availability, SEO) remains materially the same, the migration doesn't alter release cadence or observability contracts, and a single team owns the frontend, then: governed properties are unaffected, coordination topology is unchanged, and future change economics of the system are the same. What changed is the implementation within a bounded context. It's a large, painful, risky implementation change — but the pain is contained. Nothing propagated non-locally.

The interesting part is where the test draws a finer line than intuition does. If the React app's client-side caching changes the load pattern on the API in ways that affect availability under peak traffic — that specific coupling channel is architectural, and the element test will flag it. But the framework migration itself isn't. Does "predictably" carry too much weight here? Maybe. But "hard to change" can't make the distinction at all. "Predictably changes governed properties" at least generates a prediction you can check. The test separates the pain of a change from its architectural significance, which is exactly the separation practitioners need when deciding what belongs in an ADR versus a project plan.

What this buys you

It gives "technical debt" a scalpel. Under this definition, architectural debt is specifically: decisions that are degrading governed system-wide properties or closing off future change paths you need open. A tangled service class is messy. A Celery task that bypasses the message bus and writes directly to another service's database — that's architectural debt, because it violates an interaction structure and changes the blast radius of future schema migrations.

It tells you when to write an ADR. Can you trace a non-local effect on a governed property or on future change economics? If yes — architectural, document it. If no — implementation.

It resolves "is X architectural?" by making the answer context-dependent. "We use Redis" is architectural if Redis is enforcing a system invariant — pub/sub as the only inter-service communication path. It's not architectural if it's the implementation behind an interface that could be swapped for Memcached without non-local effects. The isolation level is architectural if your application assumes serializable and would break under read committed. Same technology, different architectural status, depending on which invariants it participates in. The element test just asks: does this element's variation predictably change a governed property or the coordination topology around it?

Why the boundary is fuzzy — and why that's a result, not a bug

The whole premise of coarse-graining is scale separation: the micro-level details you've integrated out don't affect the macro-level properties you've chosen to govern. In physics, effective field theories work because scales typically separate — the atomic physics of a crystal doesn't affect its thermodynamics at room temperature. But sometimes scale separation fails. High-energy modes contaminate low-energy observables through channels the effective theory wasn't built to see. This is called IR/UV mixing, and it's structural: no amount of careful coarse-graining can prevent it, because the mixing arises from the coupling structure of the theory itself.

Software systems exhibit a structurally analogous phenomenon. Your architectural model says "the database is behind a repository abstraction, so the database choice is non-architectural." The model is internally consistent. The coarse-graining looks correct. Then your application grows, and the ORM's default transaction isolation level — a detail your model integrated out — turns out to be governing your consistency semantics. The implementation detail coupled to the system-wide property through a channel the architectural model wasn't built to represent. That's invisible coupling: the channel was always there, but the model couldn't see it.

Or: a retry policy with no exponential backoff, buried in a single service's HTTP client configuration, causes a cascade that saturates the message bus under load. That's scale-dependent coupling: the channel only exists at load levels the model wasn't tested against.

Or: a logging format string that includes request headers leaks PII into an unencrypted log store, turning an implementation choice into a compliance violation. That's domain-crossing coupling: the channel connects a technical decision to a regulatory property through a path the technical model doesn't represent.

In each case, a detail the coarse-grained model correctly classified as non-architectural turned out to couple to a governed property through a channel the model wasn't built to represent. This isn't accidental — it's structural. Interface contracts specify what a component does. They can't fully specify what a component is: its latency profile under contention, its failure propagation paths, its resource consumption at scale. Every architectural boundary is a lossy projection from a system's full state space onto the subset the contract exposes, and the information it loses can always couple to system-wide behavior through channels the contract doesn't represent. Ostermann et al. argued at ECOOP 2011 that this is a logical limitation of information hiding, not a practical one — one of the sharpest articulations of the point in the software literature. Spolsky named the symptom ("all non-trivial abstractions, to some degree, are leaky"); Ostermann et al. diagnosed the mechanism. Neither frames the implication for architectural demarcation: if implementation details can always potentially couple to system-wide properties, then no definition can draw a clean — one-size-fits-all — boundary between "architectural" and "non-architectural". The boundary is fuzzy not because our definitions are weak, but because the coupling structure of real systems doesn't respect the separation that definitions assume.

This is why the element test says "to the extent that" rather than "if and only if" — and why the coarse-graining frame earns its keep. It explains why architectural significance resists crisp definition: the system's coupling structure can always promote a micro-level detail to macro-level relevance through a channel invisible at the time the model was drawn. Effective theories in physics have the same structural property. Whether the underlying mechanism is formally identical across physics and software is a stronger claim I'm not making. The structural analogy is explanatory, not decorative, and that's enough.

Where this leaves us

The coarse-graining frame clarifies some things and opens others.

It helps explain the definitions debate: Bass, Fowler, Booch, and ISO 42010 may each be describing one aspect of the same operation — model, propagation symptom, change economics, governance structure — without recognizing the shared mechanism. And it helps explain why the architecture/implementation boundary resists bright lines: the coupling structure of real systems doesn't fully respect scale separation, so the element test offers a dimension to measure along, not a threshold to enforce.

What the definition offers is a mechanism that makes architectural judgment teachable: choose your concerns, model at the grain your team can reason with, identify the elements whose variation changes governed properties or future change economics, and understand that the boundary you've drawn is a function of the coupling structure you can currently see — not a permanent feature of the system. When a "non-architectural" detail causes a system-wide failure, that's not necessarily a failure of judgment. The coarse-grained model didn't fail because it was wrong — it failed because it was incomplete. That's the normal condition of effective models, and knowing it changes how you respond to surprises. It also opens a question for another day: if architectural failure is mis-coarse-graining — classifying a relevant degree of freedom as irrelevant — that should be empirically trackable. But that's another investigation.