Bala Paranj

Posted on Jul 1

Microsegmentation is a Workaround for a Missing Application Map

#cloudsecurity #networking #security #architecture

✓ Human-authored analysis; AI used for formatting and proofreading.

Microsegmentation is the network equivalent of least privilege. It fails for the same reason.

The principle says: only allow the network flows required for the application to function. The word required does all the work in that sentence. Nobody defines it. So the industry falls back on the same proxy it uses for IAM: compare what's allowed against what's observed, call the difference unnecessary, and generate findings nobody acts on.

If you've read Least Privilege is a Workaround for a Missing Specification, this will feel familiar. The diagnosis is the same. The missing artifact is different. The consequences are identical.

The Principle and Its Hidden Assumption

Zero Trust networking says: deny all traffic by default, then explicitly allow only the flows the application needs. Every framework NIST 800-207, CSA's Zero Trust guidance, every cloud provider's best practices documentation endorse it. The principle is correct. The implementation is structurally broken.

A developer knows: "The frontend needs to query the search API." But implementing that knowledge in the network layer requires translating it into CIDR blocks, security group IDs, port ranges, and protocol specifications. The intent ("frontend talks to search") is a relationship between two application components. The implementation (sg-0a1b2c3d allows TCP port 8080 from sg-4e5f6g7h) is a relationship between two network constructs.

The translation from application relationship to network rule happens in the developer's head, gets implemented once, and is never updated when the application architecture changes. The security group stays. The application evolves. The gap between them is the network equivalent of privilege creep. It's just as inevitable.

The Three Mismatches

Mismatch 1: Abstraction

Developers think in services. Networks think in IP addresses, ports, and protocols.

"The order service calls the payment service" is a statement about application architecture. Implementing it as a security group rule requires knowing which instances run the payment service, which port it listens on, which protocol it uses, whether it's behind a load balancer, whether the load balancer terminates TLS, and whether the backend port differs from the frontend port. A single application relationship generates five to fifteen network rules depending on the architecture.

Nobody maintains this translation manually at scale. So teams take the same shortcut IAM teams take: allow broad access within the VPC and move on. "Allow all traffic within the VPC" is the network equivalent of PowerUserAccess. It works, it's wrong, and it persists because the alternative requires a translation layer that doesn't exist.

Mismatch 2: Time

Security groups are configured at deployment time. Application dependencies change continuously. A new microservice is added. An old integration is retired. A database is migrated to a managed service. A third-party API replaces an internal one.

Each change should update the network rules. None of them do. Because the mapping between application components and network constructs isn't maintained. The security group that allowed traffic to the old database still allows traffic to the old database. The old database may still be running (resource sprawl). It may have been replaced by a different service on the same IP (port reuse). Either way, the security group rule is stale, and nothing signals that it's stale.

The industry's answer: analyze VPC Flow Logs, find which security group rules correspond to observed traffic, and flag rules with no matching traffic as unused. This is the same backward-looking proxy IAM uses — what's allowed versus what's observed. It has the same flaw. A rule with no traffic in the last 90 days might be the path used for annual database replication, quarterly disaster recovery testing, or the failover route that only activates during an outage. Remove it based on "no traffic observed" and you discover its purpose during the next disaster — when the failover path is closed.

Mismatch 3: Composition

Individual security group rules are evaluated in isolation. Compound network paths are evaluated by nobody.

Security group A allows traffic from the public subnet to the application tier. Security group B allows traffic from the application tier to the database tier. Each rule was reviewed and approved individually. Together, they create a two-hop path from the public internet to the database. The compound path was never evaluated because security groups are evaluated per-resource, not per-path.

This is the same composition mismatch that makes IAM least privilege fail. In IAM, individual permissions look reasonable; compound trust chains create attack paths. In networking, individual security group rules look reasonable. Compound network paths create lateral movement opportunities. Per-resource evaluation approves every link in the chain while missing the chain itself.

In environments with hundreds of security groups, thousands of rules, and multiple VPCs with peering connections, the number of compound network paths numbers in the hundreds of thousands. VPC Flow Logs show which paths carry traffic. They don't show which paths could carry traffic. The allowed-but-unused paths are the attacker's map.

The Symptom Treatment Industry

The network security industry has built the same symptom-treatment ecosystem that the IAM industry built:

Flow log analyzers compare allowed security group rules against observed traffic patterns. They report: "This security group has 47 rules. Only 12 had matching traffic in the last 90 days. The other 35 are unnecessary." The same problem as IAM: "unnecessary" means "not recently used," which is a proxy for "not needed." The proxy is wrong for the same reason. It mistakes history for intent.

Network visualization tools draw maps of observed traffic flows. Beautiful graphs showing which services talk to which services. The maps are descriptive, not prescriptive. They show what is happening, not what should happen. The map of current traffic is not the specification of intended architecture. A service that hasn't communicated in 90 days might be dormant, deprecated, or critical-but-infrequent. The map can't tell you which.

Automated security group tightening restricts rules based on observed traffic patterns. This is the network equivalent of automated IAM permission removal. It causes the same outages. A security group rule is tightened to only allow observed ports. A quarterly batch process that uses a different port fails. The team adds the port back, loses trust in the automation, and creates an exception. Eventually every critical system has an exception. The automation governs only the systems that don't matter.

Cloud security posture management checks for obviously wrong configurations: security groups allowing 0.0.0.0/0 on port 22, VPCs with no network ACLs, public subnets with direct internet gateway routes. These are valuable checks. But they're the equivalent of checking if the front door is locked. They don't evaluate whether the internal doors between rooms should be open or closed, because that requires knowing the building's purpose.

The Missing Artifact

The artifact that would make microsegmentation automatic is a machine-readable application dependency map — a declaration of which application components need to communicate, over which protocols, and why.

The flow log exists and answers: "which instances are talking to which instances".

The security group configuration answers "which security group rules allow which traffic".

The application dependency map: "which application components NEED to communicate, and why?" doesn't exist as a machine-readable specification.

APPLICATION DEPENDENCY MAP (the missing artifact):

  frontend:
    talks_to:
      - service: search-api
        protocol: https
        reason: "User search queries"
      - service: auth-service
        protocol: https
        reason: "Session validation"

  search-api:
    talks_to:
      - service: elasticsearch
        protocol: tcp:9200
        reason: "Index queries"

  order-service:
    talks_to:
      - service: payment-gateway
        protocol: https
        reason: "Payment processing"
      - service: inventory-db
        protocol: tcp:5432
        reason: "Stock level checks"
    never_talks_to:
      - service: user-db
        reason: "No business reason — PCI scope boundary"

If this map existed, the security group rules would be derived from it — automatically, correctly, and maintainably. The developer declares the application relationships. The system translates them into security group rules with the correct CIDR blocks, ports, and protocols. The verification layer compares the deployed security groups against the derived rules and reports any deviation.

The reason field in the map is Chesterton's Fence made machine-readable. G.K. Chesterton's principle states: don't remove a fence until you understand why it was built. Every unused security group rule is a fence whose purpose has been forgotten. In a traditional security group rule, the reason for the rule's existence is a description field that says "Ticket #1234". When the engineer who created the rule leaves the company, the context leaves with them. Nobody dares delete the rule because nobody knows why it exists. Chesterton's Fence enforced by ignorance rather than by design.

The reason field in the application map inverts this: a rule with a documented reason can be evaluated ("Is this reason still valid?"), and a rule whose reason no longer applies can be removed with confidence. A rule without a reason can only be evaluated by observing traffic. The backward-looking proxy that is insufficient. The 5,000 unused network paths stay open because each one is a Chesterton's Fence and nobody recorded why it was built.

The irony is that much of this map already exists — just not where security tooling looks for it. Internal Developer Portals and service catalogs like Backstage, Cortex, and OpsLevel maintain service ownership, dependency graphs, API contracts, and communication patterns. Development teams document which services talk to which services, over which protocols, for what purpose. The data that would drive microsegmentation enforcement is sitting in a developer portal being used for onboarding documentation and incident routing. Nobody connects it to the security groups. The application map exists as a wiki page. It should exist as a policy input. The data is there. The enforcement loop is missing.

Stewart Brand's Shearing Layers

As with IAM, microsegmentation couples things that change at different rates into a single artifact — the security group.

Application architecture (which services exist and how they connect)
  → changes monthly — new microservices, retired integrations

Deployment topology (which instances run which services, which subnets they're in)
  → changes weekly — scaling events, redeployments, migrations

Network rules (which security group rules allow which traffic)
  → changes at provisioning time — and then never again

The security group tries to express application architecture through network-layer constructs. The application architecture changes monthly. The security group doesn't. The gap between them is microsegmentation drift. A structural consequence of coupling things that change at different rates.

Declared dependencies separate these layers:

Layer 1: APPLICATION MAP (changes when architecture changes)
  "Frontend talks to search-api over HTTPS"
  → changes when services are added or retired
  → owned by the team that builds the application

Layer 2: DERIVED RULES (changes automatically when Layer 1 changes)
  Security group rules, NACLs, and firewall policies derived from the map
  → computed, not hand-authored
  → updates automatically when Layer 1 changes

Layer 3: DEPLOYED CONFIGURATION (verified on every snapshot)
  The actual security groups, NACLs, and routing tables
  → verified against Layer 2 on every snapshot
  → any deviation = finding, not "drift"

When the architecture changes (new microservice added), the team updates Layer 1 — the application map. Layer 2 recomputes the security group rules. Layer 3 is verified against Layer 2. The network rules track the architecture. No drift, because the layers are separated and each one changes at its own rate.

The CDK Parallel

Just as AWS CDK's grantRead() method translates IAM intent into IAM policy, CDK's networking constructs partially address this for networks. A developer can write service.connections.allowFrom(otherService, Port.tcp(443)) — expressing a relationship between services rather than between security group IDs.

And just as with IAM, the industry rejected this model in favor of raw Terraform aws_security_group_rule resources — because security tooling (Checkov, tfsec, Rego policies) analyzes the mechanism (CIDR blocks and port ranges), not the intent (service relationships). Part of this is organizational. Security teams and platform teams are siloed. The security team wants to audit a "firewall rule" they can read. They don't trust the code that generates it. The result: the team that understands the application relationships (platform/dev) doesn't control the security groups, and the team that controls the security groups (security) doesn't understand the application relationships. The gap between intent and mechanism runs through the org chart and the tooling.

Partial Solutions the Industry Has Attempted

The industry has not ignored this problem. Three approaches partially address it. Each one illustrates why the full solution remains missing.

Service mesh (Istio, Linkerd, Consul) implements the missing artifact at the application layer. In an Istio AuthorizationPolicy, a developer declares "App A can call /pay on App B" — the relationship-level specification this article describes. But service mesh operates above the network layer. The underlying cloud infrastructure — security groups, NACLs, VPC routing — is usually left wide open ("allow all within the cluster") because the service mesh handles enforcement at L7. The application-layer map exists. The network-layer microsegmentation doesn't. The intent is declared for the mesh but not for the infrastructure underneath it. Two layers, only one has the specification.

Identity-based networking (SPIFFE/SPIRE) replaces IP addresses with cryptographic identities. If the firewall understands "Service A" instead of "IP 10.0.1.5," the abstraction mismatch from Mismatch 1 dissolves — the developer thinks in services, the network rule references services. This is the industry's official answer to Mismatch 1. But SPIFFE solves the naming problem, not the specification problem. The firewall now references service identities instead of IPs. But the map of which identity should talk to which identity is still derived from observed traffic, not declared from architectural intent. Better names on the same missing specification.

Kubernetes NetworkPolicies are the closest to the machine-readable manifest we need. A NetworkPolicy resource declares which pods can communicate with which pods, by label selector and port. It's declarative, it's machine-readable, and it lives in the same repository as the deployment manifests. The problem: most organizations find NetworkPolicies so difficult to author and maintain that they use flow log analyzers to generate them. The tool observes traffic, produces a NetworkPolicy that matches observed patterns, and the team applies it. The specification which was supposed to declare architectural intent is derived from the backward-looking traffic proxy. The manifest exists. It's generated from the wrong source. The circular problem: the industry built the artifact, then populated it from the same symptom data.

The Macro-Segmentation Fallback

Because the application dependency map doesn't exist, organizations settle for the only map they can maintain manually: macro-segmentation. Production versus development. Public subnet versus private subnet. Application tier versus database tier. These coarse boundaries survive because they're simple enough for humans to reason about and stable enough that they don't need frequent updates.

Macro-segmentation is the acknowledgment that microsegmentation is unachievable without the application map. The granularity that Zero Trust demands (per-service, per-flow) requires a specification that tracks at per-service, per-flow granularity. The specification doesn't exist, so the segmentation stays at the only granularity humans can maintain: per-environment, per-tier. The "micro" in microsegmentation remains an aspiration documented in security architectures and absent from deployed configurations.

The "Flow Logs Tell Us Enough" Fallacy

The most common objection: "We have VPC Flow Logs. We can see what's talking to what. That is the application map."

Flow logs are a map of observed traffic. They are not a map of intended traffic. The distinction matters in three ways:

Absence is ambiguous. A flow that doesn't appear in 90 days of logs could be: deprecated (safe to remove), seasonal (runs quarterly), emergency-only (disaster recovery), or not-yet-needed (feature launching next month). Flow logs can't distinguish between these. Only the application map — the declaration of intent can distinguish them.

Presence is not approval. A flow that appears in logs could be: intended (legitimate application communication), lateral movement (attacker traversing the network), misconfiguration (service talking to the wrong endpoint), or legacy (old integration nobody remembered to decommission). Flow logs show that traffic happened. They don't show whether it should have happened.

Compound paths are invisible. Flow logs show individual connections: A talked to B, B talked to C. They don't show that A can reach C through B — the compound path. The security group rules allow A→B and B→C, which means A→C is possible. But if A never talked to C, the flow logs show no record. The allowed-but-unused compound path is the one the attacker will use.

The Identical Pattern

The IAM article described this progression:

1. Principle assumes "needed" is defined (it isn't)
2. Industry compares what's granted against what's used
3. Generates findings nobody acts on
4. Missing artifact: intent specification

Network security follows the same progression exactly:

1. Zero Trust assumes "required flows" are defined (they aren't)
2. Industry compares what's allowed against what's observed
3. Generates findings nobody acts on (5,000 unused network paths)
4. Missing artifact: application dependency map

Same structural gap. Same symptom treatment. Same missing artifact. Different domain.

The fix is the same: declare what should be true (the application map), derive the implementation from the declaration (security group rules from service relationships), and verify the deployed configuration against the derived rules (snapshot comparison). The declaration is the artifact that makes everything else work for network security just as for IAM.

The Path Forward

Start with the application boundary that matters most. The path from the public internet to the database tier. Declare which services are in that path. Declare which connections are intended. Derive the security group rules. Compare against what's deployed. Fix the delta.

The application map doesn't need to cover every service on day one. It needs to cover the paths an attacker would traverse. Those paths are finite, knowable, and declarable. Each declaration is a ratchet — once the intended communication is specified, any unauthorized path is detectable, actionable, and fixable.

The same infrastructure-as-code challenge applies: current IaC is mechanistic about network rules, not semantic about application relationships. The same CDK partial-solution exists and was rejected for the same reason. The same honest challenge applies: the specification must be simpler than the security group rules it replaces, or developers will copy-paste CIDRs into the map and the problem reproduces itself.

And the same closing truth: microsegmentation is not the goal. It's the symptom of not having declared the application's communication architecture. Declare the architecture, and microsegmentation becomes automatic. Without the declaration, it remains what it has always been: a principle everyone endorses and nobody achieves.

DEV Community