Arshdeep Singh

Posted on Jan 17

Stop Drawing Stacks: Seeing Drupal on AWS as a Graph

#aws #drupal #graphtheory #architecture

Every architecture diagram you've ever drawn is a graph. The boxes are nodes; the arrows are edges. Yet most teams treat those diagrams as static documentation rather than a working model they can reason about mathematically. Once you start applying graph theory to how Drupal and AWS actually behave at runtime, hard problems—latency budgets, failure blast radius, refactoring priorities—become graph problems you already know how to solve.

Your platform is a graph

A graph is simply a set of nodes (vertices) connected by edges. In a Drupal-on-AWS platform:

Drupal nodes: content types, configuration entities, services in the container, event subscribers, routes, caches.
AWS nodes: VPCs, subnets, security groups, ALBs, ECS tasks, Lambda functions, RDS clusters, S3 buckets, SQS queues, external SaaS (Auth0, Salesforce).
Edges: "calls API", "publishes to queue", "allowed by security group", "replicates to region", "feeds dashboard".

Once you accept this framing, you stop asking vague questions like "Is this architecture clean?" and start asking precise questions like "What's the shortest failure path from this AWS primitive to a broken SLO?"

Drupal as a dependency graph

Drupal is usually described in terms of content types, views, and modules. Operationally, it behaves like several overlapping directed graphs.

Configuration graph

Entity types, bundles, fields, field formatters, views, and access rules form a dependency graph. Change a field storage definition and you can cascade through displays, views, REST resources, and integrations.

Runtime call graph

The Symfony service container, event subscribers, and middleware stack define a call graph. Every HTTP request walks a specific path through this graph—touching routing, access checking, entity loading, rendering, and caching nodes in sequence.

Permission graph

Roles, permissions, and route access callbacks form yet another graph. Model "who can reach what" as directed edges and you can visualize privilege-escalation risks as unexpectedly short paths between low-privilege and high-privilege nodes.

A single anonymous page view is actually a walk through all three subgraphs simultaneously. Understanding that walk is the first step to optimizing it.

AWS as an infrastructure graph

AWS architectures are already drawn as graphs; graph theory just makes the math explicit.

Network topology graph

VPCs, subnets, route tables, security groups, and NACLs form a reachability graph. Two nodes can only communicate if there's a valid path through this graph—no path, no packets.

Data-flow graph

S3, Kinesis, SQS, SNS, Lambda, and analytics services form directed acyclic graphs (DAGs) of data transformations. Your ETL pipeline, event-driven workflows, and observability stack are all DAGs whether you drew them that way or not.

Service dependency graph

ECS services, Lambda functions, RDS, ElastiCache, and external APIs form a runtime dependency graph. Traces and flow logs let you infer this graph from production traffic rather than relying on outdated documentation.

Four graph concepts that sharpen your thinking

1. Paths and latency

User-perceived latency is the weighted sum of edges along the shortest path from browser to data and back. CDN, WAF, ALB, PHP-FPM, Redis, RDS—each hop adds weight.

Reducing latency means either removing nodes from the path (for example, serving from edge cache) or reducing edge weights (for example, connection pooling, read replicas). Frame every performance optimization as a path-shortening or weight-reduction exercise.

2. Minimum cuts and resilience

A minimum cut is the smallest set of nodes or edges whose removal disconnects the graph. In infrastructure terms, it's your single points of failure: the lone RDS writer, the shared Redis cluster, the internal auth service everything depends on.

High-availability design is the art of making minimum cuts large and expensive. Multi-AZ RDS, stateless Drupal behind multiple ALBs, and regional failover all increase the size of the cut an outage must hit.

3. Centrality and hotspots

Betweenness centrality measures how often a node sits on the shortest path between other nodes. High-centrality nodes are chokepoints: an API gateway every request flows through, a monolithic "integration" module in Drupal, a single SQS queue feeding multiple consumers.

Focus observability, rate limiting, and capacity planning on high-centrality nodes. Their failure has a disproportionate blast radius. If you can't eliminate centrality, at least instrument it.

4. Strongly connected components and coupling

A strongly connected component (SCC) is a subset of nodes where every node is reachable from every other. In practice, SCCs represent tightly coupled subsystems: Drupal plus a specific internal API, a queue, and a Lambda that all depend on each other.

Changes to one node in an SCC risk breaking the others. Identify SCCs before refactoring; break them apart by introducing explicit, versioned contracts—APIs, schemas, events—rather than implicit runtime dependencies.

Using graph thinking day to day

Architecture reviews

Instead of subjective "Is this clean?", ask:

What's the longest path in the critical user journey?
What's the minimum cut between the user and the SLO?
Which node has the highest centrality?

Incident analysis

Reconstruct failures as graph walks:

Which edge broke?
What alternative paths existed (or didn't)?
Which node's centrality amplified the blast radius?

Modernization roadmaps

Prioritize refactors by graph metrics:

Decompose the highest-centrality Drupal module first.
Replace a single massive integration edge with a message-driven subgraph.
Break apart the largest SCC into independent, deployable units.

Where to start

You don't need a graph database to benefit from graph thinking. Start with one exercise:

Pick a critical user journey (for example, authenticated page load, checkout, or form submission).
Sketch it as a directed graph: every service, cache, database, and external API is a node; every call or dependency is an edge.
Label edges with latency (p50, p99) and availability (historical uptime).
Identify the minimum cut and the highest-centrality node.
Use those findings to prioritize your next performance or reliability improvement.

Once you see your Drupal-on-AWS platform as a graph, decisions get crisper. You stop guessing at complexity and start operating on the structure that's actually there.

DEV Community