DEV Community

Cover image for AWS Networking, End to End: a production blueprint with diagrams and checklists
T2C for tsquaredc

Posted on

AWS Networking, End to End: a production blueprint with diagrams and checklists

Start from a real issue, then shape the baseline. In one T2C migration, missing NAT Gateways in a multi AZ VPC forced traffic across zones and raised transfer costs about 20 percent. Two lines of Terraform fixed it. A stronger baseline would have prevented it.

Groundwork before the first VPC

Lay out accounts, space, and automation so later growth is straightforward.

1) Accounts and guardrails

Make isolation and ownership explicit.

  • Use AWS Organizations for shared networking, platform, and app accounts

  • Keep separate log archive and security accounts

  • Tag owner, environment, and service everywhere

2) Space and mapping

Decide CIDRs and record AZ mappings early.

  • Non overlapping CIDRs with summarization room

  • AZ name to ID mappings per account saved in code

  • Plan for IPv6 rather than retrofit it later

3) Everything as code

Make changes reviewable and reproducible.

  • VPCs, subnets, and DNS written as IaC

  • Required code review and CI checks for merges

Mini checklist

  • Orgs and baseline accounts created

  • CIDR and IPv6 choices documented

  • AZ mappings captured

  • Pipelines and tags ready

What you should remember

  • Early structure simplifies future change

  • IaC plus tags make audits and handovers faster

VPC blueprint

Keep the shape identical across services for clarity and fault isolation.

1) Tiers and routes

  • Separate entry, app, and data traffic.

  • Public, private application, private data subnets in each AZ

  • One route table per tier

  • Security groups for fine rules, network ACLs for coarse controls

2) NAT and endpoints

  • Remove single points and keep traffic private.

  • NAT Gateways per AZ

  • Gateway endpoints for S3 and DynamoDB

  • Interface endpoints for ECR, Secrets Manager, vendors

  • VPC Flow Logs to S3 or CloudWatch

3) VPC with subnet tiers

Place this diagram right here.

  • Two or more AZs, three tiers per AZ

  • IGW on public tier, NAT per AZ

  • Gateway and interface endpoints highlighted

Mini checklist

  • Two plus AZs

  • Three tiers per AZ

  • NAT per AZ, endpoints present

  • Flow Logs enabled

What you should remember

  • Tiers plus per AZ NAT keep paths short and predictable
  • Endpoints reduce egress and exposure
  • Connecting VPCs and accounts

Scale connectivity without a peering mesh.

1) Transit Gateway as hub

  • Centralize, separate, and grow.
  • TGW for cross account and cross region scale
  • Route tables per environment to isolate production
  • Land VPN or Direct Connect on the hub

2) Sharing and Zero Trust

  • Split roles and keep policies tight.
  • Use RAM based VPC sharing when platform owns networking
  • Narrow routes and identity first access

3) TGW hub and spoke

App VPC spokes attach to TGW hub

  • Separate route tables for prod and non prod
  • Optional on premises edge
  • Mini checklist
  • TGW chosen, routes segmented
  • Peering used only for simple pairs
  • RAM sharing rules defined

What you should remember

  • Hub with separate route tables avoids path sprawl
  • Clear ownership speeds safe changes

Ingress, egress, and service to service

Pick entry and exit points, then standardize internal policy.

1) Ingress

Match features to needs.

  • ALB for HTTP or HTTPS with WAF and path rules
  • NLB for TCP or TLS pass through
  • Gateway Load Balancer for inline security tools

2) Egress and private service access

  • Keep AWS traffic on private links.
  • NAT per AZ for IPv4, egress only IGW for IPv6
  • Gateway and interface endpoints for common services

3) Service to service

  • Keep relations explicit.
  • Security groups between services
  • VPC Lattice or App Mesh when mesh wide policy is needed

Mini checklist

  • Correct load balancer type selected
  • NAT per AZ, endpoints configured
  • Service relations documented in security groups

What you should remember

  • Standard ingress and egress reduce one off fixes
  • Private access lowers risk and cost

DNS and Route 53

Use DNS as a safety and rollout tool.

1) Zones and scope

  • Separate audiences cleanly.
  • Public zones for external endpoints
  • Private zones for internal services across multiple VPCs

2) Routing policies

  • Ship changes gradually and fail over safely.
  • Weighted records for canaries
  • Latency based routing for multi region apps
  • Health checks with failover
  • Short TTLs during launches, longer later

3)Route 53 policies

  • Public and private zones, attachments to VPCs
  • Weighted, failover, and latency policies with health checks
  • TTL guidance visible

Mini checklist

  • Zones created and attached
  • Policies matched to service goals
  • Health checks and TTLs tuned

What you should remember

  • DNS can stage, steer, and recover
  • TTLs are an operational control

Security posture

Build controls into everyday delivery.

1) Access and data

  • Prefer managed and auditable services.
  • Session Manager instead of unmanaged SSH
  • KMS for data at rest, TLS everywhere
  • WAF in front of public apps

2) Detection and standards

  • Stay aware without noise.
  • GuardDuty and Security Hub across accounts
  • Standard tags for ownership and lifecycle

Mini checklist

  • Session Manager on, SSH closed
  • Encryption and WAF set
  • GuardDuty and Security Hub enabled

What you should remember

  • Defaults in code keep defenses consistent
  • Audits are faster when tags and logs are standard

Observability

Measure what maps to user impact and cost.

1) Logs and metrics

  • Keep the set small and useful.
  • VPC Flow Logs to S3 with lifecycle rules
  • ALB or NLB access logs per environment
  • NAT bytes, TGW attachments, and WAF counts as key metrics

2) Tracing and retention

  • Match depth to need.
  • OpenTelemetry for service traces
  • Separate short term debugging from long term compliance storage

Mini checklist

  • Flow Logs and LB logs enabled
  • Targeted alarms, not floods
  • Retention policies documented

What you should remember

  • Signal beats volume
  • Clear dashboards guide action

Cost as design input

Treat spend as a design choice, not a surprise.

1) Locality and endpoints

  • Keep paths short and private.
  • NAT per AZ and gateway endpoints reduce NAT egress
  • Keep traffic in the same AZ where possible

2) Data plane choices

  • Pick tools that match features in use.
  • ALB when you need L7 features, NLB when you do not
  • Watch TGW processing and avoid hairpins
  • Create interface endpoints only where required

Mini checklist

  • Locality confirmed, endpoints used
  • Load balancer type justified
  • TGW paths checked for loops
  • What you should remember
  • Most savings come from locality and private links
  • Review paths before switching platforms

PR level checklist

Bake basics into every change request.

1) Before merge

  • Keep reviewers focused.
  • CIDRs and IPv6 choices documented
  • Subnets across AZs with route tables per tier
  • Flow Logs on, central storage set
  • TGW routes and attachments reviewed
  • DNS records and routing policies verified
  • Security defaults present, cost note included

Mini checklist

  • Owners and on call listed
  • All blueprint items ticked

What you should remember

  • Checklists reduce drift and speed merges
  • Documentation is part of the product

Closing

A quiet network is the product of consistent patterns, clear ownership, and small checks that never get skipped. Use the blueprint, add the diagrams, and keep the checklists next to your pull requests.

Top comments (0)