T2C for tsquaredc

Posted on Sep 18

AWS Networking, End to End: a production blueprint with diagrams and checklists

Start from a real issue, then shape the baseline. In one T2C migration, missing NAT Gateways in a multi AZ VPC forced traffic across zones and raised transfer costs about 20 percent. Two lines of Terraform fixed it. A stronger baseline would have prevented it.

Groundwork before the first VPC

Lay out accounts, space, and automation so later growth is straightforward.

1) Accounts and guardrails

Make isolation and ownership explicit.

Use AWS Organizations for shared networking, platform, and app accounts
Keep separate log archive and security accounts
Tag owner, environment, and service everywhere

2) Space and mapping

Decide CIDRs and record AZ mappings early.

Non overlapping CIDRs with summarization room
AZ name to ID mappings per account saved in code
Plan for IPv6 rather than retrofit it later

3) Everything as code

Make changes reviewable and reproducible.

VPCs, subnets, and DNS written as IaC
Required code review and CI checks for merges

Mini checklist

Orgs and baseline accounts created
CIDR and IPv6 choices documented
AZ mappings captured
Pipelines and tags ready

What you should remember

Early structure simplifies future change
IaC plus tags make audits and handovers faster

VPC blueprint

Keep the shape identical across services for clarity and fault isolation.

1) Tiers and routes

Separate entry, app, and data traffic.
Public, private application, private data subnets in each AZ
One route table per tier
Security groups for fine rules, network ACLs for coarse controls

2) NAT and endpoints

Remove single points and keep traffic private.
NAT Gateways per AZ
Gateway endpoints for S3 and DynamoDB
Interface endpoints for ECR, Secrets Manager, vendors
VPC Flow Logs to S3 or CloudWatch

3) VPC with subnet tiers

Place this diagram right here.

Two or more AZs, three tiers per AZ
IGW on public tier, NAT per AZ
Gateway and interface endpoints highlighted

Mini checklist

Two plus AZs
Three tiers per AZ
NAT per AZ, endpoints present
Flow Logs enabled

What you should remember

Tiers plus per AZ NAT keep paths short and predictable
Endpoints reduce egress and exposure
Connecting VPCs and accounts

Scale connectivity without a peering mesh.

1) Transit Gateway as hub

Centralize, separate, and grow.
TGW for cross account and cross region scale
Route tables per environment to isolate production
Land VPN or Direct Connect on the hub

2) Sharing and Zero Trust

Split roles and keep policies tight.
Use RAM based VPC sharing when platform owns networking
Narrow routes and identity first access

3) TGW hub and spoke

App VPC spokes attach to TGW hub

Separate route tables for prod and non prod
Optional on premises edge
Mini checklist
TGW chosen, routes segmented
Peering used only for simple pairs
RAM sharing rules defined

What you should remember

Hub with separate route tables avoids path sprawl
Clear ownership speeds safe changes

Ingress, egress, and service to service

Pick entry and exit points, then standardize internal policy.

1) Ingress

Match features to needs.

ALB for HTTP or HTTPS with WAF and path rules
NLB for TCP or TLS pass through
Gateway Load Balancer for inline security tools

2) Egress and private service access

Keep AWS traffic on private links.
NAT per AZ for IPv4, egress only IGW for IPv6
Gateway and interface endpoints for common services

3) Service to service

Keep relations explicit.
Security groups between services
VPC Lattice or App Mesh when mesh wide policy is needed

Mini checklist

Correct load balancer type selected
NAT per AZ, endpoints configured
Service relations documented in security groups

What you should remember

Standard ingress and egress reduce one off fixes
Private access lowers risk and cost

DNS and Route 53

Use DNS as a safety and rollout tool.

1) Zones and scope

Separate audiences cleanly.
Public zones for external endpoints
Private zones for internal services across multiple VPCs

2) Routing policies

Ship changes gradually and fail over safely.
Weighted records for canaries
Latency based routing for multi region apps
Health checks with failover
Short TTLs during launches, longer later

3)Route 53 policies

Public and private zones, attachments to VPCs
Weighted, failover, and latency policies with health checks
TTL guidance visible

Mini checklist

Zones created and attached
Policies matched to service goals
Health checks and TTLs tuned

What you should remember

DNS can stage, steer, and recover
TTLs are an operational control

Security posture

Build controls into everyday delivery.

1) Access and data

Prefer managed and auditable services.
Session Manager instead of unmanaged SSH
KMS for data at rest, TLS everywhere
WAF in front of public apps

2) Detection and standards

Stay aware without noise.
GuardDuty and Security Hub across accounts
Standard tags for ownership and lifecycle

Mini checklist

Session Manager on, SSH closed
Encryption and WAF set
GuardDuty and Security Hub enabled

What you should remember

Defaults in code keep defenses consistent
Audits are faster when tags and logs are standard

Observability

Measure what maps to user impact and cost.

1) Logs and metrics

Keep the set small and useful.
VPC Flow Logs to S3 with lifecycle rules
ALB or NLB access logs per environment
NAT bytes, TGW attachments, and WAF counts as key metrics

2) Tracing and retention

Match depth to need.
OpenTelemetry for service traces
Separate short term debugging from long term compliance storage

Mini checklist

Flow Logs and LB logs enabled
Targeted alarms, not floods
Retention policies documented

What you should remember

Signal beats volume
Clear dashboards guide action

Cost as design input

Treat spend as a design choice, not a surprise.

1) Locality and endpoints

Keep paths short and private.
NAT per AZ and gateway endpoints reduce NAT egress
Keep traffic in the same AZ where possible

2) Data plane choices

Pick tools that match features in use.
ALB when you need L7 features, NLB when you do not
Watch TGW processing and avoid hairpins
Create interface endpoints only where required

Mini checklist

Locality confirmed, endpoints used
Load balancer type justified
TGW paths checked for loops
What you should remember
Most savings come from locality and private links
Review paths before switching platforms

PR level checklist

Bake basics into every change request.

1) Before merge

Keep reviewers focused.
CIDRs and IPv6 choices documented
Subnets across AZs with route tables per tier
Flow Logs on, central storage set
TGW routes and attachments reviewed
DNS records and routing policies verified
Security defaults present, cost note included

Mini checklist

Owners and on call listed
All blueprint items ticked

What you should remember

Checklists reduce drift and speed merges
Documentation is part of the product

Closing

A quiet network is the product of consistent patterns, clear ownership, and small checks that never get skipped. Use the blueprint, add the diagrams, and keep the checklists next to your pull requests.

DEV Community

AWS Networking, End to End: a production blueprint with diagrams and checklists

Top comments (0)