Start from a real issue, then shape the baseline. In one T2C migration, missing NAT Gateways in a multi AZ VPC forced traffic across zones and raised transfer costs about 20 percent. Two lines of Terraform fixed it. A stronger baseline would have prevented it.
Groundwork before the first VPC
Lay out accounts, space, and automation so later growth is straightforward.
1) Accounts and guardrails
Make isolation and ownership explicit.
Use AWS Organizations for shared networking, platform, and app accounts
Keep separate log archive and security accounts
Tag owner, environment, and service everywhere
2) Space and mapping
Decide CIDRs and record AZ mappings early.
Non overlapping CIDRs with summarization room
AZ name to ID mappings per account saved in code
Plan for IPv6 rather than retrofit it later
3) Everything as code
Make changes reviewable and reproducible.
VPCs, subnets, and DNS written as IaC
Required code review and CI checks for merges
Mini checklist
Orgs and baseline accounts created
CIDR and IPv6 choices documented
AZ mappings captured
Pipelines and tags ready
What you should remember
Early structure simplifies future change
IaC plus tags make audits and handovers faster
VPC blueprint
Keep the shape identical across services for clarity and fault isolation.
1) Tiers and routes
Separate entry, app, and data traffic.
Public, private application, private data subnets in each AZ
One route table per tier
Security groups for fine rules, network ACLs for coarse controls
2) NAT and endpoints
Remove single points and keep traffic private.
NAT Gateways per AZ
Gateway endpoints for S3 and DynamoDB
Interface endpoints for ECR, Secrets Manager, vendors
VPC Flow Logs to S3 or CloudWatch
3) VPC with subnet tiers
Place this diagram right here.
Two or more AZs, three tiers per AZ
IGW on public tier, NAT per AZ
Gateway and interface endpoints highlighted
Mini checklist
Two plus AZs
Three tiers per AZ
NAT per AZ, endpoints present
Flow Logs enabled
What you should remember
- Tiers plus per AZ NAT keep paths short and predictable
- Endpoints reduce egress and exposure
- Connecting VPCs and accounts
Scale connectivity without a peering mesh.
1) Transit Gateway as hub
- Centralize, separate, and grow.
- TGW for cross account and cross region scale
- Route tables per environment to isolate production
- Land VPN or Direct Connect on the hub
2) Sharing and Zero Trust
- Split roles and keep policies tight.
- Use RAM based VPC sharing when platform owns networking
- Narrow routes and identity first access
3) TGW hub and spoke
App VPC spokes attach to TGW hub
- Separate route tables for prod and non prod
- Optional on premises edge
- Mini checklist
- TGW chosen, routes segmented
- Peering used only for simple pairs
- RAM sharing rules defined
What you should remember
- Hub with separate route tables avoids path sprawl
- Clear ownership speeds safe changes
Ingress, egress, and service to service
Pick entry and exit points, then standardize internal policy.
1) Ingress
Match features to needs.
- ALB for HTTP or HTTPS with WAF and path rules
- NLB for TCP or TLS pass through
- Gateway Load Balancer for inline security tools
2) Egress and private service access
- Keep AWS traffic on private links.
- NAT per AZ for IPv4, egress only IGW for IPv6
- Gateway and interface endpoints for common services
3) Service to service
- Keep relations explicit.
- Security groups between services
- VPC Lattice or App Mesh when mesh wide policy is needed
Mini checklist
- Correct load balancer type selected
- NAT per AZ, endpoints configured
- Service relations documented in security groups
What you should remember
- Standard ingress and egress reduce one off fixes
- Private access lowers risk and cost
DNS and Route 53
Use DNS as a safety and rollout tool.
1) Zones and scope
- Separate audiences cleanly.
- Public zones for external endpoints
- Private zones for internal services across multiple VPCs
2) Routing policies
- Ship changes gradually and fail over safely.
- Weighted records for canaries
- Latency based routing for multi region apps
- Health checks with failover
- Short TTLs during launches, longer later
3)Route 53 policies
- Public and private zones, attachments to VPCs
- Weighted, failover, and latency policies with health checks
- TTL guidance visible
Mini checklist
- Zones created and attached
- Policies matched to service goals
- Health checks and TTLs tuned
What you should remember
- DNS can stage, steer, and recover
- TTLs are an operational control
Security posture
Build controls into everyday delivery.
1) Access and data
- Prefer managed and auditable services.
- Session Manager instead of unmanaged SSH
- KMS for data at rest, TLS everywhere
- WAF in front of public apps
2) Detection and standards
- Stay aware without noise.
- GuardDuty and Security Hub across accounts
- Standard tags for ownership and lifecycle
Mini checklist
- Session Manager on, SSH closed
- Encryption and WAF set
- GuardDuty and Security Hub enabled
What you should remember
- Defaults in code keep defenses consistent
- Audits are faster when tags and logs are standard
Observability
Measure what maps to user impact and cost.
1) Logs and metrics
- Keep the set small and useful.
- VPC Flow Logs to S3 with lifecycle rules
- ALB or NLB access logs per environment
- NAT bytes, TGW attachments, and WAF counts as key metrics
2) Tracing and retention
- Match depth to need.
- OpenTelemetry for service traces
- Separate short term debugging from long term compliance storage
Mini checklist
- Flow Logs and LB logs enabled
- Targeted alarms, not floods
- Retention policies documented
What you should remember
- Signal beats volume
- Clear dashboards guide action
Cost as design input
Treat spend as a design choice, not a surprise.
1) Locality and endpoints
- Keep paths short and private.
- NAT per AZ and gateway endpoints reduce NAT egress
- Keep traffic in the same AZ where possible
2) Data plane choices
- Pick tools that match features in use.
- ALB when you need L7 features, NLB when you do not
- Watch TGW processing and avoid hairpins
- Create interface endpoints only where required
Mini checklist
- Locality confirmed, endpoints used
- Load balancer type justified
- TGW paths checked for loops
- What you should remember
- Most savings come from locality and private links
- Review paths before switching platforms
PR level checklist
Bake basics into every change request.
1) Before merge
- Keep reviewers focused.
- CIDRs and IPv6 choices documented
- Subnets across AZs with route tables per tier
- Flow Logs on, central storage set
- TGW routes and attachments reviewed
- DNS records and routing policies verified
- Security defaults present, cost note included
Mini checklist
- Owners and on call listed
- All blueprint items ticked
What you should remember
- Checklists reduce drift and speed merges
- Documentation is part of the product
Closing
A quiet network is the product of consistent patterns, clear ownership, and small checks that never get skipped. Use the blueprint, add the diagrams, and keep the checklists next to your pull requests.
Top comments (0)