Introduction
DNS is the most critical and least understood part of most infrastructure. When DNS is working, nobody thinks about it. When it breaks, everything breaks. Your application, your API, your email, your monitoring dashboards that would help you debug the DNS issue, all of them depend on DNS resolution.
CDN configuration is similarly important but often treated as an afterthought. Teams deploy CloudFront or Cloudflare, accept the defaults, and never think about cache hit ratios, edge computing, or failover behavior until there is an incident.
This guide covers the DNS and CDN architecture decisions that matter for production systems, from basic setup through multi-region failover, with specific configurations for Route 53, CloudFront, and Cloudflare.
DNS Fundamentals That Matter in Production
Before diving into architecture, let us establish the DNS concepts that directly affect your infrastructure decisions.
Record types you will use most:
| Type | Purpose | Example |
|---|---|---|
| A | Maps domain to IPv4 address | api.example.com → 203.0.113.50 |
| AAAA | Maps domain to IPv6 address | api.example.com → 2001:db8::1 |
| CNAME | Alias to another domain | www.example.com → example.com |
| ALIAS/ANAME | CNAME-like at zone apex | example.com → d1234.cloudfront.net |
| MX | Mail server routing | example.com → 10 mail.example.com |
| TXT | Verification and policy | SPF, DKIM, DMARC records |
| NS | Nameserver delegation | example.com → ns-1234.awsdns-12.org |
TTL (Time To Live) controls how long resolvers cache your records. Lower TTL means faster propagation of changes but more DNS queries hitting your nameservers. Higher TTL means better performance but slower failover.
# Check current TTL and propagation
dig +noall +answer api.example.com A
# Check from multiple locations
dig @8.8.8.8 api.example.com A # Google DNS
dig @1.1.1.1 api.example.com A # Cloudflare DNS
dig @208.67.222.222 api.example.com A # OpenDNS
Production TTL recommendations:
- Static content CDN domains: 86400 (24 hours)
- API endpoints (stable): 300-600 (5-10 minutes)
- API endpoints (failover-ready): 60 (1 minute)
- During migrations: 60 or lower, set 24-48 hours before the change
Route 53 Configuration for Production
Route 53 is AWS's DNS service and the most common choice for teams running on AWS. Its killer feature is routing policies that go beyond simple DNS records.
Basic hosted zone setup:
# Create a hosted zone
aws route53 create-hosted-zone \
--name example.com \
--caller-reference "$(date +%s)"
# The output includes nameservers - update these at your domain registrar
Health checks and failover routing:
# Create a health check for the primary endpoint
aws route53 create-health-check --caller-reference "primary-api" \
--health-check-config '{
"IPAddress": "203.0.113.50",
"Port": 443,
"Type": "HTTPS",
"ResourcePath": "/health",
"RequestInterval": 10,
"FailureThreshold": 3,
"EnableSNI": true,
"FullyQualifiedDomainName": "api.example.com"
}'
// Primary record (us-east-1)
{
"Name": "api.example.com",
"Type": "A",
"SetIdentifier": "primary",
"Failover": "PRIMARY",
"TTL": 60,
"ResourceRecords": [{"Value": "203.0.113.50"}],
"HealthCheckId": "abcd-1234-health-check-id"
}
// Secondary record (eu-west-1)
{
"Name": "api.example.com",
"Type": "A",
"SetIdentifier": "secondary",
"Failover": "SECONDARY",
"TTL": 60,
"ResourceRecords": [{"Value": "198.51.100.25"}]
}
When the primary health check fails (3 consecutive failures at 10-second intervals = 30 seconds), Route 53 automatically starts returning the secondary IP. With a 60-second TTL, most clients will failover within 90 seconds.
Latency-based routing sends users to the closest region:
// US East record
{
"Name": "api.example.com",
"Type": "A",
"SetIdentifier": "us-east-1",
"Region": "us-east-1",
"TTL": 60,
"AliasTarget": {
"HostedZoneId": "Z35SXDOTRQ7X7K",
"DNSName": "us-east-alb-1234.us-east-1.elb.amazonaws.com",
"EvaluateTargetHealth": true
}
}
// EU West record
{
"Name": "api.example.com",
"Type": "A",
"SetIdentifier": "eu-west-1",
"Region": "eu-west-1",
"TTL": 60,
"AliasTarget": {
"HostedZoneId": "Z32O12XQLNTSW2",
"DNSName": "eu-west-alb-5678.eu-west-1.elb.amazonaws.com",
"EvaluateTargetHealth": true
}
}
Route 53 determines the user's region based on the location of the DNS resolver they are using. This works well for most cases but can route incorrectly when users use public DNS resolvers (like 8.8.8.8) that are anycast and may resolve from a different location than the user.
CloudFront CDN Configuration
CloudFront is AWS's CDN with 600+ edge locations globally. For static websites and API acceleration, proper CloudFront configuration can reduce latency by 50-80% for geographically distributed users.
# Create a CloudFront distribution for a static website on S3
aws cloudfront create-distribution --distribution-config '{
"CallerReference": "static-site-2026",
"Origins": {
"Quantity": 1,
"Items": [{
"Id": "S3-static-site",
"DomainName": "my-site-bucket.s3.amazonaws.com",
"S3OriginConfig": {
"OriginAccessIdentity": ""
},
"OriginAccessControlId": "E2QWRUHEXAMPLE"
}]
},
"DefaultCacheBehavior": {
"TargetOriginId": "S3-static-site",
"ViewerProtocolPolicy": "redirect-to-https",
"CachePolicyId": "658327ea-f89d-4fab-a63d-7e88639e58f6",
"Compress": true,
"AllowedMethods": {
"Quantity": 2,
"Items": ["GET", "HEAD"]
}
},
"Aliases": {
"Quantity": 1,
"Items": ["www.example.com"]
},
"ViewerCertificate": {
"ACMCertificateArn": "arn:aws:acm:us-east-1:123456789012:certificate/abc123",
"SSLSupportMethod": "sni-only",
"MinimumProtocolVersion": "TLSv1.2_2021"
},
"HttpVersion": "http2and3",
"DefaultRootObject": "index.html",
"Enabled": true,
"Comment": "Production static site"
}'
Cache policies control what CloudFront caches and for how long. The default policies work for most cases, but for APIs you need custom policies:
{
"CachePolicyConfig": {
"Name": "API-Cache-Policy",
"DefaultTTL": 0,
"MaxTTL": 86400,
"MinTTL": 0,
"ParametersInCacheKeyAndForwardedToOrigin": {
"EnableAcceptEncodingGzip": true,
"EnableAcceptEncodingBrotli": true,
"HeadersConfig": {
"HeaderBehavior": "whitelist",
"Headers": {
"Quantity": 1,
"Items": ["Authorization"]
}
},
"QueryStringsConfig": {
"QueryStringBehavior": "all"
},
"CookiesConfig": {
"CookieBehavior": "none"
}
}
}
}
Origin failover provides automatic failover between origin servers:
{
"OriginGroups": {
"Quantity": 1,
"Items": [{
"Id": "api-origin-group",
"FailoverCriteria": {
"StatusCodes": {
"Quantity": 4,
"Items": [500, 502, 503, 504]
}
},
"Members": {
"Quantity": 2,
"Items": [
{ "OriginId": "api-us-east-1" },
{ "OriginId": "api-eu-west-1" }
]
}
}]
}
}
When CloudFront gets a 5xx from the primary origin, it automatically retries the request against the secondary origin. This failover is per-request and happens at the edge, so it is much faster than DNS failover.
Cloudflare as an Alternative
Cloudflare offers a compelling alternative to CloudFront, especially for teams that are not all-in on AWS. Its free tier is generous, the dashboard is more intuitive, and features like Workers, automatic image optimization, and DDoS protection are included.
Key Cloudflare configurations for production:
# Using Cloudflare API to configure DNS records
curl -X POST "https://api.cloudflare.com/client/v4/zones/{zone_id}/dns_records" \
-H "Authorization: Bearer $CF_API_TOKEN" \
-H "Content-Type: application/json" \
--data '{
"type": "A",
"name": "api.example.com",
"content": "203.0.113.50",
"ttl": 1,
"proxied": true
}'
When proxied: true, traffic flows through Cloudflare's network, enabling caching, DDoS protection, and WAF rules. When proxied: false (DNS-only), Cloudflare just serves DNS.
Page Rules and Cache Rules control caching behavior:
# Cache everything on the static assets subdomain
curl -X POST "https://api.cloudflare.com/client/v4/zones/{zone_id}/rulesets" \
-H "Authorization: Bearer $CF_API_TOKEN" \
-H "Content-Type: application/json" \
--data '{
"name": "Static Assets Cache",
"kind": "zone",
"phase": "http_request_cache_settings",
"rules": [{
"expression": "(http.host eq \"static.example.com\")",
"action": "set_cache_settings",
"action_parameters": {
"cache": true,
"edge_ttl": {
"mode": "override_origin",
"default": 2592000
},
"browser_ttl": {
"mode": "override_origin",
"default": 86400
}
}
}]
}'
Cloudflare Workers run JavaScript at the edge, enabling powerful patterns like A/B testing, request routing, and API response transformation without touching your origin:
// worker.js - Edge-side API response caching with stale-while-revalidate
export default {
async fetch(request, env, ctx) {
const cacheKey = new Request(request.url, request);
const cache = caches.default;
let response = await cache.match(cacheKey);
if (response) {
// Return cached response immediately
// Revalidate in the background if older than 60 seconds
const age = Date.now() - new Date(response.headers.get('x-cache-time')).getTime();
if (age > 60000) {
ctx.waitUntil(revalidate(request, cacheKey, cache));
}
return response;
}
return await revalidate(request, cacheKey, cache);
}
};
async function revalidate(request, cacheKey, cache) {
const response = await fetch(request);
const newResponse = new Response(response.body, response);
newResponse.headers.set('x-cache-time', new Date().toISOString());
newResponse.headers.set('Cache-Control', 'public, max-age=120');
await cache.put(cacheKey, newResponse.clone());
return newResponse;
}
Caching Headers and Cache Control Strategy
Your CDN is only as effective as your caching headers. If your origin sends Cache-Control: no-cache on everything, your CDN is just an expensive reverse proxy.
# Nginx configuration for optimal caching headers
server {
listen 443 ssl http2;
server_name api.example.com;
# Static assets - cache aggressively (hashed filenames for cache busting)
location ~* \.(js|css|png|jpg|jpeg|gif|svg|woff2|ico)$ {
expires 1y;
add_header Cache-Control "public, max-age=31536000, immutable";
add_header Vary "Accept-Encoding";
}
# HTML pages - short cache, revalidate
location ~* \.html$ {
add_header Cache-Control "public, max-age=300, must-revalidate";
add_header Vary "Accept-Encoding";
}
# API responses - no CDN cache, allow browser cache
location /api/ {
add_header Cache-Control "private, max-age=0, must-revalidate";
add_header Vary "Authorization, Accept-Encoding";
}
# API responses that are cacheable (public, non-personalized)
location /api/products {
add_header Cache-Control "public, max-age=60, s-maxage=300";
add_header Vary "Accept-Encoding";
}
}
The distinction between max-age and s-maxage is important:
-
max-agecontrols browser cache duration. -
s-maxagecontrols CDN/proxy cache duration (overridesmax-agefor shared caches).
This lets you cache at the CDN for 5 minutes while telling browsers to cache for only 1 minute, giving you a balance between performance and freshness.
Cache invalidation when you deploy:
# CloudFront invalidation
aws cloudfront create-invalidation \
--distribution-id E1234ABCDEF \
--paths "/*"
# Targeted invalidation (faster and cheaper)
aws cloudfront create-invalidation \
--distribution-id E1234ABCDEF \
--paths "/index.html" "/api/products*"
# Cloudflare cache purge
curl -X POST "https://api.cloudflare.com/client/v4/zones/{zone_id}/purge_cache" \
-H "Authorization: Bearer $CF_API_TOKEN" \
-H "Content-Type: application/json" \
--data '{"purge_everything": true}'
# Cloudflare targeted purge
curl -X POST "https://api.cloudflare.com/client/v4/zones/{zone_id}/purge_cache" \
-H "Authorization: Bearer $CF_API_TOKEN" \
-H "Content-Type: application/json" \
--data '{"files": ["https://example.com/styles.css", "https://example.com/app.js"]}'
Multi-Region Failover Architecture
For services that require high availability across regions, combine DNS routing with CDN origin failover for defense in depth.
A strong multi-region architecture:
User Request
│
▼
Cloudflare/CloudFront (Global Edge)
│
├─ Cache HIT → Return immediately
│
├─ Cache MISS → Origin Request
│ │
│ ▼
│ Route 53 Latency-Based Routing
│ │
│ ├─ US users → us-east-1 ALB
│ │ │
│ │ └─ Health check fails → eu-west-1 ALB (failover)
│ │
│ └─ EU users → eu-west-1 ALB
│ │
│ └─ Health check fails → us-east-1 ALB (failover)
Implementation checklist for multi-region failover:
- DNS TTL at 60 seconds for all records involved in failover.
- Health checks every 10 seconds with a failure threshold of 3 (30-second detection).
- CDN origin failover for per-request resilience (faster than DNS failover).
- Database replication across regions (Aurora Global Database or DynamoDB Global Tables).
- Session management in a shared store (ElastiCache Global Datastore or DynamoDB).
- Monitoring from multiple regions to distinguish between regional and global outages.
# Monitor DNS resolution from your CI/CD or monitoring system
# Check that failover is working by querying from multiple locations
for resolver in 8.8.8.8 1.1.1.1 208.67.222.222; do
echo "Resolver $resolver:"
dig +short @$resolver api.example.com A
done
# Monitor CDN cache hit ratio
aws cloudwatch get-metric-statistics \
--namespace AWS/CloudFront \
--metric-name CacheHitRate \
--dimensions Name=DistributionId,Value=E1234ABCDEF \
--start-time "$(date -u -d '1 hour ago' +%Y-%m-%dT%H:%M:%SZ)" \
--end-time "$(date -u +%Y-%m-%dT%H:%M:%SZ)" \
--period 300 \
--statistics Average
A healthy CDN should have a cache hit ratio above 80% for static content. If it is lower, review your caching headers and cache key configuration. Every cache miss is a request to your origin, which means more load on your infrastructure and higher latency for your users.
Need Help with Your DevOps?
DNS and CDN architecture is foundational infrastructure that directly impacts your application's performance, reliability, and user experience. At InstaDevOps, we design and implement global infrastructure architectures with multi-region failover, CDN optimization, and DNS strategies that keep your applications fast and available.
We offer fractional DevOps engineering starting at $2,999/month with no long-term contracts. Book a free 15-minute call to discuss your infrastructure needs: https://calendly.com/instadevops/15min
Top comments (0)