DEV Community

InstaDevOps
InstaDevOps

Posted on • Originally published at instadevops.com

DNS and CDN Architecture: Building Fast, Resilient Global Infrastructure

Introduction

DNS is the most critical and least understood part of most infrastructure. When DNS is working, nobody thinks about it. When it breaks, everything breaks. Your application, your API, your email, your monitoring dashboards that would help you debug the DNS issue, all of them depend on DNS resolution.

CDN configuration is similarly important but often treated as an afterthought. Teams deploy CloudFront or Cloudflare, accept the defaults, and never think about cache hit ratios, edge computing, or failover behavior until there is an incident.

This guide covers the DNS and CDN architecture decisions that matter for production systems, from basic setup through multi-region failover, with specific configurations for Route 53, CloudFront, and Cloudflare.

DNS Fundamentals That Matter in Production

Before diving into architecture, let us establish the DNS concepts that directly affect your infrastructure decisions.

Record types you will use most:

Type Purpose Example
A Maps domain to IPv4 address api.example.com → 203.0.113.50
AAAA Maps domain to IPv6 address api.example.com → 2001:db8::1
CNAME Alias to another domain www.example.com → example.com
ALIAS/ANAME CNAME-like at zone apex example.com → d1234.cloudfront.net
MX Mail server routing example.com → 10 mail.example.com
TXT Verification and policy SPF, DKIM, DMARC records
NS Nameserver delegation example.com → ns-1234.awsdns-12.org

TTL (Time To Live) controls how long resolvers cache your records. Lower TTL means faster propagation of changes but more DNS queries hitting your nameservers. Higher TTL means better performance but slower failover.

# Check current TTL and propagation
dig +noall +answer api.example.com A

# Check from multiple locations
dig @8.8.8.8 api.example.com A     # Google DNS
dig @1.1.1.1 api.example.com A     # Cloudflare DNS
dig @208.67.222.222 api.example.com A  # OpenDNS
Enter fullscreen mode Exit fullscreen mode

Production TTL recommendations:

  • Static content CDN domains: 86400 (24 hours)
  • API endpoints (stable): 300-600 (5-10 minutes)
  • API endpoints (failover-ready): 60 (1 minute)
  • During migrations: 60 or lower, set 24-48 hours before the change

Route 53 Configuration for Production

Route 53 is AWS's DNS service and the most common choice for teams running on AWS. Its killer feature is routing policies that go beyond simple DNS records.

Basic hosted zone setup:

# Create a hosted zone
aws route53 create-hosted-zone \
  --name example.com \
  --caller-reference "$(date +%s)"

# The output includes nameservers - update these at your domain registrar
Enter fullscreen mode Exit fullscreen mode

Health checks and failover routing:

# Create a health check for the primary endpoint
aws route53 create-health-check --caller-reference "primary-api" \
  --health-check-config '{
    "IPAddress": "203.0.113.50",
    "Port": 443,
    "Type": "HTTPS",
    "ResourcePath": "/health",
    "RequestInterval": 10,
    "FailureThreshold": 3,
    "EnableSNI": true,
    "FullyQualifiedDomainName": "api.example.com"
  }'
Enter fullscreen mode Exit fullscreen mode
// Primary record (us-east-1)
{
  "Name": "api.example.com",
  "Type": "A",
  "SetIdentifier": "primary",
  "Failover": "PRIMARY",
  "TTL": 60,
  "ResourceRecords": [{"Value": "203.0.113.50"}],
  "HealthCheckId": "abcd-1234-health-check-id"
}

// Secondary record (eu-west-1)
{
  "Name": "api.example.com",
  "Type": "A",
  "SetIdentifier": "secondary",
  "Failover": "SECONDARY",
  "TTL": 60,
  "ResourceRecords": [{"Value": "198.51.100.25"}]
}
Enter fullscreen mode Exit fullscreen mode

When the primary health check fails (3 consecutive failures at 10-second intervals = 30 seconds), Route 53 automatically starts returning the secondary IP. With a 60-second TTL, most clients will failover within 90 seconds.

Latency-based routing sends users to the closest region:

// US East record
{
  "Name": "api.example.com",
  "Type": "A",
  "SetIdentifier": "us-east-1",
  "Region": "us-east-1",
  "TTL": 60,
  "AliasTarget": {
    "HostedZoneId": "Z35SXDOTRQ7X7K",
    "DNSName": "us-east-alb-1234.us-east-1.elb.amazonaws.com",
    "EvaluateTargetHealth": true
  }
}

// EU West record
{
  "Name": "api.example.com",
  "Type": "A",
  "SetIdentifier": "eu-west-1",
  "Region": "eu-west-1",
  "TTL": 60,
  "AliasTarget": {
    "HostedZoneId": "Z32O12XQLNTSW2",
    "DNSName": "eu-west-alb-5678.eu-west-1.elb.amazonaws.com",
    "EvaluateTargetHealth": true
  }
}
Enter fullscreen mode Exit fullscreen mode

Route 53 determines the user's region based on the location of the DNS resolver they are using. This works well for most cases but can route incorrectly when users use public DNS resolvers (like 8.8.8.8) that are anycast and may resolve from a different location than the user.

CloudFront CDN Configuration

CloudFront is AWS's CDN with 600+ edge locations globally. For static websites and API acceleration, proper CloudFront configuration can reduce latency by 50-80% for geographically distributed users.

# Create a CloudFront distribution for a static website on S3
aws cloudfront create-distribution --distribution-config '{
  "CallerReference": "static-site-2026",
  "Origins": {
    "Quantity": 1,
    "Items": [{
      "Id": "S3-static-site",
      "DomainName": "my-site-bucket.s3.amazonaws.com",
      "S3OriginConfig": {
        "OriginAccessIdentity": ""
      },
      "OriginAccessControlId": "E2QWRUHEXAMPLE"
    }]
  },
  "DefaultCacheBehavior": {
    "TargetOriginId": "S3-static-site",
    "ViewerProtocolPolicy": "redirect-to-https",
    "CachePolicyId": "658327ea-f89d-4fab-a63d-7e88639e58f6",
    "Compress": true,
    "AllowedMethods": {
      "Quantity": 2,
      "Items": ["GET", "HEAD"]
    }
  },
  "Aliases": {
    "Quantity": 1,
    "Items": ["www.example.com"]
  },
  "ViewerCertificate": {
    "ACMCertificateArn": "arn:aws:acm:us-east-1:123456789012:certificate/abc123",
    "SSLSupportMethod": "sni-only",
    "MinimumProtocolVersion": "TLSv1.2_2021"
  },
  "HttpVersion": "http2and3",
  "DefaultRootObject": "index.html",
  "Enabled": true,
  "Comment": "Production static site"
}'
Enter fullscreen mode Exit fullscreen mode

Cache policies control what CloudFront caches and for how long. The default policies work for most cases, but for APIs you need custom policies:

{
  "CachePolicyConfig": {
    "Name": "API-Cache-Policy",
    "DefaultTTL": 0,
    "MaxTTL": 86400,
    "MinTTL": 0,
    "ParametersInCacheKeyAndForwardedToOrigin": {
      "EnableAcceptEncodingGzip": true,
      "EnableAcceptEncodingBrotli": true,
      "HeadersConfig": {
        "HeaderBehavior": "whitelist",
        "Headers": {
          "Quantity": 1,
          "Items": ["Authorization"]
        }
      },
      "QueryStringsConfig": {
        "QueryStringBehavior": "all"
      },
      "CookiesConfig": {
        "CookieBehavior": "none"
      }
    }
  }
}
Enter fullscreen mode Exit fullscreen mode

Origin failover provides automatic failover between origin servers:

{
  "OriginGroups": {
    "Quantity": 1,
    "Items": [{
      "Id": "api-origin-group",
      "FailoverCriteria": {
        "StatusCodes": {
          "Quantity": 4,
          "Items": [500, 502, 503, 504]
        }
      },
      "Members": {
        "Quantity": 2,
        "Items": [
          { "OriginId": "api-us-east-1" },
          { "OriginId": "api-eu-west-1" }
        ]
      }
    }]
  }
}
Enter fullscreen mode Exit fullscreen mode

When CloudFront gets a 5xx from the primary origin, it automatically retries the request against the secondary origin. This failover is per-request and happens at the edge, so it is much faster than DNS failover.

Cloudflare as an Alternative

Cloudflare offers a compelling alternative to CloudFront, especially for teams that are not all-in on AWS. Its free tier is generous, the dashboard is more intuitive, and features like Workers, automatic image optimization, and DDoS protection are included.

Key Cloudflare configurations for production:

# Using Cloudflare API to configure DNS records
curl -X POST "https://api.cloudflare.com/client/v4/zones/{zone_id}/dns_records" \
  -H "Authorization: Bearer $CF_API_TOKEN" \
  -H "Content-Type: application/json" \
  --data '{
    "type": "A",
    "name": "api.example.com",
    "content": "203.0.113.50",
    "ttl": 1,
    "proxied": true
  }'
Enter fullscreen mode Exit fullscreen mode

When proxied: true, traffic flows through Cloudflare's network, enabling caching, DDoS protection, and WAF rules. When proxied: false (DNS-only), Cloudflare just serves DNS.

Page Rules and Cache Rules control caching behavior:

# Cache everything on the static assets subdomain
curl -X POST "https://api.cloudflare.com/client/v4/zones/{zone_id}/rulesets" \
  -H "Authorization: Bearer $CF_API_TOKEN" \
  -H "Content-Type: application/json" \
  --data '{
    "name": "Static Assets Cache",
    "kind": "zone",
    "phase": "http_request_cache_settings",
    "rules": [{
      "expression": "(http.host eq \"static.example.com\")",
      "action": "set_cache_settings",
      "action_parameters": {
        "cache": true,
        "edge_ttl": {
          "mode": "override_origin",
          "default": 2592000
        },
        "browser_ttl": {
          "mode": "override_origin",
          "default": 86400
        }
      }
    }]
  }'
Enter fullscreen mode Exit fullscreen mode

Cloudflare Workers run JavaScript at the edge, enabling powerful patterns like A/B testing, request routing, and API response transformation without touching your origin:

// worker.js - Edge-side API response caching with stale-while-revalidate
export default {
  async fetch(request, env, ctx) {
    const cacheKey = new Request(request.url, request);
    const cache = caches.default;

    let response = await cache.match(cacheKey);

    if (response) {
      // Return cached response immediately
      // Revalidate in the background if older than 60 seconds
      const age = Date.now() - new Date(response.headers.get('x-cache-time')).getTime();
      if (age > 60000) {
        ctx.waitUntil(revalidate(request, cacheKey, cache));
      }
      return response;
    }

    return await revalidate(request, cacheKey, cache);
  }
};

async function revalidate(request, cacheKey, cache) {
  const response = await fetch(request);
  const newResponse = new Response(response.body, response);
  newResponse.headers.set('x-cache-time', new Date().toISOString());
  newResponse.headers.set('Cache-Control', 'public, max-age=120');

  await cache.put(cacheKey, newResponse.clone());
  return newResponse;
}
Enter fullscreen mode Exit fullscreen mode

Caching Headers and Cache Control Strategy

Your CDN is only as effective as your caching headers. If your origin sends Cache-Control: no-cache on everything, your CDN is just an expensive reverse proxy.

# Nginx configuration for optimal caching headers
server {
    listen 443 ssl http2;
    server_name api.example.com;

    # Static assets - cache aggressively (hashed filenames for cache busting)
    location ~* \.(js|css|png|jpg|jpeg|gif|svg|woff2|ico)$ {
        expires 1y;
        add_header Cache-Control "public, max-age=31536000, immutable";
        add_header Vary "Accept-Encoding";
    }

    # HTML pages - short cache, revalidate
    location ~* \.html$ {
        add_header Cache-Control "public, max-age=300, must-revalidate";
        add_header Vary "Accept-Encoding";
    }

    # API responses - no CDN cache, allow browser cache
    location /api/ {
        add_header Cache-Control "private, max-age=0, must-revalidate";
        add_header Vary "Authorization, Accept-Encoding";
    }

    # API responses that are cacheable (public, non-personalized)
    location /api/products {
        add_header Cache-Control "public, max-age=60, s-maxage=300";
        add_header Vary "Accept-Encoding";
    }
}
Enter fullscreen mode Exit fullscreen mode

The distinction between max-age and s-maxage is important:

  • max-age controls browser cache duration.
  • s-maxage controls CDN/proxy cache duration (overrides max-age for shared caches).

This lets you cache at the CDN for 5 minutes while telling browsers to cache for only 1 minute, giving you a balance between performance and freshness.

Cache invalidation when you deploy:

# CloudFront invalidation
aws cloudfront create-invalidation \
  --distribution-id E1234ABCDEF \
  --paths "/*"

# Targeted invalidation (faster and cheaper)
aws cloudfront create-invalidation \
  --distribution-id E1234ABCDEF \
  --paths "/index.html" "/api/products*"

# Cloudflare cache purge
curl -X POST "https://api.cloudflare.com/client/v4/zones/{zone_id}/purge_cache" \
  -H "Authorization: Bearer $CF_API_TOKEN" \
  -H "Content-Type: application/json" \
  --data '{"purge_everything": true}'

# Cloudflare targeted purge
curl -X POST "https://api.cloudflare.com/client/v4/zones/{zone_id}/purge_cache" \
  -H "Authorization: Bearer $CF_API_TOKEN" \
  -H "Content-Type: application/json" \
  --data '{"files": ["https://example.com/styles.css", "https://example.com/app.js"]}'
Enter fullscreen mode Exit fullscreen mode

Multi-Region Failover Architecture

For services that require high availability across regions, combine DNS routing with CDN origin failover for defense in depth.

A strong multi-region architecture:

User Request
  │
  ▼
Cloudflare/CloudFront (Global Edge)
  │
  ├─ Cache HIT → Return immediately
  │
  ├─ Cache MISS → Origin Request
  │     │
  │     ▼
  │   Route 53 Latency-Based Routing
  │     │
  │     ├─ US users → us-east-1 ALB
  │     │     │
  │     │     └─ Health check fails → eu-west-1 ALB (failover)
  │     │
  │     └─ EU users → eu-west-1 ALB
  │           │
  │           └─ Health check fails → us-east-1 ALB (failover)
Enter fullscreen mode Exit fullscreen mode

Implementation checklist for multi-region failover:

  1. DNS TTL at 60 seconds for all records involved in failover.
  2. Health checks every 10 seconds with a failure threshold of 3 (30-second detection).
  3. CDN origin failover for per-request resilience (faster than DNS failover).
  4. Database replication across regions (Aurora Global Database or DynamoDB Global Tables).
  5. Session management in a shared store (ElastiCache Global Datastore or DynamoDB).
  6. Monitoring from multiple regions to distinguish between regional and global outages.
# Monitor DNS resolution from your CI/CD or monitoring system
# Check that failover is working by querying from multiple locations
for resolver in 8.8.8.8 1.1.1.1 208.67.222.222; do
  echo "Resolver $resolver:"
  dig +short @$resolver api.example.com A
done

# Monitor CDN cache hit ratio
aws cloudwatch get-metric-statistics \
  --namespace AWS/CloudFront \
  --metric-name CacheHitRate \
  --dimensions Name=DistributionId,Value=E1234ABCDEF \
  --start-time "$(date -u -d '1 hour ago' +%Y-%m-%dT%H:%M:%SZ)" \
  --end-time "$(date -u +%Y-%m-%dT%H:%M:%SZ)" \
  --period 300 \
  --statistics Average
Enter fullscreen mode Exit fullscreen mode

A healthy CDN should have a cache hit ratio above 80% for static content. If it is lower, review your caching headers and cache key configuration. Every cache miss is a request to your origin, which means more load on your infrastructure and higher latency for your users.

Need Help with Your DevOps?

DNS and CDN architecture is foundational infrastructure that directly impacts your application's performance, reliability, and user experience. At InstaDevOps, we design and implement global infrastructure architectures with multi-region failover, CDN optimization, and DNS strategies that keep your applications fast and available.

We offer fractional DevOps engineering starting at $2,999/month with no long-term contracts. Book a free 15-minute call to discuss your infrastructure needs: https://calendly.com/instadevops/15min

Top comments (0)