APIVerve

Posted on Feb 26 • Edited on Mar 16 • Originally published at blog.apiverve.com

The DNS Migration Checklist Nobody Talks About

#dns #infrastructure #devops #migration

"DNS propagation takes 24-48 hours."

You've heard this. You've probably said it. It's become the standard disclaimer for any DNS change.

But here's the thing: it's not really true. Or rather, it's true in a way that obscures what's actually happening and what you can control.

I've watched teams sit nervously for two days after a DNS change, refreshing their browser, wondering if it "propagated yet." I've also watched teams complete migrations in under an hour with zero downtime.

The difference isn't luck. It's preparation.

Why "24-48 Hours" Is Misleading

The 24-48 hour estimate comes from the maximum TTL values commonly seen in DNS records. If your A record has a TTL of 86400 seconds (24 hours), then yes, some DNS resolvers might cache that record for up to 24 hours.

But that's the maximum, not the average. And it assumes you made no preparations.

Here's what actually determines propagation time:

Your TTL settings. A record with a 300-second (5-minute) TTL will propagate in minutes, not days. The TTL tells resolvers how long they can cache the record before checking for updates.

When resolvers last cached. If a resolver cached your record 23 hours ago with a 24-hour TTL, it will check for updates in an hour. If it cached it 1 minute ago, you're waiting almost a full day.

Resolver behavior. Most resolvers respect TTL. Some don't. Some ISPs cache longer than specified to reduce load. Some corporate networks have aggressive caching.

Your preparation. This is the variable you control.

The Preparation Phase (72 Hours Before)

The single most important thing you can do for a smooth DNS migration happens days before the actual change: lower your TTL.

If your current TTL is 86400 (24 hours), change it to 300 (5 minutes) at least 72 hours before your planned migration. Why 72 hours? Because you need the old, high TTL to expire everywhere first.

Here's the logic:

Current state: TTL 86400, IP 1.2.3.4
T-72h: Change TTL to 300 (IP still 1.2.3.4)
Over the next 72 hours, all caches eventually expire and re-fetch
Now everyone has TTL 300 cached
Migration day: Change IP to 5.6.7.8
Within ~5 minutes, everyone has the new IP

Skip the TTL reduction? Some resolvers will serve the old IP for up to 24 hours after you change it. That's 24 hours of partial outage.

// Check your current TTL before migration
async function checkCurrentTTL(domain) {
  const response = await fetch(
    `https://api.apiverve.com/v1/dnslookup?domain=${domain}&type=A`,
    { headers: { 'x-api-key': 'YOUR_API_KEY' } }
  );
  const { data } = await response.json();

  console.log(`Current TTL: ${data.ttl} seconds`);

  if (data.ttl > 300) {
    console.log('Warning: TTL is high. Lower to 300 before migration.');
    console.log(`Wait at least ${data.ttl} seconds after lowering before proceeding.`);
  }

  return data;
}

The Infrastructure Phase (24 Hours Before)

Your new infrastructure should be fully operational before you touch DNS. This seems obvious, but I've seen teams change DNS to point to servers that aren't ready.

Set up the new environment completely. Application deployed, SSL certificates installed, database connected, health checks passing.

Test with host file overrides. On your local machine, edit /etc/hosts (or C:\Windows\System32\drivers\etc\hosts) to point the domain to the new IP. Browse the site. Test critical flows. Find problems before users do.

Configure the new server to respond to the domain. Web servers often need explicit configuration to handle requests for a specific domain. Make sure nginx, Apache, or your load balancer is configured correctly.

Set up monitoring on the new infrastructure. You want to know immediately if something goes wrong after cutover, not when users report problems hours later.

The Verification Phase (Hours Before)

Before making the DNS change, verify everything one more time.

Confirm TTL has been lowered long enough. If you lowered TTL 48 hours ago from a 24-hour TTL, you're good. If you lowered it 6 hours ago, wait.

Verify new infrastructure health. Run your test suite against the new servers. Check logs for errors. Confirm SSL certificates are valid using an SSL certificate checker.

Check current DNS records from multiple locations. This establishes your baseline and confirms everything is working.

import requests

def verify_pre_migration(domain, expected_old_ip):
    response = requests.get(
        'https://api.apiverve.com/v1/dnspropagation',
        params={'domain': domain, 'type': 'A'},
        headers={'x-api-key': 'YOUR_API_KEY'}
    )
    data = response.json()['data']

    # Check all locations return expected IP
    for loc in data['propagation']:
        if loc['result'] != expected_old_ip:
            print(f"Warning: {loc['location']} shows {loc['result']}, expected {expected_old_ip}")
            return False

    print("All locations consistent. Ready for migration.")
    return True

The Migration Phase

Now you're ready to make the actual DNS change. This should be the least stressful part if you've prepared correctly.

Make the change at your DNS provider. Update the A record (or CNAME, or whatever you're changing) to point to the new destination.

Start monitoring propagation immediately. Don't just wait and hope.

async function monitorPropagation(domain, expectedIP) {
  const checkInterval = 60000; // Check every minute
  const maxWait = 3600000; // Wait up to 1 hour
  const startTime = Date.now();

  while (Date.now() - startTime < maxWait) {
    const response = await fetch(
      `https://api.apiverve.com/v1/dnspropagation?domain=${domain}&type=A&expected=${expectedIP}`,
      { headers: { 'x-api-key': 'YOUR_API_KEY' } }
    );
    const { data } = await response.json();

    const propagated = data.propagation.filter(loc => loc.result === expectedIP);
    const pending = data.propagation.filter(loc => loc.result !== expectedIP);

    console.log(`Propagated: ${propagated.length}/${data.propagation.length}`);

    if (pending.length > 0) {
      console.log('Still waiting on:', pending.map(p => p.location).join(', '));
    }

    if (data.fullyPropagated) {
      console.log('Full propagation confirmed!');
      return true;
    }

    await new Promise(r => setTimeout(r, checkInterval));
  }

  console.log('Warning: Did not reach full propagation within timeout');
  return false;
}

The Parallel Running Phase

Here's what separates zero-downtime migrations from "hope nobody notices" migrations: keep both environments running.

After changing DNS, don't immediately shut down the old infrastructure. Some users will still be hitting it due to caching. Some corporate networks might cache longer than expected.

Run both environments in parallel for at least 24 hours after the DNS change. Monitor both. Compare traffic patterns.

Old infrastructure traffic declining: Good sign. Propagation is working.

Old infrastructure traffic staying high: Something might be cached longer than expected. Keep it running.

Errors on new infrastructure: You caught them before full cutover. Fix them.

The Post-Migration Phase

Once propagation is complete and the old infrastructure shows minimal traffic:

Increase TTL back to normal values. A TTL of 300 seconds means lots of DNS queries. For stable records, 3600 (1 hour) or 86400 (24 hours) is more efficient.

Update any hardcoded IPs. Check configuration files, firewall rules, monitoring systems, documentation. Anything that referenced the old IP needs updating. It's also a good time to verify your domain expiration dates so you don't face a lapsed registration on top of a fresh migration.

Decommission old infrastructure. But not too fast. I recommend keeping it available (even if not running) for a week, just in case.

Document what you did. The next migration will be easier with notes from this one.

Common Migration Scenarios

Scenario 1: Changing Hosting Providers

You're moving from AWS to GCP, or from one VPS to another.

Set up new server, deploy application, test thoroughly
Lower TTL to 300 on A record (wait 48h+)
Change A record to new IP
Monitor propagation
Keep old server running 24h
Increase TTL, decommission old server

Scenario 2: Adding a CDN

You're putting Cloudflare or another CDN in front of your origin.

Configure CDN, test with CDN's preview/dev URL
Lower TTL to 300 (wait 48h+)
Change DNS to CDN's provided records (usually CNAME)
Monitor propagation and CDN health
Verify SSL, caching behavior, origin connectivity
Increase TTL

Scenario 3: Changing Nameservers

Moving DNS management to a different provider. This is higher risk because it affects ALL records.

Export complete zone file from current provider
Import to new provider, verify all records match
Lower TTL on NS records at registrar (not always possible)
Update nameservers at domain registrar
Wait for NS propagation (this can genuinely take 24-48h)
Verify all record types resolve correctly from multiple locations

The Things That Go Wrong

Even with perfect preparation, things go wrong. Here's what to watch for:

SSL certificate mismatch. The new server doesn't have the right SSL cert, or it expired, or it's for the wrong domain. Users see security warnings or connections fail.

Application configuration. The app is listening on localhost instead of all interfaces. Or it's not configured to accept requests for this domain. Or environment variables are wrong.

Database connectivity. The new server can't reach the database. Security groups, connection strings, credentials — lots can go wrong here.

Missing redirects. Old URLs don't work on the new server. Users hitting cached links get 404s.

Caching surprises. Your CDN is caching the old content. Or your application cache isn't invalidated. Or the browser is caching aggressively.

ISP-level caching. Some ISPs cache DNS longer than TTL specifies. Users on certain networks might hit the old IP for an extended period.

The Rollback Plan

Every migration needs a rollback plan. Before you start, know exactly how to undo the change.

For most DNS changes, rollback is simple: change the record back to the old value. But that requires the old infrastructure still being available.

Here's your rollback checklist:

[ ] Old infrastructure remains running during migration
[ ] Old IP address documented
[ ] DNS change can be reverted in under 5 minutes
[ ] Team knows who can make DNS changes and how
[ ] Monitoring in place to detect problems quickly

If you've lowered TTL, rollback propagates quickly too. That's a feature, not a bug.

Automating the Process

For frequent migrations or as part of CI/CD, automate the propagation checking:

import requests
import time
import sys

def migrate_dns(domain, new_ip, api_key, timeout_minutes=60):
    """
    Monitor DNS propagation after making a change.
    Returns True when fully propagated, False on timeout.
    """
    start_time = time.time()
    timeout_seconds = timeout_minutes * 60

    while time.time() - start_time < timeout_seconds:
        response = requests.get(
            'https://api.apiverve.com/v1/dnspropagation',
            params={
                'domain': domain,
                'type': 'A',
                'expected': new_ip
            },
            headers={'x-api-key': api_key}
        )
        data = response.json()['data']

        propagated_count = sum(
            1 for loc in data['propagation']
            if loc['result'] == new_ip
        )
        total_count = len(data['propagation'])

        print(f"[{int(time.time() - start_time)}s] "
              f"Propagation: {propagated_count}/{total_count}")

        if data['fullyPropagated']:
            print("Migration complete: DNS fully propagated")
            return True

        time.sleep(30)  # Check every 30 seconds

    print("Warning: Timeout reached before full propagation")
    return False

# Usage in CI/CD pipeline
if __name__ == '__main__':
    success = migrate_dns(
        domain='example.com',
        new_ip='5.6.7.8',
        api_key='YOUR_API_KEY'
    )
    sys.exit(0 if success else 1)

The Real Checklist

Here's the complete checklist, condensed:

72+ hours before:

[ ] Lower TTL to 300 seconds
[ ] Verify TTL change propagated

24 hours before:

[ ] New infrastructure fully operational
[ ] Application tested via hosts file override
[ ] SSL certificates valid
[ ] Monitoring configured

Just before:

[ ] Verify current DNS from multiple locations
[ ] Confirm old infrastructure baseline
[ ] Rollback plan documented

During:

[ ] Make DNS change
[ ] Start propagation monitoring
[ ] Watch for errors on new infrastructure

After (24 hours):

[ ] Verify full propagation
[ ] Confirm old infrastructure traffic dropped
[ ] Increase TTL
[ ] Decommission old infrastructure

After (1 week):

[ ] Remove old infrastructure completely
[ ] Update documentation

The Bottom Line

DNS migrations don't have to be stressful. The "24-48 hour propagation" warning exists because people skip the preparation steps that make migrations fast and safe.

Lower your TTL early. Run infrastructure in parallel. Monitor propagation actively. Have a rollback plan.

With preparation, most DNS migrations can complete with verified propagation in under an hour. Without preparation, you're hoping that TTL values you set years ago happen to be low enough.

Don't hope. Prepare.

Need to monitor your next DNS migration? The DNS Propagation API checks propagation from multiple global locations in real-time. Verify your changes have reached all regions before decommissioning old infrastructure. The DNS Lookup API helps you check current records and TTL values before you start.

Originally published at APIVerve Blog

DEV Community