Dipo Ajayi

Posted on Jun 5

38 Domains, One Session. What the DNS Migration Tools Didn't Show Me.

#devops #infrastructure #automation #networking

Thirty-eight domains. One session. No user-visible downtime.

That's the result. But the process looked nothing like the step-by-step guides promise — and several of the assumptions those guides are built on didn't hold in my environment.

This is what I found, what I can and can't explain, and the pipeline that made the migration safe.

The Setup

I manage DNS for a portfolio of ~40 domains — mostly businesses, NGOs, and client sites registered on Namecheap. The goal was to centralise DNS management under Cloudflare without transferring registrations. Nameservers move; domains stay put.

The standard approach, which every Cloudflare migration guide will walk you through:

Add the zone in Cloudflare
Let the auto-scan pull in your existing records
Review and confirm
Update nameservers at the registrar

I followed that on the first domain.

Cloudflare's auto-scan returned zero records.

Why the Auto-Scan Came Back Empty (and Why I'm Not Entirely Sure)

Let me be precise about what happened and what I can claim.

Cloudflare's zone import doesn't perform a full zone transfer (AXFR) — almost no public authoritative DNS server allows that. Instead, it queries a dictionary of roughly 100–200 common subdomains (www, mail, ftp, smtp, etc.) alongside apex records. If your records live outside that predefined list, the scan won't find them.

That's the architectural constraint. What I can't say with certainty is why the scan returned zero records on multiple active domains in this specific session. The most plausible explanations in my environment: non-standard record setups that fell outside the scan's dictionary, or Namecheap's authoritative servers rate-limiting the scan during a bulk session. I didn't isolate the cause conclusively.

What I can say is that the auto-scan — whatever the reason — was not reliable enough to trust as the sole source of record data. So I stopped relying on it and built an explicit pipeline instead.

The Pipeline: Two Sources, One Hard Gate

For every domain:

Query Namecheap's getHosts API to get the registrar's view of deployed records.
Run dig against the live authoritative nameservers to get what's actually served.
Merge both sources, flag conflicts, resolve them.
Push the reconciled records to Cloudflare via API.
Verify by querying Cloudflare's designated nameservers directly (dig @ben.ns.cloudflare.com) — confirming the zone was active and resolving correctly before the domain ever delegated to them publicly.
Only then run the nameserver cutover at Namecheap — and only if verification passed.

That last step is the one that matters most. Verification was a hard gate — the script refused to proceed on a mismatch. Not "looks close enough." Pass or no cutover.

One note on the pipeline's known limits: active network probing via dig can confirm a wildcard record exists, but it cannot map out which explicit subdomains are defined behind it. The getHosts API was the primary source of truth for catching explicitly defined records that would otherwise be masked by a wildcard fallback.

The Gotchas That Would Have Broken Things

Namecheap's hidden MX records (strongest finding)

Domains using Gmail or Namecheap's email forwarding may show zero MX records in the getHosts API. The records are real — they resolve correctly in live DNS — but Namecheap injects them at the platform level, outside the standard host records API.

This is the most significant finding in the whole migration. If you migrate DNS for a domain with active email and your only source of truth is the registrar API, you will remove the MX records and silently break inbound mail.

Important caveat: this is not universal to all Namecheap zones. Many zones expose MX records normally. The specific configurations affected are those using Namecheap's pre-set mail handling (Gmail preset, email forwarding preset). The lesson is narrower but still critical: for any domain with active email, source MX from live authoritative DNS, not from the registrar API alone.

Email forwarding doesn't migrate — only DNS records do

Three domains used Namecheap's email forwarding (eforward*.registrar-servers.com). This is a platform service, not a portable DNS record. When you move nameservers off Namecheap, the forwarding stops.

This distinction matters: a DNS migration moves records, not services built on top of DNS. On those three domains, forwarding was dropped by agreement; web records were migrated. But if you don't audit this in advance, you find out when the first forwarded email bounces after cutover.

URL redirects aren't DNS records

Namecheap's proprietary URL301 record type handles HTTP redirects inside their platform. It has no DNS equivalent — DNS has no redirect record type. Move the nameservers and those redirects simply stop working.

Six domains needed this rebuilt. The modern replacement is Cloudflare Redirect Rules, built on the Ruleset Engine — not legacy Page Rules, which Cloudflare has deprecated. Each rule maps domain.com/* → https://target.com/$1, preserving query strings, HTTP → HTTPS upgraded.

One practical caveat: the Ruleset Engine endpoint requires explicit Zone:Redirect Rules:Edit scope on your API token. A Zone:Edit + DNS:Edit token returns a 10000 Authentication error against the rulesets API and silently forces you onto Page Rules. If your token is scoped for DNS-only, that's the fallback you'll land on — which works, but leaves you on a deprecated feature you'll have to migrate off later.

Proxied records and mail delivery

One domain had its apex A record and mail CNAME proxied (orange-clouded) through Cloudflare's CDN, while the MX pointed to the apex hostname. Cloudflare's CDN proxy only handles HTTP/HTTPS traffic on standard web ports. If your MX record points to an orange-clouded hostname, inbound mail servers trying to establish an SMTP handshake on port 25 will be rejected at Cloudflare's edge — the proxy has no listener for it.

The fix: grey-cloud any hostname used as a mail target. The domain had 12 records total; all were switched to DNS-only before cutover rather than sorting through which specific records were the MX targets and which weren't.

What Made Zero User-Visible Downtime Possible

No uptime monitoring was running during cutover — "no user-visible downtime" means no reported email failures, bounced messages, or site outages were observed in the hours following each cutover. That's the honest scope of the claim.

Two things made it the likely outcome:

Original records were never touched. Namecheap's records stayed exactly as they were throughout. During the DNS propagation window — the hours after a nameserver change when external resolvers are still serving cached responses pointing at the old nameservers — users hitting those cached records still resolved correctly, because the old records were intact and live. Rollback at any point was a single API call.
The hard gate on verification. Before the nameserver change at Namecheap, every domain's records had to resolve correctly when queried directly against Cloudflare's designated nameservers. This confirmed the zone was committed and live on Cloudflare's infrastructure before any public traffic would ever touch those nameservers. No domain was cut over until that check passed.

The Breakdown Across 38 Domains

Category	Count	Notes
No-email / simple web	5	Mirror A/CNAME, cut over
Google Workspace email	3	MX sourced from live dig, not getHosts
Zoho email	11	Mirror MX + SPF + DKIM + zoho-verification TXT
URL-forwarding	6	Rebuilt as Cloudflare Redirect Rules
Email-forwarding (eforward)	3	Forwarding dropped; web-only migration
External-DNS (dig-rebuilt)	2	Authoritative NS not Namecheap; all records from live dig
cPanel (whogohost)	1	Grey-clouded all records before cutover
Already on Cloudflare	4	Records confirmed, no action needed
Excluded	1	One domain intentionally left on NS1/AWS — live load-balanced app, requires separate review

The scan failures, API mismatches, and manual interventions weren't tracked by type in the manifest — that's a gap worth fixing in future runs.

The Broader Point

The takeaway isn't "the tools are broken." It's that DNS migrations cross multiple layers — registrar platform services, API representations, live DNS resolution, CDN proxy behaviour — and no single tool has full visibility across all of them. The auto-scan sees what public DNS serves. The registrar API sees what the UI configured. Neither sees what the other knows.

The safe approach: query multiple sources, require agreement, and don't cut over until the records you pushed are independently verified as live on the new nameservers. That principle scales from 2 domains to 200.

Where the overhead of a full verification pipeline isn't warranted (personal blogs, low-stakes domains), at minimum source MX from live authoritative DNS and confirm email forwarding dependencies before you touch nameservers.

Rather than maintaining a public repo for the entire toolchain (which involves client-specific logic I'd have to scrub and babysit), I've pulled out the technical meat. The core reconciliation function contains the Python that merges the getHosts API payload with the live dig results to build the reconciled Cloudflare payload.

Tags: DNS, Cloudflare, DevOps, Infrastructure, Web Development

DEV Community