DEV Community

Cover image for DNS Health Audit: Best Practices for a Reliable Domain
toolbox-poster
toolbox-poster

Posted on • Originally published at toolbox.starnomina.tn

DNS Health Audit: Best Practices for a Reliable Domain

TL;DR
A healthy DNS configuration is the invisible foundation of every online service. Misconfigurations—lame delegations, stale records, missing DNSSEC signatures—silently degrade availability and security. This guide provides a structured DNS health audit checklist, recommended TTL values, and monitoring strategies to keep your zones in top shape.

📑 Table of Contents

  • What Is a DNS Health Audit?
  • The DNS Health Checklist
  • Recommended TTL Values by Record Type
  • DNSSEC Chain of Trust
  • Zone Delegation & Lame Delegation
  • Monitoring Strategies
  • Best Practices
  • Common Mistakes
  • Tools
  • References

What Is a DNS Health Audit?

A DNS health audit is a systematic review of your domain's DNS configuration to identify misconfigurations, security gaps, and performance issues before they cause outages. It covers nameserver redundancy, record accuracy, DNSSEC integrity, and TTL tuning.

📖 Definition — A lame delegation occurs when a parent zone lists a nameserver as authoritative for a child zone, but that nameserver does not actually serve the zone—returning REFUSED or SERVFAIL instead of valid answers.

The DNS Health Checklist

Run through each item below during every audit cycle:

1. NS Redundancy — Verify at least two geographically diverse nameservers are configured and responding authoritatively.

2. SOA Serial — Confirm the SOA serial number increments with every zone change. Stale serials prevent secondary servers from syncing.

3. TTL Sanity — Check that TTL values match the record type's volatility. Overly short TTLs increase query load; overly long ones delay propagation.

4. Dangling CNAMEs — Scan for CNAME records pointing to decommissioned services—these are subdomain takeover vectors.

5. DNSSEC Validation — Verify the DS → DNSKEY → RRSIG chain is intact and signatures are not expired.

6. MX & SPF Alignment — Ensure MX records resolve and SPF includes all legitimate sending IPs.

Recommended TTL Values by Record Type

TTL values balance caching efficiency against change propagation speed. Use the table below as a starting point:

Record Type Recommended TTL Rationale
A / AAAA 300–3600 s Short for failover-enabled hosts; longer for static servers
CNAME 3600 s Rarely changes once set; cache-friendly
MX 3600 s Mail routing changes infrequently
TXT (SPF/DKIM) 3600 s Stable after deployment; lower during rollout
NS 86400 s (24 h) Nameserver changes are rare and must propagate widely
SOA 86400 s Negative caching TTL (SOA MINIMUM) should be 300–900 s

Pro Tip: 💡 Before a planned migration, lower TTLs to 60–300 seconds 48 hours in advance. After the migration is verified, restore production TTLs to reduce resolver load.

DNSSEC Chain of Trust

DNSSEC adds cryptographic signatures to DNS responses, preventing cache poisoning. The chain of trust works as follows:

  1. The parent zone publishes a DS record containing a hash of the child's DNSKEY.

  2. The child zone holds a DNSKEY RRset (KSK + ZSK) signed by the KSK.

  3. Every record set in the child zone has an RRSIG created with the ZSK.

  4. Resolvers validate from the root trust anchor down through each DS → DNSKEY → RRSIG link.

⚠️ DNSSEC signature expiration is the #1 cause of DNSSEC-related outages. Monitor RRSIG expiry dates and automate key rollovers.

Zone Delegation & Lame Delegation

When you delegate a subdomain (e.g., app.example.com) to a separate set of nameservers, the parent zone contains NS records pointing to those servers. A lame delegation occurs when:

  • The listed nameserver is unreachable or misconfigured.

  • The nameserver responds but is not authoritative for the zone (returns REFUSED).

  • The glue records in the parent zone have stale IP addresses.

Detect lame delegations by querying each NS record directly with dig +norec @ns1.example.com example.com SOA and verifying the aa (Authoritative Answer) flag is set.

Monitoring Strategies

Passive Monitoring

Enable query logging on your authoritative servers and analyze patterns: unexpected NXDOMAIN spikes, query volume anomalies, and SERVFAIL rates.

Active Monitoring

Schedule synthetic queries from multiple global vantage points every 60 seconds. Alert on response time > 200 ms, SERVFAIL responses, or RRSIG expiration within 7 days.

Best Practices

  • Use at least two nameservers on different networks and ideally different providers (multi-homing).

  • Automate SOA serial increments in your CI/CD pipeline.

  • Audit DNS records quarterly—remove orphaned records from decommissioned services.

  • Set the SOA MINIMUM (negative caching TTL) to 300–900 seconds per RFC 2308.

  • Store zone files in version control for auditability.

Common Mistakes

Mistake Impact Fix
Single nameserver Complete DNS failure on one server outage Add a secondary NS on a different network
TTL of 86400 s on A records for load-balanced hosts 24-hour delay before failover is visible Lower TTL to 300 s or use DNS-based health checks
Forgetting to update DS record after KSK rollover DNSSEC validation failure → domain unreachable Automate DS updates; use CDS/CDNSKEY (RFC 7344)
Dangling CNAME to deprovisioned cloud resource Subdomain takeover risk Audit CNAMEs monthly; remove stale records
Lame delegation left after nameserver migration Intermittent resolution failures Query every NS directly and verify authoritative flag

Tools

DNS Checker — Global propagation check across 50+ resolvers.

DNS Lookup — Query any record type against any resolver.

CNAME Lookup — Resolve CNAME chains and detect dangling records.

References

  • 📄 RFC 1035 — Domain Names: Implementation and Specification

  • 📄 RFC 4033 — DNS Security Introduction and Requirements (DNSSEC)

  • 📄 RFC 2308 — Negative Caching of DNS Queries

  • 📄 RFC 7344 — Automating DNSSEC Delegation Trust Maintenance

🎯 Key Takeaway: 🎯 A DNS health audit is not a one-time task—schedule quarterly reviews covering NS redundancy, TTL tuning, DNSSEC signature freshness, and dangling record cleanup. Automate what you can and monitor the rest.


Originally published on StarNomina ToolBox. Try our free online tools — no signup required.

Top comments (0)