Eunji

Posted on May 2

Why Lowering ndots Breaks Alpine Pods (But Not Debian) — A Deep Dive into glibc vs musl Resolvers

#kubernetes #devops #networking

If you're running Alpine-based pods in Kubernetes and someone tells you to lower ndots for better DNS performance — don't. Or at least, read this first.

We had 5 DNS queries firing for every external domain lookup. The fix seemed obvious: drop ndots:5 to ndots:2. An AI reviewer warned me it might break internal service resolution. The reasoning didn't hold up when I read the resolver code, so I went ahead — and broke things in a way I didn't expect.

The AI was right about the symptom but wrong about the cause. The breakage is real, but it lives in libc, not in the search algorithm.

TL;DR

Lowering ndots reduces DNS query amplification, but breaks internal service resolution on Alpine pods.

The cause isn't CoreDNS or Kubernetes — it's that musl libc skips the search list when dots ≥ ndots, while glibc falls back gracefully.

If you're on Alpine: switch base images, use FQDNs with a trailing dot, or roll out per-workload via dnsConfig.

The starting point: 5 queries for one domain

Every external DNS lookup in our cluster was producing 4-5 queries. This is well-known behavior — it's caused by the default ndots:5 combined with Kubernetes' three-entry search list.

The path of least resistance is to lower ndots. Most external domains have dots ≥ 2 (google.com, api.example.com), so ndots:2 skips the search-list traversal for them entirely.

spec:
  dnsConfig:
    options:
      - name: ndots
        value: "2"

Before shipping the change, I asked an AI assistant to review it. It warned that internal service resolution might break, with this reasoning:

With ndots:5 and a query for "my-svc.default" (1 dot):
  dots 1 < ndots 5 → search first
  my-svc.default.<ns>.svc.cluster.local  → NXDOMAIN
  my-svc.default.svc.cluster.local       → NOERROR  ✓

With ndots:1:
  dots 1 ≥ ndots 1 → original first
  my-svc.default                         → NXDOMAIN (doesn't exist externally)
  ... then what?

The "then what" was the question. I read the resolver source and concluded the AI was wrong: after the original query fails, the resolver should fall back to search-list traversal. The lookup should still succeed.

I was reading the wrong source.

The three config files that matter

Before going deeper, the surface area of "Kubernetes DNS" lives in three files. Knowing which one controls what saves a lot of pain.

#	File	What it controls	Where it lives	Reload
1	`Corefile`	Zones, plugin chain, fallthrough — all CoreDNS behavior	`coredns` ConfigMap → `/etc/coredns/Corefile`	Runtime
2	CoreDNS's `resolv.conf`	Upstream DNS that CoreDNS forwards to	CoreDNS Pod's `/etc/resolv.conf`	Pod creation only
3	App Pod's `resolv.conf`	`ndots`, search list — the part this post is about	App Pod's `/etc/resolv.conf`	Pod creation only

A Pod's DNS settings come from spec.dnsPolicy, default ClusterFirst, which inherits the pod's resolv.conf:

nameserver 10.96.0.10
search <namespace>.svc.cluster.local svc.cluster.local cluster.local
options ndots:5

That's the file libc reads. And libc is the part that decides whether to walk the search list or skip it.

How a query travels (and where the retry loop lives)

The key thing to notice in the flow: when a Pod's resolver gets NXDOMAIN, it retries with the next FQDN from the search list. That retry loop is where query amplification comes from. Lowering ndots is appealing because it skips this loop for high-dot names.

CoreDNS itself doesn't care about ndots. It just answers whatever FQDN arrives. The retry decision happens entirely on the client side, inside libc.

glibc vs musl: same input, different behavior

Here's the part the AI got right (in spirit) and I missed: the resolver isn't part of CoreDNS, Kubernetes, or even your app. It's the libc shipped in your container image. Different libcs implement search/ndots differently.

glibc — falls back gracefully

Distros: Debian, Ubuntu, CentOS, RHEL, Amazon Linux.

When dots ≥ ndots, glibc tries the original first. If that returns NXDOMAIN, it walks the search list anyway as a fallback. One or two extra queries, but resolution succeeds.

my-svc.default                         → NXDOMAIN
↓ search fallback
my-svc.default.<ns>.svc.cluster.local  → NXDOMAIN
my-svc.default.svc.cluster.local       → NOERROR  ✓

The fallback logic lives in __res_context_search(). The relevant part:

// dots ≥ ndots OR trailing dot → try as-is first
if (dots >= statp->ndots || trailing_dot) {
    ret = __res_context_querydomain (ctx, name, NULL, class, type, ...);
    if (ret > 0 || trailing_dot ...)
        return (ret);
    saved_herrno = h_errno;
    tried_as_is++;
    // ... falls through to search loop below
}

// Run search list when at least one entry might apply
if ((!dots && (statp->options & RES_DEFNAMES) != 0) ||
    (dots && !trailing_dot && (statp->options & RES_DNSRCH) != 0)) {
    for (size_t domain_index = 0; !done; ++domain_index) {
        const char *dname = __resolv_context_search_list (ctx, domain_index);
        if (dname == NULL) break;
        ret = __res_context_querydomain (ctx, name, dname, class, type, ...);
        // ...
    }
}

The critical detail: the as-is attempt and the search loop are sequential, not exclusive. Failure of the first does not prevent the second.

musl — stops on the first miss

Distros: Alpine.

musl is intentionally minimal. When dots ≥ ndots, it sets *search = 0 and never enters the search loop.

my-svc.default            → NXDOMAIN                           ✗
End. No search attempted.

From name_from_dns_search():

// Count dots. If dots ≥ ndots OR trailing dot → zero out the search list.
for (dots=l=0; name[l]; l++) if (name[l]=='.') dots++;
if (dots >= conf.ndots || name[l-1]=='.') *search = 0;

// ...

// Walk the search list, splitting on whitespace.
// When *search = 0 above: *p == 0, z == p, break on the first iteration.
// → search is attempted ZERO times.
for (p=search; *p; p=z) {
    for (; isspace(*p); p++);
    for (z=p; *z && !isspace(*z); z++);
    if (z==p) break;
    // ... query combined FQDN
}

// Final fallback: query the original as-is, exactly once.
return name_from_dns(buf, canon, name, family, &conf);

Setting *search = 0 isn't a bug. It's deliberate. The next question is why.

Why doesn't musl just add the fallback?

This has come up on the musl mailing list more than once. Maintainer Rich Felker rejects it consistently. The clearest example is from Andrey Arapov's 2019 thread:

The ask: A small DNS misconfiguration causes musl's resolver to stop on a single SERVFAIL. It doesn't even try the FQDN. Is this intentional?
Felker's answer: "If a lookup ends in SERVFAIL, the result is indeterminate. That should be reported to the caller as an error, not silently fallen back from. Otherwise the lookup result depends on transient nameserver failures." The principle is determinism. The moment fallback is allowed, the same query can return different answers between runs. An attacker can induce transient failures to manipulate which answer wins. From day one, musl's stance has been: "we don't reproduce the dangerous behavior of other implementations."

If you're on musl and want to avoid this entirely: set ndots:1 and don't depend on short names.

This is a values disagreement, not a bug. Both libcs are doing what they intended. The mismatch only becomes a Kubernetes problem because Kubernetes hands every Pod a search list and assumes the resolver will use it.

Reproducing it with kind

Four pods, two libcs, two ndots values.

Pod	libc	ndots
`alpine-ndots5`	musl	5 (default)
`alpine-ndots2`	musl	2
`debian-ndots5`	glibc	5 (default)
`debian-ndots2`	glibc	2

Setup

# kind-config.yaml
kind: Cluster
apiVersion: kind.x-k8s.io/v1alpha4
name: dns-poc
nodes:
  - role: control-plane

kind create cluster --config kind-config.yaml

Patch CoreDNS to log every query:

# coredns-log-patch.yaml
apiVersion: v1
kind: ConfigMap
metadata:
  name: coredns
  namespace: kube-system
data:
  Corefile: |
    .:53 {
        log
        errors
        health { lameduck 5s }
        ready
        kubernetes cluster.local in-addr.arpa ip6.arpa {
           pods insecure
           fallthrough in-addr.arpa ip6.arpa
           ttl 30
        }
        prometheus :9153
        forward . /etc/resolv.conf { max_concurrent 1000 }
        cache 30
        loop
        reload
        loadbalance
    }

kubectl apply -f coredns-log-patch.yaml
kubectl rollout restart deployment/coredns -n kube-system

The four test pods (full manifest in the original Korean post). Key bits:

# alpine-ndots2: musl + ndots:2
spec:
  dnsConfig:
    options:
      - name: ndots
        value: "2"
  containers:
    - name: shell
      image: alpine:3.20
      # ...

Test: resolve `kubernetes.default.svc` (2 dots)

This name has 2 dots. Under ndots:5, dots < ndots → search first. Under ndots:2, dots ≥ ndots → original first. The libc difference only surfaces in the ndots:2 case.

NAME=kubernetes.default.svc
for p in alpine-ndots5 alpine-ndots2 debian-ndots5 debian-ndots2; do
  printf "===== [%s] =====\n" "$p"
  kubectl exec "$p" -- sh -c "getent hosts $NAME; echo exit=\$?"
done

Result

===== [alpine-ndots5] =====
10.96.0.1       kubernetes.default.svc.cluster.local
exit=0

===== [alpine-ndots2] =====
exit=2                          # ← musl, no search fallback → fails

===== [debian-ndots5] =====
10.96.0.1       kubernetes.default.svc.cluster.local
exit=0

===== [debian-ndots2] =====
10.96.0.1       kubernetes.default.svc.cluster.local
exit=0                          # ← glibc, NXDOMAIN then search fallback → succeeds

Same query. Same cluster. Same ndots:2. The only thing that changed is the libc.

CoreDNS logs confirm it

kubectl logs -n kube-system -l k8s-app=kube-dns -f --tail=20 --prefix

alpine-ndots2 — only the original name arrives at CoreDNS. No search-expanded queries:

... kubernetes.default.svc.   AAAA  NXDOMAIN
... kubernetes.default.svc.   A     NXDOMAIN

debian-ndots2 — original first, then the entire search list, then success:

... kubernetes.default.svc.                                A  NXDOMAIN
... kubernetes.default.svc.default.svc.cluster.local.      A  NXDOMAIN
... kubernetes.default.svc.svc.cluster.local.              A  NXDOMAIN
... kubernetes.default.svc.cluster.local.                  A  NOERROR  10.96.0.1

This is exactly what the source code predicted. musl exits the search loop on the first iteration; glibc walks every entry.

# cleanup
kind delete cluster --name dns-poc

Resolver behavior, side by side

Condition	glibc	musl
`dots < ndots`	search first → on failure, original	search first → on failure, original
`dots ≥ ndots`	original first → on failure, search fallback	original only → on failure, stop

Under the default ndots:5, most names have fewer than 5 dots, so both libcs try search first and the difference doesn't surface. The moment you lower ndots, more names cross into dots ≥ ndots territory — and that's where musl's missing fallback turns into a real outage.

What to do about it

If you want to lower ndots and you have any musl-based workloads:

Reconsider your base image. alpine → debian-slim or distroless. The biggest hammer, but it solves the class of problem, not just this one.
Use FQDNs at the application level. my-svc.default.svc.cluster.local. (with the trailing dot) skips the search list regardless of libc.
Roll out per-workload. Apply dnsConfig to specific deployments first, not the whole cluster.
Run NodeLocal DNSCache in parallel. Independent of ndots, a cache layer dramatically cuts CoreDNS load and softens the cost of the search loop on glibc workloads. ## The bigger lesson

The thing I keep coming back to: the abstraction you're tuning (Kubernetes' ndots) and the layer where the behavior actually lives (libc resolver) can be miles apart. The Kubernetes docs talk about ndots. The Pod spec exposes ndots. CoreDNS configures things adjacent to ndots. And none of them are the layer that decides what happens when dots ≥ ndots.

The AI reviewer wasn't wrong to flag the risk. It just couldn't see one layer down. Neither could I, until the test pods told me.

When something in a layered system behaves unexpectedly, "why" usually doesn't have a clean answer at the layer you're operating in. Tracing the call all the way down to the C source is, surprisingly often, faster than reading another blog post.

Originally published in Korean on my blog. Part 2 will cover NodeLocal DNSCache as an alternative path — getting most of the latency win without touching ndots.

DEV Community

Why Lowering ndots Breaks Alpine Pods (But Not Debian) — A Deep Dive into glibc vs musl Resolvers

The starting point: 5 queries for one domain

The three config files that matter

How a query travels (and where the retry loop lives)

glibc vs musl: same input, different behavior

glibc — falls back gracefully

musl — stops on the first miss

Why doesn't musl just add the fallback?

Reproducing it with kind

Setup

Test: resolve `kubernetes.default.svc` (2 dots)

Result

CoreDNS logs confirm it

Resolver behavior, side by side

What to do about it

Further reading

Top comments (0)

The starting point: 5 queries for one domain

The three config files that matter

How a query travels (and where the retry loop lives)

glibc vs musl: same input, different behavior

glibc — falls back gracefully

musl — stops on the first miss

Why doesn't musl just add the fallback?

Reproducing it with kind

Setup

Test: resolve kubernetes.default.svc (2 dots)

Result

CoreDNS logs confirm it

Resolver behavior, side by side

What to do about it

Further reading

Test: resolve `kubernetes.default.svc` (2 dots)