The etcd Endpoint Trap
A cluster migration just took your whole control plane offline. In the next few minutes you'll find out why, and fix it the way the CKA exam expects.
This is a CKA Troubleshooting walkthrough. Every command below is real output from a live cluster, and you can reproduce the whole thing yourself (scripts at the end).
The scenario
A single-node kubeadm cluster was migrated to a new machine. The control plane won't come up. Your task: identify the broken component, find the root cause, fix the config, restart, and verify.
- Single-node kubeadm cluster, freshly migrated
- Control plane will not start
- Find the broken component
- Root-cause it, fix it, verify
How the control plane actually starts
The kubelet runs the control plane as static pods from /etc/kubernetes/manifests. The kube-apiserver cannot start unless it can reach etcd. Give it the wrong etcd address and the apiserver crashes, so the whole cluster looks dead.
The kube-apiserver runs as a static pod: the kubelet reads its manifest from /etc/kubernetes/manifests/ and keeps it running. The apiserver cannot start unless it can reach etcd, so a wrong --etcd-servers endpoint takes the whole API down, and with it, everything you'd normally use to debug.
Step 1 — Reproduce the symptom
First, reproduce the symptom. kubectl get nodes is refused. A refused connection on the API port means the API server is down: a control-plane problem, not a workload problem.
$ kubectl get nodes
The connection to the server cka-scenario1-control-plane:6443 was refused - did you specify the right host or port?
A refused connection on the API port is a control-plane problem, not a workload problem.
Step 2 — Investigate from the node
kubectl can't help us now, so drop to the node. The kubelet itself is active. But follow its log and you'll see it stuck in a loop, restarting the apiserver over and over. The kubelet is fine; the static pod it manages is the problem.
$ systemctl is-active kubelet
active
$ journalctl -u kubelet -f
...
Jun 20 02:14:28 cka-scenario1-control-plane kubelet[6720]: I0620 02:14:28.159315 6720 kubelet.go:3482] "Trying to delete pod" pod="kube-system/kube-apiserver-cka-scenario1-control-plane" podUID="95dcb4d5-1292-48bc-b96b-af2bce2ffe2e"
Jun 20 02:14:37 cka-scenario1-control-plane kubelet[6720]: E0620 02:14:37.431018 6720 status_manager.go:1164] "Failed to get status for pod" err="Get \"https://172.18.0.6:6443/api/v1/namespaces/kube-system/pods/kube-apiserver-cka-scenario1-control-plane\": net/http: TLS handshake timeout - error from a previous attempt: read tcp 172.18.0.6:37382->172.18.0.6:6443: read: connection reset by peer" podUID="ed99dd090df2692b82e8b55ea870a211" pod="kube-system/kube-apiserver-cka-scenario1-control-plane"
Jun 20 02:14:38 cka-scenario1-control-plane kubelet[6720]: E0620 02:14:38.160470 6720 mirror_client.go:139] "Failed deleting a mirror pod" err="Delete \"https://172.18.0.6:6443/api/v1/namespaces/kube-system/pods/kube-apiserver-cka-scenario1-control-plane\": net/http: TLS handshake timeout" pod="kube-system/kube-apiserver-cka-scenario1-control-plane"
Jun 20 02:14:38 cka-scenario1-control-plane kubelet[6720]: I0620 02:14:38.174907 6720 kubelet.go:3482] "Trying to delete pod" pod="kube-system/kube-apiserver-cka-scenario1-control-plane" podUID="95dcb4d5-1292-48bc-b96b-af2bce2ffe2e"
Jun 20 02:14:47 cka-scenario1-control-plane kubelet[6720]: E0620 02:14:47.382325 6720 mirror_client.go:139] "Failed deleting a mirror pod" err="Delete \"https://172.18.0.6:6443/api/v1/namespaces/kube-system/pods/kube-apiserver-cka-scenario1-control-plane\": read tcp 172.18.0.6:60614->172.18.0.6:6443: read: connection reset by peer" pod="kube-system/kube-apiserver-cka-scenario1-control-plane"
Jun 20 02:14:48 cka-scenario1-control-plane kubelet[6720]: I0620 02:14:48.193112 6720 kubelet.go:3482] "Trying to delete pod" pod="kube-system/kube-apiserver-cka-scenario1-control-plane" podUID="95dcb4d5-1292-48bc-b96b-af2bce2ffe2e"
Jun 20 02:14:48 cka-scenario1-control-plane kubelet[6720]: E0620 02:14:48.193362 6720 mirror_client.go:139] "Failed deleting a mirror pod" err="Delete \"https://172.18.0.6:6443/api/v1/namespaces/kube-system/pods/kube-apiserver-cka-scenario1-control-plane\": dial tcp 172.18.0.6:6443: connect: connection refused" pod="kube-system/kube-apiserver-cka-scenario1-control-plane"
Jun 20 02:14:48 cka-scenario1-control-plane kubelet[6720]: E0620 02:14:48.193529 6720 pod_workers.go:1324] "Error syncing pod, skipping" err="failed to \"StartContainer\" for \"kube-apiserver\" with CrashLoopBackOff: \"back-off 10s restarting failed container=kube-apiserver pod=kube-apiserver-cka-scenario1-control-plane_kube-system(ed99dd090df2692b82e8b55ea870a211)\"" pod="kube-system/kube-apiserver-cka-scenario1-control-plane" podUID="ed99dd090df2692b82e8b55ea870a211"
Jun 20 02:14:48 cka-scenario1-control-plane kubelet[6720]: E0620 02:14:48.383587 6720 status_manager.go:1164] "Failed to get status for pod" err="Get \"https://172.18.0.6:6443/api/v1/namespaces/kube-system/pods/kube-apiserver-cka-scenario1-control-plane\": dial tcp 172.18.0.6:6443: connect: connection refused" podUID="ed99dd090df2692b82e8b55ea870a211" pod="kube-system/kube-apiserver-cka-scenario1-control-plane"
Jun 20 02:14:48 cka-scenario1-control-plane kubelet[6720]: E0620 02:14:48.384437 6720 status_manager.go:1164] "Failed to get status for pod" err="Get \"https://172.18.0.6:6443/api/v1/namespaces/kube-system/pods/kube-apiserver-cka-scenario1-control-plane\": dial tcp 172.18.0.6:6443: connect: connection refused" podUID="ed99dd090df2692b82e8b55ea870a211" pod="kube-system/kube-apiserver-cka-scenario1-control-plane"
Jun 20 02:14:49 cka-scenario1-control-plane kubelet[6720]: E0620 02:14:49.038880 6720 status_manager.go:1164] "Failed to get status for pod" err="Get \"https://172.18.0.6:6443/api/v1/namespaces/kube-system/pods/kube-apiserver-cka-scenario1-control-plane\": dial tcp 172.18.0.6:6443: connect: connection refused" podUID="ed99dd090df2692b82e8b55ea870a211" pod="kube-system/kube-apiserver-cka-scenario1-control-plane"
The kubelet is healthy; the static pod it manages keeps dying. So drop down a level.
Step 3 — Find the root cause
kubectl still can't reach the API, so don't guess. Compare the static pod manifests directly. The apiserver is pointed at one etcd endpoint, but etcd is actually listening on a different one. That mismatch, left over from the migration, is the bug.
$ cat /etc/kubernetes/manifests/kube-apiserver.yaml | grep etcd-servers
- --etcd-servers=https://127.0.0.1:2399
$ cat /etc/kubernetes/manifests/etcd.yaml | grep client-urls
kubeadm.kubernetes.io/etcd.advertise-client-urls: https://172.18.0.6:2379
- --advertise-client-urls=https://172.18.0.6:2379
- --listen-client-urls=https://127.0.0.1:2379,https://172.18.0.6:2379
There it is: the apiserver is told to dial one endpoint, but etcd is listening on another. Compare the manifests, don't guess. That leftover-from-the-migration mismatch is the bug.
Step 4 — Fix it
Open the manifest in vi and fix the endpoint. Change the etcd-servers line back to the address where etcd is actually listening, then save with :wq. The kubelet auto-reloads the static pod; restart it to be safe.
Open the manifest in vi (the way you would on the exam) and change the --etcd-servers line back to the endpoint where etcd is actually listening:
- --etcd-servers=https://127.0.0.1:2399
+ --etcd-servers=https://127.0.0.1:2379
Save with :wq. The kubelet reloads the static pod automatically; restart it to be sure:
$ systemctl restart kubelet
Step 5 — Verify
Prove the cluster is healthy. The apiserver pod is Running with zero restarts, and the node reports Ready.
$ kubectl get pods -n kube-system -l component=kube-apiserver
NAME READY STATUS RESTARTS AGE
kube-apiserver-cka-scenario1-control-plane 1/1 Running 0 40m
$ kubectl get nodes
NAME STATUS ROLES AGE VERSION
cka-scenario1-control-plane Ready control-plane 40m v1.36.1
Apiserver Running with no restarts, node Ready: recovered.
Exam tips
The culprit may differ: it could be a wrong IP, a wrong port, or a wrong cert path. If etcd is external, test it directly with curl against its health endpoint. And always confirm the kubelet is active first.
- Culprit varies: IP vs port vs cert path
- External etcd? curl https://:2379/health
- Always confirm kubelet is active first
- Manifests in /etc/kubernetes/manifests are the source of truth
Recap
- Control-plane down ~= a static pod that won't start
- /etc/kubernetes/manifests is the source of truth
- Verify with get pods / get nodes
- Subscribe + dev.to writeup
Reproduce this yourself
The entire scenario is scripted on a throwaway kind cluster, so you can break and fix it as many times as you like:
git clone https://github.com/The-Cyber-Sidekick/TCS_CKA_2026_Exam_Scenarios.git && cd scenario1-etcd-endpoint-trap
./setup.sh # creates the cluster AND arms the fault (exam-style: hands you a broken cluster)
# troubleshoot by hand, or:
./solution.sh # apply the answer key and recover
If this helped, subscribe to The Cyber SideKick on YouTube https://www.youtube.com/channel/UCxZcRycR7OjnFzQYXfkrTqw for more CKA troubleshooting drills, and grab the newsletter at https://thecybersidekick.beehiiv.com.
Top comments (0)