The Cyber Sidekick

Posted on Jun 20

CKA Exam study 2026 Scenario 1 - The etcd Endpoint Trap

#kubernetes #containers #learning #cka

The etcd Endpoint Trap

A cluster migration just took your whole control plane offline. In the next few minutes you'll find out why, and fix it the way the CKA exam expects.

This is a CKA Troubleshooting walkthrough. Every command below is real output from a live cluster, and you can reproduce the whole thing yourself (scripts at the end).

The scenario

A single-node kubeadm cluster was migrated to a new machine. The control plane won't come up. Your task: identify the broken component, find the root cause, fix the config, restart, and verify.

Single-node kubeadm cluster, freshly migrated
Control plane will not start
Find the broken component
Root-cause it, fix it, verify

How the control plane actually starts

The kubelet runs the control plane as static pods from /etc/kubernetes/manifests. The kube-apiserver cannot start unless it can reach etcd. Give it the wrong etcd address and the apiserver crashes, so the whole cluster looks dead.

The kube-apiserver runs as a static pod: the kubelet reads its manifest from /etc/kubernetes/manifests/ and keeps it running. The apiserver cannot start unless it can reach etcd, so a wrong --etcd-servers endpoint takes the whole API down, and with it, everything you'd normally use to debug.

Step 1 — Reproduce the symptom

First, reproduce the symptom. kubectl get nodes is refused. A refused connection on the API port means the API server is down: a control-plane problem, not a workload problem.

$ kubectl get nodes
The connection to the server cka-scenario1-control-plane:6443 was refused - did you specify the right host or port?

A refused connection on the API port is a control-plane problem, not a workload problem.

Step 2 — Investigate from the node

kubectl can't help us now, so drop to the node. The kubelet itself is active. But follow its log and you'll see it stuck in a loop, restarting the apiserver over and over. The kubelet is fine; the static pod it manages is the problem.

$ systemctl is-active kubelet
active

$ journalctl -u kubelet -f
...
Jun 20 02:14:28 cka-scenario1-control-plane kubelet[6720]: I0620 02:14:28.159315    6720 kubelet.go:3482] "Trying to delete pod" pod="kube-system/kube-apiserver-cka-scenario1-control-plane" podUID="95dcb4d5-1292-48bc-b96b-af2bce2ffe2e"
Jun 20 02:14:37 cka-scenario1-control-plane kubelet[6720]: E0620 02:14:37.431018    6720 status_manager.go:1164] "Failed to get status for pod" err="Get \"https://172.18.0.6:6443/api/v1/namespaces/kube-system/pods/kube-apiserver-cka-scenario1-control-plane\": net/http: TLS handshake timeout - error from a previous attempt: read tcp 172.18.0.6:37382->172.18.0.6:6443: read: connection reset by peer" podUID="ed99dd090df2692b82e8b55ea870a211" pod="kube-system/kube-apiserver-cka-scenario1-control-plane"
Jun 20 02:14:38 cka-scenario1-control-plane kubelet[6720]: E0620 02:14:38.160470    6720 mirror_client.go:139] "Failed deleting a mirror pod" err="Delete \"https://172.18.0.6:6443/api/v1/namespaces/kube-system/pods/kube-apiserver-cka-scenario1-control-plane\": net/http: TLS handshake timeout" pod="kube-system/kube-apiserver-cka-scenario1-control-plane"
Jun 20 02:14:38 cka-scenario1-control-plane kubelet[6720]: I0620 02:14:38.174907    6720 kubelet.go:3482] "Trying to delete pod" pod="kube-system/kube-apiserver-cka-scenario1-control-plane" podUID="95dcb4d5-1292-48bc-b96b-af2bce2ffe2e"
Jun 20 02:14:47 cka-scenario1-control-plane kubelet[6720]: E0620 02:14:47.382325    6720 mirror_client.go:139] "Failed deleting a mirror pod" err="Delete \"https://172.18.0.6:6443/api/v1/namespaces/kube-system/pods/kube-apiserver-cka-scenario1-control-plane\": read tcp 172.18.0.6:60614->172.18.0.6:6443: read: connection reset by peer" pod="kube-system/kube-apiserver-cka-scenario1-control-plane"
Jun 20 02:14:48 cka-scenario1-control-plane kubelet[6720]: I0620 02:14:48.193112    6720 kubelet.go:3482] "Trying to delete pod" pod="kube-system/kube-apiserver-cka-scenario1-control-plane" podUID="95dcb4d5-1292-48bc-b96b-af2bce2ffe2e"
Jun 20 02:14:48 cka-scenario1-control-plane kubelet[6720]: E0620 02:14:48.193362    6720 mirror_client.go:139] "Failed deleting a mirror pod" err="Delete \"https://172.18.0.6:6443/api/v1/namespaces/kube-system/pods/kube-apiserver-cka-scenario1-control-plane\": dial tcp 172.18.0.6:6443: connect: connection refused" pod="kube-system/kube-apiserver-cka-scenario1-control-plane"
Jun 20 02:14:48 cka-scenario1-control-plane kubelet[6720]: E0620 02:14:48.193529    6720 pod_workers.go:1324] "Error syncing pod, skipping" err="failed to \"StartContainer\" for \"kube-apiserver\" with CrashLoopBackOff: \"back-off 10s restarting failed container=kube-apiserver pod=kube-apiserver-cka-scenario1-control-plane_kube-system(ed99dd090df2692b82e8b55ea870a211)\"" pod="kube-system/kube-apiserver-cka-scenario1-control-plane" podUID="ed99dd090df2692b82e8b55ea870a211"
Jun 20 02:14:48 cka-scenario1-control-plane kubelet[6720]: E0620 02:14:48.383587    6720 status_manager.go:1164] "Failed to get status for pod" err="Get \"https://172.18.0.6:6443/api/v1/namespaces/kube-system/pods/kube-apiserver-cka-scenario1-control-plane\": dial tcp 172.18.0.6:6443: connect: connection refused" podUID="ed99dd090df2692b82e8b55ea870a211" pod="kube-system/kube-apiserver-cka-scenario1-control-plane"
Jun 20 02:14:48 cka-scenario1-control-plane kubelet[6720]: E0620 02:14:48.384437    6720 status_manager.go:1164] "Failed to get status for pod" err="Get \"https://172.18.0.6:6443/api/v1/namespaces/kube-system/pods/kube-apiserver-cka-scenario1-control-plane\": dial tcp 172.18.0.6:6443: connect: connection refused" podUID="ed99dd090df2692b82e8b55ea870a211" pod="kube-system/kube-apiserver-cka-scenario1-control-plane"
Jun 20 02:14:49 cka-scenario1-control-plane kubelet[6720]: E0620 02:14:49.038880    6720 status_manager.go:1164] "Failed to get status for pod" err="Get \"https://172.18.0.6:6443/api/v1/namespaces/kube-system/pods/kube-apiserver-cka-scenario1-control-plane\": dial tcp 172.18.0.6:6443: connect: connection refused" podUID="ed99dd090df2692b82e8b55ea870a211" pod="kube-system/kube-apiserver-cka-scenario1-control-plane"

The kubelet is healthy; the static pod it manages keeps dying. So drop down a level.

Step 3 — Find the root cause

kubectl still can't reach the API, so don't guess. Compare the static pod manifests directly. The apiserver is pointed at one etcd endpoint, but etcd is actually listening on a different one. That mismatch, left over from the migration, is the bug.

$ cat /etc/kubernetes/manifests/kube-apiserver.yaml | grep etcd-servers
    - --etcd-servers=https://127.0.0.1:2399

$ cat /etc/kubernetes/manifests/etcd.yaml | grep client-urls
    kubeadm.kubernetes.io/etcd.advertise-client-urls: https://172.18.0.6:2379
    - --advertise-client-urls=https://172.18.0.6:2379
    - --listen-client-urls=https://127.0.0.1:2379,https://172.18.0.6:2379

There it is: the apiserver is told to dial one endpoint, but etcd is listening on another. Compare the manifests, don't guess. That leftover-from-the-migration mismatch is the bug.

Step 4 — Fix it

Open the manifest in vi and fix the endpoint. Change the etcd-servers line back to the address where etcd is actually listening, then save with :wq. The kubelet auto-reloads the static pod; restart it to be safe.

Open the manifest in vi (the way you would on the exam) and change the --etcd-servers line back to the endpoint where etcd is actually listening:

- --etcd-servers=https://127.0.0.1:2399
+ --etcd-servers=https://127.0.0.1:2379

Save with :wq. The kubelet reloads the static pod automatically; restart it to be sure:

$ systemctl restart kubelet

Step 5 — Verify

Prove the cluster is healthy. The apiserver pod is Running with zero restarts, and the node reports Ready.

$ kubectl get pods -n kube-system -l component=kube-apiserver
NAME                                         READY   STATUS    RESTARTS   AGE
kube-apiserver-cka-scenario1-control-plane   1/1     Running   0          40m

$ kubectl get nodes
NAME                          STATUS   ROLES           AGE   VERSION
cka-scenario1-control-plane   Ready    control-plane   40m   v1.36.1

Apiserver Running with no restarts, node Ready: recovered.

Exam tips

The culprit may differ: it could be a wrong IP, a wrong port, or a wrong cert path. If etcd is external, test it directly with curl against its health endpoint. And always confirm the kubelet is active first.

Culprit varies: IP vs port vs cert path
External etcd? curl https://:2379/health
Always confirm kubelet is active first
Manifests in /etc/kubernetes/manifests are the source of truth

Recap

Control-plane down ~= a static pod that won't start
/etc/kubernetes/manifests is the source of truth
Verify with get pods / get nodes
Subscribe + dev.to writeup

Reproduce this yourself

The entire scenario is scripted on a throwaway kind cluster, so you can break and fix it as many times as you like:

git clone https://github.com/The-Cyber-Sidekick/TCS_CKA_2026_Exam_Scenarios.git && cd scenario1-etcd-endpoint-trap
./setup.sh        # creates the cluster AND arms the fault (exam-style: hands you a broken cluster)
# troubleshoot by hand, or:
./solution.sh     # apply the answer key and recover

If this helped, subscribe to The Cyber SideKick on YouTube https://www.youtube.com/channel/UCxZcRycR7OjnFzQYXfkrTqw for more CKA troubleshooting drills, and grab the newsletter at https://thecybersidekick.beehiiv.com.

DEV Community