Senior / Staff DevOps Engineer Level)
1️⃣ What problem does Kubernetes solve that people misunderstand?
Answer (Senior):
Kubernetes does not solve deployment complexity — it solves state reconciliation. It continuously enforces desired state, but it does not understand correctness, business logic, or configuration validity. Most outages happen because people assume Kubernetes validates things it doesn’t.
2️⃣ Why are crashes safer than silent failures in Kubernetes?
Answer:
Crashes are loud — restarts, alerts, and visibility. Silent failures happen when Pods are Running but misconfigured, causing incorrect behavior with no alerts. Kubernetes optimizes for availability, not correctness, so DevOps must force failures early when config is invalid.
3️⃣ Explain a real scenario where Pods are healthy but users see errors.
Answer:
A ConfigMap change removed a required DB host variable. Pods restarted cleanly and passed liveness probes, but queries silently failed or hit defaults. No alerts fired because Kubernetes only saw healthy Pods.
4️⃣ What does “Kubernetes is eventually consistent” mean in practice?
Answer:
Changes propagate asynchronously. Services, Endpoints, kube-proxy rules, DNS, and controllers update independently. During this window, traffic may be partially routed or temporarily dropped. Engineers must design for short-lived inconsistency.
5️⃣ Why does Kubernetes restart Pods instead of escalating failures?
Answer:
Because Kubernetes assumes failures are transient. Escalation is an organizational concern, not a platform concern. Kubernetes restores declared state; humans define correctness and alerting.
6️⃣ How does Kubernetes actually route traffic to Pods?
Answer:
Services watch Pod readiness and update Endpoints. kube-proxy programs kernel-level iptables/ipvs rules. There is no proxy process in the data path. If Endpoints are empty, traffic blackholes silently.
7️⃣ What’s the most common Service bug in production?
Answer:
Label-selector mismatch. The Service exists, DNS resolves, but Endpoints are empty. Kubernetes reports everything healthy, but traffic goes nowhere.
8️⃣ Why are readiness probes more important than liveness probes?
Answer:
Readiness controls traffic. Liveness only controls restarts. Most production incidents are caused by traffic hitting unready applications, not dead ones.
9️⃣ When should you avoid liveness probes entirely?
Answer:
For applications with long startup, GC pauses, migrations, or blocking I/O. Misused liveness probes cause restart storms and amplify outages.
🔟 How do probes prevent silent misconfiguration?
Answer:
Readiness probes can validate config, downstream dependencies, and feature flags. If config is invalid, traffic is blocked even though the Pod is Running.
1️⃣1️⃣ Why do experienced teams prefer file-based secrets?
Answer:
Environment variables leak via exec, logs, crash dumps, and debugging tools. File-based secrets reduce exposure, support rotation, and align with audit requirements.
1️⃣2️⃣ What happens when a Kubernetes Secret is deleted?
Answer:
Existing Pods keep env-based secrets until restart. Any new Pod fails with CreateContainerConfigError. This creates an immediate outage during rescheduling or scaling.
1️⃣3️⃣ Why is secret rotation a common outage cause?
Answer:
Because env-based secrets require Pod restarts. Teams rotate secrets but forget to restart workloads, leading to authentication failures hours later.
1️⃣4️⃣ What is the most dangerous Kubernetes object?
Answer:
ConfigMaps. Missing or wrong ConfigMaps don’t crash Pods — they change behavior silently.
1️⃣5️⃣ Why do StatefulSets exist when Deployments could use PVCs?
Answer:
StatefulSets guarantee identity, ordering, and stable DNS. Databases and quorum-based systems require identity consistency that Deployments cannot provide.
1️⃣6️⃣ What happens if the API server goes down?
Answer:
Workloads keep running. No scheduling, scaling, or updates occur. Kubernetes becomes operationally frozen but not unavailable.
1️⃣7️⃣ Why is Kubernetes not a deployment tool?
Answer:
It doesn’t execute workflows. It reconciles state. Deployment safety comes from probes, rollout strategies, and validation — not Kubernetes itself.
1️⃣8️⃣ What’s the difference between control plane and data plane failures?
Answer:
Control plane failures affect management. Data plane failures affect traffic. Data plane issues cause user-visible outages faster.
1️⃣9️⃣ How do you debug “Service exists but browser shows nothing”?
Answer:
Check Pods readiness → labels → endpoints → container listening port → correct NodePort/Ingress path. Services never fail alone.
2️⃣0️⃣ What Kubernetes metric do you trust the least?
Answer:
Pod status. A Pod can be Running and completely broken.
2️⃣1️⃣ What’s the biggest Kubernetes anti-pattern you see?
Answer:
Treating Kubernetes as a PaaS that guarantees correctness. It guarantees state enforcement, not correctness.
2️⃣2️⃣ Why is GitOps safer than imperative kubectl commands?
Answer:
Because state is auditable, reproducible, and reviewable. Imperative commands create drift and hidden changes.
2️⃣3️⃣ How do you prevent bad config from reaching production?
Answer:
Schema validation, startup validation, readiness gating, config versioning, and environment isolation.
2️⃣4️⃣ Why does Kubernetes not validate configuration semantics?
Answer:
Because it’s application-agnostic. Kubernetes cannot know what configuration is valid for your business logic.
2️⃣5️⃣ What Kubernetes feature causes the most outages?
Answer:
Rolling updates without readiness probes.
2️⃣6️⃣ How do you design for zero downtime in Kubernetes?
Answer:
Readiness probes, maxUnavailable control, slow startup protection, connection draining, and backward-compatible releases.
2️⃣7️⃣ What’s a Kubernetes outage that monitoring won’t catch?
Answer:
Wrong config with valid Pods. No errors, no alerts, wrong behavior.
2️⃣8️⃣ Why is “kubectl exec” dangerous in production?
Answer:
It leaks environment variables, secrets, and runtime state into terminals, logs, and tickets.
2️⃣9️⃣ What does Kubernetes NOT protect you from?
Answer:
Bad configuration, wrong assumptions, poor release practices, and human error.
3️⃣0️⃣ If everything looks healthy but users complain, what do you check first?
Answer:
Configuration, feature flags, routing, and readiness behavior — not Pod health.
🎯 INTERVIEW CLOSING STATEMENT (VERY POWERFUL)
“Kubernetes keeps systems running.
DevOps engineers keep systems correct.”
Top comments (0)