Immanuel Tikhonov

Posted on Jul 11, 2025

55 advanced Kubernetes interview questions

Hey folks, my name is Immanuel and I am a certified Kubernetes administrator. Here, I have collected some questions about Kubernetes that you can use to prepare for interviews or the CKA exam.

So, you think you know Kubernetes? You've deployed a few apps, scaled a few pods, and now you're ready for the big leagues. Let's see how you handle some questions that go beyond the basics. Let's go)

1. Explain the roles of the Kubernetes control plane components and how they interact to keep the cluster in sync

Answer:

kube-apiserver is the gatekeeper for all API calls; it validates and processes requests, persists objects to etcd, and serves as the hub for control-plane communication.
etcd is the distributed key-value store holding the cluster’s desired state.
kube-scheduler monitors unscheduled Pods and binds them to appropriate Nodes based on resource requirements and policies.
kube-controller-manager runs controllers (e.g., ReplicaSet, Endpoint) that reconcile actual state with the desired state.
cloud-controller-manager handles cloud-specific controllers (e.g., load-balancer provisioning) when running in a cloud environment. They form a feedback loop: the API server writes to etcd; controllers and the scheduler watch etcd, act, and write back via the API server.

2. How do you perform a rolling update without downtime?

Answer:

This is where Deployments really shine. A rolling update allows you to update your application with zero downtime by incrementally replacing old Pods with new ones. You can control the process with maxSurge and maxUnavailable.

maxSurge: The maximum number of Pods that can be created above the desired number of Pods.
maxUnavailable: The maximum number of Pods that can be unavailable during the update.

Command:

To trigger a rolling update, you can use kubectl set image:

kubectl set image deployment/nginx-deployment nginx=nginx:1.16.1 --record

To monitor the rollout status:

kubectl rollout status deployment/nginx-deployment

YAML Example (Rolling Update Strategy):

apiVersion: apps/v1
kind: Deployment
metadata:
  name: nginx-deployment
spec:
  replicas: 3
  strategy:
    type: RollingUpdate
    rollingUpdate:
      maxSurge: 1
      maxUnavailable: 1
  selector:
    matchLabels:
      app: nginx
  template:
    # ... rest of the template

And some additional info: the --record flag in commands like kubectl set image deployment/nginx-deployment nginx=nginx:1.16.1 --record has been deprecated and is planned to be removed in future versions of kubectl.

Currently, if you run the command without --record, the change cause is not recorded automatically. There is no direct built-in alternative flag that replicates --record functionality exactly. Instead, the recommended approach is to manually annotate the resource with the change cause using kubectl annotate, for example:

kubectl annotate deployment/nginx-deployment kubernetes.io/change-cause="Updated nginx image to 1.16.1"

This manual annotation serves as the alternative way to record the reason for changes in deployments and other resources.

3. How would you design a highly available etcd cluster for Kubernetes, and secure it?

Answer:

Deploy odd-numbered etcd members (3 or 5) across failure domains to avoid split-brain.
Use mutual TLS for client-to-server and peer-to-peer communication, generating certificates per member.
Enable etcd encryption at rest for Kubernetes secrets (see Q24).
Regularly back up etcd snapshots and use etcdctl snapshot save/restore.

4. What's the difference between a Service and an Ingress?

Service:

Ingress:

Answer:

Think of a Service as the internal load balancer for your Pods. It provides a stable IP address and DNS name for a set of Pods, so you don't have to worry about individual Pod IPs changing. There are different types of Services (ClusterIP, NodePort, LoadBalancer), but they all handle internal traffic.

An Ingress, on the other hand, is an API object that manages external access to the services in a cluster, typically HTTP. It acts as a reverse proxy and can provide load balancing, SSL termination, and name-based virtual hosting. You need an Ingress Controller (like NGINX or Traefik) for the Ingress resource to work.

In short: Service is for internal traffic, Ingress is for external traffic.

YAML Example (Ingress):

apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
  name: minimal-ingress
  annotations:
    nginx.ingress.kubernetes.io/rewrite-target: /
spec:
  rules:
  - http:
      paths:
      - path: /testpath
        pathType: Prefix
        backend:
          service:
            name: test
            port:
              number: 80

5. Describe the CRI, CNI, and CSI interfaces and their roles in Kubernetes

Answer:

Container Runtime Interface (CRI): Pluggable interface between kubelet and container runtimes (Docker, containerd, CRI-O).
Container Network Interface (CNI): Defines how Pods get networking (IP assignment, routes) via plugins (Calico, Flannel, Cilium).
Container Storage Interface (CSI): Standard for volume plugins, enabling dynamic provisioning, snapshotting, and at-rest encryption for block and file storage.

6. How would you troubleshoot a Pod that is stuck in a `CrashLoopBackOff` state?

Answer:

Ah, the dreaded CrashLoopBackOff. This means your Pod is starting, crashing, and then Kubernetes is trying to restart it, only for it to crash again. Here's the troubleshooting checklist:

Describe the Pod: This will give you information about the Pod's state, events, and any error messages.

   kubectl describe pod <pod-name>

Check the image: Make sure the container image is correct and exists. A typo in the image name is a common culprit.
Check resource limits: If your application is exceeding its memory or CPU limits, it might be getting killed by the kubelet.
Check liveness and readiness probes: If your probes are misconfigured, they might be killing your Pod prematurely.

7. What are Taints and Tolerations?

Answer:

Taints are applied to nodes, and they repel Pods that don't have a matching Toleration. This is a way to control which Pods can be scheduled on which nodes.

For example, you might want to dedicate a set of nodes to a specific application or prevent certain Pods from running on a particular node.

Command (Taint a node):

kubectl taint nodes node1 key=value:NoSchedule

YAML Example (Toleration in a Pod):

apiVersion: v1
kind: Pod
metadata:
  name: my-pod
spec:
  containers:
  - name: my-container
    image: nginx
  tolerations:
  - key: "key"
    operator: "Equal"
    value: "value"
    effect: "NoSchedule"

8. Explain how to deploy a custom scheduler and integrate it with the main scheduling framework or extenders

Answer:

Build your scheduler binary (e.g., using client-go) that implements Scheduler interface.
Deploy it as a Deployment in its own namespace.
In Pod spec, set spec.schedulerName: custom-scheduler.
(Optional) Use the Scheduling Framework’s plugin model by writing and registering a plugin via --plugin-config to kube-scheduler.
Extenders can be configured in the kube-scheduler config to call external HTTP endpoints for additional predicate/prioritization logic.

9. What's the role of the etcd in a Kubernetes cluster?

Answer:

etcd is the brain of your Kubernetes cluster. It's a consistent and highly-available key-value store used as Kubernetes' backing store for all cluster data. Everything you see in your cluster—Pods, Services, Deployments, Secrets—is stored in etcd.

If etcd goes down, your cluster becomes read-only. You can't create or update any resources. That's why it's so critical to have a reliable and backed-up etcd cluster.

10. How can you debug CRI issues using `crictl`?

Answer:

# List pods/containers managed by CRI:
crictl pods
crictl ps -a

# Pull an image manually:
crictl pull nginx:latest

# Inspect container logs:
crictl logs <container-id>

# Exec into a paused container:
crictl exec -it <container-id> sh

# Check runtime version:
crictl version

crictl talks directly to the CRI socket, bypassing kubelet.

11. How do you manage secrets in Kubernetes?

Answer:

Kubernetes Secrets are objects that let you store and manage sensitive information, such as passwords, OAuth tokens, and ssh keys. They are stored as base64 encoded strings, which is not encryption. Anyone with access to etcd can read them.

For more secure secret management, you should consider using a tool like HashiCorp Vault, AWS Secrets Manager, or Azure Key Vault, and then use a Kubernetes operator to inject those secrets into your Pods.

Command (Create a secret):

kubectl create secret generic my-secret --from-literal=password='s3cr3t'

YAML Example (Using a secret in a Pod):

apiVersion: v1
kind: Pod
metadata:
  name: my-pod
spec:
  containers:
  - name: my-container
    image: nginx
    env:
    - name: SECRET_PASSWORD
      valueFrom:
        secretKeyRef:
          name: my-secret
          key: password

12. Describe the Kubernetes networking model, including CNI and NetworkPolicies

Answer:

Flat network: every Pod gets a unique IP, and any Pod can reach any other Pod by default.
CNIs implement this: they allocate IPs and set up routes.
NetworkPolicy objects define ingress/egress rules at the Pod level, enforced by the CNI plugin (e.g., Calico or Cilium).

13. Explain the difference between `livenessProbe` and `readinessProbe`

Answer:

Both are used to check the health of a container, but they have different purposes:

livenessProbe: Checks if the container is running. If the probe fails, the kubelet kills the container and restarts it. This is useful for catching deadlocks or other situations where the application is running but not responding.
readinessProbe: Checks if the container is ready to accept traffic. If the probe fails, the container's IP address is removed from the endpoints of all Services that match the Pod. This is useful for when an application needs some time to start up before it can handle requests.

YAML Example (Probes):

apiVersion: v1
kind: Pod
metadata:
  name: my-pod
spec:
  containers:
  - name: my-container
    image: my-app
    livenessProbe:
      httpGet:
        path: /healthz
        port: 8080
      initialDelaySeconds: 3
      periodSeconds: 3
    readinessProbe:
      httpGet:
        path: /ready
        port: 8080
      initialDelaySeconds: 5
      periodSeconds: 5

14. Explain how a service mesh (e.g., Istio) integrates with Kubernetes and show an example `VirtualService`

Answer:

Service mesh runs sidecar proxies (Envoy) in Pods to intercept traffic.
Control plane (Pilot, Galley) uses CRDs to configure routing and policies.
Example:

apiVersion: networking.istio.io/v1alpha3
kind: VirtualService
metadata:
  name: reviews-vs
spec:
  hosts:
  - reviews
  http:
  - match:
    - headers:
        end-user:
          exact: "jason"
    route:
    - destination:
        host: reviews
        subset: v2
  - route:
    - destination:
        host: reviews
        subset: v1

This routes user jason to v2 of the reviews service.

15. What is a Persistent Volume (PV) and a Persistent Volume Claim (PVC)?

Answer:

In Kubernetes, Pods are ephemeral. When a Pod dies, its data is lost. To solve this, we have Persistent Volumes (PVs) and Persistent Volume Claims (PVCs).

Persistent Volume (PV): A piece of storage in the cluster that has been provisioned by an administrator. It's a resource in the cluster, just like a CPU or memory.
Persistent Volume Claim (PVC): A request for storage by a user. It's similar to how a Pod consumes CPU and memory. A PVC consumes PV resources.

This separation of concerns allows developers to request storage without needing to know the details of the underlying storage infrastructure.

YAML Example (PV and PVC):

# Persistent Volume
apiVersion: v1
kind: PersistentVolume
metadata:
  name: my-pv
spec:
  capacity:
    storage: 5Gi
  accessModes:
    - ReadWriteOnce
  hostPath:
    path: "/mnt/data"

---
# Persistent Volume Claim
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
  name: my-pvc
spec:
  accessModes:
    - ReadWriteOnce
  resources:
    requests:
      storage: 3Gi

16. How do you write and deploy a ValidatingAdmissionWebhook? Provide a sample manifest

Answer:

apiVersion: admissionregistration.k8s.io/v1
kind: ValidatingWebhookConfiguration
metadata:
  name: no-latest-tag
webhooks:
- name: no-latest.example.com
  rules:
  - apiGroups: [""]
    apiVersions: ["v1"]
    operations: ["CREATE","UPDATE"]
    resources: ["pods"]
  clientConfig:
    service:
      name: webhook-svc
      namespace: default
      path: "/validate"
    caBundle: <base64-CA-cert>

This rejects any Pod using the :latest image tag in your validation logic.

17. How does network policy work in Kubernetes?

Answer:

By default, all Pods in a Kubernetes cluster can communicate with each other. Network Policies allow you to restrict this communication. They are like a firewall for your Pods.

You can define rules that specify which Pods are allowed to communicate with which other Pods. Network Policies are implemented by a network plugin, so you need to be using a networking solution that supports them, like Calico or Cilium.

YAML Example (Network Policy):

This policy allows traffic to Pods with the label app=db only from Pods with the label app=backend.

apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
  name: db-policy
spec:
  podSelector:
    matchLabels:
      app: db
  policyTypes:
  - Ingress
  ingress:
  - from:
    - podSelector:
        matchLabels:
          app: backend

18. With PSP deprecated, how do you enforce security standards via PodSecurityAdmission?

Answer:

kubectl label ns default \
  pod-security.kubernetes.io/enforce=restricted \
  pod-security.kubernetes.io/audit=baseline

This enforces the “restricted” profile on the default namespace, blocking privileged containers and hostPath volumes.

19. What is a Sidecar container and what is it used for?

Answer:

A Sidecar is a container that runs alongside your main application container in the same Pod. They share the same network namespace and can share volumes. This pattern is used to extend or enhance the functionality of the main container without changing its code.

Common use cases for sidecars include:

Logging: A sidecar can collect logs from the main application and forward them to a central logging system.
Monitoring: A sidecar can export metrics from the application.
Service Mesh: In a service mesh like Istio, a sidecar proxy (like Envoy) is injected into each Pod to handle traffic management, security, and observability.

YAML Example (Sidecar for logging):

apiVersion: v1
kind: Pod
metadata:
  name: my-app-with-sidecar
spec:
  containers:
  - name: main-app
    image: my-app
    volumeMounts:
    - name: shared-logs
      mountPath: /var/log
  - name: sidecar-container
    image: fluentd
    volumeMounts:
    - name: shared-logs
      mountPath: /var/log
  volumes:
  - name: shared-logs
    emptyDir: {}

20. Demonstrate a `StorageClass` for CSI-based dynamic provisioning

Answer:

apiVersion: storage.k8s.io/v1
kind: StorageClass
metadata:
  name: csi-sc
provisioner: csi.example.com
parameters:
  type: gp2
reclaimPolicy: Delete
volumeBindingMode: WaitForFirstConsumer

This uses the CSI driver csi.example.com to provision volumes on-demand.

21. How would you secure a Kubernetes cluster?

Answer:

Securing a Kubernetes cluster is a multi-layered process. Here are some key areas to focus on:

RBAC (Role-Based Access Control): Use RBAC to control who can access the Kubernetes API and what they can do. Be as specific as possible with permissions.
Network Policies: As discussed earlier, use Network Policies to restrict traffic between Pods.
Pod Security Policies (or their successor): Enforce security standards for Pods, such as preventing them from running as root or accessing the host network.
Secrets Management: Use a secure solution for managing secrets, not just the default Kubernetes Secrets.
Image Scanning: Scan your container images for vulnerabilities before deploying them.
Regularly update Kubernetes: Keep your cluster up to date with the latest security patches.
Secure etcd: Restrict access to etcd and encrypt its data at rest.

22. Compare `kube-proxy` modes and show how to enable IPVS

Answer:

iptables mode uses Linux iptables rules; works universally but less performant at scale.
IPVS uses the Linux IPVS load-balancer; highly efficient for thousands of services. To enable IPVS in kube-proxy ConfigMap:

kind: ConfigMap
apiVersion: v1
metadata:
  name: kube-proxy
  namespace: kube-system
data:
  mode: "ipvs"

23. What is a Custom Resource Definition (CRD)?

Answer:

A Custom Resource Definition (CRD) allows you to extend the Kubernetes API with your own custom resources. This is a powerful way to build custom automation and workflows on top of Kubernetes.

When you create a CRD, you are essentially creating a new type of resource that you can manage with kubectl, just like you would with built-in resources like Pods and Deployments. To make these custom resources useful, you typically write a custom controller (or operator) that watches for changes to your custom resources and takes action based on them.

YAML Example (CRD):

apiVersion: apiextensions.k8s.io/v1
kind: CustomResourceDefinition
metadata:
  name: crontabs.stable.example.com
spec:
  group: stable.example.com
  versions:
    - name: v1
      served: true
      storage: true
      schema:
        openAPIV3Schema:
          type: object
          properties:
            spec:
              type: object
              properties:
                cronSpec:
                  type: string
                image:
                  type: string
                replicas:
                  type: integer
  scope: Namespaced
  names:
    plural: crontabs
    singular: crontab
    kind: CronTab
    shortNames:
    - ct

24. How do you write a custom controller using `controller-runtime`? Outline the key components

Answer:

Scaffold a project with Kubebuilder.
Define a Reconciler struct implementing Reconcile(ctx, req).
Use mgr.GetClient() to Get, Update, Patch resources.
Register your controller with builder.ControllerManagedBy(mgr).
Deploy as a Deployment and RBAC in-cluster.

25. What is Helm and why would you use it?

Answer:

Helm is a package manager for Kubernetes. It allows you to define, install, and upgrade even the most complex Kubernetes applications. Think of it like apt or yum for Kubernetes.

Helm uses a packaging format called charts. A chart is a collection of files that describe a related set of Kubernetes resources. With Helm, you can:

Manage the complexity of application deployments.
Easily share and reuse applications.
Perform repeatable deployments.
Manage application releases.

26. How do you enforce policies with OPA Gatekeeper? Provide a `ConstraintTemplate` and `Constraint`

Answer:

apiVersion: templates.gatekeeper.sh/v1beta1
kind: ConstraintTemplate
metadata:
  name: k8srequiredlabels
spec:
  crd:
    spec:
      names:
        kind: K8sRequiredLabels
  targets:
  - target: admission.k8s.gatekeeper.sh
    rego: |
      package k8srequiredlabels
      violation[{"msg": msg}] {
        input.review.object.metadata.labels["app"] == ""
        msg := "Every resource must have an ‘app’ label"
      }
---
apiVersion: constraints.gatekeeper.sh/v1beta1
kind: K8sRequiredLabels
metadata:
  name: require-app-label
spec:
  match:
    kinds:
    - apiGroups: [""]
      kinds: ["Pod"]

This rejects Pods without an app label.

27. How do you drain a node for maintenance?

Answer:

To safely perform maintenance on a node, you should first drain it. This will cordon the node (mark it as unschedulable) and then evict all the Pods running on it. The evicted Pods will be rescheduled on other available nodes, as long as they are managed by a controller like a Deployment.

Command:

kubectl drain <node-name> --ignore-daemonsets

The --ignore-daemonsets flag is often necessary because DaemonSet Pods are not evicted.

After maintenance is complete, you can uncordon the node to allow Pods to be scheduled on it again:

kubectl uncordon <node-name>

28. Explain API aggregation and provide an `APIService` example

Answer:

apiVersion: apiregistration.k8s.io/v1
kind: APIService
metadata:
  name: v1beta1.metrics.k8s.io
spec:
  group: metrics.k8s.io
  version: v1beta1
  service:
    name: metrics-server
    namespace: kube-system
  caBundle: <base64-CA-cert>
  groupPriorityMinimum: 100
  versionPriority: 10

This makes the custom metrics.k8s.io API available through the main API server.

29. What is the role of the kube-proxy?

Answer:

kube-proxy is a network proxy that runs on each node in your cluster. It is responsible for implementing the Kubernetes Service concept. It maintains network rules on nodes that allow network communication to your Pods from network sessions inside or outside of your cluster.

kube-proxy can operate in several modes, such as iptables, ipvs, or userspace. The iptables mode is the default and most common.

30. How do you configure an HPA to scale on a Prometheus metric `rps`?

Answer:

apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
  name: web-hpa
spec:
  scaleTargetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: web
  minReplicas: 2
  maxReplicas: 10
  metrics:
  - type: Pods
    pods:
      metric:
        name: rps
      target:
        type: AverageValue
        averageValue: 100

Install and configure prometheus-adapter with rules mapping PromQL to custom.metrics.k8s.io.

31. What are StatefulSets and how are they different from Deployments?

Answer:

StatefulSets are used to manage stateful applications, such as databases. They are similar to Deployments, but they provide a few key features that are important for stateful apps:

Stable, unique network identifiers: Pods in a StatefulSet have a persistent, unique hostname based on their name and an ordinal index (e.g., web-0, web-1).
Stable, persistent storage: Each Pod in a StatefulSet gets its own persistent storage that is tied to its identity. If a Pod is rescheduled, it will be reattached to the same storage.
Ordered, graceful deployment and scaling: Pods in a StatefulSet are created, updated, and deleted in a specific order.
Ordered, automated rolling updates.

YAML Example (StatefulSet):

apiVersion: apps/v1
kind: StatefulSet
metadata:
  name: web
spec:
  selector:
    matchLabels:
      app: nginx
  serviceName: "nginx"
  replicas: 3
  template:
    # ...
  volumeClaimTemplates:
  - metadata:
      name: www
    spec:
      accessModes: [ "ReadWriteOnce" ]
      resources:
        requests:
          storage: 1Gi

32. Show a `ServiceMonitor` to scrape metrics from a Pod

Answer:

apiVersion: monitoring.coreos.com/v1
kind: ServiceMonitor
metadata:
  name: app-service-monitor
  labels:
    team: frontend
spec:
  selector:
    matchLabels:
      app: frontend
  endpoints:
  - port: http-metrics
    path: /metrics
    interval: 30s

Prometheus Operator watches this CRD and configures scraping accordingly.

33. How can you limit resource usage for a Pod?

Answer:

You can limit the CPU and memory resources that a Pod can use by setting resource requests and limits in the Pod's specification.

Requests: The amount of resources that are guaranteed for the container. If a container requests a resource, Kubernetes will only schedule it on a node that can provide that amount.
Limits: The maximum amount of resources that a container can use. If a container tries to exceed its limit, it may be terminated (for memory) or throttled (for CPU).

Resource requests and limits for Pod may look like:

apiVersion: v1
kind: Pod
metadata:
  name: my-pod
spec:
  containers:
  - name: my-container
    image: my-app
    resources:
      requests:
        memory: "64Mi"
        cpu: "250m"
      limits:
        memory: "128Mi"
        cpu: "500m"

34. How do you use ephemeral containers to debug a running Pod?

Answer:

kubectl debug -it my-pod \
  --image=nicolaka/netshoot \
  --target=my-pod \
  -- /bin/bash

This attaches a debug container into the Pod’s namespace, letting you run tools like tcpdump or check in-Pod logs.

35. What is a DaemonSet?

Answer:

A DaemonSet ensures that all (or some) nodes run a copy of a Pod. As nodes are added to the cluster, Pods are added to them. As nodes are removed from the cluster, those Pods are garbage collected.

This is useful for deploying cluster-wide services, such as:

A log collector like Fluentd or Logstash.
A node monitoring agent like Prometheus Node Exporter.
A cluster storage daemon like Glusterd or Ceph.

YAML Example (DaemonSet):

apiVersion: apps/v1
kind: DaemonSet
metadata:
  name: fluentd-elasticsearch
spec:
  selector:
    matchLabels:
      name: fluentd-elasticsearch
  template:
    metadata:
      labels:
        name: fluentd-elasticsearch
    spec:
      containers:
      - name: fluentd-elasticsearch
        image: quay.io/fluentd_elasticsearch/fluentd:v2.5.2
        # ...

36. How do you rotate component certificates using Kubernetes CSR API?

Answer:

Generate CSR manifest:

apiVersion: certificates.k8s.io/v1
kind: CertificateSigningRequest
metadata:
  name: kube-apiserver-csr
spec:
  request: <base64-CSR>
  signerName: kubernetes.io/kube-apiserver-client
  usages:
  - client auth

Create with kubectl apply -f csr.yaml.
Approve: kubectl certificate approve kube-apiserver-csr.
Fetch signed cert: kubectl get csr kube-apiserver-csr -o jsonpath='{.status.certificate}' | base64 -d > apiserver.crt.

37. How do you troubleshoot a service that is not accessible?

Answer:

If you can't access a Service, here's a troubleshooting guide:

Check the Service description: Make sure the Service exists and has the correct labels.

   kubectl describe service <service-name>

Check the endpoints: Verify that the Service has endpoints (i.e., it's connected to running Pods).

   kubectl get endpoints <service-name>

If there are no endpoints, check the label selector in your Service and the labels on your Pods. They must match.

Check the Pods: Ensure that the Pods targeted by the Service are running and healthy.
Check Network Policies: If you are using Network Policies, make sure they are not blocking traffic to the Service.
Check DNS: Try to resolve the Service's DNS name from another Pod in the cluster.

   kubectl exec -it <another-pod> -- nslookup <service-name>

38. Demonstrate `postStart` and `preStop` hooks in a Pod

Answer:

apiVersion: v1
kind: Pod
metadata:
  name: lifecycle-demo
spec:
  containers:
  - name: demo
    image: nginx
    lifecycle:
      postStart:
        exec:
          command: ["/bin/sh","-c","echo 'Started' > /usr/share/message"]
      preStop:
        exec:
          command: ["/bin/sh","-c","nginx -s quit"]

postStart runs right after container start; preStop runs before termination.

39. What is the Operator pattern?

Answer:

The Operator pattern is a way to package, deploy, and manage a Kubernetes application. An Operator is a custom controller that uses Custom Resource Definitions (CRDs) to manage an application and its components.

The goal of an Operator is to automate the entire lifecycle of a complex, stateful application, including:

Deployment and configuration.
Scaling and high availability.
Backups and recovery.
Upgrades.

Essentially, an Operator encodes the operational knowledge of a human operator into software.

40. How do you deploy VPA for automatic resource right-sizing?

Answer:

kubectl apply -f https://github.com/kubernetes/autoscaler/raw/master/vertical-pod-autoscaler/deploy/vpa-v1-crd-gen.yaml
kubectl apply -f vpa-rbac.yaml

Then create:

apiVersion: autoscaling.k8s.io/v1
kind: VerticalPodAutoscaler
metadata:
  name: sample-vpa
spec:
  targetRef:
    apiVersion: "apps/v1"
    kind: Deployment
    name: sample-deployment
  updatePolicy:
    updateMode: "Auto"

VPA’s Recommender, Updater, and Admission Controller components handle resource adjustments.

41. How would you handle a situation where a node is `NotReady`?

Answer:

A NotReady status means the node controller has not heard from the node within the node-monitor-grace-period. This could be due to a network partition, the kubelet crashing, or the node being down.

Describe the node: Get more information about the node's status and any events.

   kubectl describe node <node-name>

Check the kubelet: SSH into the node and check the status of the kubelet service.

   systemctl status kubelet

Check the kubelet logs for errors:

   journalctl -u kubelet

Check network connectivity: Ensure the node can communicate with the master nodes.
Check resources: The node might be under heavy resource pressure (CPU, memory, or disk), which could be preventing the kubelet from functioning correctly.

42. How do you write a Falco rule to detect shells inside containers?

Answer:

- rule: Detect Shell in Container
  desc: Alert when bash is spawned in a container
  condition: container.id != host and proc.name = bash
  output: "Shell in container (user=%user.name container=%container.id)"
  priority: WARNING

Apply via falco_rules.local.yaml and restart Falco.

43. What are init containers and when would you use them?

Answer:

Init containers are containers that run before the main application containers in a Pod. They must run to completion before the main containers are started. If an init container fails, the Pod will be restarted.

You can use init containers for setup tasks that need to complete before the application starts, such as:

Waiting for a database or another service to be available.
Cloning a git repository into a volume.
Performing database migrations.
Setting up necessary permissions or configuration files.

YAML Example (Init Container):

apiVersion: v1
kind: Pod
metadata:
  name: my-app-pod
spec:
  containers:
  - name: my-app-container
    image: busybox
    command: ['sh', '-c', 'echo The app is running! && sleep 3600']
  initContainers:
  - name: init-myservice
    image: busybox
    command: ['sh', '-c', 'until nslookup myservice; do echo waiting for myservice; sleep 2; done;']

44. How do you enable etcd at-rest encryption for Secrets?

Answer:

Create /etc/kubernetes/enc/encryptionConfig.yaml:

   apiVersion: apiserver.config.k8s.io/v1
   kind: EncryptionConfiguration
   resources:
     - resources:
       - secrets
       providers:
       - aescbc:
           keys:
           - name: key1
             secret: <base64-32-byte-key>
       - identity: {}

Edit /etc/kubernetes/manifests/kube-apiserver.yaml to add:

   - --encryption-provider-config=/etc/kubernetes/enc/encryptionConfig.yaml
   - --encryption-provider-config-type=aescbc

Restart API server pods; now Secrets are stored encrypted in etcd.

45. How does Kubernetes handle container runtime interfaces (CRI)?

Answer:

The Container Runtime Interface (CRI) is a plugin interface that enables the kubelet to use a wide variety of container runtimes, without the need to recompile. The kubelet acts as a client, and the CRI shim for a given runtime acts as the server.

This allows you to use different container runtimes like containerd or CRI-O instead of the historically used Docker (via dockershim, which is now deprecated). This modularity is a key strength of Kubernetes.

46. How do you capture traffic between two Pods using `netshoot` and `tcpdump`?

Answer:

# Launch netshoot sidecar
kubectl debug -it my-pod --image=nicolaka/netshoot --target=my-pod

# Inside netshoot shell:
tcpdump -i any -nn dst port 80 -w /tmp/traffic.pcap

Or use kubectl netshoot run tmp-shell via plugin.

47. What is the difference between a `ClusterIP`, `NodePort`, and `LoadBalancer` service type?

Answer:

These are the three main types of Kubernetes Services:

ClusterIP: This is the default service type. It exposes the Service on an internal IP in the cluster. This IP is only reachable from within the cluster.
NodePort: This exposes the Service on a static port on each node's IP. You can access the Service from outside the cluster by requesting <NodeIP>:<NodePort>. ClusterIP is automatically created.
LoadBalancer: This exposes the Service externally using a cloud provider's load balancer. NodePort and ClusterIP services are automatically created, and the external load balancer routes to them.

48. Show basic Velero commands to back up and restore a namespace

Answer:

# Install Velero (once)
velero install --provider aws --bucket velero-backups --secret-file ./credentials-velero

# Backup namespace
velero backup create nginx-backup --include-namespaces nginx-example

# Simulate disaster
kubectl delete ns nginx-example

# Restore
velero restore create --from-backup nginx-backup

For PV snapshotting, add --snapshot-volumes.

49. How do you manage cluster upgrades?

Answer:

Upgrading a Kubernetes cluster is a critical operation that needs to be done carefully. The general process is:

Read the release notes: Carefully read the release notes for the version you are upgrading to. Pay attention to any deprecated APIs or breaking changes.
Upgrade the master nodes: Upgrade the master components (kube-apiserver, kube-scheduler, kube-controller-manager) one at a time.
Upgrade the kubelet on worker nodes: After the master nodes are upgraded, you can upgrade the kubelet on the worker nodes. This is typically done by draining each node, performing the upgrade, and then uncordoning it.
Upgrade kube-proxy: Upgrade the kube-proxy DaemonSet.
Upgrade other components: Upgrade other cluster components like CoreDNS, CNI plugin, etc.

Using a managed Kubernetes service (like GKE, EKS, or AKS) can greatly simplify this process.

50. How do you renew all kubeadm-managed certificates?

Answer:

# Check expirations
kubeadm certs check-expiration

# Renew all certs
kubeadm certs renew all

# Restart control-plane static pods (kube-apiserver, controller-manager, scheduler)

On HA clusters, run on each control-plane node.

51. What is a Pod Disruption Budget (PDB)?

Answer:

A Pod Disruption Budget (PDB) limits the number of Pods of a replicated application that are simultaneously down from voluntary disruptions. For example, when you drain a node, the PDB will ensure that a certain number of Pods for your application remain running.

This is crucial for maintaining the availability of your applications during planned maintenance.

YAML Example (PDB):

This PDB ensures that at least 2 Pods with the label app=nginx are available at all times.

apiVersion: policy/v1
kind: PodDisruptionBudget
metadata:
  name: nginx-pdb
spec:
  minAvailable: 2
  selector:
    matchLabels:
      app: nginx

52. How do you configure the audit policy and log backend for the API server?

Answer:

Create /etc/kubernetes/audit-policy.yaml:

   apiVersion: audit.k8s.io/v1
   kind: Policy
   rules:
   - level: Metadata

Mount in kube-apiserver static Pod and add flags:

   --audit-policy-file=/etc/kubernetes/audit-policy.yaml
   --audit-log-path=/var/log/kubernetes/audit/audit.log
   --audit-webhook-config-file=/etc/kubernetes/audit-webhook.yaml

Optionally configure batching and webhook.

53. How can you control the scheduling of a Pod to a specific node?

Answer:

There are several ways to control where a Pod gets scheduled:

nodeSelector: The simplest way. You add a nodeSelector field to your Pod specification with a set of key-value pairs. The Pod will only be scheduled on nodes that have all of those labels.
nodeAffinity: A more expressive way to specify node selection. It allows you to use more complex rules, like "soft" preferences (preferredDuringSchedulingIgnoredDuringExecution) and more advanced operators.
podAffinity and podAntiAffinity: These allow you to schedule Pods based on the labels of other Pods that are already running on a node. For example, you can co-locate Pods of the same service on the same node (podAffinity) or spread them across different nodes (podAntiAffinity).
Taints and Tolerations: As discussed earlier, this is another mechanism to control scheduling.

YAML Example (nodeAffinity):

apiVersion: v1
kind: Pod
metadata:
  name: my-pod
spec:
  containers:
  - name: my-container
    image: nginx
  affinity:
    nodeAffinity:
      requiredDuringSchedulingIgnoredDuringExecution:
        nodeSelectorTerms:
        - matchExpressions:
          - key: disktype
            operator: In
            values:
            - ssd

54. Demonstrate how to create a `PriorityClass` and a high-priority Pod that can preempt lower ones

Answer:

apiVersion: scheduling.k8s.io/v1
kind: PriorityClass
metadata:
  name: high-priority
value: 1000000
globalDefault: false
description: "Critical workload"
---
apiVersion: v1
kind: Pod
metadata:
  name: critical-pod
spec:
  containers:
  - name: app
    image: nginx
  priorityClassName: high-priority

Higher-priority pods enter the scheduling queue first, and preempt lower ones if resources are scarce.

55. What is the role of the cloud-controller-manager?

Answer:

The cloud-controller-manager is a Kubernetes control plane component that embeds cloud-specific control logic. It allows you to link your cluster into your cloud provider's API, and separates out the components that interact with that cloud platform from components that just interact with your cluster.

It is responsible for things like:

Node Controller: For checking the cloud provider to determine if a node has been deleted in the cloud after it stops responding.
Route Controller: For setting up routes in the underlying cloud infrastructure.
Service Controller: For creating, updating, and deleting cloud provider load balancers.

Well, that's all for now. I hope you successfully answered all 55 questions and found the answers useful, helping to increase your understanding of Kubernetes.

If some questions were too difficult or if you were unfamiliar with any of these Kubernetes concepts, feel free to share your thoughts in the comments.

And if you want to get in touch with me, you can find me on X (twitter) — @immanuel_vibe. Bye!

Top comments (1)

Immanuel Tikhonov • Jul 11 '25

x.com/immanuel_vibe/status/1942602...

1. Explain the roles of the Kubernetes control plane components and how they interact to keep the cluster in sync

2. How do you perform a rolling update without downtime?

3. How would you design a highly available etcd cluster for Kubernetes, and secure it?

4. What's the difference between a Service and an Ingress?

5. Describe the CRI, CNI, and CSI interfaces and their roles in Kubernetes

6. How would you troubleshoot a Pod that is stuck in a CrashLoopBackOff state?

7. What are Taints and Tolerations?

8. Explain how to deploy a custom scheduler and integrate it with the main scheduling framework or extenders

9. What's the role of the etcd in a Kubernetes cluster?

10. How can you debug CRI issues using crictl?

11. How do you manage secrets in Kubernetes?

12. Describe the Kubernetes networking model, including CNI and NetworkPolicies

13. Explain the difference between livenessProbe and readinessProbe

14. Explain how a service mesh (e.g., Istio) integrates with Kubernetes and show an example VirtualService

15. What is a Persistent Volume (PV) and a Persistent Volume Claim (PVC)?

16. How do you write and deploy a ValidatingAdmissionWebhook? Provide a sample manifest

17. How does network policy work in Kubernetes?

18. With PSP deprecated, how do you enforce security standards via PodSecurityAdmission?

19. What is a Sidecar container and what is it used for?

20. Demonstrate a StorageClass for CSI-based dynamic provisioning

21. How would you secure a Kubernetes cluster?

22. Compare kube-proxy modes and show how to enable IPVS

23. What is a Custom Resource Definition (CRD)?

24. How do you write a custom controller using controller-runtime? Outline the key components

25. What is Helm and why would you use it?

26. How do you enforce policies with OPA Gatekeeper? Provide a ConstraintTemplate and Constraint

27. How do you drain a node for maintenance?

28. Explain API aggregation and provide an APIService example

29. What is the role of the kube-proxy?

30. How do you configure an HPA to scale on a Prometheus metric rps?

31. What are StatefulSets and how are they different from Deployments?

32. Show a ServiceMonitor to scrape metrics from a Pod

33. How can you limit resource usage for a Pod?

34. How do you use ephemeral containers to debug a running Pod?

35. What is a DaemonSet?

36. How do you rotate component certificates using Kubernetes CSR API?

37. How do you troubleshoot a service that is not accessible?

38. Demonstrate postStart and preStop hooks in a Pod

39. What is the Operator pattern?

40. How do you deploy VPA for automatic resource right-sizing?

41. How would you handle a situation where a node is NotReady?

42. How do you write a Falco rule to detect shells inside containers?

43. What are init containers and when would you use them?

44. How do you enable etcd at-rest encryption for Secrets?

45. How does Kubernetes handle container runtime interfaces (CRI)?

46. How do you capture traffic between two Pods using netshoot and tcpdump?

47. What is the difference between a ClusterIP, NodePort, and LoadBalancer service type?

48. Show basic Velero commands to back up and restore a namespace

49. How do you manage cluster upgrades?

50. How do you renew all kubeadm-managed certificates?

51. What is a Pod Disruption Budget (PDB)?

52. How do you configure the audit policy and log backend for the API server?

53. How can you control the scheduling of a Pod to a specific node?

54. Demonstrate how to create a PriorityClass and a high-priority Pod that can preempt lower ones

55. What is the role of the cloud-controller-manager?

6. How would you troubleshoot a Pod that is stuck in a `CrashLoopBackOff` state?

10. How can you debug CRI issues using `crictl`?

13. Explain the difference between `livenessProbe` and `readinessProbe`

14. Explain how a service mesh (e.g., Istio) integrates with Kubernetes and show an example `VirtualService`

20. Demonstrate a `StorageClass` for CSI-based dynamic provisioning

22. Compare `kube-proxy` modes and show how to enable IPVS

24. How do you write a custom controller using `controller-runtime`? Outline the key components

26. How do you enforce policies with OPA Gatekeeper? Provide a `ConstraintTemplate` and `Constraint`

28. Explain API aggregation and provide an `APIService` example

30. How do you configure an HPA to scale on a Prometheus metric `rps`?

32. Show a `ServiceMonitor` to scrape metrics from a Pod

38. Demonstrate `postStart` and `preStop` hooks in a Pod

41. How would you handle a situation where a node is `NotReady`?

46. How do you capture traffic between two Pods using `netshoot` and `tcpdump`?

47. What is the difference between a `ClusterIP`, `NodePort`, and `LoadBalancer` service type?

54. Demonstrate how to create a `PriorityClass` and a high-priority Pod that can preempt lower ones