To truly master Kubernetes, a DevOps engineer must look past the kubectl CLI and understand the internal components. When you run a command or deploy a manifest, a highly coordinated workflow occurs between the Control Plane (the brains) and the Worker Nodes (the muscle).
The Control Plane (Master Components)
-
kube-apiserver: The front door to the cluster. It exposes the Kubernetes API and is the only component that talks directly to the cluster storage. Every tool (
kubectl, CI/CD pipelines, internal controllers) communicates through it. - etcd: A distributed, consistent key-value store. This is the single source of truth for your cluster. It records the exact state of every single resource. Pro-Tip: If you lose your etcd backups, you lose your cluster.
- kube-scheduler: The matchmaker. It watches for newly created Pods that have no assigned node and selects the best Worker Node for them to run on based on resource availability, constraints, and affinity rules.
- kube-controller-manager: The engine behind the controllers. It packages the actual control loops (Deployment controller, Node controller, Endpoints controller) that continuously regulate the state of the cluster.
The Worker Nodes (Data Plane)
- kubelet: The captain of the node. It runs on every machine in the cluster, ensuring that containers are running in a Pod and healthy according to the instructions given by the Control Plane.
- kube-proxy: The network supervisor. It maintains network rules on nodes, allowing network communication to your Pods from inside or outside the cluster.
-
Container Runtime: The software responsible for running containers (most commonly
containerdorCRI-O).
2. Step-by-Step Lifecycle: What Happens When You Run kubectl apply?
To diagnose infrastructure issues effectively, a DevOps engineer must understand the exact chain of events that occurs when a manifest is deployed.
Step 1: Authentication & Authorization
You execute kubectl apply -f deployment.yaml. The request hits the kube-apiserver. The API server validates your identity (Certificate, Token, or OIDC) and checks your Role-Based Access Control (RBAC) permissions to ensure you are allowed to create deployments in that namespace.
Step 2: Mutating & Validating Admission Control
Before saving anything, the API server passes the YAML through Admission Controllers:
- Mutating webhooks modify the request (e.g., automatically injecting sidecar containers for service meshes or default resource limits).
-
Validating webhooks enforce compliance (e.g., rejecting the deployment if it runs as the
rootuser or uses an unapproved container registry).
Step 3: Etcd Storage (The State Recorded)
Once validated, the API server writes the desired state of the Deployment into etcd.
Step 4: The Controller Manager Steps In
The Deployment Controller (inside the manager) notices a new Deployment object in etcd via a watch stream. It realizes the Deployment demands a ReplicaSet, but none exists yet. The controller instructs the API server to write a ReplicaSet object to etcd.
In turn, the ReplicaSet Controller notices this new object, sees you requested replicas: 3, and generates definitions for 3 individual Pods, saving them to etcd.
Step 5: Scheduling the Pods
At this stage, the 3 Pods exist in etcd but have a status of Pending because their nodeName field is blank. The kube-scheduler detects these unassigned pods, evaluates the nodes for available CPU/Memory, and selects optimal hosts. It updates the Pod definitions in etcd with the assigned node names.
Step 6: Node Execution via Kubelet
The kubelet running on the selected worker node watches the API server. It notices a Pod has been assigned to its specific node.
- Kubelet calls the Container Runtime Interface (CRI) to pull the container image and start the containers.
- It interacts with the Container Network Interface (CNI) plugin to assign a unique cluster IP address to the Pod.
- It provisions storage via the Container Storage Interface (CSI) if persistent volumes are required.
Step 7: Continuous Reconciliation
Once running, the kubelet reports back to the API server that the pod status is Running. The controller manager confirms that the Current State (3 active pods) matches the Desired State (3 requested pods). The reconciliation loop is temporarily satisfied.
3. What a DevOps Engineer Must Know to Manage K8s Production
Being a DevOps engineer requires moving past basic YAML configurations and managing day-to-day production realities.
A. Resource Management & Scheduling Controls
Improperly configured pods can destabilize an entire cluster. You must always define resource boundaries.
- Requests: The absolute minimum CPU and Memory a container needs to boot. The Scheduler uses this number to place the Pod on a node.
- Limits: The maximum threshold a container is allowed to consume. If a container breaks past its memory limit, the Linux kernel kills it with an OOMKilled (Out Of Memory) error.
- Affinity & Taints: You must know how to dictate placement. Use Taints and Tolerations to repel pods from specific nodes (e.g., keeping regular apps off expensive GPU nodes). Use Node Affinity to attract pods to specific instances.
B. Networking, Ingress, and Service Architecture
Pods are ephemeral—they die and change IPs constantly. To expose applications reliably, you must master the layers of K8s networking:
[ Internet ] ---> [ Ingress Controller ] ---> [ ClusterIP Service ] ---> [ Pod Replicas ]
- ClusterIP: The default service type. It exposes the service on an internal cluster-only IP. Best for internal communication between microservices.
- NodePort: Exposes the service on a static port across each Node's IP.
- LoadBalancer: Integrates with your cloud provider (AWS, GCP, Azure) to provision an enterprise-grade external cloud load balancer automatically.
-
Ingress Controllers (Nginx, Traefik, ALB): Acting as an application-layer reverse proxy, an Ingress controller routes external HTTP/HTTPS traffic to internal services based on paths or domain names (e.g.,
api.company.com/v1).
C. Troubleshooting & Day-2 Operations
When production breaks, you must know exactly where to look. Memorize this troubleshooting matrix:
| If a Pod Status is... | It usually means... | Your Next Command Should Be... |
|---|---|---|
CrashLoopBackOff |
The application code is crashing immediately after startup (misconfiguration, missing env variables, database down). | kubectl logs <pod-name> --previous |
ImagePullBackOff |
Typo in the image name, wrong tag, or the cluster doesn't have the permission/credentials to pull from a private registry. | kubectl describe pod <pod-name> |
Pending |
The scheduler cannot find a node that has enough free CPU or Memory to satisfy the Pod's Requests. |
kubectl describe pod <pod-name> (Check the events at the bottom) |
OOMKilled |
The container tried to consume more memory than its explicitly declared Limit. | kubectl describe pod <pod-name> |
Top comments (0)