B.R.O.L.Y

Posted on Oct 24

Mastering Kubernetes Step by Step Part 2 Pods and Containers Explained

#kubernetes #architecture #devops #cloud

Hands On: Getting Started

Let's dive right in and get practical experience before explaining the theory. First, I recommend installing Docker Desktop so we can run example clusters locally on a single node.

Once installed, make sure to check the settings and enable the Kubernetes cluster option. After that, we'll use kubectl, the command-line interface for Kubernetes (we'll cover this in detail later).

$ kubectl get pods
NAME                                READY   STATUS    RESTARTS   AGE
nginx-deployment-66b6c48dd5-8xqmk   1/1     Running   0          2d
nginx-deployment-66b6c48dd5-k9pzx   1/1     Running   0          2d
redis-master-f46ff57fd-7jq8w        1/1     Running   0          5d

Now you have a Kubernetes cluster running on your laptop with multiple pods across multiple namespaces.

Understanding Pods

Kubernetes uses pods as its fundamental deployment unit for several important reasons: they provide an abstraction layer, enable resource sharing, add essential features, enhance scheduling capabilities, and much more.

Every application runs inside a pod on Kubernetes. Here's what that means in practice:

When you deploy an app, you deploy it in a Pod
When you terminate an app, you terminate its Pod
When you scale an app up, you add more Pods
When you scale an app down, you remove Pods
When you update an app, you replace Pods with new ones

Pods as an Abstraction Layer

Pods abstract away the complexity of different workload types. This powerful design means you can run containers, VMs, serverless functions, and WebAssembly apps inside pods, and Kubernetes treats them all the same way.

The benefits of this abstraction:

Kubernetes can focus on deploying and managing Pods without needing to understand what's running inside them
Different types of workloads can run side-by-side on the same cluster
All workloads leverage the full power of the declarative Kubernetes API
Every workload benefits from Pod features like health checks, restart policies, and resource limits

How different workloads use Pods:

Containers and WebAssembly apps work directly with standard Pods, standard workload controllers, and standard runtimes. However, serverless functions and VMs need additional components:

Serverless functions run in standard Pods but require frameworks like Knative to extend the Kubernetes API with custom resources and controllers
Virtual machines need tools like KubeVirt to extend the API and run VMs as pod-like resources (called VirtualMachineInstances or VMIs)

The diagram above shows four different workload types running on the same cluster. Each workload is wrapped in a Pod (or VMI), managed by a controller, and uses a standard runtime interface.

What Pods add to your workloads:

Resource sharing between containers
Advanced scheduling capabilities
Application health probes
Restart policies
Security policies
Termination control
Volume management

How Pods Enable Resource Sharing

A Pod can run one or more containers, and all containers within the same Pod share the Pod's execution environment. This shared environment includes:

Shared filesystem and volumes (mount namespace)
Shared network stack (network namespace) - all containers share the same IP address
Shared memory (IPC namespace)
Shared process tree (PID namespace)
Shared hostname (UTS namespace)

Real-world example:

Imagine a Pod running at IP address 10.0.10.15 with two containers:

A main application container listening on port 8080
A sidecar container listening on port 5005

External clients access both containers using the Pod's single IP address (10.0.10.15), but on different ports. Inside the Pod, the containers can communicate with each other using localhost since they share the same network namespace.

Both containers can also mount the same volume to share data. For example, the sidecar might sync static content from a Git repository and store it in a shared volume, while the main container reads that content and serves it as a web page.

Pods and Scheduling

Kubernetes guarantees that all containers in the same Pod will be scheduled to the same cluster node. However, you should only group containers in the same Pod if they truly need to share resources like memory, volumes, or networking.

Important principle: If you just want two applications to run on the same node (without resource sharing), place them in separate Pods and use scheduling features to co-locate them.

Advanced Scheduling Features

Pods provide powerful scheduling capabilities:

1. Node Selectors

The simplest way to control Pod placement. You provide a list of node labels, and the scheduler only assigns the Pod to nodes that have all those labels.

nodeSelector:
  disktype: ssd
  environment: production

2. Affinity and Anti-Affinity

More powerful than node selectors, these rules give you fine-grained control over Pod placement.

The basics:

Affinity rules attract Pods to specific nodes or other Pods
Anti-affinity rules repel Pods away from specific nodes or other Pods
Hard rules (requiredDuringScheduling) must be satisfied - the Pod won't schedule if they can't be met
Soft rules (preferredDuringScheduling) are preferences - the scheduler tries to honor them but will still schedule the Pod if it can't

Node affinity example:

This works like a node selector - you provide labels, and the scheduler assigns the Pod only to nodes with those labels.

Pod affinity example:

With Pod affinity, you provide labels of other Pods, and the scheduler places your Pod on nodes that are already running Pods with those labels. This is useful when you want related services to run close together for better performance.

Anti-affinity example:

Anti-affinity ensures your Pods spread out. For example, you might use anti-affinity to ensure database replicas run on different nodes for high availability.

3. Topology Spread Constraints

These rules help you distribute Pods evenly across failure domains like zones, regions, or nodes to improve reliability.

4. Resource Requests and Limits

You can specify how much CPU and memory your Pod needs (requests) and the maximum it can use (limits). The scheduler uses this information to place Pods on nodes with sufficient resources.

Deploying Pods

Deploying a Pod involves a carefully orchestrated series of steps across multiple Kubernetes components:

Define the Pod in a YAML manifest file
Post the manifest to the API server
The request is authenticated and authorized
The Pod spec is validated
The scheduler filters nodes based on nodeSelectors, affinity and anti-affinity rules, topology spread constraints, resource requirements and limits, and more
The Pod is assigned to a healthy node meeting all requirements
The kubelet on the node watches the API server and notices the Pod assignment
The kubelet downloads the Pod spec and asks the local container runtime to start it
The kubelet monitors the Pod status and reports status changes back to the API server

If the scheduler can't find a suitable node, it marks the Pod as pending.

Deploying a Pod is an atomic operation. This means a Pod only starts servicing requests when all its containers are up and running.

Pod Deployment Flow

This diagram shows the complete journey from running a kubectl command to having a Pod running on a node. Each component plays a specific role in ensuring the Pod is deployed correctly and securely.

Pod Lifecycle

Pods are designed to be mortal and immutable.

Mortal means you create a Pod, it executes a task, and then it terminates. Once it completes, it gets deleted and cannot be restarted. The same is true if it fails — it gets deleted and cannot be restarted.

Immutable means you cannot modify them after they're deployed. This can be a huge mindset change if you're from a traditional background where you regularly patched live servers and logged on to them to make fixes and configuration changes. If you need to change a Pod, you create a new one with the changes, delete the old one, and replace it with the new one. If a Pod needs to store data, you should attach a volume and store the data in the volume so it's not lost when the Pod is deleted.

A Typical Pod Lifecycle

You define a Pod in a declarative YAML object that you post to the API server. It goes into the pending phase while the scheduler finds a node to run it on. Assuming it finds a node, the Pod gets scheduled, and the local kubelet instructs the container runtime to start its containers. Once all of its containers are running, the Pod enters the running phase. It remains in the running phase indefinitely if it's a long-lived Pod, such as a web server.

If it's a short-lived Pod, such as a batch job, it enters the succeeded state as soon as all containers complete their tasks.

A note on running VMs on Kubernetes: VMs are designed as mutable immortal objects. For example, you can restart them, change their configurations, and even migrate them. This is very different from the design goals of Pods, which is why KubeVirt wraps VMs in a modified Pod-like resource called a VirtualMachineInstance (VMI) and manages them using custom workload controllers.

Restart Policies

Earlier, we said Pods augment apps with restart policies. However, these policies apply to individual containers, not the actual Pod.

Let's consider some scenarios:

You use a Deployment controller to schedule a Pod to a node, and the node fails. When this happens, the Deployment controller notices the failed node, deletes the Pod, and replaces it with a new one on a surviving node. Even though the new Pod is based on the same Pod spec, it has a new UID, a new IP address, and no state from the previous Pod.

The same thing happens when nodes evict Pods during node maintenance or due to resource constraints — the evicted Pod is deleted and replaced with a new one on another node.

This pattern even applies during scaling operations, updates, and rollbacks:

Scaling down deletes Pods
Scaling up creates new Pods
Updates replace old Pods with new ones

The key takeaway: Anytime we say we're "updating" or "restarting" Pods, we really mean replacing them with new ones.

Although Kubernetes can't restart Pods, it can definitely restart containers. This is always done by the local kubelet and governed by the spec.restartPolicy field, which can be set to:

Always - Always attempt to restart a failed container
Never - Never attempt to restart a container
OnFailure - Only restart if the container fails (not if it completes successfully)

The policy is Pod-wide, meaning it applies to all containers in the Pod except for init containers (more on those later).

Choosing the Right Restart Policy

The restart policy you choose depends on the nature of your application:

Long-lived containers host apps such as web servers, databases, and message queues that run indefinitely. If they fail, you want to restart them, so you'll typically use the Always restart policy.

apiVersion: v1
kind: Pod
metadata:
  name: web-server
spec:
  restartPolicy: Always
  containers:
  - name: nginx
    image: nginx:1.21

Short-lived containers typically run batch-style workloads that execute a task through to completion. Most of the time, you're happy when they complete successfully, and you only want to restart them if they fail. As such, you'll probably use the OnFailure restart policy. If you don't care whether they fail or succeed, use the Never policy.

apiVersion: v1
kind: Pod
metadata:
  name: batch-job
spec:
  restartPolicy: OnFailure
  containers:
  - name: data-processor
    image: batch-processor:1.0

Remember: Kubernetes never restarts Pods — when they fail, get scaled, or get updated, Kubernetes always deletes old Pods and creates new ones. However, Kubernetes can restart individual containers within a Pod on the same node.

Static Pods vs Controllers

There are two ways to deploy Pods:

Directly via a Pod manifest (rare)
Indirectly via a workload resource and controller (most common)

Static Pods

Deploying directly from a Pod manifest creates a static Pod that cannot self-heal, scale, or perform rolling updates. This is because static Pods are only managed by the kubelet on the node they're running on, and kubelets are limited to restarting containers on the same node. If the node fails, the kubelet fails as well and cannot do anything to help the Pod.

Controller-Managed Pods

On the flip side, Pods deployed via workload resources (like Deployments, StatefulSets, or DaemonSets) get all the benefits of being managed by a highly available controller that can:

Restart Pods on other nodes if a node fails
Scale Pods when demand changes
Perform advanced operations such as rolling updates and versioned rollbacks

The local kubelet can still attempt to restart failed containers, but if the node fails or gets evicted, the controller can restart the Pod on a different node.

The controller runs in the Kubernetes control plane and constantly watches the state of your Pods. If reality doesn't match your desired state, it takes action to fix it.

The Pod Network

Every Kubernetes cluster runs a pod network and automatically connects all Pods to it. It's usually a flat Layer 2 overlay network that spans every cluster node and allows every Pod to talk directly to every other Pod, even if the remote Pod is on a different cluster node.

The pod network is implemented by a third-party plugin that interfaces with Kubernetes and configures the network via the Container Network Interface (CNI).

You choose a network plugin at cluster build time, and it configures the Pod network for the entire cluster. Many plugins exist, each with its own pros and cons. However, at the time of writing, Cilium is the most popular and implements advanced features such as security policies and observability.

How the Pod Network Works

The Pod network creates a unified network space where every Pod gets its own unique IP address, and all Pods can communicate directly without NAT (Network Address Translation).

Key characteristics:

Each Pod gets a unique IP from the pod network CIDR range (e.g., 10.244.0.0/16)
Pods can communicate with any other Pod using its IP address
The pod network spans all nodes in the cluster
Node IPs (192.168.x.x) are separate from Pod IPs (10.244.x.x)

Network Configuration Example

When a Pod is created, the CNI plugin performs these steps:

Allocates an IP address from the pod network range
Creates a virtual ethernet pair (veth pair) - one end in the Pod's network namespace, one end on the node
Configures routing so the Pod can reach other Pods and external networks
Sets up network policies if defined (firewall rules for Pod-to-Pod traffic)

Real-world example:

Imagine you have a three-tier application:

Frontend Pods (10.244.1.x) on Node 1
Backend API Pods (10.244.2.x) on Node 2
Database Pods (10.244.3.x) on Node 3

The frontend Pods can directly call the backend API using its Pod IP (10.244.2.8:8080) even though they're on different physical nodes. The CNI plugin handles all the routing transparently using overlay networking (typically VXLAN or similar encapsulation).

The diagram above shows three nodes running five Pods. All five Pods are connected to the pod network and can communicate with each other. You can also see the Pod network spanning all three nodes.

Important distinction: The network is only for Pods, not nodes. As shown in the diagram, you can connect nodes to multiple different networks (management network, storage network, etc.), but the Pod network spans them all, creating a unified communication layer for your applications.

Kubernetes has two main patterns for multi-container Pods: init containers and sidecar containers. Let's quickly explain each.

Multi-container Pods

Multi-container Pods are a powerful pattern and very popular in real-world deployments.

According to microservices design patterns, every container should have a single clearly defined responsibility. For example, an application that syncs content from a repository and serves it as a web page has two distinct responsibilities:

Sync the content
Serve the web page

You should design this app with two microservices and give each one its own container — one container responsible for syncing the content and the other responsible for serving it. We call this separation of concerns, or the single responsibility principle, and it keeps containers small and simple, encourages reuse, and makes troubleshooting easier.

Most of the time, you'll put application containers in their own Pods and they'll communicate over the network. However, sometimes putting them in the same Pod is beneficial. Sticking with the sync and serve example, putting the containers in the same Pod allows the sync container to pull content from a remote system and store it in a shared volume where the web container can read it and serve it.

Kubernetes has two main patterns for multi-container Pods: init containers and sidecar containers.

Multi-container Pods: Init Containers

Init containers are a special type of container defined in the Kubernetes API. You run them in the same Pod as application containers, but Kubernetes guarantees they'll start and complete before the main app container starts. It also guarantees they'll only run once.

The purpose of init containers is to prepare and initialize the environment so it's ready for application containers.

Real-world Examples

Example 1: Waiting for a Service

You have an application that should only start when a remote API is accepting connections. Instead of complicating the main application with the logic to check the remote API, you run that logic in an init container in the same Pod. When you deploy the Pod, the init container comes up first and sends requests to the remote API waiting for it to respond. While this is happening, the main app container cannot start. However, as soon as the remote API accepts a request, the init container completes, and the main app container starts.

Example 2: Content Preparation

You have another application that needs a one-time clone of a remote repository before starting. Again, instead of bloating and complicating the main application with the code to clone and prepare the content (knowledge of the remote server address, certificates, auth, file sync protocol, checksum verifications, etc.), you implement that in an init container that is guaranteed to complete the task before the main application container starts.

Init Container Lifecycle

A drawback of init containers is that they're limited to running tasks before the main app container starts. For something that runs alongside the main app container, you need a sidecar container.

Multi-container Pods: Sidecars

Sidecar containers are regular containers that run at the same time as application containers for the entire lifecycle of the Pod.

Unlike init containers, sidecars are not a special resource in the Kubernetes API — we're currently using regular containers to implement the sidecar pattern. Work is in progress to formalize the sidecar pattern in the API, but at the time of writing, it's still an early alpha feature.

The job of a sidecar container is to add functionality to an app without having to implement it in the actual app. Common examples include sidecars that:

Scrape and ship logs
Sync remote content
Broker connections
Transform or munge data
Provide encryption and decryption

They're also heavily used by service meshes where the sidecar intercepts network traffic and provides traffic encryption and telemetry.

The figure below shows a multi-container Pod with a main app container and a service mesh sidecar. The sidecar intercepts all network traffic and provides encryption and decryption. It also sends telemetry data to the service mesh control plane.

Experimentation

Basic Pod Manifest

A typical Pod manifest file looks like this:

apiVersion: v1
kind: Pod
metadata:
  name: hello-pod
  labels:
    zone: prod
    version: v1
spec:
  containers:
  - name: hello-ctr
    image: nigelpoulton/k8sbook:1.0
    ports:
    - containerPort: 8080
    resources:
      limits:
        memory: 128Mi
        cpu: 0.5

Let's break down each section:

kind: Tells Kubernetes what type of object you're defining. This one defines a Pod, but if you were defining a Deployment, the kind field would say Deployment.
apiVersion: Tells Kubernetes what version of the API to use when creating the object. This manifest uses the v1 version of the API.
metadata: Names the Pod hello-pod and gives it two labels. You'll use labels in future chapters to connect the Pod to a Service for networking.
spec: Where most of the action happens. This example defines a single-container Pod with an application container called hello-ctr. The container is based on the nigelpoulton/k8sbook:1.0 image, listens on port 8080, and tells the scheduler it needs a maximum of 128MB of memory and half a CPU.

To make it a multi-container Pod, you simply add more containers below the spec.containers section.

Deploying Pods from a Manifest File

Run the following kubectl apply command to deploy the Pod. The command sends the pod.yml file to the API server defined in the current context of your kubeconfig file. It also attaches credentials from your kubeconfig file.

$ kubectl apply -f pod.yml
pod/hello-pod created

Although the output says the Pod is created, it might still be pulling the image and starting the container.

Run a kubectl get pods to check the status:

$ kubectl get pods
NAME        READY   STATUS              RESTARTS   AGE
hello-pod   0/1     ContainerCreating   0          9s

The Pod in the example isn't fully created yet — the READY column shows zero containers ready, and the STATUS column shows why.

Note: Kubernetes automatically pulls (downloads) images from Docker Hub. To use another registry, just add the registry's URL before the image name in the YAML file.

Once the READY column shows 1/1 and the STATUS column shows Running, your Pod will be running on a healthy cluster node and monitored by the node's kubelet.

You'll see how to connect to the app and test it in future chapters.

Inspecting Pods

You've already run a kubectl get pods command and seen that it returns a single line of basic info. However, the following flags provide much more information:

-o wide: Gives a few more columns but is still a single line of output
-o yaml: Gets you everything Kubernetes knows about the object

The following example shows the output of kubectl get pods with the -o yaml flag. The output is truncated, but notice how it's divided into two main parts:

spec: Shows the desired state of the object
status: Shows the observed state

$ kubectl get pods hello-pod -o yaml
apiVersion: v1
kind: Pod
metadata:
  annotations:
    kubectl.kubernetes.io/last-applied-configuration: |
      {"apiVersion":"v1","kind":"Pod"...}
  name: hello-pod
  namespace: default
spec:                           # Desired state
  containers:
  - image: image-name
    imagePullPolicy: IfNotPresent
    name: hello-ctr
    ports:
    - containerPort: 8080
      protocol: TCP
    resources:
      limits:
        cpu: 500m
        memory: 128Mi
  restartPolicy: Always
status:                         # Observed state
  conditions:
  - lastProbeTime: null
    lastTransitionTime: "2024-01-03T18:21:51Z"
    status: "True"
    type: Initialized
  - lastProbeTime: null
    lastTransitionTime: "2024-01-03T18:22:05Z"
    status: "True"
    type: Ready
  containerStatuses:
  - containerID: containerd://abc123...
    image: image-name
    name: hello-ctr
    ready: true
    state:
      running:
        startedAt: "2024-01-03T18:22:04Z"

kubectl describe

Another great command is kubectl describe. This gives you a nicely formatted overview of an object, including lifecycle events.

$ kubectl describe pod hello-pod
Name:         hello-pod
Namespace:    default
Labels:       version=v1
              zone=prod
Status:       Running
IP:           10.1.0.103
Containers:
  hello-ctr:
    Container ID:   containerd://ec0c3e...
    Image:          image-name
    Port:           8080/TCP
    State:          Running
      Started:      Wed, 03 Jan 2024 18:22:04 +0000
    Ready:          True
Conditions:
  Type              Status
  Initialized       True
  Ready             True
  ContainersReady   True
  PodScheduled      True
Events:
  Type    Reason     Age    From               Message
  ----    ------     ----   ----               -------
  Normal  Scheduled  5m30s  default-scheduler  Successfully assigned default/hello-pod to node-1
  Normal  Pulling    5m30s  kubelet            Pulling image "nigelpoulton/k8sbook:1.0"
  Normal  Pulled     5m8s   kubelet            Successfully pulled image "nigelpoulton/k8sbook:1.0"
  Normal  Created    5m8s   kubelet            Created container hello-ctr
  Normal  Started    5m8s   kubelet            Started container hello-ctr

kubectl logs

You can use the kubectl logs command to pull the logs from any container in a Pod. The basic format of the command is kubectl logs <pod>.

If you run the command against a multi-container Pod, you automatically get the logs from the first container in the Pod. However, you can override this by using the --container flag and specifying the name of the container you want the logs from. If you're unsure of the names of containers or the order they appear in a multi-container Pod, just run a kubectl describe pod <pod> command. You can get the same info from the Pod's YAML file.

The following YAML shows a multi-container Pod with two containers. The first container is called app, and the second is called syncer. Running kubectl logs against this Pod without specifying the --container flag will get you the logs from the app container.

apiVersion: v1
kind: Pod
metadata:
  name: logtest
spec:
  containers:
  - name: app                    # First container (default)
    image: nginx
    ports:
    - containerPort: 8080
  - name: syncer                 # Second container
    image: image-name
    volumeMounts:
    - name: html
      mountPath: /tmp/git
  volumes:
  - name: html
    emptyDir: {}

You'd run the following command if you wanted the logs from the syncer container. Don't run this command, as you haven't deployed this Pod yet.

$ kubectl logs logtest --container syncer

kubectl exec

The kubectl exec command is a great way to execute commands inside running containers.

You can use kubectl exec in two ways:

Remote command execution
Exec session

Remote command execution lets you send commands to a container from your local shell. The container executes the command and returns the output to your shell.

An exec session connects your local shell to the container's shell and is the same as being logged on to the container.

Let's look at both, starting with remote command execution.

Run the following command from your local shell. It's asking the first container in the hello-pod Pod to run a ps command.

$ kubectl exec hello-pod -- ps
PID   USER     TIME  COMMAND
  1   root     0:00  node ./app.js
 17   root     0:00  ps

The container executed the ps command and displayed the result in your local terminal.

The format of the command is kubectl exec <pod> -- <command>, and you can execute any command installed in the container. By default, commands execute in the first container in a Pod, but you can override this with the --container flag.

Try running the following command:

$ kubectl exec hello-pod -- curl localhost:8080
OCI runtime exec failed: exec failed: unable to start container process:
exec: "curl": executable file not found in $PATH

This one failed because the curl command isn't installed in the container.

Let's use kubectl exec to get an interactive exec session to the same container. This works by connecting your terminal to the container's terminal, and it feels like you're logged on to the container.

Run the following command to create an exec session to the first container in the hello-pod Pod. Your shell prompt will change to indicate you're connected to the container's shell.

$ kubectl exec -it hello-pod -- sh
/#

The -it flag tells kubectl exec to make the session interactive by connecting your shell's STDIN and STDOUT streams to the STDIN and STDOUT of the first container in the Pod. The sh command starts a new shell process in the session, and your prompt will change to indicate you're now inside the container.

Run the following commands from within the exec session to install the curl binary and then execute a curl command:

/# apk add curl
fetch https://dl-cdn.alpinelinux.org/alpine/v3.18/main/x86_64/APKINDEX.tar.gz
fetch https://dl-cdn.alpinelinux.org/alpine/v3.18/community/x86_64/APKINDEX.tar.gz
(1/5) Installing ca-certificates (20230506-r0)
(2/5) Installing brotli-libs (1.0.9-r14)
(3/5) Installing libunistring (1.1-r1)
(4/5) Installing libidn2 (2.3.4-r1)
(5/5) Installing curl (8.1.2-r0)
OK: 12 MiB in 20 packages

/# curl localhost:8080
<html>
  <head>
    <title>Hello from Kubernetes!</title>
  </head>
  <body>
    <h1>Hello from Kubernetes Storage!</h1>
  </body>
</html>

Making changes like this to live Pods is an anti-pattern, as Pods are designed as immutable objects. However, it's OK for demonstration purposes like this.

Pod Hostnames

Pods get their names from their YAML file's metadata.name field, and Kubernetes uses this as the hostname for every container in the Pod.

If you're following along, you'll have a single Pod deployed called hello-pod. You deployed it from the following YAML file that sets the Pod name as hello-pod:

apiVersion: v1
kind: Pod
metadata:
  name: hello-pod    # Pod hostname - inherited by all containers
  labels:
    zone: prod
    version: v1

Run the following command from inside your existing exec session to check the container's hostname. The command is case-sensitive.

/# env | grep HOSTNAME
HOSTNAME=hello-pod

As you can see, the container's hostname matches the name of the Pod. If it was a multi-container Pod, all containers would have the same hostname.

Because of this, you should ensure that Pod names are valid DNS names (a-z, 0-9, the minus and period signs).

Type exit to quit your exec session and return to your local terminal.

Check Pod Immutability

Pods are designed as immutable objects, meaning you shouldn't change them after deployment.

Immutability applies at two levels:

Object immutability (the Pod)
App immutability (containers)

Kubernetes handles object immutability by preventing changes to a running Pod's configuration. However, Kubernetes can't always prevent you from changing the app and filesystem in containers. You're responsible for ensuring containers and their apps are stateless and immutable.

The following example uses kubectl edit to edit a live Pod object. Try changing any of these attributes:

Pod name
Container name
Container port
Resource requests and limits

You'll find that Kubernetes prevents most changes to running Pods, enforcing immutability at the object level.

Resource Requests and Resource Limits

Kubernetes lets you specify resource requests and resource limits for each container in a Pod.

Requests are minimum values
Limits are maximum values

Consider the following snippet from a Pod YAML:

resources:
  requests:
    cpu: 0.5
    memory: 256Mi
  limits:
    cpu: 1.0
    memory: 512Mi

This container needs a minimum of 256Mi of memory and half a CPU. The scheduler reads this and assigns it to a node with enough resources. If it can't find a suitable node, it marks the Pod as pending, and the cluster autoscaler will attempt to provision a new cluster node.

Assuming the scheduler finds a suitable node, it assigns the Pod to the node, and the kubelet downloads the Pod spec and asks the local runtime to start it. As part of the process, the kubelet reserves the requested CPU and memory, guaranteeing the resources will be there when needed. It also sets a cap on resource usage based on each container's resource limits. In this example, it sets a cap of one CPU and 512Mi of memory. Most runtimes will also enforce resource limits, but how each runtime implements this can vary.

While a container executes, it is guaranteed its minimum requirements (requests). However, it's allowed to use more if the node has additional available resources, but it's never allowed to use more than what you specify in its limits.

For multi-container Pods, the scheduler combines the requests for all containers and finds a node with enough resources to satisfy the full Pod.

Note: If you've been following the examples closely, you'll have noticed that the pod.yml you used to deploy the hello-pod only specified resource limits — it didn't specify resource requests. However, some command outputs have shown both limits and requests. This is because Kubernetes automatically sets requests to match limits if you only specify limits.

Multi-container Pod Example – Init Container

apiVersion: v1
kind: Pod
metadata:
  name: initpod
  labels:
    app: initializer
spec:
  initContainers:
  - name: init-ctr
    image: busybox:1.28.4
    command: ['sh', '-c', 'until nslookup k8sbook; do echo waiting for k8sbook service; sleep 1; done; echo Service found!']
  containers:
  - name: web-ctr
    image: nigelpoulton/k8sbook:1.0
    ports:
    - containerPort: 8080

Defining a container under the spec.initContainers block makes it an init container that Kubernetes guarantees will run and complete before regular containers.

Regular app containers are defined under the spec.containers block and will not start until all init containers successfully complete.

This example has a single init container called init-ctr and a single app container called web-ctr. The init container runs a loop looking for a Kubernetes Service called k8sbook. As soon as you create the Service, the init container will get a response and exit. This allows the main container to start. You'll learn about Services in a future chapter.

Deploy the multi-container Pod with the following command and then run a kubectl get pods with the --watch flag to see if it comes up.

$ kubectl apply -f initpod.yml
pod/initpod created

$ kubectl get pods --watch
NAME      READY   STATUS     RESTARTS   AGE
initpod   0/1     Init:0/1   0          6s

The Init:0/1 status tells you that the init container is still running, meaning the main container hasn't started yet. If you run a kubectl describe command, you'll see the overall Pod status is Pending.

$ kubectl describe pod initpod
Name:         initpod
Namespace:    default
Status:       Pending
Init Containers:
  init-ctr:
    State:          Running
      Started:      Thu, 04 Jan 2024 10:15:32 +0000
    Ready:          False
Containers:
  web-ctr:
    State:          Waiting
      Reason:       PodInitializing
Events:
  Type    Reason     Age   From               Message
  ----    ------     ----  ----               -------
  Normal  Scheduled  45s   default-scheduler  Successfully assigned default/initpod to node-1
  Normal  Pulling    44s   kubelet            Pulling image "busybox:1.28.4"
  Normal  Pulled     42s   kubelet            Successfully pulled image "busybox:1.28.4"
  Normal  Created    42s   kubelet            Created container init-ctr
  Normal  Started    42s   kubelet            Started container init-ctr

The Pod will remain in this phase until you create a Service called k8sbook. Run the following commands to create the Service and re-check the Pod status.

$ kubectl apply -f initsvc.yml
service/k8sbook created

$ kubectl get pods --watch
NAME      READY   STATUS              RESTARTS   AGE
initpod   0/1     Init:0/1            0          15s
initpod   0/1     PodInitializing     0          3m39s
initpod   1/1     Running             0          3m57s

The init container completes as soon as the Service appears, and the main application container starts. Give it a few seconds to fully start.

If you run another kubectl describe against the initpod Pod, you'll see the init container is in the terminated state because it completed successfully (exit code 0).

Multi-container Pod Example – Sidecar Container

Sidecar containers run alongside the main application container for the entire lifecycle of the Pod. We currently define them as regular containers under the spec.containers section of the Pod YAML, and their job is to augment the main application container or provide a secondary support service.

The following YAML file defines a multi-container Pod with both containers mounting the same shared volume. It's conventional to list the main app container as the first container and sidecars after it.

apiVersion: v1
kind: Pod
metadata:
  name: sidecar-pod
  labels:
    app: webserver
spec:
  containers:
  - name: ctr-web                              # Main application container
    image: nginx:1.21
    ports:
    - containerPort: 80
    volumeMounts:
    - name: html
      mountPath: /usr/share/nginx/html
  - name: ctr-sync                             # Sidecar container
    image: image-name
    volumeMounts:
    - name: html
      mountPath: /tmp/git
    env:
    - name: GIT_SYNC_REPO
      value: "...."
    - name: GIT_SYNC_BRANCH
      value: "main"
    - name: GIT_SYNC_DEST
      value: "html"
    - name: GIT_SYNC_WAIT
      value: "60"
  volumes:
  - name: html
    emptyDir: {}

The main app container is called ctr-web. It's based on an NGINX image and serves a static web page loaded from the shared html volume.

The second container is called ctr-sync and is the sidecar. It watches a GitHub repo and syncs changes into the same shared html volume.

When the contents of the GitHub repo change, the sidecar copies the updates to the shared volume, where the app container notices and serves an updated version of the web page.