Kubernetes is an open source container orchestration framework that depends on a container runtime such as docker or containerd.
Every resource created exists in a namespace. By default, it is the namespace named
default. It is also possible to create new namespaces. If you have used
docker swarm before, you can think of a namespace as a
apiVersion: v1 kind: Namespace metadata: name: development labels: name: development
A pod is a set of one or more container sharing the same
In Kubernetes, container are not directly used. Instead, so-called pods are created, which control the container. That has the benefit of being able to keep the same pod alive while the container(s) may restart or change.
A pod is created from a template such as the one below. Usually, there is a single container per pod, but occasionally tightly coupled containers are put in the same pod.
template: spec: containers: - name: hello image: hello-world restartPolicy: OnFailure
Usually, pod templates are used by so
workload controller. They are responsible for managing the life cycle of a workload such as a
deployment. Below is a workload definition. The corresponding controller will make sure that the state of the system matches the definition.
apiVersion: batch/v1 kind: Job metadata: name: hello-world spec: template: spec: containers: - name: hello image: hello-world restartPolicy: OnFailure
Depending on the type of application and pod configuration, different workloads can be used. Below are the default workloads. It is also possible to use custom workloads with specialized behavior.
- Deployment and ReplicaSet (replacing the legacy resource ReplicationController). Deployment is a good fit for managing a stateless application workload on your cluster, where any Pod in the Deployment is interchangeable and can be replaced if needed.
- StatefulSet lets you run one or more related Pods that do track state somehow. For example, if your workload records data persistently, you can run a StatefulSet that matches each Pod with a PersistentVolume. Your code, running in the Pods for that StatefulSet, can replicate data to other Pods in the same StatefulSet to improve overall resilience.
- DaemonSet defines Pods that provide node-local facilities. These might be fundamental to the operation of your cluster, such as a networking helper tool, or be part of an add-on. Every time you add a node to your cluster that matches the specification in a DaemonSet, the control plane schedules a Pod for that DaemonSet onto the new node.
- Job and CronJob define tasks that run to completion and then stop. Jobs represent one-off tasks, whereas CronJobs recur according to a schedule.
Services are an abstraction on top of various backends such as podsets. They allow to change the backend while maintaining the service.
Services can be used for both, exposing certain workloads to the public web (
NodePort) and also for communication between different podsets(
ClusterIP) by providing a stable IP / DNS name. Usually, services match a set of Pods using labels and selectors. Oftentimes, an ingress controller is used to expose the services.
apiVersion: v1 kind: Service metadata: name: my-service spec: selector: app: MyApp ports: - protocol: TCP port: 80 targetPort: 9376
The default load balancing for services and pods is performed by the
kube-proxy and happens on layer 4.
The chosen proxy mode for the
kube-proxy determines the load balancing algorithm.
- userspace mode chooses a backend via a round-robin algorithm.
- iptables mode chooses a backend at random.
IPVS mode provides more options for balancing traffic to backend Pods; these are:
- rr: round-robin
- lc: least connection (smallest number of open connections)
- dh: destination hashing
- sh: source hashing
- sed: shortest expected delay
- nq: never queue
Kubernetes provides DNS. Depending on from where the request is placed, a DNS query yields different results. For example, resources in the same namespace can find each other without their fully qualified domain name (FQDN).
Container in the same pod can find each other via loopback interface.
A pods' DNS entry has the following form.
Pods created by a Deployment or DaemonSet exposed by a Service have the following DNS resolution.
Workloads such as deployments do not have a DNS name themselves. That's why most of the time they coupled with services if they need to be reachable for other services or external requests.
Services are resolved like the following.
Pods use their own namespace by default, this means that, for example, when only querying for
<service-name> it will resolve to the service bound to the same namespace as the pod making the DNS query.
This is possible because of the entry in each containers'
/ect/resolve.conf that has the following form.
search <namespace>.svc.cluster.local svc.cluster.local cluster.local
Note: that not all DNS related tools will search by default. For example, to get the service IP without
FQDN, from within a container, using dig, the
+search flag has to be used.
dig +search <service-name>
By default, Kubernetes uses the process ID of the container to determine of a pod is
ready to accept requests. As long as all specified container has a corresponding process ID (PID), the pod is considered healthy.
Custom healthprobes, can be specified. For example, a
ivenessProbe via HTTP.
apiVersion: apps/v1 kind: Deployment metadata: name: healthcheck-me spec: template: metadata: labels: app: healthcheck-me spec: containers: - name: healthcheck-me image: localhost/checkme ivenessProbe: httpGet: path: /healthz port: 80 initialDelaySeconds: 0 periodSeconds: 10 timeoutSeconds: 1 failureThreshold: 3
Volumes are a mechanism to decouple storage from the container. In Kubernetes, different types of volumes are supported and pods can use any number of type simultaneously. The main difference between these types is if they are ephemeral or persistent.
When a pod ceases to exist, Kubernetes destroys ephemeral volumes; however, Kubernetes does not destroy persistent volumes. For any kind of volume in a given pod, data is preserved across container restarts.
apiVersion: v1 kind: Pod metadata: name: configmap-pod spec: containers: - name: test image: busybox volumeMounts: - name: config-vol mountPath: /etc/config volumes: - name: config-vol configMap: name: log-config items: - key: log_level path: log_level
Below are some of the most common types of volumes. There more types available though, for example, the big cloud provider
Azure have their own volume type which provisions storage in the respect cloud platform.
|configMap (ephemeral)||A ConfigMap provides a way to inject configuration data into pods.|
|emptyDir (ephemeral)||An emptyDir volume is first created when a Pod is assigned to a node, and exists as long as that Pod is running on that node.|
|hostPath||A hostPath volume mounts a file or directory from the host node's filesystem into your Pod.|
|local||A local volume represents a mounted local storage device such as a disk, partition or directory.|
|nfs||An nfs volume allows an existing NFS (Network File System) share to be mounted into a Pod.|
|persistentVolumeClaim||A persistentVolumeClaim volume is used to mount a PersistentVolume into a Pod.|
|secret (epehemral)||A secret volume is used to pass sensitive information, such as passwords, to Pods.|