Squadcast Community for Squadcast

Posted on Oct 3, 2022 • Edited on Jul 10, 2024 • Originally published at squadcast.com

Kubernetes Health Check Using Probes

Introduction

Kubernetes is an open source container orchestration platform that significantly simplifies an application's creation and management. Distributed systems like Kubernetes can be hard to manage, as they involve many moving parts and all of them must work for the system to function. Even if a small part breaks, it needs to be detected, routed and fixed. These actions also need to be automated. Kubernetes allows us to do that with the help of readiness and liveness probes. In this blog, we will discuss these probes in detail. But before that, let’s first discuss health checks.

What is a Health Check?

Health checks are a simple way to let the system know whether an instance of your app is working. If the instance of your app is not working, the other services should not access it or send requests to it. Instead, requests should be sent to another instance that is ready, or you should retry sending requests.

The system should be able to bring your app to a healthy state. By default, Kubernetes will start sending traffic to the pod when all the containers inside the pod have started. Kubernetes will restart containers when they crash. This default behavior should be enough to get started. Making deployments more robust becomes relatively straightforward as Kubernetes helps create custom health checks. But before we do that, let's discuss the pod life cycle.

Pod lifecycle

A Kubernetes pod follows a defined life cycle. These are the different phases:

When the pod is first created, it starts with a pending phase. The scheduler tries to figure out where to place the pod. If the scheduler can’t find the node to place the pod, it will remain pending (To check why the pod is in pending state run ‘kubectl describe pod ’ command).
Once the pod is scheduled, it goes to the container creating phase, where the images required for the application are pulled, and the container starts.
Once the containers are in the pod, it moves to the running phase, where it continues until the program is completed successfully or terminated.

To check the status of the pod, run ‘kubectl get pod’ command and check the STATUS column. As you can see, in this case all the pods are in running state. Also, the READY column states the pod is ready to accept user traffic.

# kubectl get pod   
NAME                        READY   STATUS    RESTARTS   AGE
my-nginx-6b74b79f57-fldq6   1/1     Running   0          20s
my-nginx-6b74b79f57-n67wp   1/1     Running   0          20s
my-nginx-6b74b79f57-r6pcq   1/1     Running   0          20s

Different Types of Probes in Kubernetes

Kubernetes gives you the following types of health checks:

Readiness probes: This probe will tell you when your app is ready to serve traffic. Kubernetes will ensure the readiness probe passes before allowing a service to send traffic to the pod. If the readiness probe fails, Kubernetes will not send the traffic to the pod until it passes.
Liveness probes: Liveness probes will let Kubernetes know whether your app is healthy. If your app is healthy, Kubernetes will not interfere with pod functioning, but if it is unhealthy, Kubernetes will destroy the pod and start a new one to replace it.

To understand this further, let's take an example of a real-world scenario. You have an application that needs some time to warm up or download the application content from some external source like GitHub. Your application shouldn't receive traffic until it's fully ready. By default, Kubernetes will start sending traffic as soon as the process inside the container starts. Using the readiness probe, Kubernetes will wait until the app has fully started before it allows the service to send traffic to the new copy.

Let's take another scenario where your application crashes due to a bug in code (maybe an edge case), and it hangs indefinitely and stops serving requests. Because your process continues to run by default, Kubernetes will send traffic to the broken pod. Using the liveness probes, Kubernetes will detect the app is no longer serving requests and restart the malfunctioning pod by default.

With the theory part done, let us see how to define the probes. There are three types of probes:

HTTP
TCP
Command

Note: You have an option to start by defining either the readiness or liveness probes, as the implementation for both requires a similar template. For example, if we first define livenessProbe, we can use it to define readinessProbe or vice-versa.

HTTP probes (httpGet): This is the most common probe type. Even if your app isn’t an HTTP server, you can usually create a lightweight HTTP server inside your app to respond to the liveness probe. Kubernetes will ping a path (for example /healthz) at a given port (8080 in this example). If it gets an HTTP response in the 200 or 300 range, it will be marked as healthy. (For more information regarding HTTP response code, refer to this link). Otherwise, it will be marked as unhealthy. Here is how you can define HTTP livelinessProbe:

livenessProbe:
      httpGet:
        path: /healthz
        port: 8080

HTTP readiness probe is defined just like the HTTP livelinessProbe, you just have to replace liveness with readiness.

readinessProbe:
      httpGet:
        path: /healthz
        port: 8080

TCP probes (tcpSocket): With TCP probes, Kubernetes will try to establish a TCP connection on the specified port (for example, port 8080 in the below example). If it can establish a connection, the container is considered healthy. If it can't, it's considered a failure. These probes will be handy where HTTP or command probes don't work well. For example, the FTP service will be able to use this type of probe.

readinessProbe:
      tcpSocket:
        port: 8080

Command probes (exec command): Kubernetes will run a command inside your container in the case of command probes. If the command returns an exit code zero, the container will be marked as healthy. Otherwise, it will be marked as unhealthy. This type of probe is useful when you can’t or don’t want to run an HTTP server, but you can run a command that will check whether your app is healthy. In the example below, we check whether the file /tmp/healthy exists, and if the command returns an exit code zero, the container will be marked as healthy; otherwise, it will be marked as unhealthy.

livenessProbe:
      exec:
        command:
        - cat
        - /tmp/healthy

Probes can be configured in many ways based on how often they need to run, the success and failure thresholds, and how long to wait for responses.

initialDelaySeconds (default value 0): If you know your application needs n seconds (for example, 30 seconds) to warm up, you can add delays in seconds until the first check is executed by using initialDelaySeconds.
periodSeconds (default value 10): If you want to specify how frequently you execute a check, you can define that using periodSeconds.
timeoutSeconds (default value 1): This defines the maximum number of seconds until the probe operation is timed out.
successThreshold (default value 1): This is the number of attempts until the probe is considered successful after the failure.
failureThreshold (default value 3): In case of probe failure, Kubernetes makes multiple attempts before the probe is marked as failed.

Note: By default, the probe will stop if the application is not ready after three attempts. In case of a liveness probe, it will restart the container. In the case of a readiness probe, it will mark pods as unhealthy.

For more information about probe configuration, refer to this link.

Let’s combine everything we have discussed so far. The key thing to note here is the use of readinessProbe with httpGet. The first check will be executed after 10 seconds, and then it will be repeated after every 5 seconds.

apiVersion: v1
kind: Pod
metadata:
  labels:
    run: nginx
  name: nginx
spec:
  containers:
  - image: nginx
    name: nginx
    readinessProbe:
      httpGet:
        path: /
        port: 80
      initialDelaySeconds: 10
      periodSeconds: 5

Use the ‘kubectl create’ command to create a pod and specify the ‘yaml manifest’ file with ‘-f’ flag. You can give any name to the file, but it should end with a ‘.yaml’ extension.

kubectl create -f readinessprobe.yaml
pod/nginx created

If you check the pod's status now, it should show the status as Running(under STATUS column), but if you check the READY column, it will still show 0/1, which means it's not ready to accept a new connection.

kubectl get pod
NAME    READY   STATUS              RESTARTS   AGE
nginx   0/1     Running             0          16s

Verify the status after a few seconds as we set the initial delay of a second. By now the pod should be running.

kubectl get pod
NAME    READY   STATUS    RESTARTS   AGE
nginx   1/1     Running   0          28s

To check the detailed status of all the parameters (for example, initialDelaySeconds, periodSeconds, etc.) used when defining readiness probe, run the ‘kubectl describe’ command.

kubectl describe pod nginx |grep -i readiness
    Readiness:      http-get http://:80/ delay=10s timeout=1s period=5s #success=1 #failure=3

Let's further reinforce the concept of liveness and readiness probe with the help of an example. First, let's start with a liveness probe. In the below example, we are executing a command, ‘touch healthy; sleep 20; rm -rf healthy; sleep 600’.

Above, we have used touch command to create a file named ‘healthy’. This file will exist in the container for the first 20 seconds, then it will be removed by using the ‘rm -rf’ command. Lastly, the container will sleep for 600 seconds.

Then we defined the liveness probe. It first checks whether the file exists using the ‘cat healthy’ command. It does that with an initial delay of 5 seconds. We further define the parameter 'periodSeconds' which performs a liveness probe every 5 seconds. Once we delete the file, after 20 seconds the probe will be in a failed state.

apiVersion: v1
kind: Pod
metadata:
  labels:
  name: liveness-probe-exec
spec:
  containers:
  - name: liveness-probe
    image: busybox
    args:
    - /bin/sh
    - -c
    - touch healthy; sleep 20; rm -rf healthy; sleep 600
    livenessProbe:
      exec:
        command:
        - cat
        - healthy


      initialDelaySeconds: 5
      periodSeconds: 5

To create a pod, store the above code in a file that ends with ‘.yaml’ (for example, ‘liveness-probe.yaml’) and execute the ‘kubectl create’ command with ‘-f ’, which will create the pod.

# kubectl create -f liveness-probe.yaml 
pod/liveness-probe-exec created

Run the ‘kubectl get events’ command, and you will see that the liveness probe has failed, and the container has been killed and restarted.

54s         Normal    Scheduled                 pod/liveness-probe-exec   Successfully assigned default/liveness-probe-exec to controlplane
53s         Normal    Pulling                   pod/liveness-probe-exec   Pulling image "busybox"
52s         Normal    Pulled                    pod/liveness-probe-exec   Successfully pulled image "busybox" in 384.330188ms
52s         Normal    Created                   pod/liveness-probe-exec   Created container liveness-probe
52s         Normal    Started                   pod/liveness-probe-exec   Started container liveness-probe
18s         Warning   Unhealthy                 pod/liveness-probe-exec   Liveness probe failed: cat: can't open 'healthy': No such file or directory
18s         Normal    Killing                   pod/liveness-probe-exec   Container liveness-probe failed liveness probe, will be restarted

You can also verify it by using the ‘kubectl get pods’ command, and as you can see in the restart column, the container is restarted once.

# kubectl get pods
NAME                  READY   STATUS    RESTARTS   AGE
liveness-probe-exec   1/1     Running   1          24s

Now that you understand how the liveness probe works, let's understand how the readiness probe works by tweaking the above example to define it as a readiness probe. In the example below, we execute a command inside the container (sleep 20; touch healthy; sleep 600), which first sleeps for 20 seconds, creates a file, and finally sleeps for 600 seconds. As the initial delay is set to 15 seconds, the first check is executed with a delay of 15 seconds.

apiVersion: v1
kind: Pod
metadata:
  labels:
  name: readiness-probe-exec
spec:
  containers:
  - name: readiness-probe
    image: busybox
    args:
    - /bin/sh
    - -c
    - sleep 20;touch healthy;sleep 600
    readinessProbe:
      exec:
        command:
        - cat
        - healthy
      initialDelaySeconds: 15
      periodSeconds: 5

To create a pod, store the above code in a file that ends with ‘.yaml’, and execute the ‘kubectl create’ command, which will create the pod.

# kubectl create -f readiness-probe.yaml 
pod/readiness-probe-exec created

If you execute the ‘kubectl get events’ here, you can see the probe failed, as the file is not present.

63s         Normal    Scheduled                 pod/readiness-probe-exec   Successfully assigned default/readiness-probe-exec to controlplane
62s         Normal    Pulling                   pod/readiness-probe-exec   Pulling image "busybox"
62s         Normal    Pulled                    pod/readiness-probe-exec   Successfully pulled image "busybox" in 156.57701ms
61s         Normal    Created                   pod/readiness-probe-exec   Created container readiness-probe
61s         Normal    Started                   pod/readiness-probe-exec   Started container readiness-probe
42s         Warning   Unhealthy                 pod/readiness-probe-exec   Readiness probe failed: cat: can't open 'healthy': No such file or directory

If you check the status of the container initially, it is not in a ready state.

# kubectl get pods
NAME                   READY   STATUS    RESTARTS   AGE
readiness-probe-exec   0/1     Running   0          5s

But if you check it after 20 seconds, it should be in the running state.

# kubectl get pods
NAME                   READY   STATUS    RESTARTS   AGE
readiness-probe-exec   1/1     Running   0          27s

Conclusion

Health checks are required for any distributed system, and Kubernetes is no exception. Using health checks gives your Kubernetes services a solid foundation, better reliability, and higher uptime.

Plug: Use K8s with Squadcast for Faster Resolution

Squadcast is an incident management tool that’s purpose-built for site reliability engineering. It allows you to get rid of unwanted alerts, receive relevant notifications and integrate with popular ChatOps tools. You also can work in collaboration using virtual incident war rooms and use automation to eliminate toil.