Levent Ogut for LoftLabs

Posted on Feb 15, 2021 • Originally published at loft.sh

Kubernetes Liveness Probes - Examples & Common Pitfalls

#docker #containers #kubernetes #devops

Kubernetes has disrupted traditional deployment methods and has become very popular. Although it is a great platform to deploy to, it brings complexity and challenges as well. Kubernetes manages nodes and workloads seamlessly, and one of the great features of this containerized deployment platform is that of self-healing. For self-healing on the container level, we need health checks called probes in Kubernetes unless we depend on exit codes.

Liveness probes check if the pod is healthy, and if the pod is deemed unhealthy, it will trigger a restart; this action is different than the action of Readiness Probes I discussed in my previous post.

Let's look at the components of the probes and dive into how to configure and troubleshoot Liveness Probes.

Probes

Probes are health checks that are executed by kubelet.

All probes have five parameters that are crucial to configure.

initialDelaySeconds: Time to wait after the container starts. (default: 0)
periodSeconds: Probe execution frequency (default: 10)
timeoutSeconds: Time to wait for the reply (default: 1)
successThreshold: Number of successful probe executions to mark the container healthy (default: 1)
failureThreshold: Number of failed probe executions to mark the container unhealthy (default: 3)

You need to analyze your application's behavior to set these probe parameters.

There are three types of probes:

Exec Probe

Exec probe executes a command inside the container without a shell. The command's exit status determines a healthy state - zero is healthy; anything else is unhealthy.

        livenessProbe:
          initialDelaySeconds: 1
          periodSeconds: 5
          timeoutSeconds: 1
          successThreshold: 1
          failureThreshold: 1
          exec:
            command:
            - cat
            - /etc/nginx/nginx.conf

TCP Probe

TCP probe checks if a TCP connection can be opened on the port specified. An open port is deemed a success, a closed port or reset is deemed unsuccessful.

        livenessProbe:
          initialDelaySeconds: 1
          periodSeconds: 5
          timeoutSeconds: 1
          successThreshold: 1
          failureThreshold: 1
          tcpSocket:
            host:
            port: 80

HTTP Probe

HTTP probe makes an HTTP call, and the status code determines the healthy state, between including 200 and excluding 400 is deemed success. Any status code apart from those mentioned is deemed unhealthy.

Here are HTTP Probes additional parameters to configure.

host: IP address to connect to (default: pod IP)
scheme: HTTP scheme (default: HTTP)
path: HTTP path to call to
httpHeaders: Any custom headers you want to send.
port: Connection port.

Tip: If Host header is required, then use httpHeader.

An example of an HTTP probe.

        livenessProbe:
          initialDelaySeconds: 1
          periodSeconds: 2
          timeoutSeconds: 1
          successThreshold: 1
          failureThreshold: 1
          httpGet:
            host:
            scheme: HTTP
            path: /
            httpHeaders:
            - name: Host
              value: myapplication1.com
            port: 80
          initialDelaySeconds: 5
          periodSeconds: 5

Liveness Probes in Kubernetes

Kubelet executes liveness probes to see if the pod needs a restart. For example, let's say we have a microservice written in Go, and this microservice has some bugs on some part of the code, which causes a freeze in runtime. To avoid hitting the bug, we can configure a liveness probe to determine if the microservice is in a frozen state. This way, the microservice container will be restarted and come to a pristine condition.

If your application gracefully exits when encountering such an issue, you won't necessarily need to configure liveness probes, but there can still be bugs you don't know about. The pod will be restarted as per the configured/default restart policy.

Common Pitfalls for Liveness Probes

Probes only determine the health by the probe answers, and they are not aware of the system dynamics of our microservice/application. If for any reason, probe replies are delayed for more than periodSeconds times failureThreshold microservice/application will be determined unhealthy, and a restart of the pod will be triggered. Hence it is important to configure the parameters per application behavior.

Cascading Failures

Similar to readiness probes, liveness probes also can create a cascading failure if you misconfigure it. If the health endpoint has external dependencies or any other condition that can prevent an answer to be delivered, it can create a cascading failure; therefore, it is of paramount importance to configure the probe considering this behavior.

Crash Loop

Let's assume that our application needs to read a large amount of data into cache once in a while; unresponsiveness at this time also might cause a false positive because the probe might fail. In this case, failure of the liveness probe will restart the container, and most probably, it will enter a continuous cycle of restarts. In such a scenario a Readiness Probe might be more suitable to use, the pod will only be removed from service to execute the maintenance tasks, and once it is ready to take traffic, it can start responding to the probes.

Liveness endpoints on our microservice -that probes will hit- should check absolute minimum requirements that shows the application is running. This way, liveness checks would succeed, and the pod will not be restarted, and we ensure the service traffic flows as it should.

Example: Sample Nginx Deployment

We will deploy Nginx as a sample app. below is the deployment and service configuration.

apiVersion: apps/v1
kind: Deployment
metadata:
  name: k8s-probes
  labels:
    app: nginx
spec:
  replicas: 1
  selector:
    matchLabels:
      app: nginx
  template:
    metadata:
      labels:
        app: nginx
    spec:
      containers:
      - name: nginx
        image: nginx
        ports:
        - containerPort: 80
        livenessProbe:
          initialDelaySeconds: 1
          periodSeconds: 2
          timeoutSeconds: 1
          successThreshold: 1
          failureThreshold: 1
          httpGet:
            host:
            scheme: HTTP
            path: /
            httpHeaders:
            - name: Host
              value: myapplication1.com
            port: 80

Write this configuration to a file called k8s-probes-deployment.yaml, and apply it with kubectl apply -f k8s-probes-deployment.yaml command.

apiVersion: v1
kind: Service
metadata:
  labels:
    app: nginx
  name: nginx
  namespace: default
spec:
  ports:
  - name: nginx-http-port
    port: 80
  selector:
    app: nginx
  sessionAffinity: None
  type: NodePort

Also, write this configuration to a file called k8s-probes-svc.yaml and apply it with this command:

kubectl apply -f k8s-probes-svc.yaml

Troubleshooting Liveness Probes

There is no specific endpoint for the Liveness Probe, and we should use kubectl describe pods <POD_NAME> command to see events and current status.

kubectl get pods

Here we can see our pod is in a running state, and it is ready to receive traffic.

NAME                         READY   STATUS    RESTARTS   AGE
k8s-probes-7d979f58c-vd2rv   1/1     Running   0          6s

Let's check the applied configuration.

 kubectl describe pods k8s-probes-7d979f58c-vd2rv | grep Liveness

Here we can see the parameters we have configured.

    Liveness:       http-get http://:80/ delay=5s timeout=1s period=5s #success=1 #failure=1

Let's look at the events:

Events:
  Type    Reason     Age   From               Message
  ----    ------     ----  ----               -------
  Normal  Scheduled  45s   default-scheduler  Successfully assigned default/k8s-probes-7d979f58c-vd2rv to k8s-probes
  Normal  Pulling    44s   kubelet            Pulling image "nginx"
  Normal  Pulled     43s   kubelet            Successfully pulled image "nginx" in 1.117208685s
  Normal  Created    43s   kubelet            Created container nginx
  Normal  Started    43s   kubelet            Started container nginx

As you can see, there is no indication of failure nor success; for success conditions, there will be no event recorded.

Now let's change livenessProbe.httpGet.path to "/do-not-exists," and take a look at the pod status.

kubectl get pods

After changing the path, liveness probes will fail, and the container will be restarted.

NAME                          READY   STATUS    RESTARTS   AGE
k8s-probes-595bcfdf57-428jt   1/1     Running   4          74s

We can see that container has been restarted four times.

Let's look at the events.

...
Events:
  Type     Reason     Age                From               Message
  ----     ------     ----               ----               -------
  Normal   Scheduled  53s                default-scheduler  Successfully assigned default/k8s-probes-595bcfdf57-428jt to k8s-probes
  Normal   Pulled     50s                kubelet            Successfully pulled image "nginx" in 1.078926208s
  Normal   Pulled     42s                kubelet            Successfully pulled image "nginx" in 978.826238ms
  Normal   Pulled     32s                kubelet            Successfully pulled image "nginx" in 971.627126ms
  Normal   Pulling    23s (x4 over 51s)  kubelet            Pulling image "nginx"
  Normal   Pulled     22s                kubelet            Successfully pulled image "nginx" in 985.155098ms
  Normal   Created    22s (x4 over 50s)  kubelet            Created container nginx
  Normal   Started    22s (x4 over 50s)  kubelet            Started container nginx
  Warning  Unhealthy  13s (x4 over 43s)  kubelet            Liveness probe failed: HTTP probe failed with statuscode: 404
  Normal   Killing    13s (x4 over 43s)  kubelet            Container nginx failed liveness probe, will be restarted
  Warning  BackOff    13s                kubelet            Back-off restarting failed container

As you can see above, "Liveness probe failed: HTTP probe failed with status code: 404", indicates probe failed with HTTP code 404; the status code will also aid in troubleshooting. Just after that, the kubelet informs us that it will restart the container.

Conclusion

Kubernetes liveness probes are life savers when our application is in an undetermined state; they return the application into a pristine condition by restarting the container. However, it is very important that they need to be configured correctly. Of course, there is no one correct way; it all depends on your application and how you want Kubernetes to act in each particular failure scenario. Set values accordingly and test the values through live case scenarios.

DEV Community