loading...

How to health check your TopShelf service in Kubernetes

wolnikmarcin profile image Marcin Wolnik ・4 min read

Recently I worked on migrating a legacy TopShelf service from VM to Kubernetes.

The service was pretty unstable and from time to time transitioned into a broken state and had to be restarted manually, which was less than ideal.

However, moving the service to Kubernetes won't solve the issue in itself, because no matter how many pods would be defined in Deployment they would eventually fail one by one due to deadlock, infinite loop, etc and Kubernetes would never know since the processes would appear as still happily running.

Health checks in Kubernetes

To remedy such situations we can use one of the Kubernetes's health checks the liveness probe which will signal Kubernetes if your service is dead or alive. The other one is the readiness probe which tells the Kubernetes when the container is ready to accept the traffic. In this post, I'll focus on the former.

There are 3 types of liveness probes:

  • HTTP
  • TCP
  • Command

I'll show how to set up and use the most popular one - HTTP type.

What does it do?

The probe performs an HTTP GET request on the pod's IP address, a port, and path specified in the pod's specification. If the probe receives 2xx or 3xx HTTP response code, then the pod is considered as healthy. If the server returns an error response code >= 400 or if it doesn’t respond at all, the probe is considered a failure and the pod will be restarted.

Self-host TopShelf

Since TopShelf by default is not an HTTP Server, we have to self-host it using OWIN (Open Web Interface for .NET).

To do so I've used TopShelf.Owin NuGet package.

Start by installing the package:

Install-Package Topshelf.Owin

Then change your startup code and configure OWIN endpoint to run on 8080 port:

HostFactory.Run(configure =>
{
    configure.Service<MyHeadacheService>(service =>
    {
        service.ConstructUsing(s => new MyHeadacheService());
        service.WhenStarted(s => s.Start());
        service.WhenStopped(s => s.Stop());

        service.OwinEndpoint(app =>
        {
            app.Domain = "*";
            app.Port = 8080;
        });
    });
});

Now, create a new file HealthcheckController and define the API Controller for health checks. I'm using Route("") which means the health check endpoint will run at localhost:8080.

The health check endpoint will assume the healthiness of our service as long as a static property HealthCheckStatus.IsHealthy is true.

One issue with this kind of check is that it may not verify the responsiveness of the service thus it's important the check performs its task in a similar way as the dependent services would do. On the other hand, it shouldn't check its dependencies since it's the readiness probe task as I've mentioned earlier.

For testing purposes, we will add an additional endpoint which will put our service into an artificial broken state so the health check will return 500 status code instead of 200. Please note I'm using a Serilog logger that will output to console every time the endpoint is hit.

public class HealthcheckController : ApiController
{
    [HttpGet, Route("")]
    public IHttpActionResult Healthcheck()
    {
        Log.Information("Healthcheck invoked using User-Agent: {UserAgent}", Request.Headers.UserAgent?.ToString());
        return HealthCheckStatus.IsHealthy ? Ok() : (IHttpActionResult) InternalServerError(new Exception("I'm not healthy. Restart me."));
    }

    [HttpGet, Route("break")]
    public IHttpActionResult Break()
    {
        Log.Information("Ups!");
        HealthCheckStatus.IsHealthy = false;
        return Ok();
    }
}

public static class HealthCheckStatus
{
    public static bool IsHealthy { get; set; } = true;
}

Defining liveness probe

Our facelifted TopShelf service is now ready to serve Kubernetes probes. Now we have to let Kubernetes know where to health check our service.

In pod specification we add the section to tell Kubernetes to perform the liveness probe on pod's path / and port 8080:

livenessProbe:
  httpGet:
    path: /
    port: 8080
  initialDelaySeconds: 5
  periodSeconds: 5
  • initialDelaySeconds - This is the delay which tells kubelet (node agent) to wait for 5 seconds before performing the first probe
  • periodSeconds - specifies that the kubelet should perform a liveness probe every 5 seconds.

The default and minimal timeout for the probe request is 1 second and can be configured using timeoutSeconds.

To see other configuration options, have a look at the official docs.

Now we can deploy our pod(s) on a cluster and see the probes being sent to the pod.

kubectl logs -f podname

[20:25:41 INF] Healthcheck invoked using User-Agent: kube-probe/1.15

With our breaking endpoint (:8080/break/) we can now simulate an artificial broken state.

Let's shell into a running pod:

kubectl exec -it podname cmd

and trigger failure:

curl -v localhost:8080/break

Wait long enough so that the probe had a chance to perform and see the failed probe in pod's events:

kubectl describe pod podname

Events:
Type     Reason     Age                  From                     Message
----     ------     ----                 ----                     -------
Warning  Unhealthy  40s (x3 over 70s)    kubelet, *** Liveness probe failed: HTTP probe 
failed with statuscode: 500

Verify it has been restarted by listing the pod:

kubectl get pod podname

NAME      READY   STATUS    RESTARTS   AGE
podname   1/1     Running   1          1h

You can see the RESTARTS column shows just that.

Conclusion

By self-hosting a TopShelf service and adding a health check endpoint to it you can leverage Kubernetes liveness probes, which can greatly improve the robustness and resilience of your service.

Posted on by:

Discussion

markdown guide