Spotting Silent Pod Failures in Kubernetes with Grafana

#kubernetes #devops #programming #cloud

Unnoticed Pod Failures in Kubernetes

One of the critical issues in Kubernetes operations is the pod's deployment failures. Kubernetes pods can fail due to various reasons such as CPU constraints, memory constraints, Image pull errors, node failures etc.

The main problem is that these problems will have a negative impact on the applications in production, ultimately leading to a bad impression.

How to Spot Failures?

Discord is one of the primary communication channels for many teams. If Kubernetes cluster failures are reported on Discord, it will attract the attention of developers, who can then fix them immediately. Creating a pathway from Kubernetes clusters to Discord servers would allow addressing unnoticed failures.

Finding the Pathway

We explored various options for establishing a notification pathway from the Kubernetes cluster to the communication medium. There are multiple tools and products available for this, such as Botkube, Grafana and InfluxDB.

We chose Grafana over other options because it is an open-source analytics and monitoring platform. Grafana has an alert feature, a detailed dashboard for visualizing Kubernetes clusters, and the ability to customize alerts and set up thresholds. All of these features are available in the free version of Grafana.

Continue reading the full article here https://journal.hexmos.com/spotting-kube-failures/