DEV Community

Cover image for Rancher, Prometheus, and Alert Manager in 5 minutes
Max Espinoza
Max Espinoza

Posted on

Rancher, Prometheus, and Alert Manager in 5 minutes

I recently had take a look at how to setup monitoring and alerting for a Kubernetes cluster. I knew that Rancher offered monitoring that via Prometheus and Alert Manager, but never had the chance to sit down do its.

This is my attempt to explain the important bits to anyone looking to implement monitoring and alerting (without just throwing the docs back at you).

First, let's talk about the main players.

The Tooling Chain

Prometheus

Simply put Prometheus can be thought of as a cronjob with a database.

  • It will scrape endpoints for metrics at set intervals of time

  • It will run queries against itself and save those results for later use at set intervals of time.

Alert Manager

Alert Manager is a watcher of Prometheus alerts and a notifer with support for tools like xMatters, Slack, PagerDuty, Etc.

  • It will look at alerts and notify based on how you configure it

  • You can slience triggered alerts for periods of time

  • For more advance use cases you can de-duplicate alerts from multiple sources

Rancher

It's what Ubuntu is to Linux. For this post, you can think of it as the thing that has already installed Alert Manager and Prometheus on your cluster. Also, it's got a nice UI for you configure those tools without writing the underlying Kubernetes CRDs.

What's neat about Rancher is that when monitoring is enabled, then Prometheus is already scraping your cluster for a wide variety of metrics you can set notifications on. However, if you are not using Rancher then be sure configure Prometheus to scrape targets for the metrics you'd like to notify on. I won't talk about this here, but a Google search will turn out a lot of resources on how configure Prometheus targets.

Potentially unfamiliar terms:

  • Scraping: Polling and collecting data exposed at endpoint

  • Targets: The endpoints Prometheus is scraping

  • CRD: Custom Resource Definitions -- Custom Kuberenetes API Objects that is used to configure something on the cluster

Prometheus hands-off alerting

Prometheus offers alerting, in much the same way that books offer knowledge. It's there but it's on you read and act on it. As mentioned before Prometheus can be configured to query against itself (at set intervals of time) and record those results. Now, that's as far as Prometheus goes in regards to alerting. It's just making notes of what alerts were triggered.


Above: How Prometheus operates glossing over some details

How exactly do you tell Prometheus what to make note of then? PrometheusRules, that's how. PrometheusRules are the CRD that hold the logic of what [with Alert Manager's help] to alert on.


Above: Basic pieces of PrometheusRule.

alert: ApiAppPodCrashing
annotations:
  message: Container {{ $labels.container }} in pod {{ $labels.pod }} in {{$labels.namespace}} is crashing perpetually.
  recipient: Infrastructure Team
expr: sum by(namespace) (kube_pod_container_status_last_terminated_reason{reason="CrashLoopBackoff", namespace!~"cattle-.*system|fleet-system|node-logging|cis-operator-system|ingress-nginx|kube-system"}) > 0
for: 0s
labels:
  priority: critical
  team: infra
Enter fullscreen mode Exit fullscreen mode

Above: More detailed view of the alert YAML with a PromQL expression to app crashing.

Alert Manager doing the actual alerting

Alert Manager is the tooling that reads Promethus recorded alerts and pushes those alerts to other notification tools like Slack/xMatters.

What you need to know about Alert Manager is that it's configuration is done through creation of Routes and Receivers. Receivers can be thought of as endpoints to send alerts to. These can be slack rooms, xMatters integration, and a bunch more. Routes house the logic of "if you see an alert with these properties, then send these alert records to these recievers."


Above: How routes, receivers, and alerts relate to one another

Whats important to note here is that we can use labels to route the alerts to different places. In our setup, we use this to route alerts to various slack rooms (depending on app) and to trigger xMatters notifications on priority: critical.

Configuration all the way down

Monitoring and alerting is configuration files all the way down. These are the configurations you'd likely be working with.

There is more than one way to actually update/create these configurations. You can directly apply the PrometheusRule CRDs to the cluster and use the Alert Manager CLI to make the Routes/Receivers. But if you have Rancher, life gets a lot easier.

If you go the Cluster Explorer > Monitoring tab you'll find these configuration can all be setup from within the Rancher UI.


Above: View of monitoring page in Rancher.

In short

I've scratch the surface of what you can do and glossed over a lot of detail that you'll probably find better docs for, but in essence if you have the right tooling setup (read: Rancher Monitoring), you're only 3 configuration files away from having monitoring/alerting on services in your Kubernetes cluster.

Top comments (0)