DEV Community

Aleksi Waldén for Polar Squad

Posted on • Updated on

Prometheus Observability Platform: Alert routing

Alertmanager is a component usually bundled with Prometheus to handle routing the alerts to receivers such as Slack, e-mail, and PagerDuty. It uses a routing tree to send alerts to one or multiple receivers.

Routes define which receivers each alert should be sent to. You can define rules for the routes. The rules are evaluated from top to bottom, and alerts are sent to matching receivers. Usually, the match block is used to match the label name and value for a certain receiver. Notification integrations are configured for each receiver. There are multiple different options available, such as email_configs, slack_configs, and webhook_configs.

Alertmanager has a web UI that can be used to view current alerts and silence them if needed.

With a platform setup, we usually don’t want to use multiple Alertmanagers, so we disable the provisioning of additional alertmanagers for Prometheus deployments that include them automatically. Instead, we use one centralised Alertmanager inside, for example, a Kubernetes cluster which is aimed at monitoring platform usage.

Demo

This example assumes that you have completed the following steps, as the components from those are needed:

Prerequisites:

Now that we have an alert defined and deployed to vmalert we can add Alertmanager to our platform. Because we are creating this with a platform aspect in mind, we will install Alertmanager as a separate resource, and not as a part of the kube-platform-stack. We will use a tool called amtool which is bundled with alertmanager to run unit tests on our alert rules

We can install the Alertmanager with the following Helm chart:

helm install alertmanager prometheus-community/alertmanager --create-namespace --namespace alertmanager
Enter fullscreen mode Exit fullscreen mode

We can now port-forward the alertmanager service to access the alertmanager web UI from http://localhost:9090:

kubectl port-forward -n alertmanager services/alertmanager 9090:9093
Enter fullscreen mode Exit fullscreen mode

To trigger a test alert, we can use the following command from another terminal tab while keeping the port-forwarding on:

curl -H "Content-Type: application/json" -d '[{"labels":{"alertname":"TestAlert"}}]' localhost:9090/api/v1/alerts
Enter fullscreen mode Exit fullscreen mode

We can now use amtool to list the currently firing alerts:

amtool alert query --alertmanager.url=http://localhost:9090
---
Alertname   Starts At                Summary  State   
TestAlert   2023-07-07 07:23:55 UTC           active 
Enter fullscreen mode Exit fullscreen mode

Let's add a test receiver and routing for it. Below is an example of the configuration we want to pass to Alertmanager in Helm values format.

config:
  receivers:
    - name: default-receiver
    - name: test-team-receiver

  route:
    receiver: 'default-receiver'
    group_wait: 30s
    group_interval: 5m
    repeat_interval: 4h
    routes:
      - receiver: 'test-team-receiver'
        matchers:
        - team="test-team"
Enter fullscreen mode Exit fullscreen mode

I have converted the above into a json one-liner so we can pass it into Helm without having to create an intermediate file.

helm upgrade alertmanager prometheus-community/alertmanager --namespace alertmanager --set-json 'config.receivers=[{"name":"default-receiver"},{"name":"test-team-receiver"}]' --set-json 'config.route={"receiver":"default-receiver","group_wait":"30s","group_interval":"5m","repeat_interval":"4h","routes":[{"receiver":"test-team-receiver","matchers":["team=\"test-team\""]}]}'
Enter fullscreen mode Exit fullscreen mode

We can now use amtool to test that an alert that has the label team=test-team gets routed to the test-team-receiver:

amtool config routes test --alertmanager.url=http://localhost:9090 team=test-team
---
test-team-receiver
Enter fullscreen mode Exit fullscreen mode
amtool config routes test --alertmanager.url=http://localhost:9090 team=test     
---
default-receiver
Enter fullscreen mode Exit fullscreen mode

We have now set up an Alertmanager which can route alerts depending on team label value.

Next, we need to update vmalert to route alerts into the Alertmanager using the cluster local address of the alertmanager service:

helm upgrade vmalert vm/victoria-metrics-alert --namespace victoriametrics --reuse-values --set server.notifier.alertmanager.url="http://alertmanager.alertmanager.svc.cluster.local:9093"
Enter fullscreen mode Exit fullscreen mode

Now we can run a pod that will be crashing to increment the kube_pod_container_status_restarts_total metric by creating a pod that has a typo in the sleep command:

kubectl run crashpod --image busybox:latest --command -- slep 1d
Enter fullscreen mode Exit fullscreen mode

Next we port-forward the alertmanager service. We should see an alert in there when we navigate to http://localhost:9090:

kubectl port-forward -n alertmanager services/alertmanager 9090:9093
Enter fullscreen mode Exit fullscreen mode

alertmanager

We have now achieved setting up Alertmanager as our tool for routing alerts from the vmalert component.

Next part: Prometheus Observability Platform: Handling multiple regions

Top comments (0)