Alertmanager is a component usually bundled with Prometheus to handle routing the alerts to receivers such as Slack, e-mail, and PagerDuty. It uses a routing tree to send alerts to one or multiple receivers.
Routes define which receivers each alert should be sent to. You can define rules for the routes. The rules are evaluated from top to bottom, and alerts are sent to matching receivers. Usually, the match block is used to match the label name and value for a certain receiver. Notification integrations are configured for each receiver. There are multiple different options available, such as email_configs
, slack_configs
, and webhook_configs
.
Alertmanager has a web UI that can be used to view current alerts and silence them if needed.
With a platform setup, we usually don’t want to use multiple Alertmanagers, so we disable the provisioning of additional alertmanagers for Prometheus deployments that include them automatically. Instead, we use one centralised Alertmanager inside, for example, a Kubernetes cluster which is aimed at monitoring platform usage.
Demo
This example assumes that you have completed the following steps, as the components from those are needed:
- Prometheus Observability Platform: Prometheus
- Prometheus Observability Platform: Long-term storage
- Prometheus Observability Platform: Alerts
Prerequisites:
Now that we have an alert defined and deployed to vmalert we can add Alertmanager to our platform. Because we are creating this with a platform aspect in mind, we will install Alertmanager as a separate resource, and not as a part of the kube-platform-stack. We will use a tool called amtool which is bundled with alertmanager to run unit tests on our alert rules
We can install the Alertmanager with the following Helm chart:
helm install alertmanager prometheus-community/alertmanager --create-namespace --namespace alertmanager
We can now port-forward the alertmanager service to access the alertmanager web UI from http://localhost:9090:
kubectl port-forward -n alertmanager services/alertmanager 9090:9093
To trigger a test alert, we can use the following command from another terminal tab while keeping the port-forwarding on:
curl -H "Content-Type: application/json" -d '[{"labels":{"alertname":"TestAlert"}}]' localhost:9090/api/v1/alerts
We can now use amtool to list the currently firing alerts:
amtool alert query --alertmanager.url=http://localhost:9090
---
Alertname Starts At Summary State
TestAlert 2023-07-07 07:23:55 UTC active
Let's add a test receiver and routing for it. Below is an example of the configuration we want to pass to Alertmanager in Helm values format.
config:
receivers:
- name: default-receiver
- name: test-team-receiver
route:
receiver: 'default-receiver'
group_wait: 30s
group_interval: 5m
repeat_interval: 4h
routes:
- receiver: 'test-team-receiver'
matchers:
- team="test-team"
I have converted the above into a json one-liner so we can pass it into Helm without having to create an intermediate file.
helm upgrade alertmanager prometheus-community/alertmanager --namespace alertmanager --set-json 'config.receivers=[{"name":"default-receiver"},{"name":"test-team-receiver"}]' --set-json 'config.route={"receiver":"default-receiver","group_wait":"30s","group_interval":"5m","repeat_interval":"4h","routes":[{"receiver":"test-team-receiver","matchers":["team=\"test-team\""]}]}'
We can now use amtool to test that an alert that has the label team=test-team
gets routed to the test-team-receiver:
amtool config routes test --alertmanager.url=http://localhost:9090 team=test-team
---
test-team-receiver
amtool config routes test --alertmanager.url=http://localhost:9090 team=test
---
default-receiver
We have now set up an Alertmanager which can route alerts depending on team label value.
Next, we need to update vmalert to route alerts into the Alertmanager using the cluster local address of the alertmanager service:
helm upgrade vmalert vm/victoria-metrics-alert --namespace victoriametrics --reuse-values --set server.notifier.alertmanager.url="http://alertmanager.alertmanager.svc.cluster.local:9093"
Now we can run a pod that will be crashing to increment the kube_pod_container_status_restarts_total
metric by creating a pod that has a typo in the sleep command:
kubectl run crashpod --image busybox:latest --command -- slep 1d
Next we port-forward the alertmanager service. We should see an alert in there when we navigate to http://localhost:9090:
kubectl port-forward -n alertmanager services/alertmanager 9090:9093
We have now achieved setting up Alertmanager as our tool for routing alerts from the vmalert component.
Next part: Prometheus Observability Platform: Handling multiple regions
Top comments (0)