Arseny Zinchenko

Posted on Feb 16, 2021 • Originally published at rtfm.co.ua on Jan 26, 2021

Prometheus: Alertmanager Web UI alerts Silence

#monitoring #devops

Active alerts sending frequency via Alertmanager is configured via the repeat_interval in the /etc/alertmanager/config.yml file.

We have this interval set to 15 minutes, and as result, we have notifications about alerts in our Slack each fifteen minutes.

Still, some alerts are such a “known issues”, when we already started the investigation or fixing it, but the alert is repeatedly sent to Slack.

To mute those alerts to prevent them to be sent over and over they can be disabled by marking them as “silenced”.

An alert can be silenced with the Web UI of the Alertmanager, see the documentation.

So, what we will do in this post:

update Alertmanager’s startup options to enable Web UI
update an NGINX virtualhost to get access to the Alertmanager’s Web UI
will check and configure Prometheus server to send alerts
will add a test alert to check how to Silence it

Alertmanager Web UI configuration

We have our Alertmanager running from a Docker Compose file, let’s add two parameters to the command field - a web.route-prefix which will specify a URI for the Alertmanager Web UI, and a web.external-url, to set a full URL.

This full URL will look like dev.monitor.example.com/alertmanager — add them:

...
  alertmanager:
    image: prom/alertmanager:v0.21.0
    networks:
      - prometheus
    ports:
      - 9093:9093
    volumes:
      - /etc/prometheus/alertmanager_config.yml:/etc/alertmanager/config.yml
    command:
      - '--config.file=/etc/alertmanager/config.yml'
      - '--web.route-prefix=/alertmanager'
      - '--web.external-url=https://dev.monitor.example.com/alertmanager'
...

Alertmanager is working in a Docker container and is accessible via localhost:9093 from the monitoring host:

root@monitoring-dev:/home/admin# docker ps | grep alert
24ae3babd644 prom/alertmanager:v0.21.0 “/bin/alertmanager -…” 3 seconds ago Up 1 second 0.0.0.0:9093->9093/tcp prometheus_alertmanager_1

In the NGINX’s virtualhost config add a new upstream with the Alertmanager's Docker container:

...
upstream alertmanager {
    server 127.0.0.1:9093;
}
...

Also, add a new location in this file which will proxy-pass all requests to the dev.monitor.example.com/alertmanager to this upstream:

...
    location /alertmanager {

        proxy_redirect off;            
        proxy_set_header Host $host;
        proxy_set_header X-Real-IP $remote_addr;
        proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
        proxy_pass [http://alertmanager$request_uri;](http://alertmanager%24request_uri;)
    }
...

Save and reload NGINX and Alertmanager.

Now, open the https://dev.monitor.example.com/alertmanager URL and you must see the Alertmanager Web UI:

Here are no alerts yet — wait for Prometheus to send new alerts.

Prometheus: “Error sending alert” err=”bad response status 404 Not Found”

After a new alert in the Prometheus server will appear you can see the following error in its log:

caller=notifier.go:527 component=notifier alertmanager=http://alertmanager:9093/api/v1/alerts count=3 msg=”Error sending alert” err=”bad response status 404 Not Found”

It happens because currently, we have the alertmanagers set as:

...
alerting:
  alertmanagers:
  - static_configs:
    - targets:
      - alertmanager:9093
...

So, need to add the URI of the Alertmanager by using the path_prefix setting:

...
alerting:
  alertmanagers:
  - path_prefix: "/alertmanager/"
    static_configs:
    - targets:
      - alertmanager:9093
...

Restart the Prometheus, and wait again for alerts:

At this time, you must see them in the Alertmanager Web UI too:

Alertmanager: an alert Silence

Now, let’s add a Silence for an alert to stop sending them.

For example, to disable re-sending of the alertname=”APIendpointProbeSuccessCritical”, click at the + button at the right side:

Then on the Silence button:

The alertname label was added to the silencing condition with the default rule of the 2 hours, add an author and description why it was silenced:

Click Create — and it’s done:

You can check this alert via API now:

root@monitoring-dev:/home/admin# curl -s [http://localhost:9093/alertmanager/api/v1/alerts](http://localhost:9093/alertmanager/api/v1/alerts) | jq ‘.data[1]’
{
“labels”: {
“alertname”: “APIendpointProbeSuccessCritical”,
“instance”: “http://push.example.com",
“job”: “blackbox”,
“monitor”: “monitoring-dev”,
“severity”: “critical”
},
“annotations”: {
“description”: “Cant access API endpoint http://push.example.com!",
“summary”: “API endpoint down!”
},
“startsAt”: “2020–12–30T11:25:25.953289015Z”,
“endsAt”: “2020–12–30T11:43:25.953289015Z”,
“generatorURL”: “https://dev.monitor.example.com/prometheus/graph?g0.expr=probe_success%7Binstance%21%3D%22https%3A%2F%2Fokta.example.com%22%2Cjob%3D%22blackbox%22%7D+%21%3D+1&g0.tab=1",
“status”: {
“state”: “suppressed”,
“silencedBy”: [
“ec11c989-f66e-448e-837c-d788c1db8aa4”
],
“inhibitedBy”: null
},
“receivers”: [
“critical”
],
“fingerprint”: “01e79a8dd541cf69”
}

So, this alert will not be sent to the Slack or wherever else because of the "state": "suppressed" field:

…
“status”: {
“state”: “suppressed”,
“silencedBy”: [
“ec11c989-f66e-448e-837c-d788c1db8aa4”
],…

Done.

Originally published at RTFM: Linux, DevOps and system administration.

DEV Community

Prometheus: Alertmanager Web UI alerts Silence

Alertmanager Web UI configuration

Prometheus: “Error sending alert” err=”bad response status 404 Not Found”

Alertmanager: an alert Silence

Top comments (0)