DEV Community

Cover image for Prometheus Alert-manager with Slack, PagerDuty, and Gmail
Huynh-Chinh
Huynh-Chinh

Posted on

 

Prometheus Alert-manager with Slack, PagerDuty, and Gmail

1. Introduction
The Alertmanager handles alerts sent by client applications such as the Prometheus server. It takes care of deduplicating, grouping, and routing them to the correct receiver integration such as email, PagerDuty, or OpsGenie. It also takes care of silencing and inhibition of alerts.

2. Run service alertmanager
Add service alertmanager to docker compose file and update the stack.

alertmanager:
    image: prom/alertmanager:v0.22.2
    container_name: alertmanager
    volumes:
    - /etc/alertmanager:/etc/alertmanager
    command:
    - '--config.file=/etc/alertmanager/config.yml'
    - '--storage.path=/alertmanager'
    ports:
    - 9093:9093
    restart: unless-stopped
Enter fullscreen mode Exit fullscreen mode

3. Create alerting rules in Prometheus
We move to the server subfolder and open the content in the code editor, then create a new rules file. In the rules.yml, you will specify the conditions when you would like to be alerted.

$sudo nano /etc/prometheus/rules.yml
Enter fullscreen mode Exit fullscreen mode

After you’ve decided on your alerting condition, you need to specify them in rules.yml. Its content is going to be the following:

  • Trigger an alert if any of the monitoring targets (node-exporter and cAdvisor) are down for more than 30 seconds.
groups:
- name: AllInstances
  rules:
  - alert: InstanceDown
    # Condition for alerting
    expr: up == 0
    for: 1m
    # Annotation - additional informational labels to store more information
    annotations:
      title: 'Instance {{ $labels.instance }} down'
      description: '{{ $labels.instance }} of job {{ $labels.job }} has been down for more than 1 minute.'
    # Labels - additional labels to be attached to the alert
    labels:
      severity: 'critical'
Enter fullscreen mode Exit fullscreen mode
  • Trigger an alert if the Docker host CPU is under high load for more than 30 seconds
- alert: high_cpu_load
    # Condition for alerting
    expr: node_load1 > 1.5
    for: 30s
    # Annotation - additional informational labels to store more information
    annotations:
      title: 'Server under high load'
    description: "Docker host is under high load, the avg load 1m is at {{ $value}}. Reported by instance {{ $labels.instance }} of job {{ $labels.job }}."
    # Labels - additional labels to be attached to the alert
    labels:
    severity: 'warning'
Enter fullscreen mode Exit fullscreen mode
  • Trigger an alert if the Docker host memory is almost full
- alert: high_memory_load
    # Condition for alerting
    expr: (sum(node_memory_MemTotal) - sum(node_memory_MemFree + node_memory_Buffers + node_memory_Cached) ) / sum(node_memory_MemTotal) * 100 > 85
    for: 30s
    # Annotation - additional informational labels to store more information
    annotations:
    title: 'Server memory is almost full'
    description: "Docker host memory usage is {{ humanize $value}}%. Reported by instance {{ $labels.instance }} of job {{ $labels.job }}."
    # Labels - additional labels to be attached to the alert
    labels:
      severity: 'warning'
Enter fullscreen mode Exit fullscreen mode
  • Trigger an alert if the Docker host storage is almost full
  - alert: high_storage_load
    # Condition for alerting
    expr: (node_filesystem_size{fstype="aufs"} - node_filesystem_free{fstype="aufs"}) / node_filesystem_size{fstype="aufs"}  * 100 > 85
    for: 30s
    # Annotation - additional informational labels to store more information
    annotations:
    title: 'Server storage is almost full'
    description: "Docker host storage usage is {{ humanize $value}}%. Reported by instance {{ $labels.instance }} of job {{ $labels.job }}."
    # Labels - additional labels to be attached to the alert
    labels:
    severity: 'warning'
Enter fullscreen mode Exit fullscreen mode
  • Trigger an alert if a container is down for more than 30 seconds
  - alert: redis_down
    # Condition for alerting
    expr: absent(container_memory_usage_bytes{name="redis"})
    for: 30s
    # Annotation - additional informational labels to store more information
    annotations:
    title: 'Redis down'
    description: "Redis container is down for more than 30 seconds."
    # Labels - additional labels to be attached to the alert
    labels:
    severity: 'critical'
Enter fullscreen mode Exit fullscreen mode
  • Trigger an alert if a container is using more than 10% of total CPU cores for more than 30 seconds
  - alert: redis_high_cpu
    # Condition for alerting
    expr: sum(rate(container_cpu_usage_seconds_total{name="redis"}[1m])) / count(node_cpu{mode="system"}) * 100 > 10
    for: 30s
    # Annotation - additional informational labels to store more information
    annotations:
    title: Redis high CPU usage'
    description: "Redis CPU usage is {{ humanize $value}}%."
    # Labels - additional labels to be attached to the alert
    labels:
    severity: 'warning'
Enter fullscreen mode Exit fullscreen mode
  • Trigger an alert if a container is using more than 1,2GB of RAM for more than 30 seconds.
  - alert: redis_high_memory
    # Condition for alerting
    expr: sum(container_memory_usage_bytes{name="redis"}) > 1200000000
    for: 30s
    # Annotation - additional informational labels to store more information
    annotations:
    title: 'Redis high memory usage'
    description: "Redis memory consumption is at {{ humanize $value}}."
    # Labels - additional labels to be attached to the alert
    labels:
    severity: 'warning'
Enter fullscreen mode Exit fullscreen mode

4. Set up Slack alerts
If you want to receive notifications via Slack, you should be part of a Slack workspace. To set up alerting in your Slack workspace, you’re going to need a Slack API URL. Go to Slack -> Administration -> Manage apps.
Image description

In the Manage apps directory, search for Incoming WebHooks and add it to your Slack workspace.
Image description

Next, specify in which channel you’d like to receive notifications from Alertmanager. (I’ve created #monitoring-infrastructure channel). After you confirm and add Incoming WebHooks integration, webhook URL (which is your Slack API URL) is displayed. Copy it.
Image description

5. Set up Alertmanager
The AlertManager service is responsible for handling alerts sent by the Prometheus server. AlertManager can send notifications via email, Pushover, Slack, HipChat or any other system that exposes a webhook interface.
The notification receivers can be configured in alertmanager/config.yml file. Copy the Slack Webhook URL into the api_url field and specify a Slack channel.

$sudo nano /etc/alertmanager/config.yml
Enter fullscreen mode Exit fullscreen mode
global:
 resolve_timeout: 1m
 slack_api_url: 'https://hooks.slack.com/services/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'

route:
 receiver: 'slack-notifications'

receivers:
- name: 'slack-notifications'
 slack_configs:
 - channel: '#monitoring-instances'
   send_resolved: true
Enter fullscreen mode Exit fullscreen mode

Reload configuration by sending POST request to /-/reload endpoint curl -X POST http://localhost:9093/-/reload . In a couple of minutes (after you stop at least one of your instances), you should be receiving your alert notifications through Slack, like this:
Image description

If you would like to improve your notifications and make them look nicer, you can use the template below, or use this tool and create your own.

global:
    resolve_timeout: 1m
    slack_api_url: 'https://hooks.slack.com/services/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'

   route:
    receiver: 'slack-notifications'

   receivers:
   - name: 'slack-notifications'
    slack_configs:
    - channel: '#monitoring-instances'
      send_resolved: true
      icon_url: https://avatars3.githubusercontent.com/u/3380462
      title: |-
        [{{ .Status | toUpper }}{{ if eq .Status "firing" }}:{{ .Alerts.Firing | len }}{{ end }}] {{ .CommonLabels.alertname }} for {{ .CommonLabels.job }}
        {{- if gt (len .CommonLabels) (len .GroupLabels) -}}
          {{" "}}(
          {{- with .CommonLabels.Remove .GroupLabels.Names }}
            {{- range $index, $label := .SortedPairs -}}
              {{ if $index }}, {{ end }}
              {{- $label.Name }}="{{ $label.Value -}}"
            {{- end }}
          {{- end -}}
          )
        {{- end }}
      text: >-
        {{ range .Alerts -}}
        *Alert:* {{ .Annotations.title }}{{ if .Labels.severity }} - `{{ .Labels.severity }}`{{ end }}
        *Description:* {{ .Annotations.description }}
        *Details:*
          {{ range .Labels.SortedPairs }} • *{{ .Name }}:* `{{ .Value }}`
          {{ end }}
        {{ end }}
Enter fullscreen mode Exit fullscreen mode

And this is the final result:
Image description

6. Set up PagerDuty Alerts
PagerDuty is one of the most well-known incident response platforms for IT departments. To set up alerting through PagerDuty, you need to create an account there. (PagerDuty is a paid service, but you can always do a 14-day free trial.) Once you’re logged in, go to Configuration -> Services -> + New Service.
Image description

Choose Prometheus from the Integration types list and give the service a name — I decided to call mine Prometheus Alertmanager. (You can also customize the incident settings, but I went with the default setup.) Then click save.
Image description

The Integration Key will be displayed. Copy the key.
Image description

You’ll need to update the content of your alertmanager.yml. It should look like the example below, but use your own service_key (integration key from PagerDuty). Pagerduty_url should stay the same and should be set to https://events.pagerduty.com/v2/enqueue. Save and restart the Alertmanager.

global:
    resolve_timeout: 1m
    pagerduty_url: 'https://events.pagerduty.com/v2/enqueue'

   route:x
    receiver: 'pagerduty-notifications'

   receivers:
   - name: 'pagerduty-notifications'
    pagerduty_configs:
    - service_key: 0c1cc665a594419b6d215e81f4e38f7
      send_resolved: trueglobal:
    resolve_timeout: 1m
    pagerduty_url: 'https://events.pagerduty.com/v2/enqueue'

   route:x
    receiver: 'pagerduty-notifications'

   receivers:
   - name: 'pagerduty-notifications'
    pagerduty_configs:
    - service_key: 0c1cc665a594419b6d215e81f4e38f7
      send_resolved: true
Enter fullscreen mode Exit fullscreen mode

Stop one of your instances. After a couple of minutes, alert notifications should be displayed in PagerDuty.
Image description

In PagerDuty user settings, you can decide on how you’d like to be notified. I chose both — email and phone call — and I was notified via both.

7. Set up Gmail Alerts
If you prefer to be notified by email, the setup is even easier. Alertmanager can simply pass on emails to email services — in this case, Gmail — which then sends them on your behalf.

It’s not recommended that you use your personal password for this, so you should create an App Password. To do that, go to Account Settings -> Security -> Signing in to Google -> App password (if you don’t see App password as an option, you probably haven’t set up 2-Step Verification and will need to do that first). Copy the newly-created password.

Image description

You’ll need to update the content of your alertmanager.yml again. The content should look similar to the example below. Don’t forget to replace the email address with your own email address, and the password with your new app password.

global:
    resolve_timeout: 1m

   route:
    receiver: 'gmail-notifications'

   receivers:
   - name: 'gmail-notifications'
    email_configs:
    - to: monitoringinstances@gmail.com
      from: monitoringinstances@gmail.com
      smarthost: smtp.gmail.com:587
      auth_username: monitoringinstances@gmail.com
      auth_identity: monitoringinstances@gmail.com
      auth_password: password
      send_resolved: true
Enter fullscreen mode Exit fullscreen mode

Once again, after a couple of minutes (after you stop at least one of your instances), alert notifications should be sent to your Gmail.
Image description

8. Conclusion
Thank you very much for taking time to read this. I would really appreciate any comment in the comment section.
Enjoy🎉

Top comments (0)

Why You Need to Study Javascript Fundamentals

The harsh reality for JS Developers: If you don't study the fundamentals, you'll be just another “Coder”. Top learnings on how to get to the mid/senior level faster as a JavaScript developer by Dragos Nedelcu.