Darian Vance

Posted on Jan 18 • Edited on Jan 20 • Originally published at wp.me

Solved: How to Send Custom Prometheus Alerts to Discord via Webhooks

#devops #programming #tutorial #cloud

🚀 Executive Summary

TL;DR: Traditional alerting methods often lead to alert fatigue or high costs. This guide details how to integrate Prometheus, Alertmanager, and Discord via webhooks to establish a robust, real-time, and cost-effective system for sending custom alert notifications directly to your team’s communication channel.

🎯 Key Takeaways

Discord webhooks provide a unique URL that allows services like Alertmanager to send messages to specific channels without requiring a bot or user account, acting as a simple yet powerful integration point.
Prometheus alerting rules define conditions using PromQL expressions, a for duration to prevent flapping, and labels and annotations to provide context for Alertmanager’s processing and routing.
Alertmanager’s configuration (alertmanager.yml) uses routes to define alert grouping logic (group\_by, group\_wait, repeat\_interval) and receivers with webhook\_configs to specify output destinations like Discord, including send\_resolved: true for resolution notifications.

How to Send Custom Prometheus Alerts to Discord via Webhooks

As a Senior DevOps Engineer at TechResolve, I’ve seen firsthand the challenges of maintaining effective communication during critical system events. In the fast-paced world of site reliability, getting timely notifications about infrastructure and application issues is paramount. Relying solely on email, SMS, or even proprietary SaaS solutions can often lead to alert fatigue, missed incidents, or unnecessary expenses.

Imagine a scenario where your production database CPU spikes, or a critical microservice starts returning 500 errors. How quickly would your team know? Would they have to manually check dashboards, or would an alert immediately notify the right people in their preferred communication channel?

This is where the power of Prometheus for monitoring, Alertmanager for alert processing, and Discord for team communication comes into play. By integrating these tools, you can transform reactive troubleshooting into proactive incident response, without breaking the bank on expensive third-party tools. This tutorial will guide you through setting up a robust, real-time alerting system that pushes custom Prometheus alerts directly to your team’s Discord server using webhooks.

Prerequisites

Before we dive into the configuration, ensure you have the following in place:

A Running Prometheus Instance: Your Prometheus server should be up and actively scraping metrics from your targets.
A Running Prometheus Alertmanager Instance: Alertmanager is crucial for processing, grouping, and routing alerts generated by Prometheus.
Access to a Discord Server: You’ll need permissions to create and manage webhooks within a Discord server and channel.
Basic Understanding of YAML: Prometheus and Alertmanager configurations are written in YAML, so familiarity with its syntax is helpful.
A Linux Environment: Or any environment where you can access and modify your Prometheus and Alertmanager configuration files.

Step-by-Step Guide

Let’s walk through the process of setting up this integration, from configuring Discord to defining your Alertmanager routes.

Step 1: Create a Discord Webhook

A Discord webhook acts as a unique URL that allows services like Alertmanager to send messages to a specific channel without needing a bot or user account. It’s a simple, yet powerful, integration point.

Open your Discord client or browser.
Navigate to the server where you want to receive alerts.
Right-click on the desired text channel (e.g., #devops-alerts) and select Edit Channel.
In the channel settings, go to Integrations.
Click on Create Webhook (or View Webhooks if you have existing ones, then New Webhook).
Give your webhook a descriptive name (e.g., Prometheus Alerts) and optionally an image.
Ensure the correct channel is selected.
Click Copy Webhook URL. Save this URL; you’ll need it for Alertmanager configuration.
Click Save Changes.

Security Note: Treat your webhook URL like a sensitive credential. Anyone with the URL can post messages to your Discord channel. Do not share it publicly or commit it directly into public repositories.

Step 2: Define Prometheus Alerting Rules

Prometheus evaluates expressions based on your scraped metrics to determine if an alert condition is met. These rules are defined in a YAML file, typically named alert.rules.yml or similar, and then included in your main prometheus.yml configuration.

Let’s create a simple rule to trigger an alert if a server’s CPU usage exceeds 90% for at least 5 minutes.

Create a file, for example, /etc/prometheus/rules/cpu_alerts.yml, with the following content:

groups:
  - name: server_cpu_alerts
    rules:
      - alert: HighCPUUsage
        expr: 100 - (avg by (instance) (irate(node_cpu_seconds_total{mode="idle"}[5m])) * 100) > 90
        for: 5m
        labels:
          severity: critical
          team: infrastructure
        annotations:
          summary: "High CPU usage detected on {{ $labels.instance }}"
          description: "The CPU usage on instance {{ $labels.instance }} has been above 90% for more than 5 minutes. Current value: {{ $value | humanize }}%."

Explanation of the rule:

alert: HighCPUUsage: The name of the alert.
expr: ... > 90: The PromQL expression that, when true, triggers the alert. This specifically calculates the average CPU utilization (non-idle) over the last 5 minutes.
for: 5m: The duration for which the expression must be true before the alert fires. This prevents flapping alerts.
labels:: Key-value pairs attached to the alert. These are crucial for Alertmanager to group and route alerts.
annotations:: Additional information about the alert, often used for more detailed messages. Notice the use of templating ({{ $labels.instance }}, {{ $value }}) which Alertmanager will resolve.

Now, ensure your main /etc/prometheus/prometheus.yml includes this rule file:

# prometheus.yml
global:
  scrape_interval: 15s

rule_files:
  - "/etc/prometheus/rules/*.yml" # Or specify the exact path to cpu_alerts.yml

alerting:
  alertmanagers:
    - static_configs:
        - targets:
            - localhost:9093 # Your Alertmanager instance address

Step 3: Configure Alertmanager to Route Alerts to Discord

Alertmanager is responsible for deduping, grouping, inhibiting, and routing alerts to various receivers. We’ll configure it to use the Discord webhook URL as a receiver.

Edit your Alertmanager configuration file, usually located at /etc/alertmanager/alertmanager.yml:

# alertmanager.yml
global:
  resolve_timeout: 5m

route:
  group_by: ['alertname', 'cluster', 'service']
  group_wait: 30s
  group_interval: 5m
  repeat_interval: 1h
  receiver: 'discord-default' # Default receiver if no specific route matches

  routes:
  - match:
      severity: critical
    receiver: 'discord-critical'
    group_wait: 10s
    repeat_interval: 30m

receivers:
  - name: 'discord-default'
    webhook_configs:
      - url: 'YOUR_DISCORD_WEBHOOK_URL_HERE' # Paste your Discord webhook URL
        send_resolved: true # Send messages when an alert resolves
        # Optional: customize message template if desired (more advanced)
        # For simplicity, Alertmanager's default template is usually good for Discord

  - name: 'discord-critical'
    webhook_configs:
      - url: 'YOUR_DISCORD_WEBHOOK_URL_HERE' # You could use a different webhook for critical alerts
        send_resolved: true

Explanation of the configuration:

global:: General settings like resolve_timeout.
route:: This section defines how alerts are grouped and which receiver they go to.
- group_by:: Alerts with the same values for these labels will be grouped into a single notification.
- group_wait:: How long to wait before sending the initial notification for a new group of alerts.
- group_interval:: How long to wait before sending a new notification for a changed or fired alert in an existing group.
- repeat_interval:: How long to wait before re-sending a notification for a firing alert.
- receiver: 'discord-default': The default receiver if no more specific route matches.
- routes:: A list of specific routing rules. Here, we’ve defined a route for alerts with severity: critical to go to the discord-critical receiver, potentially with different grouping/repeat intervals.
receivers:: Defines different output destinations.
- name: 'discord-default': A named receiver.
- webhook_configs:: This is where we define the Discord webhook.
- url: 'YOUR_DISCORD_WEBHOOK_URL_HERE': Replace this placeholder with the actual Discord webhook URL you copied in Step 1.
- send_resolved: true: Ensures Alertmanager also sends a notification when an alert status changes from firing to resolved. This is crucial for knowing when issues are fixed.

Step 4: Restart Services and Test

For your configuration changes to take effect, you need to restart (or reload) both Prometheus and Alertmanager.

On most Linux systems, you can do this with systemd:

sudo systemctl restart prometheus
sudo systemctl restart alertmanager

Verify that both services started without errors:

sudo systemctl status prometheus
sudo systemctl status alertmanager

Check the Prometheus UI (default port 9090) under the “Alerts” tab to see if your rules are loaded. Also, check the Alertmanager UI (default port 9093) to see its status.

To test the alert:

If you’re monitoring a real server, you might intentionally generate high CPU load for 5 minutes (e.g., using stress-ng or a simple infinite loop) to trigger the HighCPUUsage alert. Alternatively, for testing, you could create a dummy metric that always fires:

Add a temporary rule like this to your cpu_alerts.yml:

         - alert: TestDiscordAlert
           expr: vector(1) # This expression is always true
           for: 1m
           labels:
             severity: warning
             team: devops
           annotations:
             summary: "This is a test alert for Discord integration."
             description: "If you see this, your Discord webhook integration is working!"

Reload Prometheus (sudo systemctl restart prometheus).
Wait a minute, and you should see the alert fire in Prometheus and then be sent to your Discord channel via Alertmanager.
Remember to remove this test rule after successful verification!

You should see a message in your designated Discord channel, providing details about the alert.

Common Pitfalls

Even with careful steps, issues can arise. Here are a couple of common problems:

YAML Syntax Errors: YAML is very sensitive to indentation and syntax. A single misplaced space or colon can prevent services from starting. Use a YAML linter (e.g., yamllint) to validate your configuration files before restarting services. Alertmanager’s amtool check-config command can also help.
Incorrect Webhook URL or Discord Permissions: Double-check that the Discord webhook URL in your alertmanager.yml is correct and that the webhook still exists and has permissions to post to the channel. If the channel or webhook was deleted, Alertmanager will fail to send messages silently or log errors.
Alertmanager Reload vs. Restart: While systemctl reload prometheus generally works for rule file changes, it’s often safer to perform a full systemctl restart alertmanager after modifying alertmanager.yml to ensure all changes are picked up correctly.
Firewall Rules: Ensure your Alertmanager instance can reach Discord’s servers (usually over HTTPS, port 443). If your server is behind a restrictive firewall, you might need to open outgoing connections.

Conclusion

You’ve successfully set up a robust, real-time alerting system that integrates Prometheus with Discord. By leveraging webhooks, you’ve created a cost-effective and highly customizable solution for your team to stay informed about critical system events. This direct line of communication ensures that your SysAdmins, Developers, and DevOps Engineers receive actionable alerts precisely when and where they need them, fostering quicker response times and minimizing potential downtime.

From here, you can further enhance your alerting strategy:

Advanced Alertmanager Templating: Customize the Discord message format more extensively using Go templates in Alertmanager to include specific links, runbooks, or dashboards.
More Sophisticated Routing: Implement more granular routing rules in Alertmanager based on severity, team, or specific alert labels to direct alerts to different Discord channels or even different receivers (e.g., PagerDuty for critical incidents).
Alert Silencing and Inhibition: Explore Alertmanager’s UI (default port 9093) to temporarily silence alerts during maintenance windows or inhibit lower-priority alerts when a higher-priority one is firing.
Integrate with Other Tools: Consider how these alerts could trigger automated actions or feed into incident management platforms.

Happy monitoring, and may your alerts always be timely and informative!