Alexandre Vazquez

Posted on Apr 8 • Originally published at alexandre-vazquez.com

Prometheus Alertmanager vs Grafana Alerting (2026): Architecture, Features, and When to Use Each

#kubernetes #285 #49 #67

Most observability stacks that have been running in production for more than a year end up with alerting spread across two systems: Prometheus Alertmanager handling metric-based alerts and Grafana Alerting managing everything else. Engineers add a Slack integration in Grafana because it is convenient, then realize their Alertmanager routing tree already covers the same service. Before long, the on-call team receives duplicated pages, silencing rules live in two places, and nobody is confident which system is authoritative.

This is the alerting consolidation problem, and it affects teams of every size. The question is straightforward: should you standardize on Prometheus Alertmanager, move everything into Grafana Alerting, or deliberately run both? The answer depends on your datasource mix, your GitOps maturity, and how your organization manages on-call routing. This guide breaks down the architecture, features, and operational trade-offs of each system so you can make a deliberate choice instead of drifting into accidental complexity.

Architecture Overview

Before comparing features, you need to understand how each system fits into the alerting pipeline. They occupy the same logical space — “receive a condition, route a notification” — but they get there from fundamentally different starting points.

Prometheus Alertmanager: The Standalone Receiver

Alertmanager is a dedicated, standalone component in the Prometheus ecosystem. It does not evaluate alert rules itself. Instead, Prometheus (or any compatible sender like Thanos Ruler, Cortex, or Mimir Ruler) evaluates PromQL expressions and pushes firing alerts to the Alertmanager API. Alertmanager then handles deduplication, grouping, inhibition, silencing, and notification delivery.

# Simplified Prometheus → Alertmanager flow
#
# [Prometheus] --evaluates rules--> [firing alerts]
#        |
#        +--POST /api/v2/alerts--> [Alertmanager]
#                                      |
#                          +-----------+-----------+
#                          |           |           |
#                       [Slack]    [PagerDuty]  [Email]

The entire configuration lives in a single YAML file (alertmanager.yml). This includes the routing tree, receiver definitions, inhibition rules, and silence templates. There is no database, no UI-driven state — just a config file and an optional local storage directory for notification state and silences. This makes it trivially reproducible and ideal for GitOps workflows.

For high availability, you run multiple Alertmanager instances in a gossip-based cluster. They use a mesh protocol to share silence and notification state, ensuring that failover does not result in duplicate or lost notifications. The HA model is well-understood and has been stable for years.

Grafana Alerting: The Integrated Platform

Grafana Alerting (sometimes called “Grafana Unified Alerting,” introduced in Grafana 8 and significantly matured through Grafana 11 and 12) takes a different architectural approach. It embeds the entire alerting lifecycle — rule evaluation, state management, routing, and notification — inside the Grafana server process. Under the hood, it actually uses a fork of Alertmanager for the routing and notification layer, but this is an implementation detail that is invisible to users.

# Simplified Grafana Alerting flow
#
# [Grafana Server]
#   ├── Rule Evaluation Engine
#   │     ├── queries Prometheus
#   │     ├── queries Loki
#   │     ├── queries CloudWatch
#   │     └── queries any supported datasource
#   │
#   ├── Alert State Manager (internal)
#   │
#   └── Embedded Alertmanager (routing + notifications)
#           |
#           +-----------+-----------+
#           |           |           |
#        [Slack]    [PagerDuty]  [Email]

The critical distinction is that Grafana Alerting evaluates alert rules itself, querying any configured datasource — not just Prometheus. It can fire alerts based on Loki log queries, Elasticsearch searches, CloudWatch metrics, PostgreSQL queries, or any of the 100+ datasource plugins available in Grafana. Rule definitions, contact points, notification policies, and mute timings are stored in the Grafana database (or provisioned via YAML files and the Grafana API).

For high availability in self-hosted environments, Grafana Alerting relies on a shared database and a peer-discovery mechanism between Grafana instances. In Grafana Cloud, HA is fully managed by Grafana Labs.

Feature Comparison

The following table provides a side-by-side comparison of the capabilities that matter most in production alerting systems. Both systems are mature, but they prioritize different things.

Feature	Prometheus Alertmanager	Grafana Alerting
Datasources	Prometheus-compatible only (Prometheus, Thanos, Mimir, VictoriaMetrics)	Any Grafana datasource (Prometheus, Loki, Elasticsearch, CloudWatch, SQL databases, etc.)
Rule evaluation	External (Prometheus/Ruler evaluates rules and pushes alerts)	Built-in (Grafana evaluates rules directly)
Routing tree	Hierarchical YAML-based routing with match/match_re, continue, group_by	Notification policies with label matchers, nested policies, mute timings
Grouping	Full support via group_by, group_wait, group_interval	Full support via notification policies with equivalent controls
Inhibition	Native inhibition rules (suppress alerts when a related alert is firing)	Supported since Grafana 10.3 but less flexible than Alertmanager
Silencing	Label-based silences via API or UI, time-limited	Mute timings (recurring schedules) and silences (ad-hoc, label-based)
Notification channels	Email, Slack, PagerDuty, OpsGenie, VictoriaOps, webhook, WeChat, Telegram, SNS, Webex	All of the above plus Teams, Discord, Google Chat, LINE, Threema, Oncall, and more via contact points
Templating	Go templates in notification config	Go templates with access to Grafana template variables and functions
Multi-tenancy	Not built-in; achieved via separate instances or Mimir Alertmanager	Native multi-tenancy via Grafana organizations and RBAC
High availability	Gossip-based cluster (peer mesh, well-proven)	Database-backed HA with peer discovery between Grafana instances
Configuration model	Single YAML file, fully declarative	UI + API + provisioning YAML files, stored in database
GitOps compatibility	Excellent — config file lives in version control natively	Possible via provisioning files or Terraform provider, but requires extra tooling
External alert sources	Any system that can POST to the Alertmanager API	Supported via the Grafana Alerting API (external alerts can be pushed)
Managed service	Available via Grafana Cloud (as Mimir Alertmanager), Amazon Managed Prometheus	Available via Grafana Cloud

Alertmanager Strengths

Alertmanager has been a production staple since 2015. Over a decade of use across thousands of organizations has made it one of the most battle-tested components in the CNCF ecosystem. Here is where it genuinely excels.

Declarative, GitOps-Native Configuration

The entire Alertmanager configuration is a single YAML file. There is no hidden state in a database, no click-driven configuration that someone forgets to document. You check it into Git, review it in a pull request, and deploy it through your CI/CD pipeline like any other infrastructure code. This is a significant operational advantage for teams that have invested in GitOps.

alertmanager.yml — everything in one file

global:
resolve_timeout: 5m
slack_api_url: "https://hooks.slack.com/services/T00/B00/XXX"

route:
receiver: platform-team
group_by: [alertname, cluster, namespace]
group_wait: 30s
group_interval: 5m
repeat_interval: 4h
routes:
- match:
severity: critical
receiver: pagerduty-oncall
group_wait: 10s
- match_re:
team: "^(payments|checkout)$"
receiver: payments-slack
continue: true

receivers:

name: platform-team slack_configs:
- channel: "#platform-alerts"
name: pagerduty-oncall pagerduty_configs:
- service_key: ""
name: payments-slack slack_configs:
- channel: "#payments-oncall"

inhibit_rules:

source_match: severity: critical target_match: severity: warning equal: [alertname, cluster]

Every change is auditable. Rollbacks are a git revert away. This matters enormously when you are debugging why an alert did not fire at 3 AM.

Lightweight and Single-Purpose

Alertmanager does one thing: route and deliver notifications. It has no dashboard, no query engine, no datasource plugins. This single-purpose design makes it operationally simple. Resource consumption is minimal — a small Alertmanager instance handles thousands of active alerts on a few hundred megabytes of memory. It starts in milliseconds and requires almost no maintenance.

Mature Inhibition and Routing

Alertmanager’s inhibition rules are first-class citizens. You can suppress downstream warnings when a critical alert is already firing, preventing alert storms from overwhelming your on-call team. The hierarchical routing tree with continue flags allows for nuanced delivery: send to the team channel AND escalate to PagerDuty simultaneously, with different grouping strategies at each level.

Proven High Availability

The gossip-based HA cluster has been stable for years. Running three Alertmanager replicas behind a load balancer (or using Kubernetes service discovery) gives you reliable notification delivery without shared storage. The protocol handles deduplication across instances automatically, which is the hardest part of distributed alerting.

Grafana Alerting Strengths

Grafana Alerting has matured considerably since its rocky introduction in Grafana 8. By Grafana 11 and 12, it has become a legitimate production alerting platform with capabilities that Alertmanager cannot match on its own.

Multi-Datasource Alert Rules

This is Grafana Alerting’s strongest differentiator. You can write alert rules that query Loki for error log spikes, CloudWatch for AWS resource utilization, Elasticsearch for application errors, or a PostgreSQL database for business metrics — all from the same alerting system. If your observability stack includes more than just Prometheus, this eliminates the need for separate alerting tools per datasource.

# Grafana alert rule provisioning example — alerting on Loki log errors
apiVersion: 1
groups:
  - orgId: 1
    name: application-errors
    folder: Production
    interval: 1m
    rules:
      - uid: loki-error-spike
        title: "High error rate in payment service"
        condition: C
        data:
          - refId: A
            datasourceUid: loki-prod
            model:
              expr: 'sum(rate({app="payment-service"} |= "ERROR" [5m]))'
          - refId: B
            datasourceUid: "__expr__"
            model:
              type: reduce
              expression: A
              reducer: last
          - refId: C
            datasourceUid: "__expr__"
            model:
              type: threshold
              expression: B
              conditions:
                - evaluator:
                    type: gt
                    params: [10]
        for: 5m
        labels:
          severity: warning
          team: payments

This is something Alertmanager simply cannot do. Alertmanager only receives pre-evaluated alerts — it has no concept of datasources or query execution.

Unified UI for Alert Management

Grafana provides a single pane of glass for alert rule creation, visualization, notification policy management, contact point configuration, and silence management. For teams where not every engineer is comfortable editing YAML routing trees, the visual notification policy editor significantly reduces the barrier to entry. You can see the state of every alert rule, its evaluation history, and the exact notification path it will take — all without leaving the browser.

Native Multi-Tenancy and RBAC

Grafana’s organization model and role-based access control extend naturally to alerting. Different teams can manage their own alert rules, contact points, and notification policies within their organization or folder scope, without seeing or interfering with other teams. Achieving this with standalone Alertmanager requires either running separate instances per tenant or using Mimir’s multi-tenant Alertmanager.

Mute Timings and Richer Scheduling

While Alertmanager supports silences (ad-hoc, time-limited suppressions), Grafana Alerting adds mute timings — recurring time-based windows where notifications are suppressed. This is useful for scheduled maintenance windows, business-hours-only alerting, or suppressing non-critical alerts on weekends. Alertmanager requires external tooling or manual silence creation for recurring windows.

Grafana Cloud as a Managed Option

For teams that want to avoid managing alerting infrastructure entirely, Grafana Cloud provides a fully managed Grafana Alerting stack. This includes HA, state persistence, and notification delivery without any self-hosted components. The Grafana Cloud alerting stack also includes a managed Mimir Alertmanager, which means you can use Prometheus-native alerting rules if you prefer that model while still benefiting from the managed infrastructure.

When to Use Prometheus Alertmanager

Alertmanager is the right choice when the following conditions describe your environment:

Your metrics stack is Prometheus-native. If all your alert rules are PromQL expressions evaluated by Prometheus, Thanos Ruler, or Mimir Ruler, Alertmanager is the natural fit. There is no added value in routing those alerts through Grafana.
GitOps is non-negotiable. If every infrastructure change must go through a pull request and be fully declarative, Alertmanager’s single-file configuration model is significantly easier to manage than Grafana’s database-backed state. Tools like amtool provide config validation in CI pipelines.
You need fine-grained routing with inhibition. Complex routing trees with multiple levels of grouping, inhibition rules, and continue flags are more naturally expressed in Alertmanager’s YAML format. The routing logic has been stable and well-documented for years.
You run microservices with per-team routing. If each team owns its routing subtree and the routing logic is complex, Alertmanager’s hierarchical model scales better than UI-driven configuration. Teams can own their section of the config file via CODEOWNERS in Git.
You want minimal operational overhead. Alertmanager is a single binary with minimal resource requirements. There is no database to back up, no migrations to run, and no UI framework to keep updated.

When to Use Grafana Alerting

Grafana Alerting is the right choice when these conditions apply:

You alert on more than just Prometheus metrics. If you need alert rules based on Loki logs, Elasticsearch queries, CloudWatch metrics, or database queries, Grafana Alerting is the only option that handles all of these natively. The alternative is running separate alerting tools per datasource, which is worse.
Your team prefers UI-driven configuration. Not every engineer wants to edit YAML routing trees. If your organization values a visual interface for managing alerts, contact points, and notification policies, Grafana’s UI is a major productivity advantage.
You are using Grafana Cloud. If you are already on Grafana Cloud, using its built-in alerting is the path of least resistance. You get HA, managed notification delivery, and a unified experience without running any additional infrastructure.
Multi-tenancy is a requirement. If multiple teams need isolated alerting configurations with RBAC, Grafana’s native organization and folder-based access model is significantly easier to set up than running per-tenant Alertmanager instances.
You want mute timings for recurring maintenance windows. If your team regularly needs to suppress alerts during scheduled windows (deploy windows, batch processing hours, weekend non-critical suppression), Grafana’s mute timings feature is more ergonomic than creating and managing recurring silences in Alertmanager.

Running Both Together: The Hybrid Pattern

In practice, many production environments run both Alertmanager and Grafana Alerting. This is not necessarily a mistake — it can be a deliberate architectural choice when done with clear boundaries.

Common Hybrid Architecture

The most common pattern looks like this:

Prometheus Alertmanager handles all metric-based alerts. PromQL rules are evaluated by Prometheus or a long-term storage ruler (Thanos, Mimir). Alertmanager owns routing, grouping, and notification for these alerts.
Grafana Alerting handles non-Prometheus alerts: log-based alerts from Loki, business metrics from SQL datasources, and cross-datasource correlation rules.

The key to making this work without chaos is establishing clear ownership rules:

# Ownership boundaries for hybrid alerting
#
# Prometheus Alertmanager owns:
#   - All PromQL-based alert rules
#   - Infrastructure alerts (node, kubelet, etcd, CoreDNS)
#   - Application SLO/SLI alerts based on metrics
#
# Grafana Alerting owns:
#   - Log-based alert rules (Loki, Elasticsearch)
#   - Business metric alerts (SQL datasources)
#   - Cross-datasource correlation rules
#   - Alerts for teams that prefer UI-driven management
#
# Shared:
#   - Contact points / receivers use the same Slack channels and PagerDuty services
#   - On-call rotations are managed externally (PagerDuty, Grafana OnCall)

Both systems can deliver to the same notification channels. The critical discipline is ensuring that silencing and maintenance windows are applied in both systems when needed. This is the primary operational cost of the hybrid approach.

Grafana as a Viewer for Alertmanager

Even if you use Alertmanager exclusively for routing and notification, Grafana can serve as a read-only viewer. Grafana natively supports connecting to an external Alertmanager datasource, allowing you to see firing alerts, active silences, and alert groups in the Grafana UI. This gives you the operational visibility of Grafana without moving your alerting logic into it.

# Grafana datasource provisioning for external Alertmanager
apiVersion: 1
datasources:
  - name: Alertmanager
    type: alertmanager
    url: http://alertmanager.monitoring.svc:9093
    access: proxy
    jsonData:
      implementation: prometheus

Migration Considerations

If you are moving from one system to the other, here are the practical considerations to plan for.

Migrating from Alertmanager to Grafana Alerting

Rule conversion. Your PromQL-based recording and alerting rules defined in Prometheus rule files need to be recreated as Grafana alert rules. Grafana provides a migration tool that can import Prometheus-format rules, but complex expressions may need manual adjustment.
Routing tree translation. Alertmanager’s hierarchical routing tree maps to Grafana’s notification policies, but the semantics are not identical. Test the notification routing thoroughly — the continue flag behavior and default routes may differ.
Silence and inhibition migration. Active silences are ephemeral and do not need migration. Inhibition rules need to be recreated in Grafana’s format. Recurring maintenance windows should be converted to mute timings.
Run in parallel first. The safest migration strategy is to run both systems in parallel for two to four weeks, sending notifications from both, then cutting over when you have confidence in the Grafana setup. Accept the temporary noise of duplicate alerts — it is far cheaper than missing a critical page during migration.

Migrating from Grafana Alerting to Alertmanager

Datasource limitation. You can only migrate alerts that are based on Prometheus-compatible datasources. Alerts querying Loki, Elasticsearch, or SQL datasources have no equivalent in Alertmanager — you will need an alternative solution for those.
Rule export. Export Grafana alert rules and convert them to Prometheus-format rule files. The Grafana API (GET /api/v1/provisioning/alert-rules) provides structured output that can be transformed with a script.
Contact point mapping. Map Grafana contact points to Alertmanager receivers. The configuration format is different, but the concepts are equivalent.
State loss. Alertmanager does not carry over Grafana’s alert evaluation history. You start fresh. Plan for a brief period where alerts may re-fire as Prometheus evaluates rules that were previously managed by Grafana.

Decision Framework

If you want a quick decision path, use this framework:

Start here:
│
├── Do you alert on non-Prometheus datasources (Loki, ES, SQL, CloudWatch)?
│   ├── YES → Grafana Alerting (at least for those datasources)
│   └── NO ↓
│
├── Is GitOps/declarative config a hard requirement?
│   ├── YES → Alertmanager
│   └── NO ↓
│
├── Do you need multi-tenancy with RBAC?
│   ├── YES → Grafana Alerting (or Mimir Alertmanager)
│   └── NO ↓
│
├── Are you on Grafana Cloud?
│   ├── YES → Grafana Alerting (path of least resistance)
│   └── NO ↓
│
└── Default → Alertmanager (simpler, lighter, well-proven)

For many teams, the honest answer is “both” — Alertmanager for the Prometheus-native metric pipeline, Grafana Alerting for everything else. That is a valid architecture as long as the ownership boundaries are documented and the on-call team knows where to look.

Frequently Asked Questions

What is the difference between Alertmanager and Grafana Alerting?

Prometheus Alertmanager is a standalone notification routing engine that receives pre-evaluated alerts from Prometheus and delivers them to receivers like Slack, PagerDuty, or email. It does not evaluate alert rules itself. Grafana Alerting is an integrated alerting platform embedded in Grafana that both evaluates alert rules (querying any supported datasource) and handles notification routing. Alertmanager is configured entirely via YAML, while Grafana Alerting offers a UI, API, and file-based provisioning. The fundamental difference is scope: Alertmanager handles only the routing and notification phase, while Grafana Alerting handles the full lifecycle from query evaluation to notification.

Can Grafana Alerting replace Prometheus Alertmanager?

Yes, for many use cases. Grafana Alerting can evaluate PromQL rules directly against your Prometheus datasource, so you do not strictly need a separate Alertmanager instance. However, there are scenarios where Alertmanager remains the better choice: heavily GitOps-driven environments, teams that need Alertmanager’s mature inhibition rules, or architectures where Prometheus rule evaluation happens externally (Thanos Ruler, Mimir Ruler) and a dedicated Alertmanager is already in the pipeline. If your only datasource is Prometheus and you value declarative configuration, Alertmanager is still simpler and lighter.

Is Grafana Alertmanager the same as Prometheus Alertmanager?

Not exactly. Grafana Alerting uses a fork of the Prometheus Alertmanager code internally for its notification routing engine, but it is not the same product. The Grafana “Alertmanager” you see in the UI is a managed, embedded component with a different configuration interface (notification policies, contact points, mute timings) compared to the standalone Prometheus Alertmanager (routing tree, receivers, inhibition rules in YAML). Grafana can also connect to an external Prometheus Alertmanager as a datasource, which adds to the confusion. When people refer to “Grafana Alertmanager,” they usually mean the embedded routing engine inside Grafana Alerting.

What are the best alternatives to Prometheus Alertmanager?

The most direct alternative is Grafana Alerting, which can receive and route Prometheus alerts while also supporting other datasources. Beyond that, other options include: Grafana OnCall for on-call management and escalation (often used alongside Alertmanager rather than replacing it), PagerDuty or Opsgenie as managed incident response platforms that can receive alerts directly, Keep as an open-source AIOps alert management platform, and Mimir Alertmanager for multi-tenant environments running Grafana Mimir. The choice depends on whether you need an Alertmanager replacement (routing and notification) or a complementary tool for escalation and incident response.

Should I use Prometheus alerts or Grafana alerts for Kubernetes monitoring?

For Kubernetes monitoring specifically, the kube-prometheus-stack (which includes Prometheus, Alertmanager, and a comprehensive set of pre-built alerting rules) remains the industry standard. These rules are PromQL-based and are designed to work with Alertmanager. If you are deploying kube-prometheus-stack, using Alertmanager for metric-based alerts is the straightforward choice. Add Grafana Alerting on top if you also need to alert on logs (via Loki) or non-metric datasources. For Kubernetes-specific monitoring, the combination of Prometheus rules with Alertmanager for routing is the most mature and well-supported path.

Final Thoughts

The Alertmanager vs Grafana Alerting debate is not really about which tool is better — it is about which tool fits your operational context. Alertmanager is simpler, lighter, and more GitOps-friendly. Grafana Alerting is more versatile, more accessible to UI-oriented teams, and the only option if you need multi-datasource alerting. Running both is perfectly valid when the boundaries are clear.

The worst outcome is not picking the “wrong” tool. The worst outcome is running both accidentally, with overlapping coverage, duplicated notifications, and no clear ownership. Whatever you choose, document the decision, define the ownership boundaries, and make sure your on-call team knows exactly where to go when they need to silence an alert at 3 AM.

DEV Community