DEV Community

Alex Spinov
Alex Spinov

Posted on

Grafana OnCall Has a Free API: Open-Source Incident Response and Alert Routing

Grafana OnCall is an open-source incident response tool for DevOps and SRE teams. It provides on-call scheduling, alert routing, and escalation policies — all integrated with your existing monitoring stack.

What Is Grafana OnCall?

Grafana OnCall (formerly Amixr) is an on-call management system that integrates natively with Grafana, Prometheus, Alertmanager, and 100+ monitoring tools. It routes alerts to the right people at the right time.

Key Features:

  • On-call schedules and rotations
  • Alert routing and escalation policies
  • Slack, Telegram, MS Teams notifications
  • Phone calls and SMS alerts
  • Grafana native integration
  • ChatOps for incident management
  • Public API for automation

Installation

# Via Docker Compose
git clone https://github.com/grafana/oncall.git
cd oncall
docker compose up -d

# Or via Helm
helm repo add grafana https://grafana.github.io/helm-charts
helm install oncall grafana/oncall -n oncall --create-namespace
Enter fullscreen mode Exit fullscreen mode

Grafana OnCall API: Automate Incident Response

import requests

BASE = "https://oncall.example.com/api/v1"
HEADERS = {"Authorization": "Bearer YOUR_API_TOKEN"}

# List on-call schedules
schedules = requests.get(f"{BASE}/schedules", headers=HEADERS).json()
for schedule in schedules["results"]:
    print(f"Schedule: {schedule[name]}, Team: {schedule[team_id]}")

# Get current on-call users
oncall = requests.get(
    f"{BASE}/schedules/{schedule_id}/on-call",
    headers=HEADERS
).json()
print(f"On-call now: {oncall[users]}")
Enter fullscreen mode Exit fullscreen mode

Alert Routing Rules

# Create an integration (webhook endpoint)
integration = requests.post(
    f"{BASE}/integrations",
    headers=HEADERS,
    json={
        "type": "webhook",
        "name": "Production Alerts",
        "team_id": "TEAM_ID"
    }
).json()
print(f"Webhook URL: {integration[link]}")

# Create routing rule
route = requests.post(
    f"{BASE}/routes",
    headers=HEADERS,
    json={
        "integration_id": integration["id"],
        "routing_regex": "severity=critical",
        "escalation_chain_id": "CHAIN_ID",
        "position": 0
    }
).json()
Enter fullscreen mode Exit fullscreen mode

Escalation Chains

# Create escalation chain
chain = requests.post(
    f"{BASE}/escalation_chains",
    headers=HEADERS,
    json={"name": "Critical P1", "team_id": "TEAM_ID"}
).json()

# Add escalation policies
# Step 1: Notify on-call
requests.post(f"{BASE}/escalation_policies", headers=HEADERS, json={
    "escalation_chain_id": chain["id"],
    "type": "notify_persons",
    "persons_to_notify": ["USER_ID"],
    "position": 0
})

# Step 2: Wait 5 minutes
requests.post(f"{BASE}/escalation_policies", headers=HEADERS, json={
    "escalation_chain_id": chain["id"],
    "type": "wait",
    "duration": 300,
    "position": 1
})

# Step 3: Notify entire team
requests.post(f"{BASE}/escalation_policies", headers=HEADERS, json={
    "escalation_chain_id": chain["id"],
    "type": "notify_team_members",
    "notify_to_team_members": "TEAM_ID",
    "position": 2
})
Enter fullscreen mode Exit fullscreen mode

Trigger Alert via Webhook

curl -X POST https://oncall.example.com/integrations/v1/webhook/INTEGRATION_TOKEN/ \
  -H "Content-Type: application/json" \
  -d x27{"title": "High CPU on prod-web-01", "message": "CPU usage >95% for 10min", "severity": "critical"}x27
Enter fullscreen mode Exit fullscreen mode

Resources


Need to scrape monitoring data or web sources? Check out my web scraping tools on Apify — production-ready actors for Reddit, Google Maps, and more. Questions? Email me at spinov001@gmail.com

Top comments (0)