Apogee Watcher

Posted on Mar 16 • Originally published at apogeewatcher.com

Slack Alert Policy Template for Web Performance Teams

#webdev #webperf #seo

The Problem: Alerts Without a Policy Are Chaos

You've got performance alerts, but nobody knows when to treat one as critical versus "review when you can." You either drown in notifications and start ignoring them, or you set thresholds so high that real issues slip through. Without a policy, your team doesn't know when to drop everything, when to acknowledge in the next standup, or how to avoid re-alerting on the same issue every hour.

The solution: An alert policy. It defines:

What triggers an alert
When the alert fires (thresholds and conditions)
Where the alert goes (channels, recipients)
How the team responds (severity levels, SLAs, escalation)

This template gives you a complete policy you can adopt, adapt, and paste into your team's Slack workspace or documentation. For background on what is Core Web Vitals (LCP, INP, CLS), see our practical guide; for performance budget thresholds and severity criteria, see The Complete Guide to Performance Budgets and our Performance Budget Thresholds Template.

The Alert Policy Template

Section 1: Alert Channels

Define where alerts are routed based on severity.

// Alert Channel Configuration

#perf-alerts-critical
  Purpose:     P1 and P2 alerts only
  Who watches:  On-call engineer, tech lead
  Expectation:  Respond within 30 minutes during business hours
  Muted:        Never

#perf-alerts
  Purpose:     All performance alerts (P1-P4)
  Who watches:  Full engineering team
  Expectation:  Review during next standup
  Muted:        Allowed outside business hours

#client-[name]-alerts (per-client channels, optional)
  Purpose:     Client-specific performance alerts
  Who watches:  Account manager, assigned developer
  Expectation:  Review same business day
  Muted:        Allowed outside business hours

Email (fallback)
  Purpose:     Backup delivery for all P1 alerts
  Recipients:  tech-lead@agency.com, ops@agency.com
  Expectation:  Ensures delivery if Slack is down

Section 2: Severity Levels

Define four severity levels based on metric deviation from budget. The "Poor" thresholds below match Google's Core Web Vitals definitions (LCP > 4s, INP > 500ms, CLS > 0.25).

// Severity Levels

P1 — CRITICAL
  Condition:   Any Core Web Vital in "Poor" range OR Performance Score < 50
  Thresholds:
    - LCP > 4.0 seconds
    - INP > 500 milliseconds
    - CLS > 0.25
    - Performance Score < 50
  Channel:     #perf-alerts-critical + email fallback
  Response:    Acknowledge within 30 minutes. Begin investigation immediately.
  Escalation:  If unacknowledged after 1 hour → notify tech lead directly
  Cooldown:    4 hours (don't re-alert for same page/metric within 4 hours)

P2 — WARNING
  Condition:   Any Core Web Vital exceeds budget by > 20% but not in "Poor" range
  Thresholds:
    - LCP > 3.0 seconds (budget: 2.5s, 20% over)
    - INP > 240 milliseconds (budget: 200ms, 20% over)
    - CLS > 0.12 (budget: 0.1, 20% over)
    - Performance Score < 75
  Channel:     #perf-alerts-critical
  Response:    Investigate within 4 hours during business hours.
  Escalation:  If unresolved after 24 hours → escalate to P1
  Cooldown:    8 hours

P3 — NOTICE
  Condition:   Any Core Web Vital exceeds budget but by < 20%
  Thresholds:
    - LCP 2.5 – 3.0 seconds
    - INP 200 – 240 milliseconds
    - CLS 0.10 – 0.12
    - Performance Score 75 – 89
  Channel:     #perf-alerts
  Response:    Review in next standup. Create ticket if persistent.
  Escalation:  If still over budget after 3 consecutive tests → escalate to P2
  Cooldown:    24 hours

P4 — INFO
  Condition:   Score improved significantly OR metric returned to budget after violation
  Thresholds:
    - Performance Score improved by > 10 points
    - Previously violated metric now within budget
  Channel:     #perf-alerts
  Response:    No action required. For awareness only.
  Cooldown:    None (fire once per event)

Section 3: What to Include in Each Alert

Define the content of each alert message so the team can triage quickly.

// Alert Message Template

[SEVERITY] Performance Alert — [SITE] — [PAGE]

Metric:        [METRIC_NAME]
Current Value: [ACTUAL_VALUE]
Budget:        [BUDGET_VALUE]
Strategy:      [MOBILE / DESKTOP]
Deviation:     [PERCENTAGE OVER BUDGET]

Page URL:      [FULL_URL]
Test Time:     [TIMESTAMP]
Dashboard:     [LINK_TO_MONITORING_DASHBOARD]

Previous Test: [PREVIOUS_VALUE] at [PREVIOUS_TIMESTAMP]
Trend:         [IMPROVING / DEGRADING / STABLE]

Example P2 Alert:

[P2] Performance Alert — acme.com — Homepage

Metric:        LCP (Largest Contentful Paint)
Current Value: 3,200 ms
Budget:        2,500 ms
Strategy:      Mobile
Deviation:     28% over budget

Page URL:      https://acme.com/
Test Time:     2026-02-12 07:00 UTC
Dashboard:     https://app.apogeewatcher.com/sites/acme/results

Previous Test: 2,400 ms at 2026-02-11 07:00 UTC
Trend:         DEGRADING (↑ 800ms in 24 hours)

Section 4: Response Procedures

Define what happens after an alert fires.

// Response Procedure

Step 1: ACKNOWLEDGE
  - React to the Slack message to signal you're looking (e.g. add a reaction)
  - If P1: reply in thread confirming you're investigating

Step 2: TRIAGE
  - Open the monitoring dashboard link from the alert
  - Check if the issue is on multiple pages or isolated to one
  - Check if the issue is on mobile only, desktop only, or both
  - Check the trend: is this a sudden drop or gradual degradation?

Step 3: IDENTIFY CAUSE
  - Recent deployment? Check git log for changes in the last 24 hours
  - New third-party script? Check the page source for new external resources
  - Server issue? Check TTFB and server response times
  - Image/media change? Check if the LCP element changed
  - CDN issue? Check CDN provider status page

Step 4: FIX OR ESCALATE
  - If the fix is straightforward (e.g., revert a deployment, remove a script):
    - Fix it, then re-run a manual test to verify
  - If the fix requires investigation:
    - Create a ticket with the alert details
    - Set priority based on severity level
    - Assign to the appropriate developer

Step 5: VERIFY
  - After fixing, run a PageSpeed test to confirm the metric is back within budget
  - Reply in the alert thread with the resolution
  - React to the alert to signal resolution (e.g. checkmark reaction)

Step 6: POST-MORTEM (P1 only)
  - For P1 incidents lasting > 2 hours:
    - Write a brief post-mortem (what happened, why, how it was fixed)
    - Identify preventive measures
    - Share in #perf-alerts for team awareness

Section 5: Cooldown and De-Duplication Rules

Prevent alert fatigue with smart cooldown rules. PagerDuty's guide to reducing alert noise covers similar principles: rate limits, deduplication, severity-based routing, and suppression during maintenance. The rules below follow the same approach.

// Cooldown Rules

Rule 1: Per-Page Cooldown
  After an alert fires for a specific page + metric + strategy,
  do not fire again for the same combination within the cooldown period.

  P1: 4 hours
  P2: 8 hours
  P3: 24 hours

Rule 2: Escalation Override
  If a metric worsens during the cooldown period (e.g., P3 → P2 threshold),
  fire a new alert at the higher severity regardless of cooldown.

Rule 3: Resolution Resets Cooldown
  When a metric returns to within budget, the cooldown resets.
  If it exceeds budget again, a new alert fires immediately.

Rule 4: Maintenance Windows
  During scheduled maintenance or deployments:
  - Suppress P3 and P4 alerts for 2 hours
  - P1 and P2 alerts are NEVER suppressed
  - Use Slack's /mute-perf-alerts command (if configured) or
    temporarily disable non-critical alerts in the monitoring tool

Section 6: Scheduled Reporting (Non-Alert)

In addition to reactive alerts, set up proactive scheduled messages.

// Scheduled Reports

Daily Digest (posted to #perf-alerts at 9:00 AM)
  Content:
    - Number of tests run in the last 24 hours
    - Number of budget violations detected
    - Top 3 worst-performing pages (by score)
    - Any new alerts that fired overnight

Weekly Summary (posted to #perf-alerts every Monday at 9:00 AM)
  Content:
    - Week-over-week score trends for all monitored sites
    - Pages that improved vs degraded
    - Open alerts that haven't been resolved
    - API usage summary

Monthly Report (posted to #perf-alerts first Monday of each month)
  Content:
    - Link to the full monthly performance report
    - Top wins and regressions
    - Budget compliance percentage per client
    - Recommendations for the coming month

Quick Start: Minimal Policy

If you're not ready for the full policy, start with this:

One channel — #perf-alerts for all alerts
Two severity levels — Critical (score < 50 or any CWV in "Poor" range) and Notice (everything else)
Cooldown — 8 hours minimum between alerts for the same page/metric
Response — Acknowledge Critical within 4 hours; review Notice in next standup

Expand to the full policy as your team and client count grow.

Customisation Guide

For Small Teams (1-3 People)

Simplify to two channels:

#perf-alerts for all alerts
Email fallback for P1 only

Reduce severity levels to P1 (Critical) and P3 (Notice). Skip P2 and P4.

For Large Agencies (10+ Clients)

Add per-client channels (#client-acme-alerts) and route alerts based on client assignment. Use a rotation for on-call acknowledgment. Consider integrating with PagerDuty or Opsgenie for P1 escalation.

For In-House Teams

Replace "client" language with "product" or "team." Route alerts to the team that owns the page or feature. Add deployment-triggered alerts (test automatically after every deploy).

Implementation Checklist

[ ] Create Slack channels (#perf-alerts-critical, #perf-alerts)
[ ] Configure monitoring tool to send alerts to the appropriate channels
[ ] Set up severity thresholds per the template above
[ ] Configure cooldown periods
[ ] Document the response procedure in your team wiki
[ ] Brief the team on the new alert policy
[ ] Run a test alert to verify routing works
[ ] Schedule a 2-week review to adjust thresholds based on alert volume

FAQ

What's the difference between P1, P2, P3, and P4 severity levels?P1 is critical: any Core Web Vital in "Poor" range or Performance Score below 50. P2 is warning: metrics exceed budget by more than 20%. P3 is notice: metrics exceed budget but by less than 20%. P4 is informational: score improved or a previously violated metric returned to budget.

How do I prevent alert fatigue?Use cooldowns so the same page/metric doesn't alert repeatedly within a short window. Start with 8–24 hour cooldowns. Escalate only when metrics worsen or persist. Consider alerting only after 2+ consecutive tests over budget to filter natural fluctuation. PagerDuty's reduce-noise guide applies the same logic to incident management: deduplication, severity-based routing, and suppression during maintenance.

Should every team member get every alert?No. Route P1 and P2 to a small critical channel (on-call engineer, tech lead). Route all alerts to a general channel for visibility. Use per-client channels for agencies so the right account manager sees the right alerts.

Can I integrate performance alerts with PagerDuty or Opsgenie?Yes. Many monitoring tools support webhooks. Configure a webhook to fire on P1 alerts and point it at your incident management service. PagerDuty and Opsgenie accept incoming webhooks for alert creation.

How often should I review and adjust alert thresholds?Plan a review every 2–4 weeks for the first few months. If you're getting too many P2 or P3 alerts, thresholds may be too tight (or you have real regressions to fix). If you rarely see alerts, thresholds may be too loose. Use the scheduled digest to spot patterns before changing numbers.

Should I use different thresholds for mobile vs desktop?Yes, if you test both. Mobile typically performs worse (slower networks, weaker devices). Many teams set the same budget for both but expect more mobile alerts; others set a slightly looser mobile budget (e.g. LCP 3.0s mobile vs 2.5s desktop). Document your choice in the policy so the team knows what to expect.

What you can achieve: Your team has one policy — who gets which alerts, how to respond, and how to avoid fatigue. You can add Slack (or email) alerts without chaos: everyone knows when to act and when to review later.

Apogee Watcher supports email alerts today, with Slack and webhook delivery coming soon. Set configurable thresholds, cooldowns, and per-site budgets. Join the waitlist to be the first to set up automated performance alerts for your team.

DEV Community