The Problem: Alerts Without a Policy Are Chaos
You've got performance alerts, but nobody knows when to treat one as critical versus "review when you can." You either drown in notifications and start ignoring them, or you set thresholds so high that real issues slip through. Without a policy, your team doesn't know when to drop everything, when to acknowledge in the next standup, or how to avoid re-alerting on the same issue every hour.
The solution: An alert policy. It defines:
- What triggers an alert
- When the alert fires (thresholds and conditions)
- Where the alert goes (channels, recipients)
- How the team responds (severity levels, SLAs, escalation)
This template gives you a complete policy you can adopt, adapt, and paste into your team's Slack workspace or documentation. For background on what is Core Web Vitals (LCP, INP, CLS), see our practical guide; for performance budget thresholds and severity criteria, see The Complete Guide to Performance Budgets and our Performance Budget Thresholds Template.
The Alert Policy Template
Section 1: Alert Channels
Define where alerts are routed based on severity.
// Alert Channel Configuration
#perf-alerts-critical
Purpose: P1 and P2 alerts only
Who watches: On-call engineer, tech lead
Expectation: Respond within 30 minutes during business hours
Muted: Never
#perf-alerts
Purpose: All performance alerts (P1-P4)
Who watches: Full engineering team
Expectation: Review during next standup
Muted: Allowed outside business hours
#client-[name]-alerts (per-client channels, optional)
Purpose: Client-specific performance alerts
Who watches: Account manager, assigned developer
Expectation: Review same business day
Muted: Allowed outside business hours
Email (fallback)
Purpose: Backup delivery for all P1 alerts
Recipients: tech-lead@agency.com, ops@agency.com
Expectation: Ensures delivery if Slack is down
Section 2: Severity Levels
Define four severity levels based on metric deviation from budget. The "Poor" thresholds below match Google's Core Web Vitals definitions (LCP > 4s, INP > 500ms, CLS > 0.25).
// Severity Levels
P1 — CRITICAL
Condition: Any Core Web Vital in "Poor" range OR Performance Score < 50
Thresholds:
- LCP > 4.0 seconds
- INP > 500 milliseconds
- CLS > 0.25
- Performance Score < 50
Channel: #perf-alerts-critical + email fallback
Response: Acknowledge within 30 minutes. Begin investigation immediately.
Escalation: If unacknowledged after 1 hour → notify tech lead directly
Cooldown: 4 hours (don't re-alert for same page/metric within 4 hours)
P2 — WARNING
Condition: Any Core Web Vital exceeds budget by > 20% but not in "Poor" range
Thresholds:
- LCP > 3.0 seconds (budget: 2.5s, 20% over)
- INP > 240 milliseconds (budget: 200ms, 20% over)
- CLS > 0.12 (budget: 0.1, 20% over)
- Performance Score < 75
Channel: #perf-alerts-critical
Response: Investigate within 4 hours during business hours.
Escalation: If unresolved after 24 hours → escalate to P1
Cooldown: 8 hours
P3 — NOTICE
Condition: Any Core Web Vital exceeds budget but by < 20%
Thresholds:
- LCP 2.5 – 3.0 seconds
- INP 200 – 240 milliseconds
- CLS 0.10 – 0.12
- Performance Score 75 – 89
Channel: #perf-alerts
Response: Review in next standup. Create ticket if persistent.
Escalation: If still over budget after 3 consecutive tests → escalate to P2
Cooldown: 24 hours
P4 — INFO
Condition: Score improved significantly OR metric returned to budget after violation
Thresholds:
- Performance Score improved by > 10 points
- Previously violated metric now within budget
Channel: #perf-alerts
Response: No action required. For awareness only.
Cooldown: None (fire once per event)
Section 3: What to Include in Each Alert
Define the content of each alert message so the team can triage quickly.
// Alert Message Template
[SEVERITY] Performance Alert — [SITE] — [PAGE]
Metric: [METRIC_NAME]
Current Value: [ACTUAL_VALUE]
Budget: [BUDGET_VALUE]
Strategy: [MOBILE / DESKTOP]
Deviation: [PERCENTAGE OVER BUDGET]
Page URL: [FULL_URL]
Test Time: [TIMESTAMP]
Dashboard: [LINK_TO_MONITORING_DASHBOARD]
Previous Test: [PREVIOUS_VALUE] at [PREVIOUS_TIMESTAMP]
Trend: [IMPROVING / DEGRADING / STABLE]
Example P2 Alert:
[P2] Performance Alert — acme.com — Homepage
Metric: LCP (Largest Contentful Paint)
Current Value: 3,200 ms
Budget: 2,500 ms
Strategy: Mobile
Deviation: 28% over budget
Page URL: https://acme.com/
Test Time: 2026-02-12 07:00 UTC
Dashboard: https://app.apogeewatcher.com/sites/acme/results
Previous Test: 2,400 ms at 2026-02-11 07:00 UTC
Trend: DEGRADING (↑ 800ms in 24 hours)
Section 4: Response Procedures
Define what happens after an alert fires.
// Response Procedure
Step 1: ACKNOWLEDGE
- React to the Slack message to signal you're looking (e.g. add a reaction)
- If P1: reply in thread confirming you're investigating
Step 2: TRIAGE
- Open the monitoring dashboard link from the alert
- Check if the issue is on multiple pages or isolated to one
- Check if the issue is on mobile only, desktop only, or both
- Check the trend: is this a sudden drop or gradual degradation?
Step 3: IDENTIFY CAUSE
- Recent deployment? Check git log for changes in the last 24 hours
- New third-party script? Check the page source for new external resources
- Server issue? Check TTFB and server response times
- Image/media change? Check if the LCP element changed
- CDN issue? Check CDN provider status page
Step 4: FIX OR ESCALATE
- If the fix is straightforward (e.g., revert a deployment, remove a script):
- Fix it, then re-run a manual test to verify
- If the fix requires investigation:
- Create a ticket with the alert details
- Set priority based on severity level
- Assign to the appropriate developer
Step 5: VERIFY
- After fixing, run a PageSpeed test to confirm the metric is back within budget
- Reply in the alert thread with the resolution
- React to the alert to signal resolution (e.g. checkmark reaction)
Step 6: POST-MORTEM (P1 only)
- For P1 incidents lasting > 2 hours:
- Write a brief post-mortem (what happened, why, how it was fixed)
- Identify preventive measures
- Share in #perf-alerts for team awareness
Section 5: Cooldown and De-Duplication Rules
Prevent alert fatigue with smart cooldown rules. PagerDuty's guide to reducing alert noise covers similar principles: rate limits, deduplication, severity-based routing, and suppression during maintenance. The rules below follow the same approach.
// Cooldown Rules
Rule 1: Per-Page Cooldown
After an alert fires for a specific page + metric + strategy,
do not fire again for the same combination within the cooldown period.
P1: 4 hours
P2: 8 hours
P3: 24 hours
Rule 2: Escalation Override
If a metric worsens during the cooldown period (e.g., P3 → P2 threshold),
fire a new alert at the higher severity regardless of cooldown.
Rule 3: Resolution Resets Cooldown
When a metric returns to within budget, the cooldown resets.
If it exceeds budget again, a new alert fires immediately.
Rule 4: Maintenance Windows
During scheduled maintenance or deployments:
- Suppress P3 and P4 alerts for 2 hours
- P1 and P2 alerts are NEVER suppressed
- Use Slack's /mute-perf-alerts command (if configured) or
temporarily disable non-critical alerts in the monitoring tool
Section 6: Scheduled Reporting (Non-Alert)
In addition to reactive alerts, set up proactive scheduled messages.
// Scheduled Reports
Daily Digest (posted to #perf-alerts at 9:00 AM)
Content:
- Number of tests run in the last 24 hours
- Number of budget violations detected
- Top 3 worst-performing pages (by score)
- Any new alerts that fired overnight
Weekly Summary (posted to #perf-alerts every Monday at 9:00 AM)
Content:
- Week-over-week score trends for all monitored sites
- Pages that improved vs degraded
- Open alerts that haven't been resolved
- API usage summary
Monthly Report (posted to #perf-alerts first Monday of each month)
Content:
- Link to the full monthly performance report
- Top wins and regressions
- Budget compliance percentage per client
- Recommendations for the coming month
Quick Start: Minimal Policy
If you're not ready for the full policy, start with this:
-
One channel —
#perf-alertsfor all alerts - Two severity levels — Critical (score < 50 or any CWV in "Poor" range) and Notice (everything else)
- Cooldown — 8 hours minimum between alerts for the same page/metric
- Response — Acknowledge Critical within 4 hours; review Notice in next standup
Expand to the full policy as your team and client count grow.
Customisation Guide
For Small Teams (1-3 People)
Simplify to two channels:
-
#perf-alertsfor all alerts - Email fallback for P1 only
Reduce severity levels to P1 (Critical) and P3 (Notice). Skip P2 and P4.
For Large Agencies (10+ Clients)
Add per-client channels (#client-acme-alerts) and route alerts based on client assignment. Use a rotation for on-call acknowledgment. Consider integrating with PagerDuty or Opsgenie for P1 escalation.
For In-House Teams
Replace "client" language with "product" or "team." Route alerts to the team that owns the page or feature. Add deployment-triggered alerts (test automatically after every deploy).
Implementation Checklist
- [ ] Create Slack channels (
#perf-alerts-critical,#perf-alerts) - [ ] Configure monitoring tool to send alerts to the appropriate channels
- [ ] Set up severity thresholds per the template above
- [ ] Configure cooldown periods
- [ ] Document the response procedure in your team wiki
- [ ] Brief the team on the new alert policy
- [ ] Run a test alert to verify routing works
- [ ] Schedule a 2-week review to adjust thresholds based on alert volume
FAQ
What's the difference between P1, P2, P3, and P4 severity levels?P1 is critical: any Core Web Vital in "Poor" range or Performance Score below 50. P2 is warning: metrics exceed budget by more than 20%. P3 is notice: metrics exceed budget but by less than 20%. P4 is informational: score improved or a previously violated metric returned to budget.
How do I prevent alert fatigue?Use cooldowns so the same page/metric doesn't alert repeatedly within a short window. Start with 8–24 hour cooldowns. Escalate only when metrics worsen or persist. Consider alerting only after 2+ consecutive tests over budget to filter natural fluctuation. PagerDuty's reduce-noise guide applies the same logic to incident management: deduplication, severity-based routing, and suppression during maintenance.
Should every team member get every alert?No. Route P1 and P2 to a small critical channel (on-call engineer, tech lead). Route all alerts to a general channel for visibility. Use per-client channels for agencies so the right account manager sees the right alerts.
Can I integrate performance alerts with PagerDuty or Opsgenie?Yes. Many monitoring tools support webhooks. Configure a webhook to fire on P1 alerts and point it at your incident management service. PagerDuty and Opsgenie accept incoming webhooks for alert creation.
How often should I review and adjust alert thresholds?Plan a review every 2–4 weeks for the first few months. If you're getting too many P2 or P3 alerts, thresholds may be too tight (or you have real regressions to fix). If you rarely see alerts, thresholds may be too loose. Use the scheduled digest to spot patterns before changing numbers.
Should I use different thresholds for mobile vs desktop?Yes, if you test both. Mobile typically performs worse (slower networks, weaker devices). Many teams set the same budget for both but expect more mobile alerts; others set a slightly looser mobile budget (e.g. LCP 3.0s mobile vs 2.5s desktop). Document your choice in the policy so the team knows what to expect.
What you can achieve: Your team has one policy — who gets which alerts, how to respond, and how to avoid fatigue. You can add Slack (or email) alerts without chaos: everyone knows when to act and when to review later.
Apogee Watcher supports email alerts today, with Slack and webhook delivery coming soon. Set configurable thresholds, cooldowns, and per-site budgets. Join the waitlist to be the first to set up automated performance alerts for your team.
Top comments (0)