Every VoIP provider will have an incident eventually. The question is not whether it happens — it is how the provider responds when it does.
Here is DialPhone's incident response process, documented publicly because we believe transparency builds more trust than pretending incidents do not happen.
Severity Classification
| Severity | Definition | Example | Response Target |
|---|---|---|---|
| P1 — Critical | Service down for multiple customers | Complete outage, no calls | 5 minutes to acknowledge, 30 minutes to mitigate |
| P2 — Major | Service degraded for multiple customers | Call quality below MOS 3.5 | 15 minutes to acknowledge, 2 hours to resolve |
| P3 — Minor | Service affected for single customer | One customer's recording not working | 1 hour to acknowledge, 8 hours to resolve |
| P4 — Low | Cosmetic or non-urgent | Dashboard display error | 4 hours to acknowledge, 48 hours to resolve |
What Happens During a P1 (The Worst Day)
Minute 0-5: Detection
Our monitoring catches it before customers call:
- Synthetic call testing from 20 UK locations every 60 seconds
- SIP registration success rate monitored per-second
- RTP quality metrics aggregated per-minute
- Customer-facing status page auto-updates
Target: Detect within 60 seconds. Acknowledge on status page within 5 minutes.
Minute 5-15: Assessment
On-call engineer (24/7 rotation, UK-based) assesses:
- Scope: how many customers affected?
- Impact: complete outage or degraded service?
- Root cause hypothesis: network, application, or infrastructure?
Status page updated with scope and estimated time to resolution.
Minute 15-30: Mitigation
Immediate actions to restore service:
- If data centre issue: failover to secondary DC (active-active, < 3 seconds)
- If application issue: restart affected services, roll back recent changes
- If network issue: reroute traffic through backup paths
Target: Service restored within 30 minutes for P1.
Minute 30-60: Confirmation
- Verify service is restored for all affected customers
- Monitor for recurrence
- Notify affected customers by email with incident summary
- Status page updated to "resolved" with timeline
Hour 1-72: Postmortem
Within 72 hours of resolution, we publish a postmortem containing:
- Timeline: Minute-by-minute account of what happened
- Root cause: Technical explanation of why it happened
- Impact: Number of customers affected, duration, call statistics
- Resolution: What we did to fix it
- Prevention: What we are changing to prevent recurrence
Postmortems are published on our status page. No spin. No minimising. If we messed up, we say so.
Real Postmortem Example
Incident: March 2025 — 18-minute partial outage
| Field | Detail |
|---|---|
| Duration | 18 minutes |
| Customers affected | 12% (geographic — London region) |
| Impact | Inbound calls to affected customers failed; outbound and inter-office calls unaffected |
| Root cause | Upstream BGP route leak from transit provider caused London PoP to become unreachable |
| Detection time | 47 seconds (automated monitoring) |
| Mitigation | Traffic rerouted through Manchester PoP at minute 14 |
| Resolution | Transit provider corrected BGP announcement at minute 18 |
| Prevention | Added automated BGP anomaly detection with sub-60-second rerouting |
Our Track Record
| Year | P1 Incidents | Total Downtime | Measured Uptime |
|---|---|---|---|
| 2024 | 2 | 25 minutes | 99.995% |
| 2025 | 3 | 47 minutes | 99.991% |
| 2026 (Q1) | 0 | 0 minutes | 100% |
We are not perfect. 47 minutes of downtime in 2025 is 47 minutes too many. But we detected every incident in under 60 seconds, mitigated within 30 minutes, and published full postmortems within 72 hours.
What to Ask Any Provider
- How many P1 incidents did you have last year?
- What was your longest outage?
- Can I see a postmortem from a recent incident?
- What is your detection-to-mitigation time?
- Do you have a public status page with history?
If they cannot answer all five, their reliability story is marketing, not engineering.
DialPhone answers all five publicly. Because your business depends on us answering calls — and you deserve to know exactly how seriously we take that responsibility.
Top comments (0)