Hrish B

Posted on Nov 18

A List of Status Pages Every TechOps Engineer Should Know

#statuspage #monitoring #devops #sitereliabilityengineering

The Importance of Tracking Third-Party Status Pages

As a TechOps engineer, you are responsible for keeping abreast of the external services you use. In a modern team, that is pretty much every critical service - whether it's a cloud provider or a SaaS vendor.
Part of your incident management strategy should be to monitor the status of these services and be alerted when they are down or experiencing issues.

No service in your stack lives by itself. While debugging an outage in your application, you will need to know the status of the services it depends on.
Most third-party services have a public status page where they post updates about outages and maintenance.

Tracking the status of these services should be a crucial part of your incident management process.

Which Status Pages Should You Track?

The actual pages will depend on which services your applications depend on. The following lists have some of the most common ones.

Cloud Providers

Microsoft Azure Note that Microsoft only publishes "widespread incidents" on their status page, i.e., incidents with a widespread impact. For incidents affecting services in your specific account, you should monitor your Azure health dashboard.
Google Cloud Platform
Amazon Web Services
DigitalOcean
Hetzner
Fly.io
Render
Railway
Linode
Vercel
Netlify
Cloudflare

CI/CD

Source Code Repositories

Hosted Databases

CDN and DNS

LLMs and AI Assistants

Observability and Monitoring

Artifact Repositories

SMTP Providers

Payment Gateways

How to Track Status Pages

Tracking status pages can be done in multiple ways:

Manually subscribing to the status page RSS feed, webhooks, Slack, or email notifications.
Using a status page monitoring/aggregator tool.

This article was first published on https://dev.to/talonx/a-list-of-status-pages-every-techops-engineer-should-know-28kd.

DEV Community