DEV Community

2x lazymac
2x lazymac

Posted on

Is OpenAI Down? Real-Time AI Provider Status Monitoring

Your AI pipeline breaks when OpenAI goes down. In 2025, GPT-4 had 12 significant outages. Claude had 8. If your app has no fallback detection, each outage means manual intervention and angry users. Here's how to monitor AI provider status in real time.

The Problem with Official Status Pages

Most AI providers have status pages, but they:

  • Update with 15-30 minute delays after incidents start
  • Don't distinguish between degraded performance and total outages
  • Don't track regional differences (US-East fine, EU degraded)
  • Require manual checking

You need automated, real-time detection.

Real-Time Status Monitoring

const resp = await fetch('https://api.lazy-mac.com/ai-provider-status', {
  headers: { 'Authorization': 'Bearer YOUR_KEY' }
});

const status = await resp.json();

// {
//   "openai": { "status": "operational", "latency_p95_ms": 2340, "error_rate": 0.001 },
//   "anthropic": { "status": "degraded", "latency_p95_ms": 8900, "error_rate": 0.043 },
//   "google": { "status": "operational", "latency_p95_ms": 1200, "error_rate": 0.002 }
// }

if (status.anthropic.status !== 'operational') {
  // Route to backup provider
  await routeToOpenAI(request);
} else {
  await routeToAnthropic(request);
}
Enter fullscreen mode Exit fullscreen mode

Webhook Alerts

Get notified the moment a provider degrades:

# Register a webhook for status changes
import requests

requests.post("https://api.lazy-mac.com/ai-provider-status/webhooks",
    json={
        "url": "https://your-app.com/webhooks/ai-status",
        "providers": ["openai", "anthropic", "google"],
        "alert_on": ["degraded", "outage", "recovered"],
        "latency_threshold_ms": 5000  # Alert if P95 > 5s
    },
    headers={"Authorization": "Bearer YOUR_KEY"}
)
Enter fullscreen mode Exit fullscreen mode

Building a Resilient AI Router

from functools import lru_cache
import time

class ResilientAIRouter:
    def __init__(self, status_api_key: str):
        self.api_key = status_api_key
        self._cache = {}
        self._cache_ts = 0

    def get_operational_providers(self) -> list[str]:
        # Cache status checks for 30 seconds
        if time.time() - self._cache_ts > 30:
            resp = requests.get(
                "https://api.lazy-mac.com/ai-provider-status",
                headers={"Authorization": f"Bearer {self.api_key}"}
            )
            self._cache = resp.json()
            self._cache_ts = time.time()

        return [
            provider for provider, info in self._cache.items()
            if info["status"] == "operational" and info["error_rate"] < 0.01
        ]

    def route(self, request, preferred="anthropic"):
        operational = self.get_operational_providers()

        if preferred in operational:
            return self.call_provider(preferred, request)

        for fallback in ["openai", "google", "anthropic"]:
            if fallback in operational:
                return self.call_provider(fallback, request)

        raise Exception("All AI providers currently unavailable")
Enter fullscreen mode Exit fullscreen mode

Historical Uptime Data

The API also provides 30-day uptime history per provider — useful for SLA reporting and choosing the right provider for critical workloads.

AI Provider Status API | Documentation

Top comments (0)