Furozq

Posted on Feb 28

How I Detect Website Failures 60 Seconds Before They Happen (Without Heavy ML)

#webdev #javascript #saas #monitoring

Most monitoring tools answer one question:

“Is it up?”

I wanted to answer a different one:

“Is it about to go down?”

The Problem with Traditional Uptime Checks
Downtime rarely happens instantly.

In real-world systems, failure usually looks like this:

T-5 minutes → response time slowly climbs (200ms → 400ms)
T-2 minutes → latency spikes, occasional timeouts
T-1 minute → error rate increases sharply
T-0 → service crash

Traditional monitoring only checks availability.

It completely ignores degradation patterns.

The Core Idea: Trend + Volatility > Status
Instead of checking:

isAlive = true / false

I started tracking:

Response time trend
Slope direction
Variance (volatility)
Consecutive instability signals

Because instability is usually visible before failure.

The Lightweight Prediction Model

No heavy ML.
No TensorFlow.
No GPU.

Just math.

1️⃣ Exponential Moving Average (EMA)

EMA smooths out noise while preserving trend.

A single spike doesn’t trigger an alert.
But a gradual climb does.

2️⃣ Linear Regression (Slope Detection)

If latency is trending upward, regression tells me:

How fast it’s increasing
Where it will likely be in 5–15 minutes

If projected latency crosses a risk threshold → risk score increases.

3️⃣ Variance Analysis

A stable 200ms ± 20ms system is healthy.

A 200ms average swinging between 50ms and 2000ms is unstable.

Variance exposes hidden risk that averages hide.

Risk Scoring
All signals combine into a 0–100 instability score.

Instead of binary alerts, I get probabilistic warning levels:

0–30 → stable
30–60 → degrading
60+ → likely incident

This allows earlier, smarter alerts.

The Result
In controlled stress tests, the system flagged instability:

60–90 seconds before actual downtime
While the service was still technically “up.”

That window is enough to:

Scale horizontally
Trigger failover
Enable CDN fallback
Alert on-call engineers

Why Not Machine Learning?

I initially experimented with ML models.

They were:

Slower
Harder to tune
Resource heavy
Not meaningfully more accurate

Well-tuned statistical methods outperformed them.

Sometimes simple math beats complex AI.

Built with:

Node.js + TypeScript

SQLite

Single VPS (~$6/month)

No Kubernetes

If you're building infrastructure tools, you don't always need complexity.

Sometimes you just need the right signal.

I’m building this as ORVO AI. Feedback from fellow builders is always welcome.

DEV Community

How I Detect Website Failures 60 Seconds Before They Happen (Without Heavy ML)

Top comments (0)