When 200 OK Doesn't Mean Everything Is OK

#webdev #programming #monitoring #startup

Why Your Uptime Monitor Lies to You (And What I Built Instead)

Last year my site went down. Well, not really "down" — the server was fine. But users saw a broken page because Cloudflare had issues and my CSS wasn't loading.

My uptime monitor? Green checkmarks everywhere. 100% uptime. Great job.

That's when I realized: traditional monitoring is broken.

The lie of "200 OK"

Every uptime tool does the same thing:

GET https://your-site.com → 200 OK → "All good!"

But your site isn't just your server. It's:

Fonts from Google
JS bundles from your CDN
Payment forms from Stripe
Images from S3
Chat widgets, analytics, whatever else

If any of these break, users see a broken page. Your server is fine. Your monitoring is green. Your users are angry.

Remember when AWS S3 went down and half the internet broke? Or those Cloudflare incidents that took out millions of sites? Your server was probably fine during all of that. Your uptime monitor probably said everything was OK.

So I built something different

I made upsonar.io. It checks your site like a real browser would:

Load the page
Find all external resources (scripts, styles, images, fonts)
Check each one
Tell you what's actually broken

Here's what a real check looks like:

example.com: 200 OK ✓
├── fonts.googleapis.com → 200 ✓
├── cdn.example.com/app.js → 200 ✓
├── js.stripe.com/v3 → 200 ✓
├── s3.amazonaws.com/images/hero.webp → 503 ✗
└── widget.intercom.io → timeout ✗

Result: 3/5 dependencies working

Now you know the real picture.

How it works

┌──────────────┐      ┌──────────────┐      ┌─────────────┐
│   Frontend   │ ───▶ │   Go API     │ ───▶ │    Redis    │
│  React/Vite  │      │    Echo      │      │    Cache    │
└──────────────┘      └──────┬───────┘      └─────────────┘
                             │
           ┌─────────────────┼─────────────────┐
           ▼                 ▼                 ▼
    ┌────────────┐    ┌────────────┐    ┌────────────┐
    │  Sydney    │    │ Amsterdam  │    │  New York  │
    │ (DO Func)  │    │ (DO Func)  │    │ (DO Func)  │
    └─────┬──────┘    └─────┬──────┘    └─────┬──────┘
          │                 │                 │
          └─────────────────┼─────────────────┘
                            ▼
                    ┌──────────────┐
                    │ Your Website │
                    └──────────────┘

The idea: checks run from where your users actually are. Not from one datacenter.

9 regions right now: Sydney, Singapore, Amsterdam, Frankfurt, London, New York, San Francisco, Toronto, Bangalore. Each one is a serverless function on DigitalOcean — no servers to manage, scales automatically.

What happens on each check:

Scheduler triggers (every 1-5 minutes depending on your plan)
API fans out requests to all selected regions in parallel
Each function does its thing:
- HTTP request with detailed timing
- Parse HTML, find all external resources
- Check each resource (limited concurrency so we don't hammer the target)
Results come back, get cached in Redis
If status changed from last check → notification goes out

The serverless part is nice. Cold starts add maybe 200-300ms, but for monitoring that's fine. And I don't have to babysit servers in 9 datacenters.

Tech stuff (for the curious)

Backend is Go with Echo. Why Go? The net/http/httptrace package is amazing for measuring exactly where time goes in a request. Plus goroutines make it easy to check 50 resources in parallel without the code turning into callback soup.

Frontend is React + Vite. Database is just SQLite — simple, no separate service to manage, works fine for this scale.

The interesting bits:

HTML parsing to extract resources (handling srcset, lazy loading, relative URLs)
Semaphore pattern for concurrent resource checks (don't want to DDoS the target)
TLS probing to check which protocols and ciphers a server supports

I'll write about each of these separately.