Why Your Uptime Monitor Lies to You (And What I Built Instead)
Last year my site went down. Well, not really "down" — the server was fine. But users saw a broken page because Cloudflare had issues and my CSS wasn't loading.
My uptime monitor? Green checkmarks everywhere. 100% uptime. Great job.
That's when I realized: traditional monitoring is broken.
The lie of "200 OK"
Every uptime tool does the same thing:
GET https://your-site.com → 200 OK → "All good!"
But your site isn't just your server. It's:
- Fonts from Google
- JS bundles from your CDN
- Payment forms from Stripe
- Images from S3
- Chat widgets, analytics, whatever else
If any of these break, users see a broken page. Your server is fine. Your monitoring is green. Your users are angry.
Remember when AWS S3 went down and half the internet broke? Or those Cloudflare incidents that took out millions of sites? Your server was probably fine during all of that. Your uptime monitor probably said everything was OK.
So I built something different
I made upsonar.io. It checks your site like a real browser would:
- Load the page
- Find all external resources (scripts, styles, images, fonts)
- Check each one
- Tell you what's actually broken
Here's what a real check looks like:
example.com: 200 OK ✓
├── fonts.googleapis.com → 200 ✓
├── cdn.example.com/app.js → 200 ✓
├── js.stripe.com/v3 → 200 ✓
├── s3.amazonaws.com/images/hero.webp → 503 ✗
└── widget.intercom.io → timeout ✗
Result: 3/5 dependencies working
Now you know the real picture.
How it works
┌──────────────┐ ┌──────────────┐ ┌─────────────┐
│ Frontend │ ───▶ │ Go API │ ───▶ │ Redis │
│ React/Vite │ │ Echo │ │ Cache │
└──────────────┘ └──────┬───────┘ └─────────────┘
│
┌─────────────────┼─────────────────┐
▼ ▼ ▼
┌────────────┐ ┌────────────┐ ┌────────────┐
│ Sydney │ │ Amsterdam │ │ New York │
│ (DO Func) │ │ (DO Func) │ │ (DO Func) │
└─────┬──────┘ └─────┬──────┘ └─────┬──────┘
│ │ │
└─────────────────┼─────────────────┘
▼
┌──────────────┐
│ Your Website │
└──────────────┘
The idea: checks run from where your users actually are. Not from one datacenter.
9 regions right now: Sydney, Singapore, Amsterdam, Frankfurt, London, New York, San Francisco, Toronto, Bangalore. Each one is a serverless function on DigitalOcean — no servers to manage, scales automatically.
What happens on each check:
- Scheduler triggers (every 1-5 minutes depending on your plan)
- API fans out requests to all selected regions in parallel
- Each function does its thing:
- HTTP request with detailed timing
- Parse HTML, find all external resources
- Check each resource (limited concurrency so we don't hammer the target)
- Results come back, get cached in Redis
- If status changed from last check → notification goes out
The serverless part is nice. Cold starts add maybe 200-300ms, but for monitoring that's fine. And I don't have to babysit servers in 9 datacenters.
Tech stuff (for the curious)
Backend is Go with Echo. Why Go? The net/http/httptrace package is amazing for measuring exactly where time goes in a request. Plus goroutines make it easy to check 50 resources in parallel without the code turning into callback soup.
Frontend is React + Vite. Database is just SQLite — simple, no separate service to manage, works fine for this scale.
The interesting bits:
- HTML parsing to extract resources (handling srcset, lazy loading, relative URLs)
- Semaphore pattern for concurrent resource checks (don't want to DDoS the target)
- TLS probing to check which protocols and ciphers a server supports
I'll write about each of these separately.
Try it
Free tools, no signup:
- Check availability from 9 regions
- Check SSL certificate — expiry, chain, TLS config
- Check domain expiry — WHOIS lookup
- Full health check — timing breakdown, security headers, dependencies
For continuous monitoring there's a free tier — 3 sites, checks every 5 minutes.
What's next
Planning to write about:
- Measuring HTTP phases with Go's httptrace
- Parsing HTML for resource URLs (it's trickier than you'd think)
- Testing TLS versions programmatically
- Concurrent HTTP requests without killing the target
Follow if you're interested.
What do you use for monitoring? Curious if others have hit similar issues with "false positives" from traditional uptime tools.
Top comments (0)