DEV Community: Adarsh Shukla

I Built a Full-Stack Uptime Monitoring SaaS in 30 Days — Here's Everything I Learned

Adarsh Shukla — Sat, 30 May 2026 06:48:35 +0000

Six months ago I was manually refreshing my client's website after every deployment, praying it stayed up.

That's when I decided to build WhistleBlower — a real-time uptime monitoring tool with alerts, status pages, and incident tracking.

Here's what I built and what I learned.

What WhistleBlower does

🔴 HTTP, TCP, PING, and DNS monitoring — not just websites
📧 Instant alerts via email, Slack, Discord, and SMS
📊 Public status pages — your users always know what's up
💓 Heartbeat monitoring — know when your cron jobs die silently
🔒 SSL certificate expiry alerts — never get caught with an expired cert
👥 Team & on-call scheduling for agencies

The tech stack

Frontend: Next.js 14 + Tailwind CSS
Backend: Node.js + Express + TypeScript
Database: MySQL (Railway)
Emails: Resend
Payments: Razorpay
Deploy: Vercel (frontend) + Railway (backend)
Cron worker: GitHub Actions (free!)

The hardest part

ICMP ping is blocked on containerized environments like Railway and Docker. My PING monitors were silently failing in production while working fine locally.

The fix? A 3-strategy fallback:

ICMP ping (works on bare metal / GitHub Actions)
TCP connect to port 443, then 80
DNS lookup as final fallback

async function checkPing(host: string): Promise<CheckResult> {
  // Strategy 1: ICMP
  const icmpResult = await tryICMP(host);
  if (icmpResult.isUp) return icmpResult;

  // Strategy 2: TCP fallback (containers block ICMP)
  for (const port of [443, 80]) {
    const tcp = await tryTCP(host, port);
    if (tcp.isUp) return tcp;
  }

  // Strategy 3: DNS
  return tryDNS(host);
}

What I'd do differently

Start with a free tier plan from day one — I almost didn't add one
Deploy earlier — I spent too long perfecting locally
GitHub Actions as a cron runner is genuinely brilliant for side projects

Try it free

👉 whistle-blower-two.vercel.app

Free plan includes 5 monitors, 5-minute checks, email alerts — no credit card needed.

Would love your feedback in the comments! 🚀

Why Your Website Can Be "Up" And Still Broken: A Deep Dive Into Latency Phases

Adarsh Shukla — Fri, 29 May 2026 06:34:04 +0000

Why Your Website Can Be "Up" And Still Broken

Most uptime monitors tell you one thing: is the server responding? But that binary answer misses the full picture of what your users actually experience.

The 4 Phases of Every HTTP Request

Every time a browser loads your website, it goes through 4 distinct phases:

1. DNS Lookup (dns_ms)

Your browser needs to convert yoursite.com into an IP address. This involves querying DNS servers. A healthy DNS lookup takes < 50ms. If you're seeing > 200ms, your DNS provider may be slow or your TTL is set too low.

What breaks it: DNS propagation issues, expired records, DDoS on DNS provider (happened to Cloudflare, Dyn).

2. TCP Connect (tcp_ms)

Once the IP is known, the browser opens a TCP connection to your server. This is basically the round-trip time between your user and your server. Expect < 100ms for same-continent users.

What breaks it: Server is too far from users, DDoS, port blocked by firewall.

3. TLS Handshake (tls_ms)

For HTTPS sites, client and server negotiate encryption keys. This adds 20-150ms typically. If you're seeing > 500ms, your TLS configuration needs tuning (consider enabling TLS session resumption).

What breaks it: Expired certificates (site shows scary red warning), misconfigured cipher suites, revoked certificates.

4. Time to First Byte — TTFB (ttfb_ms)

This is the time from "request sent" to "first byte of response received." It's the most important metric for perceived performance. Target < 200ms. Above 800ms, users start bouncing.

What breaks it: Slow database queries, no caching, memory leaks, cold-start serverless functions.

Why This Matters More Than Simple Uptime

A server can return HTTP 200 while:

Taking 4 seconds to respond (TTFB issue)
Having a broken TLS configuration
Serving from a CDN that's cached a broken page
Running out of memory (but still technically responding)

Setting Up Phase-Level Monitoring

Tools like WhistleBlower break down each phase independently, so when your monitoring fires an alert, you already know which part of the request chain failed — not just "something is wrong."

DNS: 12ms  ✅
TCP: 45ms  ✅  
TLS: 89ms  ✅
TTFB: 2400ms ❌ ← your database is choking

This cuts mean-time-to-resolution dramatically. Instead of "the site is slow, dig into everything," you know exactly where to look.

Quick Fixes by Phase

Phase	Slow?	Try This
DNS	> 100ms	Switch to Cloudflare DNS, increase TTL
TCP	> 200ms	Use a CDN, move server closer to users
TLS	> 300ms	Enable TLS session resumption, use HTTP/2
TTFB	> 500ms	Add Redis cache, optimize slow DB queries

WhistleBlower monitors all 4 phases on every check and tracks trends over time. Try it free.

5 Uptime Monitoring Mistakes That Cost Developers Hours of Debugging

Adarsh Shukla — Fri, 29 May 2026 06:31:55 +0000

5 Uptime Monitoring Mistakes That Cost Developers Hours of Debugging

I've been building and maintaining web applications for years, and I've watched the same monitoring mistakes happen over and over. Here are the 5 most costly ones — and how to fix them.

Mistake 1: Only Monitoring the Home Page

Your / route working tells you almost nothing about whether your app is healthy. What about your checkout API? Your user auth endpoint? Your image upload handler?

Fix: Set up monitors for your critical user paths: login endpoint, payment flow, core API routes.

Mistake 2: Ignoring SSL Expiry Until It's Too Late

SSL certificates expire. When they do, browsers show a full-page red warning that tells users your site is dangerous. Most users leave immediately.

Fix: Monitor your certificate expiry date and alert 30 days before. This is free and takes 5 minutes to set up.

Mistake 3: Not Monitoring DNS

If your DNS goes down, your site is completely unreachable — even if your server is running perfectly. Yet most monitoring tools only check if the server responds, not if DNS resolves correctly.

Fix: Add DNS monitoring alongside HTTP monitoring. They're different failure modes.

Mistake 4: Only Checking From One Location

Your server might be fine, but Cloudflare's edge node in Frankfurt could be serving a cached error to all European users. Single-location monitoring misses this entirely.

Fix: Monitor from at least 2 geographic locations. Compare response times.

Mistake 5: No Incident Timeline

When your site goes down, the first thing stakeholders ask is "when did this start?" Without a proper incident log, you're guessing.

Fix: Use a monitoring tool that records every check result with timestamps. You want to show exactly: went down at 14:32, resolved at 15:18, duration 46 minutes.

WhistleBlower handles all 5 of these by default. Check it out.

Building a Public Status Page: What to Show and What to Hide

Adarsh Shukla — Fri, 29 May 2026 06:30:04 +0000

Building a Public Status Page: What to Show and What to Hide

A public status page is one of the highest-leverage things you can do for user trust. When your service has an incident, users who find your status page stay calm. Users who don't find it flood your support inbox.

What You Should Always Show

Current Status — A clear "All systems operational" or "Degraded performance" that updates in real time. Don't make users wonder.

Uptime History — A 90-day bar chart showing daily uptime. Users want to see if outages are rare or a pattern.

Incident History — Past incidents with start time, duration, and resolution. Transparency builds trust.

Response Time Trend — Showing your average response time over the past 30 days proves your site is consistently fast, not just "up."

What You Should Hide

Internal System Names — Your users don't care about "postgres-replica-3." Call it "Database" or "Search."

Exact Error Messages — "ECONNREFUSED 10.0.1.5:3306" tells your users nothing but tells attackers a lot.

Partial Outages on Non-Critical Systems — A flapping internal metric service doesn't need to appear as a user-facing incident.

The Tone Matters

During an incident, your status page message should:

Acknowledge the issue immediately (don't wait until it's fixed)
Give an estimated resolution time even if it's a rough guess
Update at least every 30 minutes
Thank users for their patience

"We are investigating reports of elevated error rates on the API. Our team is actively working on a fix. ETA: 45 minutes. Last updated: 14:47 UTC."

That's it. Don't oversell, don't overpromise.

Auto-Generated vs Manual Status Pages

Auto-generated (like WhistleBlower does): based on real monitoring data. Updates automatically. Accurate. Requires zero manual effort.

Manual: requires a human to update during an incident. Usually accurate but delayed. Good for incident narrative and communication.

Best practice: auto-generate the uptime data, write the incident narrative manually.

WhistleBlower generates public status pages automatically from your monitoring data. Share a URL with your users and never write a manual status update again.