wilfridterry

Posted on Jun 17

200 OK Is Not the Same as "It Works"

#javascript #monitoring #testing #webdev

A few months ago a team I know shipped a routine Friday deploy. Every monitor stayed green all weekend. On Monday they discovered the signup form had been throwing a JavaScript error since Friday afternoon. The server was returning 200 OK the whole time. The page loaded. The HTML was valid. And not a single person could create an account for three days.

Nobody filed a bug. Customers don't file bugs. They hit a wall and leave.

This is the uncomfortable truth about most monitoring setups: they answer "is the server responding?" when the question that actually matters is "can a customer do the thing they came to do?" Those are not the same question, and the gap between them is where revenue quietly leaks out.

Why `200 OK` lies

A classic uptime check does roughly this:

curl -s -o /dev/null -w "%{http_code}" https://yoursite.com/
# 200

Green. Ship it. But 200 only tells you the origin returned something. It says nothing about whether that something is usable. All of the following return 200 while being completely broken for a real human:

A page that renders blank because a JS bundle 404'd and the framework never hydrated.
A checkout button that disappeared after a CSS refactor changed a class name.
A login form that submits to an endpoint now returning 500, but the page itself loads fine.
A third-party script (payments, analytics, a chat widget) that fails and takes the rest of the page down with it.
A layout that "works" but pushes the CTA below a broken hero image, so conversions crater.

The server is healthy. The experience is dead. And the longer your front end leans on client-side rendering, third-party scripts, and multi-step flows, the wider this gap gets.

Three layers, not one

Closing the gap means monitoring at three levels, each catching a class of failure the others miss.

1. Health — but the deep kind

Pinging a URL is table stakes. A genuinely useful health check on a single request should also surface:

HTTP, SSL, DNS, redirects — the boring stuff that still takes you down at 2 a.m. when a cert expires.
Blank-page / empty-render detection — did the DOM actually paint meaningful content, or did you ship an empty <div id="app">?
Broken resources — any sub-resource (JS, CSS, images, fonts) that failed to load.
Console JavaScript errors — the silent killers, since a thrown error can break interactivity without changing the status code.
First-party API calls — did the XHR/fetch calls the page depends on actually succeed?
Core Web Vitals, security headers, basic a11y and SEO — slower-moving signals, but cheap to grab in the same pass.

The key shift: stop treating "responded" as "healthy." Healthy means rendered and interactive.

2. Visual regression — catch what you can't assert

Some breakage has no clean assertion. A button moved. The hero image is 404ing so the layout collapsed. A font swap pushed everything 40px down. You can't easily expect() your way to "the page looks right."

So you do what humans do — you look. Programmatically:

Capture a screenshot on a schedule (daily/weekly for stable pages).
Diff it pixel-by-pixel against the previous baseline.
Surface the changed percentage and the diff image so a human can glance and decide: intended change, or regression?

This is the same idea behind tools like Percy or BackstopJS, applied continuously to production rather than only in CI. A 2% diff after a deploy you didn't ship is a great early-warning signal.

3. Journey monitoring — test the verbs

Health checks test nouns (the page). Journeys test verbs (the actions). This is where real money lives:

Search → add to cart → checkout for ecommerce.
Signup → verify → onboard for SaaS.
Login → load dashboard → key action for everything.

A journey monitor drives a real (headless) browser through these steps on a schedule and reports failure at the step level — so you don't just learn "checkout is broken," you learn "step 4, clicking 'Place order,' timed out." Historically this meant maintaining brittle Playwright/Cypress scripts that break every time a selector changes. The newer approach is to describe the flow in plain language and let the tooling resolve the steps, which dramatically lowers the maintenance cost that kills most synthetic-monitoring efforts.

Where this lands in practice

You can absolutely assemble this yourself: a cron'd headless-Chrome script for health, BackstopJS for visual diffs, Playwright for journeys, and something to route alerts. I've gone down that road; the wiring and the upkeep are the expensive parts. Selectors rot, baselines drift, and the alerting glue becomes its own side project.

The other option is a tool that bundles the three layers. NorthDuty is one I looked at recently that's built squarely around this "up but broken" thesis — it runs health checks (every 5 minutes by default), screenshot-based visual diffs, and user-journey checks on the same project, and notably lets you define journeys as plain text instead of scripts, plus AI-suggests a handful of likely happy-path flows per site. It's free to use right now, so it's easy to point it at a site and see what your current monitoring has been missing. There are others in adjacent space (Checkly leans script-first and developer-heavy, Better Stack and Pingdom lean uptime-first, Visualping is visual-only). The point isn't the brand — it's that you should be covering all three layers, however you get there.

A pragmatic starting point

If you want to close the biggest part of the gap with the least effort, in order:

Upgrade your health check to detect blank renders, console errors, and failed sub-resources — not just status codes. This alone catches a surprising share of "green but broken" incidents.
Add one journey for your single most revenue-critical flow (checkout or signup). One good journey beats ten URL pings.
Add visual diffs on your 3–5 highest-traffic, rarely-changing pages, where an unexpected diff is almost always a regression.
Set thresholds, not just on/off — alert on response time, health score, SSL expiry, and journey failure, and route them somewhere your team already reads (Slack/Discord/Teams), with maintenance windows to mute planned-work noise.

The takeaway

200 OK is a promise from your server, not from your product. The deploys that hurt most are rarely the ones that take the site down — they're the ones that leave it up and quietly broken, where every dashboard is green and your customers are the only ones who know the truth.

Monitor the experience, not just the endpoint.

How does your team catch "up but broken" today — custom scripts, a hosted tool, or do you find out from support tickets? Curious what's actually working for people.

DEV Community

200 OK Is Not the Same as "It Works"

Why `200 OK` lies

Three layers, not one

1. Health — but the deep kind

2. Visual regression — catch what you can't assert

3. Journey monitoring — test the verbs

Where this lands in practice

A pragmatic starting point

The takeaway

Top comments (0)

Why 200 OK lies

Three layers, not one

1. Health — but the deep kind

2. Visual regression — catch what you can't assert

3. Journey monitoring — test the verbs

Where this lands in practice

A pragmatic starting point

The takeaway

Why `200 OK` lies