DevHelm

Posted on Jun 19 • Originally published at devhelm.io

What Is Synthetic Monitoring? The Complete Guide

#guides #reliability

Your API returned 200 OK. Your servers were up. Your dashboards were green. And the "Pay now" button did nothing, because a frontend deploy shipped a JavaScript error that broke the click handler. You found out when refunds started rolling in.

Synthetic monitoring is the practice of running scripted, automated checks against your application from the outside, on a fixed schedule, so you catch broken paths before a real user does. Instead of waiting for traffic to reveal a problem, you generate synthetic traffic — a script that requests an endpoint, or a headless browser that logs in and clicks through checkout — and assert that the result is what it should be.

The name is the giveaway: the traffic is synthetic. It is not a real user; it is a robot pretending to be one, running every 30 seconds from a datacenter in another part of the world, so that the first entity to discover your checkout is broken is a machine you own — not a customer.

What synthetic monitoring actually checks

A synthetic check has three parts: a script (what to do), an assertion (what "correct" means), and a schedule (how often, from where).

The script can be as simple as "GET /health and expect 200" or as involved as "open the homepage, click Sign in, type these credentials, wait for the dashboard, confirm the account balance renders."
The assertion is the part that separates real monitoring from a glorified ping. Status code 200 is not enough — you assert on the response body, a specific element appearing, a redirect landing where it should, or a page finishing under two seconds.
The schedule decides your detection latency. A check every 30 seconds means you learn about a failure within 30 seconds; a check every 5 minutes means a broken deploy can bleed for five minutes before anything notices.

The core idea is proactive, not reactive. Real-user monitoring tells you what already happened to real people. Synthetic monitoring tells you what would happen to the next person — continuously, including at 3 AM when nobody is shopping but your deploy pipeline just ran.

Synthetic monitoring vs real-user monitoring

These two are complements, not competitors. Real-user monitoring (RUM) instruments your actual frontend and records what real visitors experience — their load times, their errors, their rage clicks. It has perfect fidelity to reality but zero coverage when there is no traffic, and it can only tell you about a broken path after a real person hit it.

Synthetic monitoring has the opposite shape: it runs whether or not anyone is using the app, it covers the exact journeys you choose, and it catches regressions the moment they ship. The trade-off is that a synthetic script only tests the paths you wrote scripts for. We cover the full comparison — including when each one wins and how teams run both — in synthetic monitoring vs real-user monitoring.

The two layers: API checks and browser checks

"Synthetic monitoring" spans two technically different workloads, and the distinction matters for cost and coverage.

API (HTTP) synthetic checks exercise your endpoints directly. They send a request — often a multi-step sequence like authenticate, create a resource, read it back, delete it — and assert on status codes, headers, response bodies, and JSON paths. They are cheap to run, fast, and catch the majority of backend regressions. This is the same machinery as API monitoring: a request, an assertion, an alert.

Browser synthetic checks drive a real headless browser (almost always Chromium via Playwright) through a user journey: navigate, type, click, wait, assert on what the user actually sees. They catch the class of failure that API checks structurally cannot — the dead button, the broken redirect, the form that submits but never confirms, the third-party script that blocks render. They cost more to run (a browser launch is heavier than an HTTP request), which is why most vendors meter them.

A mature setup uses both: API checks for breadth and speed across every endpoint, browser checks for depth on the two or three journeys that pay your bills.

How a browser synthetic check works

Under the hood, a browser synthetic check is a Playwright (or Playwright-style) script executed on a schedule. A minimal checkout check looks like this:

import { test, expect } from "@playwright/test";

test("checkout flow reaches confirmation", async ({ page }) => {
  await page.goto("https://shop.example.com");
  await page.getByRole("button", { name: "Add to cart" }).click();
  await page.getByRole("link", { name: "Checkout" }).click();
  await page.getByLabel("Card number").fill("4242424242424242");
  await page.getByRole("button", { name: "Pay now" }).click();

  // The assertion that a 200 OK can never make for you:
  await expect(page.getByText("Order confirmed")).toBeVisible({
    timeout: 10000,
  });
});

The script launches a headless Chromium, runs the steps, and the assertion fails if "Order confirmed" never appears — even though every underlying API returned 200. When it fails, a good platform captures a screenshot, the console errors, and the network waterfall at the moment of failure, so you are not debugging blind. Turning an existing end-to-end test into a production monitor is the core move; we walk through it in Playwright monitoring.

What to monitor (and what not to)

You cannot synthetically monitor everything, and you should not try — every browser check costs compute and adds maintenance. Pick the journeys where failure is expensive and silent:

Authentication — login and signup. If users cannot get in, nothing else matters.
The money path — checkout, subscription upgrade, add payment method. Revenue-bearing, and the most likely to break silently behind a 200.
Core product action — the one thing your product exists to do (send a message, create a report, run a query).
Critical third-party handoffs — the OAuth redirect, the payment provider iframe, the SSO round-trip.

What to leave to cheaper layers: every static page, every read-only endpoint, every internal admin screen. Those belong on uptime and API checks, not on expensive browser journeys.

How often, and from where

Two scheduling decisions shape both your detection speed and your bill.

Interval is the detection-latency lever. A 30-second interval is the standard for revenue-critical journeys; 5 minutes is acceptable for secondary flows. Faster is not free — a browser check every 30 seconds from three regions is 259,200 runs per month for a single check, which is exactly where metered pricing turns into bill shock.

Location matters because failures are often regional: a CDN edge cert expires in one region, DNS propagates unevenly, a deploy rolls out to one zone first. Running the same check from multiple geographies catches problems a single-origin check misses, and it confirms whether an outage is global or local. The same multi-region logic applies to DNS and SSL certificate checks.

A practical default: 30-second API checks everywhere, 30-second-to-1-minute browser checks on your top journeys, from two or three regions that match where your users are.

Where synthetic monitoring fits in your reliability stack

Synthetic monitoring is a detection layer, and detection is the first term in every incident metric. The faster a synthetic check catches a broken deploy, the lower your MTTR — you cannot start fixing what you have not noticed. Synthetic uptime data is also the cleanest input to an availability SLI and SLO: a check that runs every 30 seconds from outside your infrastructure is a far more honest measure of "is it working for users" than internal health metrics that stay green while the frontend burns.

It also pairs with dependency awareness. A synthetic checkout check that fails because Stripe is degraded is a different incident than one that fails because you shipped a bug — and knowing which is which up front is the difference between a five-minute acknowledgment and a thirty-minute scramble.

Getting started

The build order that works: cover your endpoints with API and uptime checks first (breadth, cheap, fast), then add browser checks on the two or three journeys that cost you money when they break. For tool selection, see the best synthetic monitoring tools in 2026 and the best practices for what to assert and how often.

The endpoints and uptime underneath those journeys are the foundation — and the cheapest layer to get right first. Set up your API and uptime monitoring, with multi-region checks and a status page that updates from the same data, at app.devhelm.io — your first monitor is live in about 60 seconds, no credit card.

Originally published on DevHelm.

DEV Community