<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>DEV Community: velprove</title>
    <description>The latest articles on DEV Community by velprove (@velprove).</description>
    <link>https://dev.to/velprove</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F3847633%2F8043484f-4748-45bd-841a-38308831f6d2.png</url>
      <title>DEV Community: velprove</title>
      <link>https://dev.to/velprove</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://dev.to/feed/velprove"/>
    <language>en</language>
    <item>
      <title>Solo Founder Outage Playbook: Survive the 3 AM Call</title>
      <dc:creator>velprove</dc:creator>
      <pubDate>Mon, 11 May 2026 14:00:02 +0000</pubDate>
      <link>https://dev.to/velprove/solo-founder-outage-playbook-survive-the-3-am-call-4ff3</link>
      <guid>https://dev.to/velprove/solo-founder-outage-playbook-survive-the-3-am-call-4ff3</guid>
      <description>&lt;p&gt;&lt;strong&gt;The honest take:&lt;/strong&gt; Your phone goes off at 3:14 AM. The monitor says login is failing in three regions. You are about to be the Incident Commander, the Communications Lead, and the Operations Lead at the same time, because there is no one else awake. Here is the playbook for what happens next. Most of it is deciding which role to be in which minute, not which command to run. The commands are the easy part.&lt;/p&gt;

&lt;h2&gt;
  
  
  You are wearing all three core ICS roles right now
&lt;/h2&gt;

&lt;p&gt;The Google SRE Book's Managing Incidents chapter (Chapter 14) defines four roles for any production incident under the Incident Command System (ICS): Incident Command (IC), Operational Work, Communications, and Planning. For a solo founder, Planning collapses into IC because there is no multi-day coordination to schedule. The three you actively cycle through are IC, Ops, and Comms. The IC owns the incident, makes decisions, and holds the timeline. Comms talks to customers and stakeholders. Ops touches the production system. In a team of five SREs at 3 AM, the roles go to different humans on purpose, because one person trying to do all of them at once is the most reliable way to make a 10-minute outage into a 4-hour one.&lt;/p&gt;

&lt;p&gt;You do not have three humans. You have you. The playbook adaptation for a solo founder is not to pretend the roles do not exist. It is to wear them in sequence, not in parallel. For the first 10 minutes you are the IC and only the IC. You do not push fixes. You do not write status updates. You confirm the outage, you open the incident document, and you decide what happens next. Then you switch to Comms for two minutes and post the first update. Only then do you become the Operations Lead and touch the system.&lt;/p&gt;

&lt;p&gt;That sequence sounds slow. It is faster than the alternative, because the alternative is you SSHed into production with no incident log, no customer message posted, and a half-formed theory of what is broken. Founders skip the IC role at 3 AM because it feels like overhead. It is the opposite. It is the part that keeps the next 60 minutes from spiraling.&lt;/p&gt;

&lt;h2&gt;
  
  
  What you should have set up before 3 AM
&lt;/h2&gt;

&lt;p&gt;You cannot fix the prep gap mid-incident. If you are reading this mid-outage, skip to the next section. If you are reading this on a calm Tuesday, the following list is the one that pays out at 3 AM, and the one most founders postpone because none of it feels urgent until it is.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;A monitor that actually catches the outage.&lt;/strong&gt; Most founders run one HTTP check on the homepage, and most real outages do not flip that check. The &lt;a href="https://velprove.com/blog/uptime-monitoring-saas-founders" rel="noopener noreferrer"&gt;four layers your monitoring should cover&lt;/a&gt; covers what to monitor, layered by depth. The short version: HTTP plus a browser login monitor plus a multi-step API monitor plus a public status page. &lt;strong&gt;A phone alert you cannot sleep through.&lt;/strong&gt; Email at 3 AM is not enough. Use a phone-ringing channel for severity-1 alerts: a dedicated ringtone, a hardware pager, or a paid escalation tool once you have a teammate. For solo founders, a ringing email-to-SMS bridge plus Do Not Disturb allow-list is the floor. &lt;strong&gt;A written runbook with three commands.&lt;/strong&gt; One command to roll back the last release. One to check the database is reachable. One to flip the marketing site to a static maintenance page. If you cannot run those three at 3 AM without thinking, write them down now. &lt;strong&gt;A public status page URL customers know about.&lt;/strong&gt; Linked from your marketing footer and from your support replies. If customers have to search for it during the incident, the page is too late. &lt;strong&gt;A pre-written first-message template.&lt;/strong&gt; Three sentences, with blanks for what is broken and when you will update next. Drafting prose at 3 AM with adrenaline is how founders post sentences they later regret.&lt;/p&gt;

&lt;h2&gt;
  
  
  How you find out you're down (and why your monitor lies)
&lt;/h2&gt;

&lt;p&gt;The first lie your monitor tells you is that it knows. The second is that what it knows matches what your customers are seeing. Both lies have specific shapes, and both have specific fixes.&lt;/p&gt;

&lt;p&gt;The most common false-green is a 200 OK on the marketing page while the authenticated dashboard is broken. The &lt;a href="https://velprove.com/blog/anatomy-of-a-silent-outage" rel="noopener noreferrer"&gt;200 OK can be a lie&lt;/a&gt; post walks through 10 documented incidents where HTTP monitoring returned green while real users could not log in, including the Cloudflare June 12 2025 Workers KV outage that took Access to 100% identity-login failure while marketing properties on the same network kept serving. If your only monitor is a homepage HTTP check, your dashboard could be down right now and your monitor would not know.&lt;/p&gt;

&lt;p&gt;The second lie is regional. A monitor in one city sees one network path. A regional fiber cut, a CDN routing change, or a single-AZ failure can look like a full outage from one probe and look fine from another. The &lt;a href="https://velprove.com/blog/why-uptime-monitors-miss-outages" rel="noopener noreferrer"&gt;green-while-down problem&lt;/a&gt; post covers why single-region and low-frequency probes miss the real failure surface. Multi-region with a probe interval short enough to catch the incident inside its lifetime is the structural answer.&lt;/p&gt;

&lt;p&gt;Velprove's browser login monitor is the layer that catches what HTTP cannot. It opens a real Chromium context, signs in as a known test user, and asserts on a post-login element. If identity is broken, if a captcha vendor is down, if a session token rotated, the run fails with a screenshot of where it died. Every plan includes a browser login monitor; the Free tier includes one browser login monitor at a 15-minute interval, and every plan probes from 5 global regions.&lt;/p&gt;

&lt;h2&gt;
  
  
  The first 10 minutes: triage before you touch a thing
&lt;/h2&gt;

&lt;p&gt;You wake up. The phone is buzzing. Resist the urge to push a hotfix. Your job for the next 10 minutes is to be Incident Commander first, Operations Lead second. The IC-first sequence is:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Confirm from a second source.&lt;/strong&gt; A monitor saying down is one source. Open a private browser window on your phone and try the affected flow yourself. If you can sign in, the monitor may be lying. If you cannot, the monitor is right. 30 seconds. &lt;strong&gt;Open the incident document.&lt;/strong&gt; One text file. Write the wall-clock time. Write what the monitor said. Write what you just confirmed. This is the timeline that becomes the post-mortem at 9 AM. 60 seconds. &lt;strong&gt;Form one hypothesis, write it down.&lt;/strong&gt; If your last release was within the last hour, the hypothesis is "recent change." If the database CPU graph is spiking, the hypothesis is "database." If the monitor failed in one region only, the hypothesis is "regional." You are not committing to the hypothesis. You are writing it down so you can test it. 60 seconds. &lt;strong&gt;Decide stop-the-bleed.&lt;/strong&gt; Stop-the-bleed is never the same as root cause. If recent change is the hypothesis, roll back. If the database is the hypothesis, failover or scale. If a third-party is the hypothesis, flip the affected feature to a degraded fallback. The goal of the next move is not to fix the bug. It is to stop the customer-facing damage while you keep investigating. 5 minutes. &lt;strong&gt;Switch to Comms, post the first update.&lt;/strong&gt; You have not solved it. Post anyway. The next section is the template. 2 minutes.&lt;/p&gt;

&lt;p&gt;That whole sequence is roughly 10 minutes. The temptation is to skip steps 2 and 3 and go straight to step 4. The reason experienced ICs do not skip is that step 3 is what saves you from fixing the wrong thing at minute 12.&lt;/p&gt;

&lt;h2&gt;
  
  
  What to say to customers in the first 10 minutes
&lt;/h2&gt;

&lt;p&gt;The &lt;a href="https://velprove.com/blog/public-status-page-guide" rel="noopener noreferrer"&gt;5-minute first-comms rule&lt;/a&gt; is the operating standard across mature incident response: post something within 5 minutes of confirming the outage, even if you do not know the cause. Customers do not need root cause in the first message. They need acknowledgement, scope, and a next checkpoint.&lt;/p&gt;

&lt;p&gt;A three-sentence template that works at 3 AM:&lt;/p&gt;

&lt;p&gt;&lt;em&gt;What is broken from the customer's point of view.&lt;/em&gt; "We are seeing failed logins for some users." Not "auth-service is throwing 503s." The customer language version. &lt;em&gt;That you are aware and investigating.&lt;/em&gt; "Our on-call has acknowledged and is investigating now." You are the on-call. The phrasing still works. &lt;em&gt;When you will post the next update.&lt;/em&gt; "Next update within 30 minutes." Then keep that promise. A 30-minute checkpoint with no new info is "Still investigating. Next update by 04:30 UTC."&lt;/p&gt;

&lt;p&gt;Post the update to your status page first, then to the channel your customers actually watch. For most early-stage SaaS that is one email blast to active users, not a tweet that nobody will see for six hours. Do not speculate on cause. Do not commit to a fix time. Post-now beats perfect every time, because the alternative is customers filing tickets that you have to answer one by one while you are also fixing the system.&lt;/p&gt;

&lt;h2&gt;
  
  
  Mistakes that turn a 10-minute outage into a 4-hour one
&lt;/h2&gt;

&lt;p&gt;The &lt;a href="https://velprove.com/blog/monitoring-mistakes-small-business" rel="noopener noreferrer"&gt;7 pre-incident mistakes&lt;/a&gt; post covers what to configure before the outage. This list is different. These are the mid-incident behaviors that take a contained outage and unwind it into a bad night.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Panic-pushing a fix without a rollback plan.&lt;/strong&gt; At 3 AM with degraded judgment, the fix that looks obvious often is not. If you push a change, know exactly how you roll it back, and rehearse the rollback command before you push the fix. &lt;strong&gt;Fixing the wrong thing because the symptom looks like the cause.&lt;/strong&gt; A failed login can be auth, can be a database replica, can be a captcha vendor, can be a CDN. The hypothesis from step 3 of the first-10-minutes block exists so you test it, not so you commit to it. If the rollback did not fix it, the hypothesis was wrong. Form a new one. &lt;strong&gt;Going dark on comms because you are deep in the fix.&lt;/strong&gt; Customers do not see the SSH session. They see the silence. If you committed to a 30-minute update, post it on time even if the only new information is "still investigating." The silence is what generates the angry support emails, not the downtime. &lt;strong&gt;Staying up past judgment.&lt;/strong&gt; A founder at minute 90 of an outage at 3 AM is making worse decisions than the same founder asleep. If stop-the-bleed is in place and customer impact is contained, sleep. Root cause can wait for daylight. &lt;strong&gt;No pre-written templates.&lt;/strong&gt; If you are drafting prose mid-incident, you are spending IC and Comms cycles on wordsmithing instead of triage. The template lives in your repo before the outage, not in your head during it.&lt;/p&gt;

&lt;h2&gt;
  
  
  The post-mortem you write at 9 AM (not the one in the SRE book)
&lt;/h2&gt;

&lt;p&gt;The Google SRE Book talks about a "living incident document" updated in real time by the IC. That works when the IC is a different person from the Ops Lead. For a solo founder, the document gets written during triage (the incident log from step 2) and finished at 9 AM after sleep and coffee.&lt;/p&gt;

&lt;p&gt;A solo-scaled post-mortem has five fields. No more.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Timeline.&lt;/strong&gt; Wall-clock times. When the monitor flipped, when you confirmed, when you posted comms, when stop-the-bleed landed, when full recovery was verified. Lift this from the incident log you opened in step 2. &lt;strong&gt;Customer impact.&lt;/strong&gt; Who saw what, for how long. "Roughly 40 users could not sign in for 22 minutes." If you do not know the number, write "unknown, follow-up in action items." A missing number is a learning, not a failure. &lt;strong&gt;Root cause, in one paragraph.&lt;/strong&gt; Plain language. The five-whys version is fine if it fits. Do not write a 20-page narrative. The next-you will not read it. &lt;strong&gt;What worked, what did not.&lt;/strong&gt; If the rollback worked, name it. If the monitor did not catch the failure for 8 minutes, name that. If the first comms post went out in 4 minutes, name that. Specific. &lt;strong&gt;One action item with a date.&lt;/strong&gt; One. Not five. The post-mortem with five action items becomes the post-mortem with zero action items completed, because solo founders do not have the bandwidth to land five concurrent fixes. Pick the one that prevents this exact outage from happening again, put a date on it, and put it in your backlog.&lt;/p&gt;

&lt;p&gt;One question that sometimes comes up at the post-mortem stage: do you owe customers a credit? For SaaS without a formal SLA, the answer is usually no, but the goodwill question is real. The &lt;a href="https://velprove.com/blog/sla-vs-slo-vs-sli-customer-guide" rel="noopener noreferrer"&gt;SLA vs SLO vs SLI customer guide&lt;/a&gt; covers what a credit actually obligates you to and what a discretionary credit signals.&lt;/p&gt;

&lt;h2&gt;
  
  
  What an outage actually costs you at 5 customers
&lt;/h2&gt;

&lt;p&gt;The headline numbers from enterprise outage cost surveys do not apply at 5 customers. The &lt;a href="https://velprove.com/blog/website-downtime-cost-small-business" rel="noopener noreferrer"&gt;real cost of downtime for small businesses&lt;/a&gt; post covers the math at small scale, where the cost is not revenue per minute but trust per incident. A 30-minute outage on a $50 monthly plan is roughly 70 cents of pro-rated revenue if you credit it. The actual cost is one customer deciding they have seen enough and starting to evaluate your competitor. That cost is unrecoverable and does not appear on any spreadsheet.&lt;/p&gt;

&lt;p&gt;The practical implication: at low customer count, the incident response that pays out the most is the first comms message, not the technical recovery speed. A 45-minute outage with proactive updates every 15 minutes is a survivable story. A 12-minute outage that you never acknowledged is the one that loses the customer.&lt;/p&gt;

&lt;h2&gt;
  
  
  Tools a solo founder actually needs (and what you can skip)
&lt;/h2&gt;

&lt;p&gt;The tools market for incident response is shaped for teams of 20 with on-call rotations. Most of it is not what a solo founder needs. Here is the honest floor.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;What you actually need.&lt;/strong&gt; A monitor that catches real outages from outside your network, a phone alert that wakes you up, a public status page, and a written runbook. Velprove's Free plan covers the first three at $0 with no credit card required: 10 monitors including a browser login monitor at 15-minute intervals, HTTP and API monitoring at 5-minute intervals, 5 global regions, email alerts, multi-step API monitors with up to 3 steps, 1 status page, and 30-day incident history. The browser login monitor is the layer that catches the silent outages an HTTP check will miss.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;What you can skip at single-digit customer count.&lt;/strong&gt; PagerDuty at $21 per user per month (billed annually), Atlassian Statuspage at $29 per month, Better Stack's on-call add-on at $29 per responder per month (billed annually). None of these are bad products. They are scoped for teams that have multiple humans to coordinate. Email alerts on a phone allow-list cover the same ground for one person.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;When to upgrade.&lt;/strong&gt; Velprove Starter at $19 per month adds Slack, Discord, Microsoft Teams, and webhook alerts at 1-minute intervals, 3 browser login monitors at 10-minute intervals, and 90-day dashboard incident history. Pro at $49 per month adds PagerDuty integration, 30-second HTTP intervals, 10 browser login monitors at 5-minute intervals, and 1-year incident history. The trigger to upgrade is the first time you miss a real outage because the interval was too slow or the channel was too quiet, not the first time someone tells you to.&lt;/p&gt;

&lt;p&gt;If you are weighing the broader tool landscape rather than the Velprove plan ladder specifically, the &lt;a href="https://velprove.com/blog/choose-uptime-monitoring-tool-2026" rel="noopener noreferrer"&gt;uptime monitoring tool comparison for 2026&lt;/a&gt; covers the major options side by side.&lt;/p&gt;

&lt;h2&gt;
  
  
  Frequently Asked Questions
&lt;/h2&gt;

&lt;h3&gt;
  
  
  What should a solo founder do first when their website goes down at 3 AM?
&lt;/h3&gt;

&lt;p&gt;Be Incident Commander first, not Operations Lead. Open a single text file, write the timestamp, write what your monitor said, write what you confirmed from a second source, and do not touch production for the first 90 seconds. The most expensive solo-founder outage move is panic-pushing a hotfix into a system you have not finished diagnosing. Confirm the outage from an outside source, declare the incident to yourself in writing, then triage. Stop-the-bleed before root cause.&lt;/p&gt;

&lt;h3&gt;
  
  
  How do I write a status update during an outage when I don't know the cause yet?
&lt;/h3&gt;

&lt;p&gt;Post a three-sentence update inside 10 minutes. Sentence one: what is broken from the customer's point of view. Sentence two: that you are aware and investigating. Sentence three: when you will post the next update. Do not speculate on cause. Do not promise a fix time. The 5-minute first-comms rule beats the perfect post-mortem update every time, because customers tolerate downtime and do not tolerate silence.&lt;/p&gt;

&lt;h3&gt;
  
  
  Should I roll back my last deploy or try to debug forward during an outage?
&lt;/h3&gt;

&lt;p&gt;Roll back. If the outage started within an hour of your last release, the rollback hypothesis is correct often enough that it is the default move. Debugging forward at 3 AM with degraded judgment, no second pair of eyes, and a clock running against your customers is the wrong shape of work. Get the system to the last known good state, then debug at 9 AM when you have coffee and daylight.&lt;/p&gt;

&lt;h3&gt;
  
  
  Do I need a paid incident management tool like PagerDuty as a solo founder?
&lt;/h3&gt;

&lt;p&gt;No, not at single-digit customer count. PagerDuty's value is on-call rotation, escalation policies, and team coordination. A solo founder has no rotation to schedule and no team to coordinate. Email plus a phone alert on a monitor that actually catches real outages is enough. The Velprove free plan includes a browser login monitor, 10 monitors from 5 regions, and email alerts at no charge. Add PagerDuty if and when you have a teammate to escalate to.&lt;/p&gt;

&lt;h3&gt;
  
  
  How long should I stay up during an outage before sleeping and escalating in the morning?
&lt;/h3&gt;

&lt;p&gt;If you have rolled back to a known good state and customer impact is contained, sleep. A founder past the 90-minute mark at 3 AM is making worse decisions than a founder asleep. If you are still in active impact at 5 AM and the rollback did not work, the call is to post a final status update telling customers you are pausing until 8 AM, accept the downtime, and sleep. A 4-hour outage you fix with judgment beats an 8-hour outage you make worse with panic.&lt;/p&gt;

&lt;h3&gt;
  
  
  What goes in a post-mortem when I'm the only person on the team?
&lt;/h3&gt;

&lt;p&gt;Five fields. Timeline with timestamps, customer impact (who saw what, for how long), root cause in one paragraph, what worked and what did not, and one action item with a date attached. Skip the blameless-culture section because there is no one to blame other than yourself, and skip the leadership-review section because there is no leadership. Save the document in your repo. The next outage will rhyme with this one, and the file is the only thing standing between you and repeating it.&lt;/p&gt;

&lt;p&gt;The 3 AM call is going to happen. The question is which of the three core ICS roles you start in, what you have written down before the alert fires, and whether your monitor catches the outage at all. &lt;a href="https://velprove.com/signup" rel="noopener noreferrer"&gt;Start a free Velprove account.&lt;/a&gt; One browser login monitor, 10 monitors total, 5 global regions, email alerts, status page. No credit card required. The setup is five minutes. The next outage will be on its own schedule.&lt;/p&gt;

</description>
      <category>monitoring</category>
      <category>webdev</category>
      <category>devops</category>
      <category>uptime</category>
    </item>
    <item>
      <title>API Health Check Patterns: What /healthz Should Return</title>
      <dc:creator>velprove</dc:creator>
      <pubDate>Sun, 10 May 2026 14:00:03 +0000</pubDate>
      <link>https://dev.to/velprove/api-health-check-patterns-what-healthz-should-return-ln8</link>
      <guid>https://dev.to/velprove/api-health-check-patterns-what-healthz-should-return-ln8</guid>
      <description>&lt;p&gt;&lt;strong&gt;In one paragraph:&lt;/strong&gt; Liveness, readiness, and startup are three different probes that do three different jobs. Conflating them is how a single database hiccup turns into a full fleet restart. Your load balancer reads the HTTP status code, not your JSON body, so a 200 with &lt;code&gt;"status":"degraded"&lt;/code&gt; is invisible. Keep liveness shallow, keep dependency checks in readiness, hide internal hostnames from public probes, and watch your &lt;code&gt;/healthz&lt;/code&gt; from outside the cluster with a free Velprove API monitor. &lt;a href="https://velprove.com/signup" rel="noopener noreferrer"&gt;Start for free&lt;/a&gt;. No credit card required.&lt;/p&gt;

&lt;p&gt;Most outage stories that start with &lt;code&gt;/healthz&lt;/code&gt; end with someone conflating two of three different probes. The endpoint name is one path, but it can answer three questions, and the questions do not have the same right answer. This post is the engineer-pillar layer above our existing tutorial on how to &lt;a href="https://velprove.com/blog/monitor-rest-api-health-endpoint" rel="noopener noreferrer"&gt;monitor your /health endpoint with response validation&lt;/a&gt; . That post is "how to point a monitor at /health." This one is "what /healthz should actually return, why, and what it should never expose."&lt;/p&gt;

&lt;h2&gt;
  
  
  The three probes, defined the way Kubernetes defines them
&lt;/h2&gt;

&lt;p&gt;The cleanest probe taxonomy in production use is the Kubernetes one, even if you do not run Kubernetes. Three probes, three jobs, three different consumers. Use the same separation on ECS, on Nomad, on a fleet of VMs behind an AWS ALB. The names change. The shape does not.&lt;/p&gt;

&lt;h3&gt;
  
  
  Liveness: is this process deadlocked?
&lt;/h3&gt;

&lt;p&gt;Per the Kubernetes &lt;a href="https://kubernetes.io/docs/concepts/configuration/liveness-readiness-startup-probes/" rel="noopener noreferrer"&gt;probe documentation&lt;/a&gt; : "Liveness probes determine when to restart a container. For example, liveness probes could catch a deadlock, where an application is running, but unable to make progress." A failed liveness probe restarts the container. That is the only thing it does. The contract is narrow on purpose: the probe should answer one question, "is this process so wedged that the only recovery is killing it," and the answer should be yes only when killing it is genuinely the right move.&lt;/p&gt;

&lt;h3&gt;
  
  
  Readiness: should this instance be in the load-balancer pool?
&lt;/h3&gt;

&lt;p&gt;The same docs: "Readiness probes determine when a container is ready to accept traffic." A failed readiness probe does not restart anything. It de-routes traffic. From the docs: "If the readiness probe returns a failed state, the EndpointSlice controller removes the Pod's IP address from the EndpointSlices of all Services that match the Pod." This is the probe where you can legitimately check whether the database is reachable, whether your in-process cache is warm, whether the upstream identity provider is responding. A blip in any of those should pull this instance out of the pool, not kill it.&lt;/p&gt;

&lt;h3&gt;
  
  
  Startup: has the app finished initializing?
&lt;/h3&gt;

&lt;p&gt;Startup is the gate during boot: "Startup probes verify whether the application within a container is started. If a startup probe is configured, Kubernetes does not execute liveness or readiness probes until the startup probe succeeds." This is the probe that exists so a slow JVM warm-up or a long migration job does not get killed by a 30-second liveness deadline three seconds after launch.&lt;/p&gt;

&lt;p&gt;The taxonomy is not Kubernetes-only. The same three jobs exist on every orchestrator and every load balancer. Even on the Kubernetes API server itself, the unified &lt;code&gt;/healthz&lt;/code&gt; endpoint is on the way out. Per the &lt;a href="https://kubernetes.io/docs/reference/using-api/health-checks/" rel="noopener noreferrer"&gt;Kubernetes API server health endpoint docs&lt;/a&gt; : "healthz is deprecated (since Kubernetes v1.16), and you should use the more specific livez and readyz endpoints instead." The reason is exactly the conflation problem above. One path that pretends to answer three questions cannot tell its caller which question failed.&lt;/p&gt;

&lt;h2&gt;
  
  
  Why HTTP status codes matter more than your JSON body
&lt;/h2&gt;

&lt;p&gt;The number one source of &lt;a href="https://velprove.com/blog/why-uptime-monitors-miss-outages" rel="noopener noreferrer"&gt;false confidence from a 200 OK&lt;/a&gt; on a health endpoint is a body that says one thing and a status code that says another. Your load balancer does not read JSON. It reads three digits.&lt;/p&gt;

&lt;h3&gt;
  
  
  The 200-with-degraded-body anti-pattern
&lt;/h3&gt;

&lt;p&gt;Per the Kubernetes &lt;a href="https://kubernetes.io/docs/tasks/configure-pod-container/configure-liveness-readiness-startup-probes/" rel="noopener noreferrer"&gt;probe configuration docs&lt;/a&gt; : "Any code greater than or equal to 200 and less than 400 indicates success. Any other code indicates failure." Returning a 200 with &lt;code&gt;&amp;amp;#123;"status":"degraded"&amp;amp;#125;&lt;/code&gt; in the body is functionally identical to returning a 200 with &lt;code&gt;&amp;amp;#123;"status":"ok"&amp;amp;#125;&lt;/code&gt;. The orchestrator does not care. Neither does your load balancer.&lt;/p&gt;

&lt;h3&gt;
  
  
  AWS ALB target-group defaults
&lt;/h3&gt;

&lt;p&gt;The defaults on an AWS Application Load Balancer target group, from the &lt;a href="https://docs.aws.amazon.com/elasticloadbalancing/latest/application/target-group-health-checks.html" rel="noopener noreferrer"&gt;AWS ALB target group health checks docs&lt;/a&gt; : 30-second interval, 5-second timeout, 5 healthy checks to bring an instance into rotation, 2 unhealthy checks to take it out. Default path is &lt;code&gt;/&lt;/code&gt;. Default matcher is the literal string &lt;code&gt;200&lt;/code&gt;. Anything outside that exact code is a failure. (GCP and Cloudflare both expose similar interval, timeout, and threshold knobs with different defaults; check your specific LB before assuming.)&lt;/p&gt;

&lt;h3&gt;
  
  
  When to actually return 503
&lt;/h3&gt;

&lt;p&gt;Per &lt;a href="https://www.rfc-editor.org/rfc/rfc9110#section-15.6.4" rel="noopener noreferrer"&gt;RFC 9110 §15.6.4&lt;/a&gt; : "The 503 (Service Unavailable) status code indicates that the server is currently unable to handle the request due to a temporary overload or scheduled maintenance, which will likely be alleviated after some delay." The rule of thumb is short. Return 200 if this instance can still serve traffic. Return 503 if it should be pulled out of rotation. Anything else hides the signal.&lt;/p&gt;

&lt;p&gt;One ALB-specific nuance worth knowing about, from the same AWS docs: "If a target group contains only unhealthy registered targets, the load balancer routes requests to all those targets, regardless of their health status." That is the fail-open backstop. It is not a feature you should rely on. It is the safety net that prevents a bad health check from taking the entire fleet dark in one zone.&lt;/p&gt;

&lt;h2&gt;
  
  
  The deep liveness anti-pattern
&lt;/h2&gt;

&lt;p&gt;** "Incorrect implementation of liveness probes can lead to cascading failures." ** That is verbatim from the &lt;a href="https://kubernetes.io/docs/concepts/configuration/liveness-readiness-startup-probes/" rel="noopener noreferrer"&gt;Kubernetes probe documentation&lt;/a&gt; , and it is the most-cited engineering rule on this topic. The most common incorrect implementation is the one where someone writes a &lt;code&gt;/healthz&lt;/code&gt; handler that opens a connection to the database, runs &lt;code&gt;SELECT 1&lt;/code&gt;, and returns 200 only if the query succeeds.&lt;/p&gt;

&lt;h3&gt;
  
  
  Why a liveness probe that hits the database will eat your fleet
&lt;/h3&gt;

&lt;p&gt;Sandor Szücs, in the canonical engineer reference on this topic ( &lt;a href="https://srcco.de/posts/kubernetes-liveness-probes-are-dangerous.html" rel="noopener noreferrer"&gt;Liveness Probes are Dangerous&lt;/a&gt; ): "A Liveness Probe in combination with an external DB health check dependency is the worst situation: a single DB hiccup will restart all your containers!" Colin Breck ( &lt;a href="https://blog.colinbreck.com/kubernetes-liveness-and-readiness-probes-how-to-avoid-shooting-yourself-in-the-foot/" rel="noopener noreferrer"&gt;Kubernetes Liveness and Readiness Probes&lt;/a&gt; ) draws the rule directly: "Avoid checking dependencies in liveness probes. Liveness probes should be inexpensive and have response times with minimal variance."&lt;/p&gt;

&lt;p&gt;The mechanism is simple. A 200ms database blip is normal. A liveness probe with a 5-second timeout treats it as healthy. Now make the blip 6 seconds. Every pod in every region fails its liveness check at the same moment. Every pod gets killed at the same moment. Every pod restarts at the same moment, hits the still-recovering database with a fresh connection storm, and fails liveness again. You have turned a 6-second database hiccup into a 5-minute app-wide cold start.&lt;/p&gt;

&lt;h3&gt;
  
  
  Container vs dependency, the right separation
&lt;/h3&gt;

&lt;p&gt;The Kubernetes docs are explicit about how to split this correctly: "When your app has a strict dependency on back-end services, you can implement both a liveness and a readiness probe. The liveness probe passes when the app itself is healthy, but the readiness probe additionally checks that each required back-end service is available." The liveness handler should answer "is this Node.js process responsive," nothing more. The readiness handler is allowed to be opinionated about dependencies.&lt;/p&gt;

&lt;h3&gt;
  
  
  What "shallow" actually means
&lt;/h3&gt;

&lt;p&gt;A shallow liveness handler in Node looks like this. No I/O. No async fetches. No database connection.&lt;/p&gt;

&lt;p&gt;And a 12-line Kubernetes probe config that points at separate paths so the two probes never get confused:&lt;/p&gt;

&lt;p&gt;The principle holds even if you do not run Kubernetes: keep two paths, point the orchestrator at the cheap one, point the load balancer at the more expressive one. &lt;a href="https://velprove.com/blog/anatomy-of-a-silent-outage" rel="noopener noreferrer"&gt;Silent outages that HTTP monitors miss&lt;/a&gt; almost always start when those two paths get collapsed into one.&lt;/p&gt;

&lt;h2&gt;
  
  
  What to put in the response body
&lt;/h2&gt;

&lt;p&gt;There is no IETF standard for the response body. The expired draft you may have seen cited in older posts is, per the &lt;a href="https://datatracker.ietf.org/doc/draft-inadarei-api-health-check/" rel="noopener noreferrer"&gt;IETF Datatracker page&lt;/a&gt; itself: "This Internet-Draft is no longer active." (Last revision October 16, 2021. Expired April 19, 2022.) Anything still treating &lt;code&gt;application/health+json&lt;/code&gt; as a standard is wrong by four years.&lt;/p&gt;

&lt;p&gt;What does exist is three reference shapes that production frameworks actually use. Pick the one your tooling already speaks.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Spring Boot Actuator&lt;/strong&gt; returns a top-level body of &lt;code&gt;&amp;amp;#123;"status":"UP"&amp;amp;#125;&lt;/code&gt; by default. Per the &lt;a href="https://docs.spring.io/spring-boot/reference/actuator/endpoints.html" rel="noopener noreferrer"&gt;Actuator endpoint docs&lt;/a&gt; : "These indicators are shown on the global health endpoint ( &lt;code&gt;/actuator/health&lt;/code&gt;). They are also exposed as separate HTTP Probes by using health groups: &lt;code&gt;/actuator/health/liveness&lt;/code&gt; and &lt;code&gt;/actuator/health/readiness&lt;/code&gt;."&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;ASP.NET Core&lt;/strong&gt; returns plain text by default per the &lt;a href="https://learn.microsoft.com/en-us/aspnet/core/host-and-deploy/health-checks" rel="noopener noreferrer"&gt;ASP.NET Core health checks docs&lt;/a&gt; : the literal string &lt;code&gt;Healthy&lt;/code&gt;, &lt;code&gt;Degraded&lt;/code&gt;, or &lt;code&gt;Unhealthy&lt;/code&gt;, with no JSON body at all unless you add a custom response writer.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;The Kubernetes API server&lt;/strong&gt; returns an empty body with a 200 on success on &lt;code&gt;/livez&lt;/code&gt; and &lt;code&gt;/readyz&lt;/code&gt; . That is the most defensible shape: zero bytes is zero info-disclosure surface.&lt;/p&gt;

&lt;p&gt;A pragmatic body for an internal-facing detailed endpoint, when you want one, looks like this:&lt;/p&gt;

&lt;p&gt;Six fields, no hostnames, no library versions, no stack traces. A short build SHA is enough to identify what is running. The detailed dependency block belongs behind auth, not on the public probe.&lt;/p&gt;

&lt;h2&gt;
  
  
  What to leak and what to hide
&lt;/h2&gt;

&lt;p&gt;Your &lt;code&gt;/healthz&lt;/code&gt; is reachable from anywhere your app is. Treat it as untrusted-caller-visible. The principle is least exposure on the public probe; dependency detail behind auth or on a separate internal-only path.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Safe to expose&lt;/strong&gt; on a public probe: a top-level status string, a short build SHA or version tag, a service name, a server timestamp. Nothing else.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Do not expose&lt;/strong&gt; on a public probe: dependency hostnames or connection strings (recon material), internal IPs (lateral movement material), library or runtime versions (CVE targeting material), stack traces or partial errors (logic disclosure), database identifiers, or queue depths (capacity reconnaissance).&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;The two-tier pattern&lt;/strong&gt; is what Spring Actuator and ASP.NET Core both default to: a public, minimal &lt;code&gt;/healthz&lt;/code&gt; for the load balancer, and an authenticated &lt;code&gt;/healthz/details&lt;/code&gt; for humans and internal tooling. Spring exposes the verbose body only when &lt;a href="https://docs.spring.io/spring-boot/reference/actuator/endpoints.html" rel="noopener noreferrer"&gt;management.endpoint.health.show-details&lt;/a&gt; is set; ASP.NET Core defaults to a single status string and forces you to opt in to a custom response writer. Both defaults are secure on purpose. If you write your own framework integration, copy the pattern.&lt;/p&gt;

&lt;h2&gt;
  
  
  Framework cheat sheet
&lt;/h2&gt;

&lt;p&gt;Five common stacks, the canonical setup, and the gotcha each one trips on most often.&lt;/p&gt;

&lt;h3&gt;
  
  
  Spring Boot Actuator (Java)
&lt;/h3&gt;

&lt;p&gt;Spring exposes &lt;code&gt;/actuator/health/liveness&lt;/code&gt; and &lt;code&gt;/actuator/health/readiness&lt;/code&gt; as separate paths once probes are enabled. &lt;strong&gt;Gotcha:&lt;/strong&gt; leaving &lt;code&gt;show-details&lt;/code&gt; on &lt;code&gt;always&lt;/code&gt; in production turns the public probe into an info-disclosure endpoint. Default is &lt;code&gt;never&lt;/code&gt;; keep it that way unless the path is auth-gated.&lt;/p&gt;

&lt;h3&gt;
  
  
  ASP.NET Core (C#)
&lt;/h3&gt;

&lt;p&gt;Two lines. Microsoft's own docs use &lt;code&gt;/healthz&lt;/code&gt; as the example path. &lt;strong&gt;Gotcha:&lt;/strong&gt; registering an &lt;code&gt;AddDbContextCheck&lt;/code&gt; on a &lt;code&gt;MapHealthChecks("/healthz")&lt;/code&gt; that is wired into the orchestrator's liveness probe is the textbook deep-liveness anti-pattern. Wire dependency checks to a separate readiness path, not the liveness one.&lt;/p&gt;

&lt;h3&gt;
  
  
  Express + terminus (Node.js)
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;Gotcha:&lt;/strong&gt; terminus also handles graceful shutdown on SIGTERM. The handler in your readiness check should start failing the moment SIGTERM arrives, so the load balancer pulls the pod before the existing connections drain. Most teams forget this and serve 200 from &lt;code&gt;/readyz&lt;/code&gt; right up until the process exits.&lt;/p&gt;

&lt;h3&gt;
  
  
  FastAPI (Python)
&lt;/h3&gt;

&lt;p&gt;Three lines. &lt;strong&gt;Gotcha:&lt;/strong&gt; FastAPI's dependency injection makes it easy to &lt;code&gt;Depends(get_db_session)&lt;/code&gt; on a liveness route. Do not. The whole point of the deep-liveness section is keeping the database out of the liveness path.&lt;/p&gt;

&lt;h3&gt;
  
  
  Next.js App Router
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;Gotcha:&lt;/strong&gt; without &lt;code&gt;dynamic = "force-dynamic"&lt;/code&gt;, the route can be cached at build time and stop reflecting the running process. Cold starts on serverless platforms add a separate signal-vs-noise problem; the dedicated &lt;a href="https://velprove.com/blog/monitor-nextjs-app-production" rel="noopener noreferrer"&gt;Next.js production monitoring guide&lt;/a&gt; covers that piece in depth.&lt;/p&gt;

&lt;h2&gt;
  
  
  Watching the probe from outside the cluster
&lt;/h2&gt;

&lt;p&gt;An internal probe sees what is internal. It does not see DNS misconfiguration. It does not see a regional CDN outage. It does not see a TLS certificate that expired at 03:00 because the renewal cron failed. None of those flip a Kubernetes liveness check, because the kubelet is connecting to the pod over its internal network. Your customers are connecting over the public internet. The two paths are not the same.&lt;/p&gt;

&lt;p&gt;An external monitor against your &lt;code&gt;/healthz&lt;/code&gt;, hitting the same URL your customers do, catches what the orchestrator cannot. Add a multi-step API monitor that hits &lt;code&gt;/healthz&lt;/code&gt; first, then a protected route with a test bearer token, and you cover both reachability and authentication in one check. The 3-step pattern fits inside the free plan; the longer chain is what the &lt;a href="https://velprove.com/blog/multi-step-api-monitoring-guide" rel="noopener noreferrer"&gt;multi-step API monitoring guide&lt;/a&gt; walks through.&lt;/p&gt;

&lt;p&gt;The full setup, four steps:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Sign up for a free Velprove account.&lt;/strong&gt; No credit card required. The free plan includes 10 monitors total, 1 browser login monitor, multi-step API monitors with up to 3 steps, 5-minute check intervals, and email alerts. Every plan, including free, runs checks from all five regions: North America, Europe, UK, Asia, and Oceania. ** Add a multi-step API monitor with /healthz as Step 1. ** Method GET, URL &lt;code&gt;https://api.yourapp.com/healthz&lt;/code&gt;. Under &lt;em&gt;Save values from response&lt;/em&gt;, extract &lt;code&gt;$.build_sha&lt;/code&gt; into a variable named &lt;code&gt;sha&lt;/code&gt;. Add a Status Code assertion equals 200 plus a Response Time assertion under 3000ms. This is the cheap, public, shallow probe, and the extracted build SHA is what Step 2 will verify against. ** Add a Step 2 GET against a protected route with a test bearer token. ** URL &lt;code&gt;https://api.yourapp.com/version&lt;/code&gt;. Pass two headers: &lt;code&gt;Authorization: Bearer &amp;lt;test-token&amp;gt;&lt;/code&gt; for auth, and &lt;code&gt;X-Expected-SHA: &amp;amp;#123;&amp;amp;#123;sha&amp;amp;#125;&amp;amp;#125;&lt;/code&gt; so the protected route can compare its own build SHA against the one the public probe reported. Assert Status Code equals 200. Use a dedicated test account, not a real one. If your public probe and your private route ever disagree on build SHA, you have a deploy-skew bug, and this chain catches it. ** Configure your check interval and alert channel. ** The free plan runs every 5 minutes with email alerts. Starter at $19 per month adds Slack, Discord, Microsoft Teams, and webhooks at 1-minute intervals. Pro at $49 per month adds PagerDuty at 30-second intervals and 1-year dashboard incident history.&lt;/p&gt;

&lt;p&gt;That is the whole thing. A 2-step monitor that watches the public probe, extracts the build SHA, and verifies the protected route is on the same deploy, from five regions, on a free plan, with commercial use allowed.&lt;/p&gt;

&lt;h2&gt;
  
  
  Frequently Asked Questions
&lt;/h2&gt;

&lt;h3&gt;
  
  
  What's the difference between liveness, readiness, and startup probes?
&lt;/h3&gt;

&lt;p&gt;Liveness asks "should the orchestrator restart this container?" Readiness asks "should the load balancer send traffic here?" Startup asks "has the app finished initializing yet?" They run on different schedules and trigger different actions. Mixing them up is the most common cause of cascading restart failures on Kubernetes and on any orchestrator that copies the pattern.&lt;/p&gt;

&lt;h3&gt;
  
  
  Why is /healthz deprecated on the Kubernetes API server?
&lt;/h3&gt;

&lt;p&gt;Kubernetes deprecated &lt;code&gt;/healthz&lt;/code&gt; on its own API server in v1.16 in favor of &lt;code&gt;/livez&lt;/code&gt; and &lt;code&gt;/readyz&lt;/code&gt;, because a single endpoint that conflates liveness and readiness cannot tell an orchestrator when to restart versus when to stop routing traffic. The deprecation is a Kubernetes-internal API-server change. Your application can still serve &lt;code&gt;/healthz&lt;/code&gt; if you want; what matters is keeping the underlying probes separated.&lt;/p&gt;

&lt;h3&gt;
  
  
  Can a single endpoint serve both liveness and readiness?
&lt;/h3&gt;

&lt;p&gt;Yes, technically, but you should not. Any failure that should only affect routing (a slow database, a flaky upstream) ends up triggering a restart instead. If the endpoint cannot distinguish "the process is wedged" from "the database is slow," every readiness blip becomes a fleet-wide restart loop. Splitting them is a five-line refactor.&lt;/p&gt;

&lt;h3&gt;
  
  
  Should /healthz require authentication?
&lt;/h3&gt;

&lt;p&gt;The orchestrator probe should not, because adding auth to liveness checks creates a new failure mode (auth service down equals all probes fail). The detailed dependency endpoint should, because dependency hostnames and version strings are recon material. The two-tier pattern (&lt;code&gt;/healthz&lt;/code&gt; public and shallow, &lt;code&gt;/healthz/details&lt;/code&gt; auth-gated) is what Spring Actuator and ASP.NET Core both default to.&lt;/p&gt;

&lt;h3&gt;
  
  
  What status code should a degraded service return?
&lt;/h3&gt;

&lt;p&gt;Return 200 if the instance is still serving real traffic, even if degraded. Return 503 if the instance should not receive traffic. Most load balancers, including AWS ALB by default, route on the 200 range and pull traffic on anything else. Putting &lt;code&gt;"status":"degraded"&lt;/code&gt; in the body of a 200 response is invisible to the load balancer. If you want the LB to act, change the status code.&lt;/p&gt;

&lt;p&gt;Three probes, three jobs, three different right answers. Keep liveness shallow, keep dependency checks in readiness, return 503 when you want traffic pulled, and never put hostnames or stack traces on a public probe. Then watch the result from outside your cluster, because the orchestrator cannot see DNS, certs, or the public internet. &lt;a href="https://velprove.com/signup" rel="noopener noreferrer"&gt;Start for free&lt;/a&gt;, point a 2-step monitor at &lt;code&gt;/healthz&lt;/code&gt;, and have it running in five minutes. No credit card required.&lt;/p&gt;

</description>
      <category>monitoring</category>
      <category>webdev</category>
      <category>devops</category>
      <category>uptime</category>
    </item>
    <item>
      <title>Is Your Host Really 99.9% Uptime? How to Verify</title>
      <dc:creator>velprove</dc:creator>
      <pubDate>Sat, 09 May 2026 14:00:04 +0000</pubDate>
      <link>https://dev.to/velprove/is-your-host-really-999-uptime-how-to-verify-5of</link>
      <guid>https://dev.to/velprove/is-your-host-really-999-uptime-how-to-verify-5of</guid>
      <description>&lt;p&gt;** Direct answer: Probably not, and probably not intentionally. 99.9% uptime means your host is allowed up to 8 hours and 46 minutes of downtime per year per the Wikipedia high-availability standard, or roughly 43 minutes per 30-day month. Most months your host hits the number. The catch is the bad months: when an outage happens, the host's status page often lags the incident by minutes to hours (AWS's own post-mortem on the December 7, 2021 us-east-1 outage admits a 52-minute gap), and every major SLA (AWS, Vercel, Cloudflare, Kinsta, WPEngine) requires you to file a credit claim with your own evidence within a tight window. The fix is independent monitoring that runs from outside your host's network, captures timestamps and response codes during the incident, and gives you the evidence the SLA requires. Velprove's free plan covers HTTP monitoring across 5 regions plus a browser login monitor at no cost. **&lt;/p&gt;

&lt;h2&gt;
  
  
  What 99.9% uptime actually means
&lt;/h2&gt;

&lt;p&gt;The math is settled and primary-sourced. Per the Wikipedia high-availability table ( &lt;a href="https://en.wikipedia.org/wiki/High_availability" rel="noopener noreferrer"&gt;en.wikipedia.org/wiki/High_availability&lt;/a&gt; , verified 2026-05-06), 99% uptime allows 3.65 days of downtime per year. 99.9% allows 8.77 hours per year. 99.95% allows 4.38 hours per year. 99.99% allows 52.60 minutes per year. 99.999% allows 5.26 minutes per year. Those are the standard annual figures every provider, every SLA, and every monitoring tool draws from.&lt;/p&gt;

&lt;p&gt;The per-month figures are arithmetic, not quoted. An average month is 30.44 days, or 730.5 hours. 99.9% of 730.5 hours leaves about 43.83 minutes of allowed downtime per month. 99.95% leaves about 21.92 minutes. 99.99% leaves about 4.38 minutes. 99.999% leaves about 26.3 seconds.&lt;/p&gt;

&lt;p&gt;Cross-check: Hostinger's own marketing page derives the same per-month number, calling out approximately 43.8 minutes of downtime per month at 99.9%. When the vendor's math agrees with the standard, the math is not in dispute. The argument is whether your host actually hits the number, and whether you have the evidence to prove it when they do not.&lt;/p&gt;

&lt;p&gt;One nuance worth keeping in mind. Providers internally target tighter SLOs than they publish externally. A vendor may run to 99.99% internally while publishing a 99.9% SLA, treating the gap as their safety margin. That is why most months the published number is easy to hit. Your job as a customer is not to chase every short outage. It is to monitor consistently so when a bad month happens, the evidence is already on your side. &lt;a href="https://velprove.com/blog/website-monitoring-beginners-guide" rel="noopener noreferrer"&gt;If you are new to uptime monitoring, start here&lt;/a&gt; .&lt;/p&gt;

&lt;h2&gt;
  
  
  What major hosts actually promise
&lt;/h2&gt;

&lt;p&gt;Verbatim SLA language from the seven hosts whose primary-source SLA pages we verified directly on 2026-05-06. Every percentage and every credit mechanic in the table below comes from the host's own legal page, linked in the source column.&lt;/p&gt;

&lt;p&gt;Cloudflare Business publishes "100% Uptime. The Service will serve Customer Content 100% of the time without qualification" for Business-tier customers ( &lt;a href="https://www.cloudflare.com/business-sla/" rel="noopener noreferrer"&gt;cloudflare.com/business-sla&lt;/a&gt; ). AWS EC2 publishes 99.99% at the region level and 99.5% at the instance level ( &lt;a href="https://aws.amazon.com/compute/sla/" rel="noopener noreferrer"&gt;aws.amazon.com/compute/sla&lt;/a&gt; ). Vercel publishes 99.99%, but only on the Enterprise plan; Hobby and Pro have no SLA at all ( &lt;a href="https://vercel.com/legal/sla" rel="noopener noreferrer"&gt;vercel.com/legal/sla&lt;/a&gt; ). Kinsta defaults to 99.9% with a tiered minute-by-minute credit schedule. WPEngine publishes 99.95% with a 5%-per-hour overage credit. Hostinger publishes 99.9% on shared hosting with a 5% credit. OVHcloud publishes 99.99% / 99.95% / 99.90% across its Public Cloud tiers.&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Provider&lt;/th&gt;
&lt;th&gt;Published SLA&lt;/th&gt;
&lt;th&gt;Credit mechanic&lt;/th&gt;
&lt;th&gt;Claim deadline&lt;/th&gt;
&lt;th&gt;Verified source&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Cloudflare Business&lt;/td&gt;
&lt;td&gt;100% (verbatim, no qualification)&lt;/td&gt;
&lt;td&gt;Tiered formula, capped at 1 month per 12 months&lt;/td&gt;
&lt;td&gt;5 business days notice, full claim by end of next billing month&lt;/td&gt;
&lt;td&gt;&lt;a href="https://www.cloudflare.com/business-sla/" rel="noopener noreferrer"&gt;cloudflare.com/business-sla&lt;/a&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;AWS EC2&lt;/td&gt;
&lt;td&gt;99.99% region / 99.5% instance&lt;/td&gt;
&lt;td&gt;10% / 30% / 100% credit tiers&lt;/td&gt;
&lt;td&gt;End of second billing cycle after the incident&lt;/td&gt;
&lt;td&gt;&lt;a href="https://aws.amazon.com/compute/sla/" rel="noopener noreferrer"&gt;aws.amazon.com/compute/sla&lt;/a&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;OVHcloud Public Cloud&lt;/td&gt;
&lt;td&gt;99.99% / 99.95% / 99.90%&lt;/td&gt;
&lt;td&gt;10% / 100% credit tiers, 100% monthly fee cap&lt;/td&gt;
&lt;td&gt;60 calendar days&lt;/td&gt;
&lt;td&gt;&lt;a href="https://us.ovhcloud.com/legal/sla/public-cloud/" rel="noopener noreferrer"&gt;us.ovhcloud.com/legal/sla/public-cloud&lt;/a&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Vercel (Enterprise only)&lt;/td&gt;
&lt;td&gt;99.99%&lt;/td&gt;
&lt;td&gt;10% / 25% / 50% credit tiers, 50% monthly cap&lt;/td&gt;
&lt;td&gt;30 days&lt;/td&gt;
&lt;td&gt;&lt;a href="https://vercel.com/legal/sla" rel="noopener noreferrer"&gt;vercel.com/legal/sla&lt;/a&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;WPEngine&lt;/td&gt;
&lt;td&gt;99.95% (Enhanced 99.99% available)&lt;/td&gt;
&lt;td&gt;5% per hour of overage&lt;/td&gt;
&lt;td&gt;30 days&lt;/td&gt;
&lt;td&gt;&lt;a href="https://wpengine.com/legal/sla/" rel="noopener noreferrer"&gt;wpengine.com/legal/sla&lt;/a&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Kinsta&lt;/td&gt;
&lt;td&gt;99.9% default&lt;/td&gt;
&lt;td&gt;5% / 10% / 15% / 20% tiered by minutes&lt;/td&gt;
&lt;td&gt;30 days&lt;/td&gt;
&lt;td&gt;&lt;a href="https://kinsta.com/legal/service-level-agreement/" rel="noopener noreferrer"&gt;kinsta.com/legal/service-level-agreement&lt;/a&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Hostinger&lt;/td&gt;
&lt;td&gt;99.9% shared hosting&lt;/td&gt;
&lt;td&gt;5% credit (future purchases only)&lt;/td&gt;
&lt;td&gt;30 days&lt;/td&gt;
&lt;td&gt;&lt;a href="https://www.hostinger.com/legal/hosting-agreement" rel="noopener noreferrer"&gt;hostinger.com/legal/hosting-agreement&lt;/a&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;Two patterns the table makes obvious. First, Vercel and Netlify Hobby/Pro plans have no SLA mechanism at all. Most users on those tiers have no credit recourse when an incident happens; they have community pressure and that is it. Second, every SLA excludes the same broad categories: scheduled maintenance, force majeure, customer-caused issues, third-party routing, and beta features. The exclusions are standard, not bad faith. The leverage from independent monitoring is not to dispute exclusions. It is to prove the outage happened, when it happened, and from where it was visible, so the vendor cannot classify it under an exclusion without evidence.&lt;/p&gt;

&lt;h2&gt;
  
  
  Why your host's status page doesn't reflect reality
&lt;/h2&gt;

&lt;p&gt;On December 7, 2021, AWS us-east-1 went down. AWS's own post-mortem ( &lt;a href="https://aws.amazon.com/message/12721/" rel="noopener noreferrer"&gt;aws.amazon.com/message/12721&lt;/a&gt; , verified 2026-05-06) states the outage began at 7:30 AM PST. The same post-mortem states "by 8:22 AM PST, we were successfully updating the Service Health Dashboard." That is a 52-minute gap between when the outage began and when AWS's own status page reflected it. The cause, in AWS's words: "The networking congestion impaired our Service Health Dashboard tooling from appropriately failing over to our standby region."&lt;/p&gt;

&lt;p&gt;AWS is the most-resourced cloud provider on Earth. Their dashboard still lagged a major incident by 52 minutes. Independent monitoring sees the outage at minute one.&lt;/p&gt;

&lt;p&gt;The structural reason is well-documented. The Register's February 2022 reporting on cloud status page accuracy ( &lt;a href="https://www.theregister.com/2022/02/24/cloud_service_status_pages_fail/" rel="noopener noreferrer"&gt;theregister.com&lt;/a&gt; , verified 2026-05-06) quotes Tim Perry of HTTP Toolkit: "It is very common to see status pages fail to match reality." Nick Humrich, a former AWS engineer, told The Register that "posting a non-green status to the status page was actually a manager decision, in no way real time." Tim Perry adds: "I strongly suspect this is because the published status is linked to contractual SLAs, financial penalties in those contracts."&lt;/p&gt;

&lt;p&gt;The mechanism is straightforward. The vendor controls the status page. The vendor has financial exposure for marking the page non-green. The customer cannot rely on a self-reported status as evidence of an outage they need to file a credit claim against. Independent monitoring has no financial incentive to lie about whether your site responded. Your host does.&lt;/p&gt;

&lt;h2&gt;
  
  
  How to verify with independent monitoring
&lt;/h2&gt;

&lt;p&gt;The Google SRE Book is the canonical reference here. Google's own engineers describe black-box monitoring ( &lt;a href="https://sre.google/sre-book/monitoring-distributed-systems/" rel="noopener noreferrer"&gt;sre.google/sre-book/monitoring-distributed-systems&lt;/a&gt; , verified 2026-05-06) as "testing externally visible behavior as a user would see it" and add that "black-box monitoring is symptom-oriented and represents active, not predicted, problems: 'The system isn't working correctly, right now.'" The same chapter notes that Google itself combines "heavy use of white-box monitoring with modest but critical uses of black-box monitoring." The split maps cleanly onto the customer-vendor relationship: the vendor runs white-box monitoring (their internal metrics). You run black-box monitoring (probing as a user from outside). Both are needed. You cannot trust just one.&lt;/p&gt;

&lt;p&gt;What good independent monitoring looks like in practice: HTTP monitors on your homepage and key pages, multi-region probes so a regional failure does not look like a global failure or vice versa, and a browser login monitor for auth-protected pages. &lt;a href="https://velprove.com/blog/monitor-saas-login-page" rel="noopener noreferrer"&gt;Browser login monitoring catches the partial outage HTTP-only monitoring misses&lt;/a&gt; when your homepage is fine but logged-in customers cannot reach checkout. It is also worth knowing &lt;a href="https://velprove.com/blog/monitoring-mistakes-small-business" rel="noopener noreferrer"&gt;the common mistakes that cause monitoring to miss outages&lt;/a&gt; before you set thresholds.&lt;/p&gt;

&lt;p&gt;Velprove's free plan covers exactly this surface. 10 HTTP monitors at a 5-minute minimum interval. 1 browser login monitor at a 15-minute interval. 5 global monitoring regions on every plan including Free. Email alerts. SSL certificate monitoring. 30-day incident history. 1 status page at velprove.com/status/your-page. No credit card required. The browser login monitor is the piece most monitoring tools either charge for or omit from the free tier entirely.&lt;/p&gt;

&lt;h2&gt;
  
  
  How to file an SLA credit claim
&lt;/h2&gt;

&lt;p&gt;Every primary-source SLA in the table above requires the customer to file a claim. None of them auto-detect a missed SLA and credit your account. Your evidence is what gets the credit issued.&lt;/p&gt;

&lt;p&gt;The deadlines are short and they vary. Cloudflare Business requires notification within 5 business days of the incident, the shortest window in the table. AWS gives you until the end of the second billing cycle after the incident. Vercel, WPEngine, Kinsta, and Hostinger all set a 30-day window. OVHcloud gives 60 calendar days. If you do not have monitoring already running before the outage, by the time you notice the analytics decline a week later, the Cloudflare window is already closed.&lt;/p&gt;

&lt;p&gt;The evidence requirements are explicit. Vercel's SLA demands "log files showing Unscheduled Downtime and the date and time it occurred." AWS demands "request logs documenting the outage." Cloudflare requires "sufficient evidence to support the Claim." The vendor's own status page does not count. The customer is asked to bring the proof.&lt;/p&gt;

&lt;p&gt;Honest framing on the credit value. On a $4 a month Hostinger shared plan, the 5% credit for a missed-SLA month is 20 cents. On a $40 a month Kinsta plan, a one-hour outage credit is $4. The dollar amount is rarely the leverage. The leverage is the documented record. A documented record gives you evidence in renewal negotiations or when escalating to a senior account manager. A pattern of repeated misses is grounds for migrating without an early-termination penalty under most contracts. For agencies, a documented record is what you show your client when you recommend a migration. The framing is not adversarial. You are bringing the data and asking the vendor to honor their own published SLA.&lt;/p&gt;

&lt;h2&gt;
  
  
  When to escalate, renegotiate, or migrate
&lt;/h2&gt;

&lt;p&gt;A simple decision tree, calibrated for the realistic case where you have a few months of independent monitoring data in hand.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;One bad month, otherwise reliable.&lt;/strong&gt; File the credit claim within the deadline, take the credit, move on. Most providers in most quarters hit their published number; one bad month is not a structural problem. &lt;strong&gt;Repeated misses (3 or more months out of 6).&lt;/strong&gt; Bring the documented record to a senior account manager. Ask politely for a service review. The data is what gets a serious response; without it the conversation goes nowhere. ** Pattern of misses plus slow vendor response on the credit claims. ** Migrate. The documented record is grounds for terminating without an early-termination penalty under most contracts, and it is what you bring to the next vendor when negotiating their SLA terms.&lt;/p&gt;

&lt;p&gt;Before you migrate, rule out the possibility the problem is on your side. If your site keeps going down, &lt;a href="https://velprove.com/blog/wordpress-site-keeps-going-down" rel="noopener noreferrer"&gt;here is how to diagnose whether it is the host or your application&lt;/a&gt; . An external monitor will name the cause within minutes either way.&lt;/p&gt;

&lt;h2&gt;
  
  
  Frequently Asked Questions
&lt;/h2&gt;

&lt;h3&gt;
  
  
  What does 99.9% uptime actually mean in downtime?
&lt;/h3&gt;

&lt;p&gt;99.9% uptime allows up to 8 hours and 46 minutes of downtime per year, or roughly 43 minutes and 50 seconds per 30-day month, per the Wikipedia high-availability standard. 99.99% allows 52.6 minutes per year, or about 4.38 minutes per month. 99.999% allows 5.26 minutes per year. Most hosts publish 99.9% and most months easily hit it. The math matters when a bad month happens.&lt;/p&gt;

&lt;h3&gt;
  
  
  How do I check my website's uptime independently?
&lt;/h3&gt;

&lt;p&gt;Run an external monitor that probes your site from outside your host's network on a fixed schedule. Velprove's free plan runs HTTP monitors every 5 minutes from 5 global regions and includes a browser login monitor for auth-protected pages at no cost. The point of independent is that the monitor has no financial relationship to your host, so its data is admissible evidence when you file an SLA credit claim.&lt;/p&gt;

&lt;h3&gt;
  
  
  How do I prove my hosting provider is down?
&lt;/h3&gt;

&lt;p&gt;Capture timestamped probe results from a third-party monitor that ran from multiple regions during the incident. The data needs to show response codes or timeouts, the time the failure started, the time it recovered, and the regions affected. Every major SLA (Vercel, AWS, Cloudflare) explicitly requires customer-side log files as evidence. Your host's status page does not count as proof, because your host controls it. Velprove's free plan runs HTTP probes from 5 global regions and stores 30 days of incident history, which is the evidence shape every major SLA requires.&lt;/p&gt;

&lt;h3&gt;
  
  
  Can I get a refund for hosting downtime?
&lt;/h3&gt;

&lt;p&gt;Usually a service credit, not a cash refund. Every major host (AWS, Vercel, Cloudflare, Kinsta, WPEngine, Hostinger, OVHcloud) publishes a credit schedule that pays out a percentage of your monthly bill if the SLA is missed. You file a claim within the deadline (5 business days for Cloudflare Business, 30 days for most others, 60 for OVHcloud) with your evidence. Credit caps are typically 100% of one month's fees. The dollar value is usually small. The documented record is the leverage.&lt;/p&gt;

&lt;h3&gt;
  
  
  Why does my host's status page say green when my site is down?
&lt;/h3&gt;

&lt;p&gt;Status pages are vendor-controlled and often lag the actual incident by minutes to hours. AWS's own post-mortem on the December 7, 2021 us-east-1 outage admits a 52-minute gap between when the outage began (7:30 AM PST) and when their Service Health Dashboard reflected it (8:22 AM PST). Former AWS engineers have stated publicly that posting a non-green status was a manager decision, not real-time automation. The structural reason is that status updates link to SLA financial exposure, which creates a disincentive to mark non-green.&lt;/p&gt;

&lt;h3&gt;
  
  
  What's the cheapest way to start verifying my host's uptime?
&lt;/h3&gt;

&lt;p&gt;Velprove's free plan covers HTTP monitoring across 5 global regions, a browser login monitor for auth-protected pages, a public status page, and email alerts at no cost. No credit card required. The browser login monitor is the differentiator: most major monitoring tools charge for it or omit it from their free tier entirely. Sign up, point a monitor at your site, and you will have the timestamped evidence trail running before the next bad month happens.&lt;/p&gt;

&lt;p&gt;Your host's 99.9% claim is probably accurate most months. Independent monitoring is what tells you which months it is not, and gives you the timestamped evidence every major SLA already requires you to bring to the credit claim. &lt;a href="https://velprove.com/signup" rel="noopener noreferrer"&gt;Start a free Velprove account.&lt;/a&gt; The free plan includes a browser login monitor most monitoring tools charge for, plus 10 HTTP monitors across 5 global regions and a public status page. No credit card required.&lt;/p&gt;

</description>
      <category>monitoring</category>
      <category>webdev</category>
      <category>devops</category>
      <category>uptime</category>
    </item>
    <item>
      <title>Anatomy of a Silent Outage: 10 Failures HTTP Misses</title>
      <dc:creator>velprove</dc:creator>
      <pubDate>Sat, 09 May 2026 14:00:03 +0000</pubDate>
      <link>https://dev.to/velprove/anatomy-of-a-silent-outage-10-failures-http-misses-20k5</link>
      <guid>https://dev.to/velprove/anatomy-of-a-silent-outage-10-failures-http-misses-20k5</guid>
      <description>&lt;p&gt;** Bottom line: on June 12, 2025, Cloudflare's own Workers KV storage failed for two hours and 28 minutes. Cloudflare Access went to 100% identity-login failure. Turnstile widgets stopped resolving. WARP couldn't register new sessions. Marketing edge stayed up the entire time. Any HTTP monitor pointed at a customer's marketing page returned 200 OK throughout. The same customer's authenticated dashboard was unreachable. That gap, between what an HTTP probe sees and what a real user sees, is a silent outage. Below are 10 failure patterns where HTTP monitoring returns 200 while users see something broken, each with the public incident or vendor documentation that proves it. **&lt;/p&gt;

&lt;h2&gt;
  
  
  The Cloudflare June 12 2025 Workers KV outage (and what your monitor saw)
&lt;/h2&gt;

&lt;p&gt;A silent outage is when an HTTP monitor returns 200 OK while real users cannot complete the action they came to do. Cloudflare's June 12, 2025 Workers KV post-mortem is the cleanest public example from the last twelve months. Per &lt;a href="https://blog.cloudflare.com/cloudflare-service-outage-june-12-2025/" rel="noopener noreferrer"&gt;Cloudflare's own engineering blog post on the June 12, 2025 service outage&lt;/a&gt; , the underlying storage provider for Workers KV failed for 2 hours and 28 minutes. The cascade reached Cloudflare Access, Turnstile, WARP, Workers AI, and parts of the dashboard. Marketing properties served from the edge kept serving.&lt;/p&gt;

&lt;p&gt;That asymmetry is the point. A customer running a marketing site on Cloudflare and a separate Access-protected dashboard would have seen two different realities at the same time. The marketing URL returned 200 OK every minute. The Access redirect to the identity provider hung or failed. An HTTP monitor pointed at the marketing site never flipped. An HTTP monitor pointed at the dashboard URL might have returned 200 on the redirect page itself, then handed the user to a login flow that never resolved.&lt;/p&gt;

&lt;p&gt;The customer-facing impact was binary. Anyone who only owned the marketing surface slept through it. Anyone who depended on Access for the actual product surface lost two and a half hours of authenticated traffic. The monitoring evidence on both sides was a green dashboard.&lt;/p&gt;

&lt;h2&gt;
  
  
  10 failures HTTP monitoring misses
&lt;/h2&gt;

&lt;p&gt;Each pattern below is the same shape: the origin returns 200, the status code says healthy, and the user sees something broken. Where a named public incident exists, the source link goes to the post-mortem or vendor documentation. Two patterns are marked &lt;code&gt;[REPRODUCIBLE-ONLY]&lt;/code&gt; because the failure mode is documented in vendor docs and bug trackers but no single named incident is the cleanest anchor.&lt;/p&gt;

&lt;h3&gt;
  
  
  1. Identity provider dependency outage
&lt;/h3&gt;

&lt;p&gt;Your application authenticates against an identity provider or a token store. The IdP has an outage. Public pages and your origin stay green. The login button takes users to a redirect URL that hangs or errors.&lt;/p&gt;

&lt;p&gt;Cloudflare Access went to 100% failure for all identity-based logins during the June 12, 2025 Workers KV outage, while marketing properties on the same Cloudflare network kept serving. Per &lt;a href="https://blog.cloudflare.com/cloudflare-service-outage-june-12-2025/" rel="noopener noreferrer"&gt;Cloudflare's post-mortem&lt;/a&gt; . Xbox Live sign-in was unavailable for roughly 7 hours on July 2, 2024, while account services failed; see &lt;a href="https://www.bleepingcomputer.com/news/technology/xbox-is-down-worldwide-with-users-unable-to-login-play-games/" rel="noopener noreferrer"&gt;BleepingComputer's coverage&lt;/a&gt; and &lt;a href="https://variety.com/2024/digital/news/xbox-live-down-1236059299/" rel="noopener noreferrer"&gt;Variety's coverage&lt;/a&gt; .&lt;/p&gt;

&lt;p&gt;The HTTP probe saw 200 on the marketing surface and on the login page itself. The user saw an IdP redirect that never returned a session.&lt;/p&gt;

&lt;p&gt;A browser login monitor walks the redirect chain to the IdP and back. If the IdP hangs, the monitor times out at the IdP step and captures a screenshot of where the redirect died.&lt;/p&gt;

&lt;h3&gt;
  
  
  2. Auth backend up, sessions broken (token mixup)
&lt;/h3&gt;

&lt;p&gt;Web servers respond. The login form renders. The POST to the login handler returns 200. But the session token issued is invalid, mixed up across users, or the session store has flushed. Users either get bounced back to login or end up on the wrong account.&lt;/p&gt;

&lt;p&gt;Meta's March 5, 2024 outage is the cleanest public case. Servers were reachable and network paths were clear. Users got incorrect-password errors on correct passwords; some users reported being logged into other people's accounts. Per &lt;a href="https://www.thousandeyes.com/blog/meta-outage-analysis-march-5-2024" rel="noopener noreferrer"&gt;ThousandEyes' analysis&lt;/a&gt; and &lt;a href="https://borncity.com/win/2024/03/09/after-facebook-glitch-march-5-2024-have-you-been-able-to-log-in-to-other-peoples-accounts/" rel="noopener noreferrer"&gt;Born's reporting on the token-mixup behavior&lt;/a&gt; .&lt;/p&gt;

&lt;p&gt;The HTTP probe saw 200 on every public surface. The user saw a rejected password they knew was right.&lt;/p&gt;

&lt;p&gt;A browser login monitor signs in as a known test user and asserts on a post-login element specific to that user (the test user's name in the navbar, not a generic dashboard string). A token mix-up trips that assertion immediately.&lt;/p&gt;

&lt;h3&gt;
  
  
  3. Captcha vendor outage breaking form submission
&lt;/h3&gt;

&lt;p&gt;The login or signup form embeds a captcha widget from Cloudflare Turnstile, hCaptcha, or reCAPTCHA. When the captcha vendor has elevated latency or a partial outage, the widget either fails to render or the verification token never returns. The login form HTML still loads with HTTP 200.&lt;/p&gt;

&lt;p&gt;Cloudflare's June 12, 2025 outage explicitly listed Turnstile and Challenges among affected services. See &lt;a href="https://blog.cloudflare.com/cloudflare-service-outage-june-12-2025/" rel="noopener noreferrer"&gt;the same Cloudflare post-mortem&lt;/a&gt; . hCaptcha's public outage history on &lt;a href="https://statusgator.com/services/hcaptcha" rel="noopener noreferrer"&gt;StatusGator's tracking page&lt;/a&gt; shows recurring elevated-latency events that produce the same symptom.&lt;/p&gt;

&lt;p&gt;The HTTP probe saw 200 on the form URL. The user saw a captcha that never resolved.&lt;/p&gt;

&lt;p&gt;A browser login monitor waits for the captcha widget to render and for verification to complete before clicking submit. If the widget never resolves within the step timeout, the run fails with the captcha frame visible in the screenshot.&lt;/p&gt;

&lt;h3&gt;
  
  
  4. Cookie domain mismatch on subdomain split &lt;code&gt;[REPRODUCIBLE-ONLY]&lt;/code&gt;
&lt;/h3&gt;

&lt;p&gt;The application is hosted on a subdomain such as &lt;code&gt;app.example.com&lt;/code&gt; and signs cookies for &lt;code&gt;.example.com&lt;/code&gt;. A configuration change switches the cookie scope to host-only. Existing sessions become orphaned and new logins from a sibling subdomain cannot see the cookie.&lt;/p&gt;

&lt;p&gt;The closest documented case is the Backstage cross-subdomain refresh-token loop in &lt;a href="https://github.com/backstage/backstage/issues/28126" rel="noopener noreferrer"&gt;Backstage Issue #28126&lt;/a&gt; , where a prod cookie on a sibling subdomain trapped users in an infinite redirect when signing in to dev.&lt;/p&gt;

&lt;p&gt;The HTTP probe on the root domain saw 200 because the marketing page rendered. The user saw a redirect loop or a still-anonymous navigation bar.&lt;/p&gt;

&lt;p&gt;A browser login monitor opens a fresh Chromium context, hits the login URL, types credentials, and asserts on a post-login element. If the cookie never lands or never gets sent on the redirect, the assertion fails and a screenshot of the redirect loop is captured.&lt;/p&gt;

&lt;h3&gt;
  
  
  5. CSRF token mismatch after deploy &lt;code&gt;[REPRODUCIBLE-ONLY]&lt;/code&gt;
&lt;/h3&gt;

&lt;p&gt;A deploy rotates the secret used to sign CSRF tokens, restarts the session store, or changes the session driver. Existing browser sessions hold a token signed with the old secret. The next form submission fails CSRF validation and the user gets bounced back to login.&lt;/p&gt;

&lt;p&gt;The failure mode is widely documented. &lt;a href="https://github.com/laravel/framework/issues/9531" rel="noopener noreferrer"&gt;Laravel Issue #9531&lt;/a&gt; documents CSRF token mismatch after session-store changes locking users out. &lt;a href="https://www.ibm.com/support/pages/apar/IV91742" rel="noopener noreferrer"&gt;IBM tracker IV91742&lt;/a&gt; documents users being logged out of a dashboard after CSRF mismatch on widget interactions.&lt;/p&gt;

&lt;p&gt;The HTTP probe saw 200 on the login page HTML. The returning user saw their submitted form rejected with a session error.&lt;/p&gt;

&lt;p&gt;A browser login monitor handles CSRF token negotiation the way a real user would. A secret rotation breaks the next monitored login cycle and surfaces immediately on the following run.&lt;/p&gt;

&lt;h3&gt;
  
  
  6. JS bundle 404 with HTML 200
&lt;/h3&gt;

&lt;p&gt;A new release invalidates a hashed JS bundle filename. The HTML still references the old hash because the HTML response was cached at the edge. The browser fetches the old bundle, gets a 404, and the page renders as a blank shell.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://github.com/GoogleChrome/workbox/issues/1528" rel="noopener noreferrer"&gt;Workbox Issue #1528&lt;/a&gt; documents the exact pattern: a cached &lt;code&gt;index.html&lt;/code&gt; references hashed assets that no longer exist after a new build is released.&lt;/p&gt;

&lt;p&gt;The HTTP probe saw 200 on the page URL with a valid HTML body. The user saw a blank page with a console full of 404s.&lt;/p&gt;

&lt;p&gt;A browser login monitor executes the script tags. A 404 on the main bundle leaves the page with no event handlers. The login click does nothing, the post-login assertion never resolves, and the screenshot shows the bare HTML shell.&lt;/p&gt;

&lt;h3&gt;
  
  
  7. Service worker stuck on broken cached version
&lt;/h3&gt;

&lt;p&gt;A previous build registered a service worker that aggressively caches the app shell. The next build is backend-incompatible, but returning visitors are served the old shell from the service-worker cache before the update lifecycle completes. Visitors see the old broken UI; first-time visitors and your HTTP probe see the new working version.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://github.com/angular/angular/issues/43163" rel="noopener noreferrer"&gt;Angular Issue #43163&lt;/a&gt; documents the production failure mode where users get a broken cached shell that survives reload, with recovery requiring the kill-switch trick on &lt;code&gt;ngsw.json&lt;/code&gt;.&lt;/p&gt;

&lt;p&gt;The HTTP probe saw 200 on the new shell. The returning user saw the old shell wired to a backend that no longer matches.&lt;/p&gt;

&lt;p&gt;A browser login monitor uses fresh contexts by default, so it sees the new build. To catch service-worker-stuck visitors, run a second monitor with persistent storage enabled. The failure surfaces as a login-form-handler not bound to the new API.&lt;/p&gt;

&lt;h3&gt;
  
  
  8. Plugin auto-update silently swaps to a different plugin slug
&lt;/h3&gt;

&lt;p&gt;A platform allows automatic plugin updates. An update silently replaces one plugin with a forked or renamed one. Code that referenced the old plugin's internals breaks. PHP errors land but the response is committed before death, so the body is partial or blank with HTTP 200.&lt;/p&gt;

&lt;p&gt;The clean public anchor is the ACF to Secure Custom Fields auto- switch on October 12, 2024, which affected sites running Advanced Custom Fields with auto-updates enabled. The full case study lives in our post on &lt;a href="https://velprove.com/blog/wordpress-plugin-update-broke-site-monitoring" rel="noopener noreferrer"&gt;the ACF Secure Custom Fields case study&lt;/a&gt; .&lt;/p&gt;

&lt;p&gt;The HTTP probe saw 200 on the page URL. The user saw a partially rendered admin screen with the navigation missing.&lt;/p&gt;

&lt;p&gt;A browser login monitor renders the actual page and the post-load DOM assertion fails when the body is empty or missing the expected admin nav.&lt;/p&gt;

&lt;h3&gt;
  
  
  9. Database read-replica lag returning empty results
&lt;/h3&gt;

&lt;p&gt;The application reads from a replica that is lagging or has stopped replicating. Queries return empty result sets, not errors. A login API returns 200 with a null user object. The frontend renders an empty dashboard or a no-data state.&lt;/p&gt;

&lt;p&gt;The class is documented by every major cloud provider. See &lt;a href="https://docs.aws.amazon.com/AmazonRDS/latest/UserGuide/USER_ReadRepl.Troubleshooting.html" rel="noopener noreferrer"&gt;AWS RDS read-replica troubleshooting docs&lt;/a&gt; and &lt;a href="https://cloud.google.com/sql/docs/postgres/replication/replication-lag" rel="noopener noreferrer"&gt;GCP Cloud SQL replication-lag docs&lt;/a&gt; .&lt;/p&gt;

&lt;p&gt;The HTTP probe saw 200 with a valid JSON body. The user saw a dashboard with no data on an account that should have hundreds of records.&lt;/p&gt;

&lt;p&gt;A browser login monitor logs in with a known test user that has known post-login content. If the replica returned empty for that user's record, the dashboard assertion fails.&lt;/p&gt;

&lt;h3&gt;
  
  
  10. Internal automation deletes data; monitoring blind to it
&lt;/h3&gt;

&lt;p&gt;A control-plane script with the wrong inputs deletes customer data via a sanctioned workflow path. Internal monitoring sees the workflow as normal because it is a normal workflow. Customer-facing surfaces still respond with 200, but they 404 or empty-state for affected accounts.&lt;/p&gt;

&lt;p&gt;Per &lt;a href="https://www.atlassian.com/blog/atlassian-engineering/post-incident-review-april-2022-outage" rel="noopener noreferrer"&gt;Atlassian's April 2022 post-incident review&lt;/a&gt; , internal monitoring did not detect the issue because the sites were deleted via a standard workflow. The first impacted customer opened a support ticket at 07:46 UTC on April 5th, roughly 8 minutes after the deletion script started at 07:38 UTC. The same incident has a separate SLA-credit dimension covered in &lt;a href="https://velprove.com/blog/sla-vs-slo-vs-sli-customer-guide" rel="noopener noreferrer"&gt;the SLA-credit lens on the same Atlassian incident&lt;/a&gt; .&lt;/p&gt;

&lt;p&gt;Atlassian's internal monitoring saw a normal workflow. The affected customer saw 404s on a workspace that no longer existed for them.&lt;/p&gt;

&lt;p&gt;A browser login monitor that signs into the affected tenant flips to failed within one cycle, because the test account's tenant is gone and the login redirects to a workspace-not-found screen.&lt;/p&gt;

&lt;h2&gt;
  
  
  What a browser login monitor sees that HTTP doesn't
&lt;/h2&gt;

&lt;p&gt;A browser login monitor, like the one Velprove runs on every tier including Free, executes the page in real headless Chromium. It loads the URL, waits for JavaScript to execute, fills the login form, clicks submit, and asserts that a known post-login element actually rendered. The 10 patterns above all surface as either an assertion timeout, a screenshot of the broken state, or a redirect chain that dies at a named step. A status-code probe sees none of that.&lt;/p&gt;

&lt;p&gt;Velprove includes a free browser login monitor on the Free plan. Every tier monitors from 5 global regions. Concretely:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Free, $0:&lt;/strong&gt; 1 browser login monitor at a 15-minute interval, 10 HTTP monitors at a 5-minute interval, multi-step API monitors up to 3 steps, email alerts. No credit card required. &lt;strong&gt;Starter, $19/mo:&lt;/strong&gt; 3 browser login monitors at a 10-minute interval, 25 HTTP monitors at a 1-minute interval, multi-step API monitors up to 5 steps, plus Slack, Discord, Teams, and webhook alerts. &lt;strong&gt;Pro, $49/mo:&lt;/strong&gt; 10 browser login monitors at a 5-minute interval, 100 HTTP monitors at a 30-second interval, multi-step API monitors up to 10 steps, plus PagerDuty.&lt;/p&gt;

&lt;p&gt;If you want a structured walkthrough before signing up, our &lt;a href="https://velprove.com/blog/browser-monitor-vs-http-monitor-decision-tree" rel="noopener noreferrer"&gt;decide if a browser login monitor is the right tool&lt;/a&gt; post asks seven binary questions and lands on a yes/no.&lt;/p&gt;

&lt;h2&gt;
  
  
  Why this keeps happening (and the broader case)
&lt;/h2&gt;

&lt;p&gt;The structural reason is simple: HTTP probes test the transport layer, not the application layer. Modern applications fail at the application layer in ways that do not propagate to status codes. The failure surface has moved up the stack while the dominant monitor type stayed at L7. Every pattern above is a different version of the same gap. &lt;a href="https://velprove.com/blog/why-uptime-monitors-miss-outages" rel="noopener noreferrer"&gt;The broader case for why HTTP monitors miss outages&lt;/a&gt; covers the underlying argument; this post is the receipts.&lt;/p&gt;

&lt;p&gt;The other half of the gap is the API layer. Patterns 2 (token mixup) and 9 (replica lag) both have an API-shape variant where the broken response happens between two services with no human in the loop. For that surface, our guide on &lt;a href="https://velprove.com/blog/multi-step-api-monitoring-guide" rel="noopener noreferrer"&gt;multi-step API monitoring catches token-refresh failures&lt;/a&gt; covers chained-call assertions that go beyond a single status check.&lt;/p&gt;

&lt;h2&gt;
  
  
  A practical monitoring stack that catches all 10 patterns
&lt;/h2&gt;

&lt;p&gt;Three layers, in order of cost and depth.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;HTTP monitors with content assertions.&lt;/strong&gt; Cheap, broad, and the right floor for every public URL. Add a keyword assertion against rendered content so a 200 with the wrong body does not pass. &lt;strong&gt;Multi-step API monitors on the auth and critical paths.&lt;/strong&gt; Velprove includes 3-step on Free, 5-step on Starter, and 10-step on Pro. A chained call proves the auth-token-call-response shape across boundaries. &lt;strong&gt;One browser login monitor on the actual login flow.&lt;/strong&gt; Velprove's Free tier includes one, and it catches the patterns HTTP and API checks cannot reach.&lt;/p&gt;

&lt;p&gt;For SaaS specifically, the closing setup is one browser login monitor on your sign-in flow plus an HTTP layer for breadth. Start with the &lt;a href="https://velprove.com/for/saas" rel="noopener noreferrer"&gt;browser login monitor for SaaS&lt;/a&gt; page for the SaaS-shaped configuration; sign-up takes under five minutes.&lt;/p&gt;

&lt;h2&gt;
  
  
  Frequently Asked Questions
&lt;/h2&gt;

&lt;h3&gt;
  
  
  What happened during the Cloudflare June 12 2025 Workers KV outage?
&lt;/h3&gt;

&lt;p&gt;Cloudflare's underlying storage provider for Workers KV failed for 2 hours and 28 minutes. The outage cascaded to Cloudflare Access (100% identity-login failure), Turnstile, WARP, Workers AI, and parts of the dashboard. Marketing properties served by Cloudflare's edge stayed online throughout because they don't depend on KV. Per &lt;a href="https://blog.cloudflare.com/cloudflare-service-outage-june-12-2025/" rel="noopener noreferrer"&gt;Cloudflare's engineering blog write-up of the June 12 2025 outage&lt;/a&gt; , every authenticated surface failed while every static surface stayed green.&lt;/p&gt;

&lt;h3&gt;
  
  
  Why did Cloudflare Access fail while marketing sites stayed up?
&lt;/h3&gt;

&lt;p&gt;Cloudflare Access stores identity tokens and session state in Workers KV. When KV's underlying storage went down, Access could not validate identities and rejected 100% of login attempts. Marketing sites on the same Cloudflare network continued serving cached HTML and assets from the edge. An HTTP monitor probing a marketing URL saw 200 OK throughout. A browser login monitor running through the Access flow would have failed at the IdP redirect.&lt;/p&gt;

&lt;h3&gt;
  
  
  What did Atlassian's April 2022 post-incident review say about internal monitoring?
&lt;/h3&gt;

&lt;p&gt;Internal monitoring did not detect the issue because the sites were deleted via a standard workflow, per &lt;a href="https://www.atlassian.com/blog/atlassian-engineering/post-incident-review-april-2022-outage" rel="noopener noreferrer"&gt;Atlassian's published review&lt;/a&gt; . The first impacted customer opened a support ticket at 07:46 UTC on April 5th, roughly 8 minutes after the deletion script started at 07:38 UTC. A control-plane script with the wrong inputs deleted hundreds of customer tenants through a sanctioned code path. Monitoring saw a normal workflow. The customer surface returned 404. For the deeper SLA-credit lens, see &lt;a href="https://velprove.com/blog/sla-vs-slo-vs-sli-customer-guide" rel="noopener noreferrer"&gt;our SLA vs SLO vs SLI guide&lt;/a&gt; .&lt;/p&gt;

&lt;p&gt;Posted by Velprove.&lt;/p&gt;

</description>
      <category>monitoring</category>
      <category>webdev</category>
      <category>devops</category>
      <category>uptime</category>
    </item>
    <item>
      <title>SLA vs SLO vs SLI: Which One Pays Out When You're Down?</title>
      <dc:creator>velprove</dc:creator>
      <pubDate>Fri, 08 May 2026 14:00:04 +0000</pubDate>
      <link>https://dev.to/velprove/sla-vs-slo-vs-sli-which-one-pays-out-when-youre-down-3fgp</link>
      <guid>https://dev.to/velprove/sla-vs-slo-vs-sli-which-one-pays-out-when-youre-down-3fgp</guid>
      <description>&lt;p&gt;** Quick answer: Only the SLA actually entitles you to money when your vendor goes down. The SLO is the vendor's internal target, almost always tighter than the SLA they sold you, and the gap between the two is the vendor's safety margin and your blind spot. Atlassian's April 2022 outage took 13 days to fully restore, and the contract caps service credits at 100% of one month's fees. If you want to know whether the vendor hit the number, you need an independent SLI source. We will not redo the credit-mechanics walkthrough here. The vendor SLA receipts and the file-a-claim deep dive live in &lt;a href="https://velprove.com/blog/verify-hosting-provider-uptime" rel="noopener noreferrer"&gt;our hosting SLA verification post&lt;/a&gt; . **&lt;/p&gt;

&lt;h2&gt;
  
  
  The three letters in 60 seconds
&lt;/h2&gt;

&lt;p&gt;The three letters get treated as interchangeable in vendor marketing. They are not. From a customer's point of view, only one of them comes with a credit you can actually claim, and the other two exist mostly to set expectations the vendor never formally agreed to. The customer-side definitions are short.&lt;/p&gt;

&lt;h3&gt;
  
  
  SLI: what the vendor measures
&lt;/h3&gt;

&lt;p&gt;A Service Level Indicator is the metric the vendor measures their own service against. The Google SRE Book's SLO chapter is the canonical reference ( &lt;a href="https://sre.google/sre-book/service-level-objectives/" rel="noopener noreferrer"&gt;sre.google/sre-book/service-level-objectives&lt;/a&gt; ). It frames the SLI as a quantitative measure of some aspect of the service: request latency, error rate, throughput, or availability. For a customer-facing SLA the SLI is almost always some flavor of monthly uptime percentage. AWS EC2's Region-Level SLA, for example, defines an availability incident as all your running instances across two or more Availability Zones in the same region losing external connectivity, with monthly uptime calculated as the percentage of minutes in the month during which that condition held. The SLI is just the measurement. It does not commit the vendor to anything by itself.&lt;/p&gt;

&lt;h3&gt;
  
  
  SLO: what the vendor commits internally
&lt;/h3&gt;

&lt;p&gt;A Service Level Objective is the internal target the vendor's engineers are paged against. The Google SRE Book SLO chapter recommends keeping a safety margin by setting an internal SLO tighter than the SLO advertised to users, so engineering teams have room to recover before the contractual line is crossed. A vendor might run their internal Confluence Cloud team to a 99.99% SLO while publishing 99.9% in the SLA. Missing the SLO triggers an internal incident review. It does not entitle the customer to anything. You never see the SLO number, and the SLO has no contractual force outside the vendor's engineering org.&lt;/p&gt;

&lt;h3&gt;
  
  
  SLA: what the vendor commits to YOU contractually
&lt;/h3&gt;

&lt;p&gt;A Service Level Agreement is the only one of the three the vendor signed in writing with you. Microsoft's own Azure reliability documentation is explicit on the contractual binding ( &lt;a href="https://learn.microsoft.com/en-us/azure/reliability/concept-service-level-agreements" rel="noopener noreferrer"&gt;learn.microsoft.com/en-us/azure/reliability/concept-service-level-agreements&lt;/a&gt; ): the SLA is the formal financial commitment, defining the uptime target, the credit schedule, and the claim process. If the vendor misses the SLO but hits the SLA, you get nothing. If they miss the SLA, you get a service credit, almost never a cash refund. The SLA is the only one of the three letters that pays out, and only if you file the claim with evidence inside the window.&lt;/p&gt;

&lt;h2&gt;
  
  
  Why the gap between SLO and SLA matters
&lt;/h2&gt;

&lt;p&gt;Most articles about SLA vs SLO vs SLI are written for vendor SREs. The structural insight that matters from a customer angle is simpler. The SLO is tighter than the SLA. The space between them is where most outages live, and you have no contractual claim on that space at all.&lt;/p&gt;

&lt;h3&gt;
  
  
  The vendor's SLO is tighter than the SLA they sold you
&lt;/h3&gt;

&lt;p&gt;The Google SRE Book SLO chapter ( &lt;a href="https://sre.google/sre-book/service-level-objectives/" rel="noopener noreferrer"&gt;sre.google/sre-book/service-level-objectives&lt;/a&gt; ) recommends keeping a safety margin by setting an internal SLO tighter than the SLO advertised to users. The chapter's primary distinction between SLO and SLA is consequences: if there is no explicit consequence for missing the target, you are looking at an SLO, not an SLA. The chapter frames the internal SLO as the operational alarm, and the SLA as the legal floor. The vendor knows this. Their SREs are paged on the SLO. Their lawyers wrote the SLA. The customer mostly sees marketing copy that quotes the looser of the two and presents it as a guarantee.&lt;/p&gt;

&lt;h3&gt;
  
  
  The gap is the vendor's safety margin
&lt;/h3&gt;

&lt;p&gt;A concrete example. Suppose a vendor publishes a 99.9% monthly SLA and runs internally to a 99.99% SLO. 99.9% allows roughly 43 minutes and 50 seconds of downtime per 30-day month. 99.99% allows about 4 minutes 22 seconds. The 0.09% gap is the vendor's monthly safety margin: roughly 39 minutes that the vendor can burn through quietly without owing anyone a credit. Most months the vendor consumes some of that buffer. Their SREs handle the noisy incidents internally. You see an ordinary green status page and no credit ever fires. The system works as designed. The design is just not optimized for the customer's benefit.&lt;/p&gt;

&lt;h3&gt;
  
  
  The gap is your blind spot
&lt;/h3&gt;

&lt;p&gt;You care about the SLA, but the vendor's status page and marketing pages talk in SLO-shaped language. "We aim for 99.99%" is an SLO sentence. "Our service is guaranteed at 99.9% per the SLA" is an SLA sentence. The two look similar in marketing copy and lead you to believe you have a tighter contractual claim than you actually have. Reading the actual SLA legal page is the only way to know what was signed. The number you saw in marketing is almost never the number in the legal document, and the legal document is the only one a court or a credit-claim adjudicator cares about.&lt;/p&gt;

&lt;h2&gt;
  
  
  When 13 days of downtime gets you a $0 credit
&lt;/h2&gt;

&lt;p&gt;The cleanest case study of why the SLA gap matters is Atlassian's April 2022 Cloud outage. On April 5, 2022, a faulty maintenance script ran inside Atlassian Cloud and permanently deleted the active customer data of 883 customer sites belonging to 775 customers, instead of the legacy data it was supposed to target. Atlassian published a detailed post-incident review on their engineering blog ( &lt;a href="https://www.atlassian.com/blog/atlassian-engineering/post-incident-review-april-2022-outage" rel="noopener noreferrer"&gt;atlassian.com/blog/atlassian-engineering/post-incident-review-april-2022-outage&lt;/a&gt; ) describing the cascade of small mistakes that produced the incident. Pragmatic Engineer's reporting at the time ( &lt;a href="https://newsletter.pragmaticengineer.com/p/scoop-atlassian" rel="noopener noreferrer"&gt;newsletter.pragmaticengineer.com/p/scoop-atlassian&lt;/a&gt; ) added the customer-side timeline.&lt;/p&gt;

&lt;p&gt;The recovery was slow. The first restored customer sites came back on April 8, three days after the incident began. Full restoration of all affected sites was completed on April 18, 13 days in. For a Confluence or Jira tenant, those 13 days were not a degraded experience. They were a complete outage of business-critical tooling for product teams, support teams, and engineering teams who lived inside those products every day.&lt;/p&gt;

&lt;p&gt;Now read the SLA. Atlassian publishes their service level agreement at &lt;a href="https://www.atlassian.com/legal/sla" rel="noopener noreferrer"&gt;atlassian.com/legal/sla&lt;/a&gt; . At time of writing, Atlassian's SLA covers Premium (99.9%) and Enterprise (99.95%) tiers only, caps service credits at 100% of the affected Cloud Product's monthly invoice, and requires customers to file a credit claim within 15 days of the end of the calendar month in which the failure occurred. The cap is the ceiling on the financial obligation. It does not scale with the duration of the outage past that ceiling.&lt;/p&gt;

&lt;p&gt;Run the math on a typical affected tenant. A Confluence Standard tenant at $10 per user per month with 50 seats bills $500 a month. Even a 100% credit on one month's fees caps the recovery at $500. 13 days of business-critical tooling unavailable cost most affected teams orders of magnitude more than $500: blocked product launches, missed support SLAs of their own, customer churn, contractor hours rebuilding wikis from cached pages. The contract Atlassian signed was honored. The contract was not designed to compensate you for the downstream impact of a 13-day outage.&lt;/p&gt;

&lt;p&gt;The punchline is not that Atlassian acted in bad faith. They published a long, candid post-incident review and they paid credits per the contract they signed. The punchline is that reading the SLA before signing would have flagged the credit cap and the claim window, and an affected team would at least have known what their contractual exposure looked like in the worst case. The lost productivity was uninsurable under the contract the vendor signed, and that is the structural truth across most enterprise SaaS SLAs, not just Atlassian's.&lt;/p&gt;

&lt;p&gt;We are not redoing the vendor SLA receipts table here. For a verbatim seven-vendor table with credit mechanics and claim deadlines drawn directly from each vendor's legal page, see &lt;a href="https://velprove.com/blog/verify-hosting-provider-uptime" rel="noopener noreferrer"&gt;the seven-vendor SLA receipts table&lt;/a&gt; .&lt;/p&gt;

&lt;h2&gt;
  
  
  How to read your vendor's SLA in 5 minutes
&lt;/h2&gt;

&lt;p&gt;Most SLA legal pages run two to four pages of dense formatting. You do not need to read every clause. There are five things that actually matter, and they are usually in the same order across vendors. Skim the headline percentage, then go to the parts that determine what the percentage covers and how you collect.&lt;/p&gt;

&lt;h3&gt;
  
  
  Find the carve-out section first
&lt;/h3&gt;

&lt;p&gt;Every SLA is mostly about what is excluded, not what is covered. Microsoft's Azure SLA reading guide ( &lt;a href="https://learn.microsoft.com/en-us/azure/reliability/concept-service-level-agreements" rel="noopener noreferrer"&gt;learn.microsoft.com/en-us/azure/reliability/concept-service-level-agreements&lt;/a&gt; ) walks through the standard exclusion structure: scheduled maintenance, force majeure, customer-caused issues, third-party network failures, and beta features. Atlassian's legal page uses the same structure. AWS does too. The carve-outs are not bad faith. They are standard. The point is that a 99.99% SLA with broad carve-outs (every weekly maintenance window, every partner outage, every beta-tagged feature) is materially weaker than a 99.9% SLA with narrow carve-outs.&lt;/p&gt;

&lt;h3&gt;
  
  
  Find the credit cap
&lt;/h3&gt;

&lt;p&gt;Almost every SLA caps the credit at 100% of one month's fees on the affected service, and most credit schedules max out well before that. On a Hostinger shared plan at $4 a month, the 5% credit for a missed-SLA month is 20 cents. On a Kinsta plan at $40 a month, a one-hour outage credit is $4. The dollar amounts are small at the bottom of the market and not much larger at the enterprise end relative to the cost of an extended outage. The credit is rarely the leverage. The documented record that you are owed it is.&lt;/p&gt;

&lt;h3&gt;
  
  
  Find the claim deadline
&lt;/h3&gt;

&lt;p&gt;Claim windows vary widely. Cloudflare Business is 5 business days. OVHcloud allows 60 calendar days. 30 days is the most common window across SaaS and hosting. You are required to file inside the window, with evidence, or the credit is forfeited regardless of how clear the outage was. If you do not have monitoring running before the outage, by the time you notice the analytics decline the next week, the Cloudflare window has already closed.&lt;/p&gt;

&lt;h3&gt;
  
  
  Find the SLI definition
&lt;/h3&gt;

&lt;p&gt;The SLA points back to a specific SLI, and the SLI definition determines what counts as "up." AWS's EC2 SLA ( &lt;a href="https://aws.amazon.com/compute/sla/" rel="noopener noreferrer"&gt;aws.amazon.com/compute/sla&lt;/a&gt; ) defines the SLI at the region level, not the instance level for the headline number. Vercel's SLA ( &lt;a href="https://vercel.com/legal/sla" rel="noopener noreferrer"&gt;vercel.com/legal/sla&lt;/a&gt; ) excludes Hobby and Pro entirely from any uptime SLI. If your single EC2 instance was down but the rest of your multi-AZ deployment was still externally reachable, the AWS Region-Level SLA was not breached and you have no claim against the 99.99% commitment. The SLI definition is where the percentage actually gets calculated, and where most surprises live.&lt;/p&gt;

&lt;h3&gt;
  
  
  Hand-off to the deep dive
&lt;/h3&gt;

&lt;p&gt;Once you know the carve-outs, the cap, the deadline, and the SLI definition, the remaining work is filing the claim with evidence. For credit-claim mechanics across seven major hosts, including the verbatim evidence the SLAs demand and the per-host claim deadlines, see &lt;a href="https://velprove.com/blog/verify-hosting-provider-uptime" rel="noopener noreferrer"&gt;the credit-claim mechanics walkthrough&lt;/a&gt; .&lt;/p&gt;

&lt;h2&gt;
  
  
  How to track the SLI yourself
&lt;/h2&gt;

&lt;p&gt;The SLA is enforceable, but only with evidence. The vendor calculates the SLI from their own infrastructure, and they get to decide what the headline number was for the month. To file a credible claim, or to know whether the vendor is quietly consuming the SLO/SLA gap month after month, you need an independent SLI source measured from outside the vendor's network. This is also &lt;a href="https://velprove.com/blog/why-uptime-monitors-miss-outages" rel="noopener noreferrer"&gt;why standard monitoring misses real outages&lt;/a&gt; when it is configured wrong: low-frequency probes, single region, no auth-protected pages.&lt;/p&gt;

&lt;h3&gt;
  
  
  Why vendor self-reporting can't be the only source
&lt;/h3&gt;

&lt;p&gt;AWS's own post-mortem on the December 7, 2021 us-east-1 outage ( &lt;a href="https://aws.amazon.com/message/12721/" rel="noopener noreferrer"&gt;aws.amazon.com/message/12721&lt;/a&gt; ) admits a 52-minute gap between when the outage began and when their Service Health Dashboard reflected it. AWS is the most resourced cloud vendor on Earth and their own status page lagged a major incident by nearly an hour. The structural reason is that the status page links to SLA financial exposure and the publishing decision sits with operations, not real-time automation. The full reporting and quotes from former AWS engineers on this lives in our &lt;a href="https://velprove.com/blog/verify-hosting-provider-uptime" rel="noopener noreferrer"&gt;hosting SLA verification post&lt;/a&gt; . For the SLA-tracking purposes here, the takeaway is enough: the vendor's self-reported SLI is not, by itself, evidence you can rely on.&lt;/p&gt;

&lt;h3&gt;
  
  
  What an independent monitor needs to do
&lt;/h3&gt;

&lt;p&gt;An SLI tracker for a third-party vendor needs four things: probe cadence at less than half the SLA error budget (a 5-minute interval is the floor for a 99.9% monthly SLA, which leaves about 43 minutes of allowed downtime), multi-region coverage so a single-region blip does not look like a vendor outage, calendar-month rollups because every published SLA is calculated on a calendar-month basis, and incident history that retains at least the longest claim window in your stack (60 days for OVHcloud) so the data is still around when you file.&lt;/p&gt;

&lt;h3&gt;
  
  
  How Velprove fits
&lt;/h3&gt;

&lt;p&gt;Velprove is an independent SLI source for your own SLO tracking. We are not an SLO management tool. We do not let you define formal SLOs, calculate error budgets, or alert on burn-rate windows. We probe the URLs you give us from outside the vendor's network from 5 global regions, including a browser login monitor for auth-protected pages, and store timestamped probe results rolled up to monthly uptime percentages and incident counts. The SLO definition stays with you, in whatever spreadsheet or internal tracker you already use. Velprove provides the measurement.&lt;/p&gt;

&lt;p&gt;The free plan is sized for exactly this: 10 monitors, 5-minute intervals on HTTP, 1 browser login monitor at 15-minute intervals, monitoring from 5 global regions, 30-day incident history, email alerts, and SSL certificate monitoring. Multi-step API monitors with up to 3 steps are included. No credit card required. The Free plan is enough to track an SLI for a small set of vendors and produce calendar-month rollups you can attach to a credit claim.&lt;/p&gt;

&lt;h2&gt;
  
  
  When the SLA is too weak to bother tracking
&lt;/h2&gt;

&lt;p&gt;Not every SLA is worth the effort. Some plans have no SLA at all, and some SLAs have carve-outs broad enough to make the headline percentage close to meaningless. In those cases the honest answer is to monitor for your own awareness, not for credit recovery.&lt;/p&gt;

&lt;h3&gt;
  
  
  Vercel Hobby and Pro: no SLA at all
&lt;/h3&gt;

&lt;p&gt;Vercel's SLA ( &lt;a href="https://vercel.com/legal/sla" rel="noopener noreferrer"&gt;vercel.com/legal/sla&lt;/a&gt; ) is titled "Enterprise Service Level Agreement" and applies only to Enterprise customers. Hobby and Pro have no published uptime SLA at all. If you are on Vercel Hobby or Pro, you have no contractual recourse for downtime, period. You are still welcome to monitor the site for your own benefit, but there is no credit claim to file when an outage happens.&lt;/p&gt;

&lt;h3&gt;
  
  
  Most consumer SaaS: 99.9% with carve-outs that gut the number
&lt;/h3&gt;

&lt;p&gt;A common pattern across mid-market SaaS: published 99.9%, scheduled maintenance excluded, third-party services excluded, beta features excluded, customer-side network issues excluded. By the time the carve-outs apply, the effective SLA may cover a quite narrow slice of real failures. The 99.9% headline is accurate as a marketing number and weak as a contractual one. You can still track the SLI honestly. The credit math just is not going to be the reason you bother.&lt;/p&gt;

&lt;h3&gt;
  
  
  When to renegotiate or migrate
&lt;/h3&gt;

&lt;p&gt;Renegotiation is rare and almost always reserved for Enterprise tiers with named account teams. For most plans the realistic options are accept the SLA as a best-effort dependency or migrate to a vendor with a tighter contract. If you are weighing monitoring tools as part of that decision, our review of &lt;a href="https://velprove.com/blog/choose-uptime-monitoring-tool-2026" rel="noopener noreferrer"&gt;how the major monitoring tools compare&lt;/a&gt; covers what matters when you are picking the SLI source itself.&lt;/p&gt;

&lt;h2&gt;
  
  
  Frequently Asked Questions
&lt;/h2&gt;

&lt;h3&gt;
  
  
  What is the difference between SLA, SLO, and SLI?
&lt;/h3&gt;

&lt;p&gt;SLI is the metric the vendor measures (e.g., monthly uptime percentage); SLO is the internal target the vendor's engineers are paged against (typically tighter than the SLA); SLA is the contractual commitment to the customer that triggers a service credit if missed. Of the three, only the SLA is enforceable by the customer. The SLO is the vendor's internal goal. The SLI is the input both the SLO and the SLA are calculated from. The Google SRE Book SLO chapter (sre.google/sre-book/service-level-objectives) is the canonical reference for the distinction.&lt;/p&gt;

&lt;h3&gt;
  
  
  Is SLO part of SLA?
&lt;/h3&gt;

&lt;p&gt;No, but they are related. The SLA contains an SLO-shaped commitment (e.g., "99.9% monthly uptime") that the vendor signs in writing. The vendor's actual internal SLO is typically tighter than the published SLA number, treated as a private operational target. The two are distinct documents. The SLA is legal. The SLO is operational. The customer only sees the SLA.&lt;/p&gt;

&lt;h3&gt;
  
  
  What is an example of SLI vs SLO?
&lt;/h3&gt;

&lt;p&gt;SLI: AWS EC2's Region-Level SLA defines an availability incident as all your running instances across two or more Availability Zones in the same region losing external connectivity, with monthly uptime calculated as the percentage of minutes in the month during which that condition held. SLO: AWS's internal target for that SLI, set tighter than the published 99.99% SLA so engineers are paged before the SLA is breached. The SLI is the measurement. The SLO is the goal. The SLA wraps a customer-facing version of the SLO into a contract. The AWS Compute SLA (aws.amazon.com/compute/sla) is the public-facing example.&lt;/p&gt;

&lt;h3&gt;
  
  
  Are SLAs legally binding?
&lt;/h3&gt;

&lt;p&gt;Yes, but the binding is narrower than most customers assume. The SLA commits the vendor to a specific service credit if a specific SLI is missed by a specific amount, claimed within a specific window. It does not commit the vendor to compensate the customer for downstream damages, lost revenue, or reputational harm. Most SLAs explicitly disclaim consequential damages. The credit cap is almost always a fraction of one month's fees on the affected service. Microsoft's Azure SLA reading guide (learn.microsoft.com/en-us/azure/reliability/concept-service-level-agreements) is a clean reference for the legal shape.&lt;/p&gt;

&lt;h3&gt;
  
  
  What happens when an SLO is missed but the SLA is met?
&lt;/h3&gt;

&lt;p&gt;Nothing happens to the customer. The vendor's internal SREs are paged and the engineering team works to recover, but no service credit is owed because the SLA threshold was not crossed. This is the gap that makes the SLO/SLA distinction matter to customers. The vendor consumed their internal safety margin. The customer experienced degraded service. The contract is silent.&lt;/p&gt;

&lt;h3&gt;
  
  
  How do I measure my own SLI for a third-party vendor?
&lt;/h3&gt;

&lt;p&gt;Run an independent monitor that probes the vendor from outside their network on a fixed schedule, captures timestamps and response codes, and stores monthly rollups. The monitor needs multi-region coverage so a regional probe failure does not look like a vendor outage, and a probe interval at less than half the SLA error budget. Velprove's free plan covers exactly this surface: 10 HTTP monitors at 5-minute intervals from 5 global regions, 1 browser login monitor at 15-minute intervals, 30-day incident history, no credit card required. Calendar-month rollups for credit claims included.&lt;/p&gt;

&lt;h3&gt;
  
  
  What's the most important thing to read in an SLA before signing?
&lt;/h3&gt;

&lt;p&gt;The carve-out section, not the headline percentage. Every major SLA excludes scheduled maintenance, force majeure, customer-caused issues, third-party network failures, and beta features. The exclusions determine what the percentage actually covers. A 99.99% SLA with broad carve-outs is materially weaker than a 99.9% SLA with narrow carve-outs. After the carve-outs, read the credit cap (almost always capped at 100% of one month's fees) and the claim window (5 to 60 days, varies wildly).&lt;/p&gt;

&lt;h2&gt;
  
  
  Want to track your vendor's SLI honestly?
&lt;/h2&gt;

&lt;p&gt;The SLA is the only one of the three that pays out, and only if you have the evidence. Velprove's free plan is an independent SLI source for your own SLO tracking: 10 HTTP monitors, 1 browser login monitor every 15 minutes, 5 global regions, 30-day incident history, no credit card required. We are not enterprise SLA tracking software. We are the calendar-month receipts you bring when the credit window opens. &lt;a href="https://velprove.com/signup" rel="noopener noreferrer"&gt;Start a free Velprove account.&lt;/a&gt;&lt;/p&gt;

</description>
      <category>monitoring</category>
      <category>webdev</category>
      <category>devops</category>
      <category>uptime</category>
    </item>
    <item>
      <title>How to Choose the Right Uptime Monitor in 2026</title>
      <dc:creator>velprove</dc:creator>
      <pubDate>Fri, 08 May 2026 14:00:03 +0000</pubDate>
      <link>https://dev.to/velprove/how-to-choose-the-right-uptime-monitor-in-2026-3khb</link>
      <guid>https://dev.to/velprove/how-to-choose-the-right-uptime-monitor-in-2026-3khb</guid>
      <description>&lt;p&gt;** Quick rundown: The right uptime monitor depends on what you actually need it to do, not on which vendor has the slickest landing page. If you are a solo founder, an early-stage SaaS team, or an ecommerce store, Velprove's free plan covers your use case at no cost: HTTP monitors across 5 regions, a browser login monitor for your account or checkout flow, multi-step API monitors, and a public status page. Paid tiers start at Velprove Starter $19 per month and Velprove Pro $49 per month, with no credit card required for free. If you are an agency monitoring more than 25 client sites, Hyperping is built for your use case at scale. If you are a SaaS team with SOC 2, SAML, or Terraform requirements in procurement, Uptime.com or Site24x7 are calibrated for that bar. If you are a hosting reseller monitoring at high volume, HetrixTools wins on bulk-provisioning API and blacklist monitoring. The matrix below shows 15 vendors compared across the axes that actually decide the answer. **&lt;/p&gt;

&lt;h2&gt;
  
  
  Why "best uptime monitor" depends on your use case
&lt;/h2&gt;

&lt;p&gt;Most "best uptime monitor" lists rank vendors top to bottom and never ask what you actually need. That works if your requirements happen to match the listicle author's favorite vendor. It does not work if you are a hosting reseller reading a list whose top pick is calibrated for an enterprise SaaS procurement team, or a solo founder reading a list whose top pick assumes you already have a paid Slack workspace and an on-call rotation.&lt;/p&gt;

&lt;p&gt;The honest framing is that there are six distinct use cases worth separating. Solo founders running indie products. Ecommerce stores with cart and checkout flows that have to keep working. Early-stage SaaS teams whose login page and OAuth flows are the load-bearing monitor surface. Agencies monitoring 25 or more client sites at scale. SaaS teams with enterprise customers asking for SOC 2 Type II, SAML SSO, or a Terraform provider from their monitoring vendor. Hosting resellers provisioning hundreds of monitors per WHMCS account.&lt;/p&gt;

&lt;p&gt;Each segment cares about a different combination of axes: monitor count, browser login monitor availability, multi-step API support, status page features (white-label and custom domain), probe count, procurement compliance, and bulk-provisioning API. No single vendor wins all six segments. We will say so honestly below.&lt;/p&gt;

&lt;p&gt;To set expectations: Velprove is the recommended pick for three of the six segments (solo founder, ecommerce, early-stage SaaS without enterprise procurement requirements). For the other three (agency at scale, enterprise SaaS with SOC 2 procurement, hosting reseller), a different vendor wins and we will route you there. The 15-vendor matrix at the end of this post lays out every cell.&lt;/p&gt;

&lt;h2&gt;
  
  
  Solo founder: what you need and why Velprove fits
&lt;/h2&gt;

&lt;p&gt;The solo founder profile is narrow and well-understood. You are running an indie product. You may have one paying customer or one thousand. You do not have a paid Slack workspace yet. You do not have an on-call rotation. You do not have a procurement department asking for SOC 2 Type II from your monitoring vendor. You need something that tells you when the homepage is down, when the login flow breaks, and when SSL is about to expire, with email alerts you will actually read.&lt;/p&gt;

&lt;p&gt;The non-negotiables are: HTTP monitoring on the homepage and the login URL, a browser login monitor on the auth flow (because the login page rendering breakage is the failure mode an HTTP keyword check misses), email alerts, a public status page you can point at when a customer asks "is it down for me too," and commercial use allowed on the free tier. That last one trips up most founders who default to UptimeRobot. &lt;a href="https://velprove.com/blog/uptimerobot-commercial-alternative" rel="noopener noreferrer"&gt;UptimeRobot Free has been personal-use-only since late 2024&lt;/a&gt; , so if your indie product earns any money, you cannot use it without a Terms of Service violation.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Recommended pick: Velprove Free.&lt;/strong&gt; Ten HTTP monitors at a 5-minute interval, one browser login monitor at a 15-minute interval, multi-step API monitors up to three steps, five global regions (North America, Europe, UK, Asia, Oceania), one public status page with a Velprove badge, email alerts, and commercial use allowed. No credit card required. The browser login monitor on the free tier is the differentiator: of the 15 vendors in the matrix below, only Velprove and Checkly include a browser monitor on free, and Checkly requires you to write Playwright scripts. Velprove is point-and-click.&lt;/p&gt;

&lt;p&gt;The honest if-you-outgrow-this routing: when you cross 10 monitors or want 1-minute HTTP intervals, Velprove Starter at $19 per month covers 25 monitors, 1-minute intervals, three browser login monitors, multi-step API up to five steps, and an unbranded status page. When you want PagerDuty, custom-domain status pages, or 30-second intervals, Velprove Pro at $49 per month covers those. One alternative on free: if you genuinely run a personal, non-commercial side project at high monitor count, UptimeRobot Free at 50 monitors beats Velprove Free at 10 on raw count. Pick by whether your project is commercial.&lt;/p&gt;

&lt;h2&gt;
  
  
  Ecommerce store: what you need and why Velprove fits
&lt;/h2&gt;

&lt;p&gt;Ecommerce monitoring is a higher bar than basic HTTP. Your homepage can be green while your checkout API is returning 500s and customers are bouncing on the payment step. The conversion math is brutal: an extra second of latency or a 30-minute checkout outage on a busy Saturday is a real revenue hit, and the fix starts with knowing before your customers tell you.&lt;/p&gt;

&lt;p&gt;What ecommerce stores actually need: a synthetic checkout-flow monitor that adds a product to cart, proceeds through checkout, and asserts the payment confirmation page renders. &lt;a href="https://velprove.com/blog/multi-step-api-monitoring-guide" rel="noopener noreferrer"&gt;Multi-step API monitoring for chained calls&lt;/a&gt; covers cart and checkout APIs that depend on session state and tokens passed between requests. A browser login monitor for the customer account portal catches the partial outage where the store is up but logged-in customers cannot reach order history. A public status page reduces support tickets when something goes wrong. SSL certificate monitoring, because an expired cert kills checkout instantly and silently. Regional probes near customer concentrations, so you see what European or Asian customers see, not just what your US-East monitor sees.&lt;/p&gt;

&lt;p&gt;What is overkill for most non-engineering ecommerce teams: code-first synthetic frameworks like Checkly that require an in-house engineer to maintain Playwright scripts in version control. That can be the right answer for a Shopify Plus team with a dedicated SRE function, but a typical Shopify or WooCommerce store gets faster value from a point-and-click monitor.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Recommended pick: Velprove Pro at $49 per month.&lt;/strong&gt; One hundred HTTP monitors at 30-second intervals, ten browser login monitors at 5-minute intervals, multi-step API up to ten steps for chained cart and checkout calls, all five global regions, custom domain status pages, PagerDuty plus Slack plus Discord plus Teams plus webhooks, and SSL monitoring built in. Velprove Starter at $19 per month is the fit if you do not need PagerDuty or custom-domain status pages yet.&lt;/p&gt;

&lt;p&gt;The honest if-you-outgrow-this routing: &lt;a href="https://velprove.com/blog/pingdom-alternative" rel="noopener noreferrer"&gt;Pingdom adds RUM and waterfall analytics if real-user performance is the bar&lt;/a&gt; for your conversion-rate work, with the budget to match. Site24x7 covers the same ground at a lower price. If you have an in-house engineer who will own the test scripts in version control, &lt;a href="https://velprove.com/blog/checkly-alternative" rel="noopener noreferrer"&gt;Checkly is the code-first synthetic option&lt;/a&gt; .&lt;/p&gt;

&lt;h2&gt;
  
  
  Early-stage SaaS: what you need and why Velprove fits
&lt;/h2&gt;

&lt;p&gt;Early-stage SaaS has the most-distinctive monitoring surface in the six segments. Your homepage is rarely the failure mode. Your login page rendering, your OAuth token exchange, your tenant-scoped API calls, and your background job runs are. An HTTP keyword check on the login URL passes when the form rendering is broken because the HTML still contains the keyword. A multi-step API monitor that executes the OAuth flow end to end catches the failure that matters.&lt;/p&gt;

&lt;p&gt;The non-negotiables for an early SaaS team: a &lt;a href="https://velprove.com/blog/monitor-saas-login-page" rel="noopener noreferrer"&gt;browser login monitor on the actual login page&lt;/a&gt; (catches form-render breakage that HTTP keyword checks miss), multi-step API monitoring for OAuth flows where step one requests a token and step two uses that token to make an authenticated call and assert the response shape, regional probes covering tenant geography (your US tenants and your European tenants do not see the same latency), on-call alerting routed through PagerDuty or OpsGenie, and a customer-facing status page customers can subscribe to.&lt;/p&gt;

&lt;p&gt;The procurement bar changes the answer. If you have enterprise customers asking for SOC 2 Type II from your monitoring vendor as part of their vendor risk assessment, or your security team mandates SAML SSO for all third-party SaaS, or your platform team wants a Terraform provider for IaC-managed monitors, &lt;strong&gt;Velprove does not include those features today&lt;/strong&gt; and you should read the next H2 instead. Velprove is calibrated for early SaaS where the monitoring bar is technical, not procurement compliance.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Recommended pick: Velprove Pro at $49 per month.&lt;/strong&gt; One hundred monitors, 30-second intervals, ten browser login monitors at 5-minute intervals on the login page, multi-step API up to ten steps for chained OAuth and tenant-scoped calls, five global regions, PagerDuty integration, and custom-domain status pages your customers can subscribe to. Velprove Starter at $19 per month covers the same shape minus PagerDuty and custom-domain status pages, and is the right fit for SaaS teams not yet on a paid PagerDuty plan.&lt;/p&gt;

&lt;p&gt;Honest competitor mention. &lt;a href="https://velprove.com/blog/better-stack-alternative" rel="noopener noreferrer"&gt;Better Stack bundles incident management and log aggregation if those are the bigger pain than the monitoring itself&lt;/a&gt; . Better Stack Responder at $34 per month monthly (or $29 per month annual) packages monitoring, on-call scheduling, and log aggregation under one roof. Velprove does not include on-call rotation scheduling or log aggregation today, so for a team where alert fatigue and log search are bigger pains than the monitoring surface itself, Better Stack is the cleaner pick.&lt;/p&gt;

&lt;h2&gt;
  
  
  Agency at scale: Hyperping is the recommended pick
&lt;/h2&gt;

&lt;p&gt;** Velprove is not the recommended pick for this segment. ** An agency monitoring 25 or more client sites needs a different shape of tool, and Hyperping owns the agency positioning category with a dedicated use-case page and feature set built around it.&lt;/p&gt;

&lt;p&gt;What agencies actually need is well-documented in primary sources. Per uptrue.io's 2026 agency monitoring guide, "a typical web agency monitoring client sites needs 3 to 5 monitors per client, usually an HTTP check, SSL monitoring, and DNS monitoring at minimum. An agency with 50 clients would need around 150 to 250 monitors." Hyperping's own use-cases page lists what the segment buys: "custom domains and branding, full white-label support per client, email and domain whitelist, password protection for stakeholder access, SAML single sign-on via Okta or Azure AD, automated scheduled maintenance notices, and SLA reporting to build trust."&lt;/p&gt;

&lt;p&gt;The math is unforgiving. Fifty client sites times three to five monitors per client lands at 150 to 250 monitors. Velprove Pro caps at 100 monitors. Above roughly 25 client sites, an agency outgrows Velprove Pro entirely, and the API is not built for the bulk-provisioning workflow agencies need. The agency-specific features (per-client access controls, branded alert emails, SAML SSO for stakeholder dashboards) are not part of Velprove's product surface today.&lt;/p&gt;

&lt;p&gt;** Recommended pick: &lt;a href="https://hyperping.com" rel="noopener noreferrer"&gt;Hyperping&lt;/a&gt; Pro at $74 per month annual or Business at $249 per month annual. ** Pro covers 100 monitors, 10 browser checks, three status pages, and five seats. Business covers 1,000 monitors, 15 seats, and the full agency feature set including SAML SSO, custom client domains, and per-client access. The pricing is flat per tier rather than per-monitor metered, which keeps the unit economics workable as the client roster grows.&lt;/p&gt;

&lt;p&gt;Honest fallback. Velprove Pro at $49 per month works for agencies under roughly 10 client sites that want a lower flat-fee pick and do not need per-client access controls or SAML SSO. The browser login monitor on every client login page, multi-step API support, and custom-domain status pages are all present at $49 per month. That is a real fit for a small studio. It stops being a fit fast as the client roster grows.&lt;/p&gt;

&lt;h2&gt;
  
  
  Enterprise SaaS with SOC 2 procurement: Uptime.com or Site24x7
&lt;/h2&gt;

&lt;p&gt;** Velprove is not the recommended pick for this segment. ** When your enterprise customers ask for SOC 2 Type II reports from your monitoring vendor as part of their vendor risk assessment, when your security team mandates SAML SSO across every third-party SaaS, or when your platform team wants a Terraform provider so monitors are IaC-managed alongside the rest of your infrastructure, Velprove does not meet that bar today.&lt;/p&gt;

&lt;p&gt;The procurement requirements are explicit. SOC 2 Type II from the monitoring vendor itself, with a current attestation report your customer's security team can review. SAML SSO for centralized access control. A Terraform provider for monitor lifecycle managed in version control. Fifty or more probe locations for global performance benchmarking, sometimes a hard requirement when your customer base spans multiple continents. Real user monitoring (RUM) for the user-experience-side metrics that complement the synthetic checks. Escalation policies with L1 to L2 routing rules. A dedicated customer success manager who picks up the phone.&lt;/p&gt;

&lt;p&gt;To be direct: &lt;strong&gt;Velprove does not include SOC 2 Type II, SAML SSO, a Terraform provider, RUM, or page-speed waterfall analytics today&lt;/strong&gt;. The capstone is honest about this. If any of those features are non-negotiable for your procurement cycle, you need a different tool.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Recommended picks:&lt;/strong&gt; &lt;a href="https://velprove.com/blog/uptime-com-alternative" rel="noopener noreferrer"&gt;Uptime.com is the typical pick for SOC 2 plus Terraform procurement requirements&lt;/a&gt; , starting at $7 per month annual ($9 monthly) for the website monitoring base, with 80+ probe locations, a no-code transaction recorder, and a separate $19 per month status page module on the modular calculator. Or &lt;a href="https://velprove.com/blog/site24x7-alternative" rel="noopener noreferrer"&gt;Site24x7 is the all-in-one pick for APM, RUM, and log management under one roof&lt;/a&gt; , starting at $9 per month annual for Web Uptime, with the observability bundle scaling up through APM, log management, AIOps, and network device monitoring.&lt;/p&gt;

&lt;p&gt;Honest if-you-outgrow-this-too. At the highest end (Fortune 500 procurement, $1M+ annual monitoring spend, dedicated observability team), Datadog and New Relic are different product categories entirely and outside the scope of this guide. Most readers do not need to think about that tier yet.&lt;/p&gt;

&lt;h2&gt;
  
  
  Hosting reseller: HetrixTools is the recommended pick
&lt;/h2&gt;

&lt;p&gt;** Velprove is not the recommended pick for this segment. ** Hosting resellers monitor at a fundamentally different scale and shape than the other five segments. A reseller with a few hundred WHMCS clients can be running thousands of monitors, all of them provisioned programmatically as new accounts come online, with per-client status pages and branded alert emails the end customer sees in their support ticket.&lt;/p&gt;

&lt;p&gt;What hosting resellers actually need: an API for programmatic monitor provisioning per client (one HTTP API call from your WHMCS provisioning hook per new account), white-label status pages, multi-client dashboard with per-client grouping, low per-monitor cost (resellers monitor at scale, hundreds to thousands of monitors, where per-monitor metered pricing breaks the unit economics), blacklist monitoring for shared-hosting IP reputation (your end customer's deliverability suffers if your IP block ends up on a major DNSBL), and ideally a WHMCS marketplace module for the official integration path.&lt;/p&gt;

&lt;p&gt;Velprove Pro caps at 100 monitors. The API is calibrated for single-account use, not for bulk programmatic provisioning across hundreds of WHMCS accounts. Blacklist monitoring across 1,000+ RBLs is not part of the Velprove product surface. A reseller with 30 or more clients outgrows Velprove fast, and the API shape was not the design target.&lt;/p&gt;

&lt;p&gt;** Recommended pick: &lt;a href="https://velprove.com/blog/hetrixtools-alternative" rel="noopener noreferrer"&gt;HetrixTools is the pick for high-volume reseller monitoring&lt;/a&gt; ** , starting at $9.95 per month for the entry tier and scaling to $49.95 per month for 200 monitors, with monitoring across 1,000+ blacklists, a reseller-friendly REST API, and 12 monitoring locations. For resellers who want the official WHMCS module path, 360 Monitoring via WHMCS MarketConnect handles automated provisioning and single sign-on from the client area. For agency-style resellers who specifically want client-facing branded status pages on custom domains, Hyperping Business at $249 per month annual covers 1,000 monitors with the agency feature set.&lt;/p&gt;

&lt;h2&gt;
  
  
  The 15-vendor matrix
&lt;/h2&gt;

&lt;p&gt;The full comparison across the seven axes that decide the answer: free tier monitor count, lowest paid floor (monthly), browser login monitor on free, multi-step API on free, status pages on free, probe regions, and best-fit segment. Pricing as of 2026-05-06. Vendor pricing changes frequently; the linked vendor page or our dedicated comparison post has the current floor.&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Vendor&lt;/th&gt;
&lt;th&gt;Free tier&lt;/th&gt;
&lt;th&gt;Lowest paid (monthly)&lt;/th&gt;
&lt;th&gt;Browser monitor on free&lt;/th&gt;
&lt;th&gt;Multi-step API on free&lt;/th&gt;
&lt;th&gt;Status pages on free&lt;/th&gt;
&lt;th&gt;Probe regions&lt;/th&gt;
&lt;th&gt;Best-fit segment&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Velprove&lt;/td&gt;
&lt;td&gt;10 monitors, 5-min interval&lt;/td&gt;
&lt;td&gt;Starter $19, Pro $49&lt;/td&gt;
&lt;td&gt;Yes, 1 monitor, no code&lt;/td&gt;
&lt;td&gt;Yes, up to 3 steps&lt;/td&gt;
&lt;td&gt;1 (Velprove badge)&lt;/td&gt;
&lt;td&gt;5 on every plan&lt;/td&gt;
&lt;td&gt;Solo founder, ecommerce, early SaaS&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;UptimeRobot&lt;/td&gt;
&lt;td&gt;50 monitors (personal use only since late 2024)&lt;/td&gt;
&lt;td&gt;Solo $9 annual ($10 monthly)&lt;/td&gt;
&lt;td&gt;No&lt;/td&gt;
&lt;td&gt;No&lt;/td&gt;
&lt;td&gt;1 basic&lt;/td&gt;
&lt;td&gt;13&lt;/td&gt;
&lt;td&gt;Personal/non-commercial side projects&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Better Stack&lt;/td&gt;
&lt;td&gt;10 monitors, 1 status page&lt;/td&gt;
&lt;td&gt;Responder $34 ($29 annual)&lt;/td&gt;
&lt;td&gt;No (Playwright metered on paid)&lt;/td&gt;
&lt;td&gt;No&lt;/td&gt;
&lt;td&gt;1&lt;/td&gt;
&lt;td&gt;Multi-region (count not surfaced)&lt;/td&gt;
&lt;td&gt;Incident management bundling&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Hyperping&lt;/td&gt;
&lt;td&gt;20 monitors, 5-min interval&lt;/td&gt;
&lt;td&gt;Essentials $24 annual ($29 monthly)&lt;/td&gt;
&lt;td&gt;No (3 on $24 Essentials)&lt;/td&gt;
&lt;td&gt;Not specified&lt;/td&gt;
&lt;td&gt;1 basic&lt;/td&gt;
&lt;td&gt;Multi-region (count not surfaced)&lt;/td&gt;
&lt;td&gt;Agency at scale, white-label client status pages&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Pingdom&lt;/td&gt;
&lt;td&gt;None (14-day trial)&lt;/td&gt;
&lt;td&gt;Roughly $10 to $15 (10 monitors)&lt;/td&gt;
&lt;td&gt;No&lt;/td&gt;
&lt;td&gt;No (transaction recorder on paid)&lt;/td&gt;
&lt;td&gt;None on free&lt;/td&gt;
&lt;td&gt;70+ probes&lt;/td&gt;
&lt;td&gt;Enterprise RUM and waterfall analytics&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Site24x7&lt;/td&gt;
&lt;td&gt;50 resources, email-only&lt;/td&gt;
&lt;td&gt;Web Uptime $9 annual ($10 monthly)&lt;/td&gt;
&lt;td&gt;No&lt;/td&gt;
&lt;td&gt;Not enumerated&lt;/td&gt;
&lt;td&gt;3 on $9 Web Uptime&lt;/td&gt;
&lt;td&gt;130+ pool, per-plan gated&lt;/td&gt;
&lt;td&gt;All-in-one observability (APM, RUM, logs)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;StatusCake&lt;/td&gt;
&lt;td&gt;10 monitors, 5-min interval&lt;/td&gt;
&lt;td&gt;Roughly $20 to $23 USD (currency varies)&lt;/td&gt;
&lt;td&gt;No&lt;/td&gt;
&lt;td&gt;No&lt;/td&gt;
&lt;td&gt;1 on free&lt;/td&gt;
&lt;td&gt;43 in 30 countries&lt;/td&gt;
&lt;td&gt;Free-for-life broad protocol coverage&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Pulsetic&lt;/td&gt;
&lt;td&gt;10 monitors, 5-min interval&lt;/td&gt;
&lt;td&gt;Solo $9 (Team $19, Org $49)&lt;/td&gt;
&lt;td&gt;No&lt;/td&gt;
&lt;td&gt;No&lt;/td&gt;
&lt;td&gt;3 on free&lt;/td&gt;
&lt;td&gt;3 on free, 15 on Team&lt;/td&gt;
&lt;td&gt;Status page count on free, 30-sec on $19&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Checkly&lt;/td&gt;
&lt;td&gt;5 browser checks (Playwright required)&lt;/td&gt;
&lt;td&gt;Team approximately $64&lt;/td&gt;
&lt;td&gt;Yes, but Playwright scripts required&lt;/td&gt;
&lt;td&gt;Yes (Playwright)&lt;/td&gt;
&lt;td&gt;Paid-tier dependent&lt;/td&gt;
&lt;td&gt;20+ public&lt;/td&gt;
&lt;td&gt;Code-first synthetic for engineering teams&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Freshping&lt;/td&gt;
&lt;td&gt;Shut down 2026-03-06&lt;/td&gt;
&lt;td&gt;n/a&lt;/td&gt;
&lt;td&gt;n/a&lt;/td&gt;
&lt;td&gt;n/a&lt;/td&gt;
&lt;td&gt;n/a&lt;/td&gt;
&lt;td&gt;n/a&lt;/td&gt;
&lt;td&gt;Migration: see Freshping vs Velprove breakdown&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;HetrixTools&lt;/td&gt;
&lt;td&gt;15 monitors, 1-min intervals&lt;/td&gt;
&lt;td&gt;$9.95 (up to $49.95 for 200 monitors)&lt;/td&gt;
&lt;td&gt;No&lt;/td&gt;
&lt;td&gt;No&lt;/td&gt;
&lt;td&gt;1 public&lt;/td&gt;
&lt;td&gt;12&lt;/td&gt;
&lt;td&gt;Hosting resellers, blacklist monitoring&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Cronitor&lt;/td&gt;
&lt;td&gt;5 monitors, 5-min interval (Hacker)&lt;/td&gt;
&lt;td&gt;Approximately $2 per monitor metered&lt;/td&gt;
&lt;td&gt;No&lt;/td&gt;
&lt;td&gt;No&lt;/td&gt;
&lt;td&gt;1 basic&lt;/td&gt;
&lt;td&gt;12+ across 5 continents&lt;/td&gt;
&lt;td&gt;Cron and scheduled-task monitoring&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Oh Dear&lt;/td&gt;
&lt;td&gt;None (10-day trial)&lt;/td&gt;
&lt;td&gt;EUR 15 (approximately USD 16)&lt;/td&gt;
&lt;td&gt;No (AI assertion layer, not browser)&lt;/td&gt;
&lt;td&gt;Not advertised&lt;/td&gt;
&lt;td&gt;All plans (configurable)&lt;/td&gt;
&lt;td&gt;14&lt;/td&gt;
&lt;td&gt;DNS change and DNSBL monitoring&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Uptime.com&lt;/td&gt;
&lt;td&gt;None (14-day trial)&lt;/td&gt;
&lt;td&gt;$7 annual ($9 monthly)&lt;/td&gt;
&lt;td&gt;No (transaction monitoring is paid)&lt;/td&gt;
&lt;td&gt;Yes (paid only)&lt;/td&gt;
&lt;td&gt;$19 module add-on&lt;/td&gt;
&lt;td&gt;80+&lt;/td&gt;
&lt;td&gt;SOC 2, Terraform, enterprise procurement&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Uptime Kuma&lt;/td&gt;
&lt;td&gt;Open source self-hosted&lt;/td&gt;
&lt;td&gt;$0 software + VPS cost&lt;/td&gt;
&lt;td&gt;No (HTTP, TCP, ping only)&lt;/td&gt;
&lt;td&gt;No&lt;/td&gt;
&lt;td&gt;Self-hosted&lt;/td&gt;
&lt;td&gt;1 (your server)&lt;/td&gt;
&lt;td&gt;Privacy-regulated, deep DevOps capacity&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;Two takeaways the matrix makes obvious. First, of the 15 vendors listed, only Velprove and Checkly include a browser-based monitor on the free tier, and Checkly requires writing Playwright scripts. Velprove is point-and-click. If you need to verify a logged-in flow without writing code, that narrows the field to one vendor on free. Second, Velprove is calibrated for monitor counts under 100 and the three segments where browser login plus multi-step API plus five regions covers the use case. Above that scale or for procurement compliance, the matrix routes you to a better-fit vendor and we have linked them above. &lt;a href="https://velprove.com/blog/statuscake-alternative" rel="noopener noreferrer"&gt;StatusCake's free-for-life plan is the broadest free protocol coverage&lt;/a&gt; if that axis matters more than the others. &lt;a href="https://velprove.com/blog/cronitor-alternative" rel="noopener noreferrer"&gt;Cronitor&lt;/a&gt; is the cleanest fit if cron-job and scheduled-task monitoring is your primary axis, &lt;a href="https://velprove.com/blog/oh-dear-alternative" rel="noopener noreferrer"&gt;Oh Dear&lt;/a&gt; wins on DNS change and DNSBL monitoring, and our &lt;a href="https://velprove.com/blog/freshping-vs-velprove" rel="noopener noreferrer"&gt;Freshping vs Velprove migration guide&lt;/a&gt; covers the post-shutdown export window.&lt;/p&gt;

&lt;h2&gt;
  
  
  When to pick a self-hosted option (Uptime Kuma)
&lt;/h2&gt;

&lt;p&gt;The hosted-versus-self-hosted decision is real and worth treating honestly. Uptime Kuma is open source, MIT-licensed, and runs on a $5 to $20 per month VPS. The software cost is zero. The trade-off is that you maintain it. You patch the host, you handle the database backups, you keep the alerting reliable when the VPS itself reboots, and your monitor goes down with the rest of your infrastructure if your provider has an outage.&lt;/p&gt;

&lt;p&gt;When self-hosted is the right answer: privacy-regulated environments where monitoring data cannot leave your infrastructure for compliance reasons, hobbyist or learning use where the maintenance itself is the point, and teams with already-deep DevOps capacity who treat the monitor as one more service in their stack.&lt;/p&gt;

&lt;p&gt;When self-hosted is the wrong answer: any commercial site where the monitor going down with the rest of your infrastructure is unacceptable. The whole point of independent monitoring is that it runs from outside your network so you find out when your network is the problem. A self-hosted monitor on the same VPS as your app fails that test. &lt;a href="https://velprove.com/blog/uptime-kuma-vs-hosted-monitoring" rel="noopener noreferrer"&gt;The full self-hosted vs hosted comparison&lt;/a&gt; covers the trade-offs in detail.&lt;/p&gt;

&lt;h2&gt;
  
  
  Frequently Asked Questions
&lt;/h2&gt;

&lt;h3&gt;
  
  
  What's the best uptime monitoring tool in 2026?
&lt;/h3&gt;

&lt;p&gt;It depends on your use case. For solo founders, ecommerce stores, and early-stage SaaS teams, Velprove's free plan covers HTTP monitors across 5 regions, a browser login monitor, multi-step API monitors, and a public status page at no cost with no credit card required. For agencies monitoring more than 25 client sites, Hyperping is built for that use case at scale. For SaaS teams with SOC 2 or Terraform requirements, Uptime.com and Site24x7 are calibrated for enterprise procurement. For hosting resellers, HetrixTools wins on bulk-provisioning API and blacklist monitoring. Velprove paid tiers start at Starter $19 and Pro $49 per month for teams that outgrow the free plan.&lt;/p&gt;

&lt;h3&gt;
  
  
  What's the best free uptime monitor in 2026?
&lt;/h3&gt;

&lt;p&gt;Velprove Free is the most generous plan if you need browser-based login monitoring (1 browser login monitor at 15-minute interval) without writing code. UptimeRobot Free has the highest raw monitor count at 50, but it has been personal-use-only since late 2024, so it's not usable for any commercial site. Hyperping Free now includes 20 monitors as of 2026 but no browser checks. &lt;a href="https://velprove.com/blog/pulsetic-alternative" rel="noopener noreferrer"&gt;Pulsetic&lt;/a&gt; Free includes 3 status pages, the most on any free tier. Checkly Free includes 5 browser checks but requires writing Playwright scripts.&lt;/p&gt;

&lt;h3&gt;
  
  
  Should I use UptimeRobot or look at alternatives?
&lt;/h3&gt;

&lt;p&gt;UptimeRobot Free has been personal-use-only since late 2024. If your site earns money or represents a business, you cannot use UptimeRobot Free without violating the Terms of Service. Velprove Free includes 10 monitors, a browser login monitor, and commercial use at no cost. UptimeRobot's paid tiers (Solo $9 annual, Team $38 annual, Enterprise $69 annual) remain commercial-use-allowed. The full UptimeRobot commercial-use breakdown is covered in our dedicated comparison.&lt;/p&gt;

&lt;h3&gt;
  
  
  What's the cheapest uptime monitoring tool?
&lt;/h3&gt;

&lt;p&gt;Free is the cheapest, and several vendors offer commercial-use-allowed free tiers. Velprove Free, Pulsetic Free, StatusCake Free, HetrixTools Free, Hyperping Free, and Better Stack Free all permit commercial use. Among paid floors, Uptime.com Website Monitoring is $7 per month annual, Pulsetic Solo is $9, UptimeRobot Solo is $9 annual, Site24x7 Web Uptime is $9 annual, HetrixTools Bronze is $9.95, Hyperping Essentials is $24 annual, Better Stack Responder is $29 annual, and Velprove Starter is $19 monthly. Cheapest is rarely the best fit; the matrix above shows what each price actually buys you.&lt;/p&gt;

&lt;h3&gt;
  
  
  What uptime monitor should I use for my SaaS?
&lt;/h3&gt;

&lt;p&gt;For early-stage SaaS that needs browser login monitoring on the actual login page, multi-step API monitoring for OAuth flows, regional probes, on-call alerting via PagerDuty, and a customer-facing status page, Velprove Pro at $49 per month is the typical recommendation. For SaaS teams with SOC 2 Type II, SAML SSO, Terraform provider, or RUM in their procurement requirements, Uptime.com or Site24x7 are calibrated for that compliance bar. Velprove does not include SOC 2, SAML, Terraform, or RUM today.&lt;/p&gt;

&lt;h3&gt;
  
  
  Is Velprove better than Better Stack or Uptime.com?
&lt;/h3&gt;

&lt;p&gt;Better and worse depend on use case. Velprove Free includes a browser login monitor; Better Stack Free does not. Velprove is point-and-click; Better Stack Responder at $34 per month bundles incident management, on-call scheduling, and log aggregation that Velprove does not include. Uptime.com includes SOC 2 Type II, Terraform, and 80+ probe locations starting at $7 per month annual; Velprove does not. For solo founders and early-stage SaaS, Velprove typically wins on cost-to-feature ratio. For incident management bundling, Better Stack typically wins. For enterprise procurement compliance, Uptime.com typically wins.&lt;/p&gt;

&lt;h3&gt;
  
  
  What monitor should an agency use for client sites?
&lt;/h3&gt;

&lt;p&gt;For agencies under 10 client sites, Velprove Pro at $49 per month covers white-label status pages, custom domains, multi-step API, and browser login monitors at a flat fee. For agencies above 25 client sites, Hyperping Pro at $74 per month annual or Business at $249 per month annual is built for the agency use case at scale, with white-label client status pages on custom domains, per-client access controls, and SAML SSO. The math: 50 client sites times 3 to 5 monitors per client equals 150 to 250 monitors, which exceeds Velprove Pro's 100-monitor cap.&lt;/p&gt;

&lt;p&gt;If your use case fits one of the three Velprove-recommended segments above (solo founder, ecommerce, or early-stage SaaS without enterprise procurement compliance requirements), &lt;a href="https://velprove.com/signup" rel="noopener noreferrer"&gt;start a free Velprove account&lt;/a&gt; . The free plan includes 10 HTTP monitors across 5 global regions, a browser login monitor for your account or checkout flow, multi-step API monitors up to 3 steps, and a public status page. No credit card required. If your use case fits one of the other three segments, the linked sibling comparison above is the right starting point instead.&lt;/p&gt;

</description>
      <category>monitoring</category>
      <category>webdev</category>
      <category>devops</category>
      <category>uptime</category>
    </item>
    <item>
      <title>Browser Monitor vs HTTP Monitor: A Decision Tree</title>
      <dc:creator>velprove</dc:creator>
      <pubDate>Thu, 07 May 2026 14:00:03 +0000</pubDate>
      <link>https://dev.to/velprove/browser-monitor-vs-http-monitor-a-decision-tree-488o</link>
      <guid>https://dev.to/velprove/browser-monitor-vs-http-monitor-a-decision-tree-488o</guid>
      <description>&lt;p&gt;** Plain answer: HTTP monitors check that a server returns a status code. Browser login monitors run the page in real Chromium, fill the form, and confirm the dashboard actually loads. If your product has a login, a checkout, or any JavaScript-rendered surface, HTTP alone will return 200 while users see a broken screen. If your site is static marketing HTML, HTTP is enough. Velprove and Checkly are the only vendors that include a real browser monitor on the free tier. Better Stack, Datadog, Pingdom, and UptimeRobot all gate it behind paid plans. The seven questions below decide which one you need. **&lt;/p&gt;

&lt;h2&gt;
  
  
  What HTTP monitors actually see (and miss)
&lt;/h2&gt;

&lt;p&gt;An HTTP monitor opens a TCP connection, sends a request, and reads the response code, the headers, and optionally a slice of the response body. That is the entire surface. Per the &lt;a href="https://www.uptrends.com/support/kb/monitor-settings/basic-webpage-checks-versus-real-browser-checks" rel="noopener noreferrer"&gt;Uptrends knowledge base on basic webpage versus real browser checks&lt;/a&gt; , a basic HTTP check only retrieves the initial response. It never executes JavaScript, never fetches stylesheets, never loads the third-party scripts your page depends on, and never touches an iframe. The monitor sees what curl sees, not what a person sees.&lt;/p&gt;

&lt;p&gt;That gap is where most modern outages live. A login page that depends on Auth0, Clerk, or Supabase can return HTTP 200 while the auth widget never finishes loading. A React dashboard can return 200 with an empty HTML shell while client-side hydration fails silently. A Stripe checkout can return 200 while the payment iframe times out against a third-party CDN. In each case the origin is healthy, the status code is healthy, and the user is staring at a broken screen.&lt;/p&gt;

&lt;p&gt;HTTP monitors still earn their spot. They are cheap, they are fast, they catch every outage where the origin server is genuinely down, and they give you sub-minute breadth across dozens of pages at low cost. The point is not that HTTP is broken. The point is that HTTP checks one layer of a stack that has at least three. If your product lives in the upper layers, HTTP alone will let many client-side failures through, and most client-side failures translate directly into customers who cannot complete the action they came for.&lt;/p&gt;

&lt;h2&gt;
  
  
  What browser login monitors actually do
&lt;/h2&gt;

&lt;p&gt;A browser login monitor runs your page inside a real headless Chromium instance. It loads the URL, waits for the JavaScript to execute, fills the username and password fields, clicks the submit button, and then asserts that the post-login surface actually rendered. If the dashboard text appears, the monitor reports up. If the page hangs, throws, or returns the user to the login screen, the monitor reports down with a screenshot of what it saw.&lt;/p&gt;

&lt;p&gt;That shape catches a class of failure HTTP cannot reach. Hydration errors that leave a blank React tree. Third-party script outages that block render. Payment iframes that fail to mount. Login services that return 200 with an error JSON body the UI misreads. Sessions that authenticate but never reach the dashboard because a downstream API call returns 500.&lt;/p&gt;

&lt;p&gt;Velprove specifically runs the page in headless Chromium from our 5 regions (NA, EU, UK, Asia, OCE), captures a screenshot when the monitor fails, and supports username plus password login flows. It does not capture Core Web Vitals or LCP. It does not run on real customer browsers. It does not record flows from a Chrome extension. It does not support OAuth or 2FA login paths. The scope is deliberate: assert that a known credentialed login still leads to a rendered dashboard, every few minutes, from multiple regions.&lt;/p&gt;

&lt;h2&gt;
  
  
  The decision tree (seven binary questions)
&lt;/h2&gt;

&lt;p&gt;The cheapest way to decide is to walk seven binary questions in order. Each one is grounded in a real failure mode rather than a theoretical one. If you want a wider lens that compares vendors as well as monitor types, our &lt;a href="https://velprove.com/blog/choose-uptime-monitoring-tool-2026" rel="noopener noreferrer"&gt;decision framework comparing 15 uptime monitors&lt;/a&gt; covers the vendor side. The seven questions below cover the monitor-type side.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Does your product require a login or session?&lt;/strong&gt; Yes means you need a browser login monitor. HTTP checks return 200 on a login page that fails when JavaScript-driven auth (Auth0, Clerk, Supabase, custom session middleware) breaks. The Xbox Live July 2024 sign-in outage and the Meta March 2024 login disruption both look like this from the outside. ** Do you take payments via embedded checkout (Stripe.js, Shopify, PayPal Smart Buttons)? ** Yes means you need a browser login monitor. The Stripe.js library, the Shopify Buy Button, and PayPal Smart Buttons all load client-side. HTTP cannot detect a healthy 200 page where the payment iframe failed to render or a third-party script timed out before mount. ** Is your site primarily JavaScript-rendered (React, Vue, Next hydrated, single-page app)? ** Yes means you need a browser monitor. HTTP fetches the HTML shell. If hydration fails, the page is blank to the user, and the response code is still 200. A keyword assertion against the shell text passes while the page is empty. ** Do you depend on third-party scripts in the critical path (analytics that block render, A/B testing tools, an embedded support chat that gates conversion, OAuth widgets)? ** Yes means you need a browser monitor. The Cloudflare June 2024 and September 2024 incidents broke downstream services whose own origins returned healthy. A browser monitor sees the actual third-party render path; an HTTP monitor on your own origin sees nothing wrong. ** Is the surface a static marketing site (Hugo, Jekyll, plain HTML, a Next.js statically-exported landing page)? ** Yes means HTTP is enough. There is no JavaScript-critical path and no login. An HTTP monitor with a content keyword assertion covers the failure modes that matter at low cost. ** Is your product server-to-server APIs only, with no human browser user? ** Yes means a multi-step API monitor is the right tool, and a browser monitor adds no signal. There is no DOM to assert against. The right test is a chained API call that proves the auth response, the token reuse, and the expected response body shape. ** Do you need to verify post-login behavior, meaning that the dashboard rendered correctly after authentication succeeded? ** Yes means you need a multi-step browser monitor that asserts on dashboard content, not just on the login response. This is meaningfully different from question one. Question one asks if you have a login at all. Question seven asks whether you need to detect a different failure: authentication succeeds (the JWT call returns 200), but the React dashboard fails to hydrate. The user lands on a blank page behind a successful login. That shape is one of the most common partial outages in modern SaaS, and it is invisible to both an HTTP check on the dashboard URL and a single-step browser check that stops at the login redirect.&lt;/p&gt;

&lt;p&gt;The summary is short. Any "yes" on questions one through four or on question seven means a browser login monitor is on the shopping list. A "yes" on question five points to HTTP-only. A "yes" on question six points to multi-step API monitoring. Most production products land on multiple yes answers and need both an HTTP layer for breadth and a browser layer for depth, which is the topic of a later section.&lt;/p&gt;

&lt;h2&gt;
  
  
  Real outages where HTTP returned 200 but users couldn't log in
&lt;/h2&gt;

&lt;p&gt;The failure mode is not theoretical. Two real incidents in 2024 prove it. On March 5, 2024, Meta's login services broke for hours. Per &lt;a href="https://variety.com/2024/digital/news/facebook-down-users-report-problems-including-getting-logged-out-1235930538/" rel="noopener noreferrer"&gt;Variety's reporting on the Meta March 2024 outage&lt;/a&gt; , thousands of Facebook and Instagram users were logged out and unable to sign back in. The marketing pages and the public surface were largely reachable. The login flow itself was the failure. An HTTP monitor on facebook.com would have shown green throughout. A browser login monitor that drives a real login flow would have gone red within one cycle.&lt;/p&gt;

&lt;p&gt;On July 2, 2024, Xbox Live sign-in broke across PC, console, and cloud. Users could load Xbox web pages but could not authenticate their accounts. The pattern was the same: surface healthy, auth flow broken, status codes returning normally on the public pages. See &lt;a href="https://velprove.com/blog/why-uptime-monitors-miss-outages" rel="noopener noreferrer"&gt;why uptime monitors miss real outages&lt;/a&gt; for the full pattern.&lt;/p&gt;

&lt;p&gt;Third-party dependency failures show the same shape from a different angle. The &lt;a href="https://blog.cloudflare.com/cloudflare-incident-on-june-20-2024/" rel="noopener noreferrer"&gt;Cloudflare June 20, 2024 incident post-mortem&lt;/a&gt; documents 114 minutes of elevated latency and error rates that broke downstream services whose own origin servers returned healthy all along. If your site loads a Cloudflare-hosted script in the critical render path and that script fails to load, your HTTP monitor on your own origin reports green while real users see a half-rendered page.&lt;/p&gt;

&lt;p&gt;A second Cloudflare incident on September 17, 2024 produced the same downstream pattern. Sites that depended on Cloudflare-routed scripts in the critical render path saw user-visible failures while their own origin HTTP monitors stayed green.&lt;/p&gt;

&lt;p&gt;The cost of those misses is documented. The &lt;a href="https://itic-corp.com/itic-2024-hourly-cost-of-downtime-report/" rel="noopener noreferrer"&gt;ITIC 2024 Hourly Cost of Downtime Survey of more than 1,000 firms&lt;/a&gt; found that the average cost of a single hour of downtime exceeds $300,000 for over 90% of mid-size and large enterprises, and that 41% of large enterprises (over 1,000 employees) report hourly costs between $1 million and $5 million or more. The often-cited $5,600-per-minute figure originates from a 2014 Gartner number and is no longer a current anchor. The ITIC 2024 number is the one to quote in 2026.&lt;/p&gt;

&lt;p&gt;Status pages do not close that gap reliably. Community data tracked by aggregators such as IsDown suggests vendor status pages can lag third-party detection by an hour or more on real incidents. A browser monitor running on your own login path, from your own account, every few minutes, is the only signal that responds to your reality rather than the vendor's communications cadence.&lt;/p&gt;

&lt;h2&gt;
  
  
  Pricing reality across the market
&lt;/h2&gt;

&lt;p&gt;Browser monitoring is more expensive than HTTP monitoring. As of May 2026, vendors price browser monitors roughly 5 to 10 times higher than HTTP monitors because they consume real Chromium instances rather than single TCP connections. Where the market diverges sharply is whether browser monitoring is available at all on a free tier.&lt;/p&gt;

&lt;p&gt;Velprove and Checkly are the only vendors that include a real browser monitor on the free plan as of May 2026. Velprove Free gives one browser login monitor at a 15-minute interval, with no credit card required. Checkly Hobby includes 1,500 browser runs per month against a metered model, also free. Better Stack, Datadog, Pingdom, and UptimeRobot all gate browser monitoring behind paid tiers. UptimeRobot does not offer real browser synthetic monitoring at any tier; it covers HTTP, keyword, and port checks only.&lt;/p&gt;

&lt;p&gt;The Velprove plan ladder is straightforward. Free at $0 includes 10 monitors, 5-minute HTTP intervals, 1 browser login monitor at 15-minute cadence, 3-step API monitors, email alerts, and 5 regions. Starter at $19 per month moves to 25 monitors, 1-minute HTTP intervals, 3 browser login monitors at 10-minute cadence, 5-step API monitors, and adds Slack, Discord, Teams, and webhook alerts. Pro at $49 per month moves to 100 monitors, 30-second intervals, 10 browser login monitors at 5-minute cadence, 10-step API monitors, and adds PagerDuty.&lt;/p&gt;

&lt;p&gt;Other vendors land in different places. Better Stack's lowest tier with browser monitoring starts around $29 per month with a 5-monitor cap, which is why our &lt;a href="https://velprove.com/blog/better-stack-alternative" rel="noopener noreferrer"&gt;Velprove vs Better Stack&lt;/a&gt; comparison flags the paid-only browser tier as the central swap reason for cost-sensitive teams. Checkly Starter at $24 per month increases the browser run quota to 3,000 per month with $6.50 per 1,000 in overage; the cost question between Checkly and Velprove comes down to whether a metered run-quota or a flat monitor-with-cadence model fits your workload, which our &lt;a href="https://velprove.com/blog/checkly-alternative" rel="noopener noreferrer"&gt;Velprove vs Checkly comparison&lt;/a&gt; breaks down concretely. Datadog Synthetic Monitoring runs roughly $12 per 1,000 browser test runs on annual commit and $15 to $18 on monthly. Pingdom starts around $10 per month for one advanced synthetic check.&lt;/p&gt;

&lt;p&gt;The pricing reality matters because the right monitor type can flip the right vendor. A team that needs one browser login monitor for a SaaS sign-in pays $0 with Velprove, $0 with Checkly (if 1,500 runs per month is enough), and $29 or more anywhere else. The seven questions above decide if you need a browser monitor at all. Once you do, the free-tier shortlist is two vendors long.&lt;/p&gt;

&lt;h2&gt;
  
  
  When you need both (and how to combine them)
&lt;/h2&gt;

&lt;p&gt;Most production sites need both layers. HTTP monitoring gives you breadth across many URLs at high frequency and low cost. Browser monitoring gives you depth on the surfaces that matter most. The two are complements, not substitutes.&lt;/p&gt;

&lt;p&gt;A practical setup for a typical SaaS looks like this. HTTP monitors at 1-minute cadence on roughly 20 critical pages, including the marketing homepage, the pricing page, the docs landing page, the login page itself (just to detect a hard origin outage), the public API base URL, and any high-traffic blog posts. Plus one browser login monitor that drives the actual login flow and asserts the dashboard rendered. Plus one &lt;a href="https://velprove.com/blog/multi-step-api-monitoring-guide" rel="noopener noreferrer"&gt;multi-step API monitoring&lt;/a&gt; flow that proves the auth-token-call-response chain for any customer integration that depends on your API.&lt;/p&gt;

&lt;p&gt;The same logic applies to specific platforms. For SaaS-shaped products, our guide on how to &lt;a href="https://velprove.com/blog/monitor-saas-login-page" rel="noopener noreferrer"&gt;monitor a SaaS login page&lt;/a&gt; walks through credential storage, region selection, and the dashboard-render assertion. For WordPress, our guide on how to &lt;a href="https://velprove.com/blog/monitor-wordpress-login" rel="noopener noreferrer"&gt;monitor a WordPress login&lt;/a&gt; walks through the wp-login.php and wp-admin assertion path with a dedicated low-privilege test account, and the broader &lt;a href="https://velprove.com/blog/wordpress-uptime-monitoring-guide-2026" rel="noopener noreferrer"&gt;WordPress uptime monitoring guide&lt;/a&gt; covers the full WordPress monitoring stack from solo blog through WooCommerce to agency.&lt;/p&gt;

&lt;p&gt;The right ratio is roughly 10 to 20 HTTP monitors for every 1 browser login monitor on a typical SaaS. The HTTP monitors give you fast detection of origin outages. The browser monitor gives you the truth about whether the customer-facing flow actually works.&lt;/p&gt;

&lt;h2&gt;
  
  
  Frequently Asked Questions
&lt;/h2&gt;

&lt;h3&gt;
  
  
  What is the difference between HTTP and synthetic monitoring?
&lt;/h3&gt;

&lt;p&gt;HTTP monitoring fetches a URL and checks the response code, headers, and optionally a content keyword. Synthetic browser monitoring runs the page in real headless Chromium, executes JavaScript, fills login forms, and asserts that post-login content actually rendered. HTTP catches origin outages. Browser monitors catch JavaScript bundle errors, third-party script failures, hydration issues, and payment iframes that fail to load. Velprove offers both.&lt;/p&gt;

&lt;h3&gt;
  
  
  Do I need synthetic monitoring if I already have HTTP checks?
&lt;/h3&gt;

&lt;p&gt;HTTP checks alone are not enough if you have a sign-in flow, an embedded Stripe checkout, a JavaScript-rendered dashboard, or critical third-party scripts. HTTP cannot see those client-side failures. HTTP is enough only if your site is static HTML or a server-rendered marketing page with no login or checkout. Real outages at Xbox Live (July 2024) and Meta (March 2024) broke login flows while origin servers stayed healthy. Add one browser login monitor for the critical flow.&lt;/p&gt;

&lt;h3&gt;
  
  
  Is API monitoring enough?
&lt;/h3&gt;

&lt;p&gt;For server-to-server APIs with no human user, yes. Multi-step API monitoring asserts the auth call returns a token, the next call uses that token, and the response body matches an expected shape. Velprove supports 3-step API monitoring on Free, 5 on Starter, 10 on Pro. If a human ever logs into a UI built on top of those APIs, you also need a browser monitor. The login can succeed at the API layer while the dashboard fails to render.&lt;/p&gt;

&lt;h3&gt;
  
  
  Why do vendor status pages show green during outages?
&lt;/h3&gt;

&lt;p&gt;Vendor status pages are updated by humans on the vendor's incident-response team. Third-party monitoring detects outages within minutes. Community data tracked by aggregators like IsDown suggests vendor status pages can lag third-party detection by an hour or more. The Cloudflare June 20 and September 17 2024 incidents are documented examples. This is why a browser monitor running from your own account, on your own login, is more reliable than a status page subscription.&lt;/p&gt;

&lt;h3&gt;
  
  
  How much does browser monitoring cost?
&lt;/h3&gt;

&lt;p&gt;Velprove and Checkly are the only vendors that include a real browser monitor on the free tier as of May 2026. Velprove Free includes 1 browser login monitor at 15-minute intervals, no credit card required. Starter at $19 per month gives 3 browser monitors at 10-minute intervals. Pro at $49 per month gives 10 monitors at 5-minute intervals. Checkly Hobby is free with 1,500 browser runs per month. Better Stack starts at $29 per month. Datadog charges roughly $12 per 1,000 browser runs on annual commit. Pingdom and UptimeRobot do not include browser monitoring on free plans.&lt;/p&gt;

&lt;p&gt;If your answer to any of questions one, two, three, four, or seven was yes, &lt;a href="https://velprove.com/signup" rel="noopener noreferrer"&gt;start a free Velprove account&lt;/a&gt; . The free plan includes 10 HTTP monitors across 5 global regions, 1 browser login monitor at a 15-minute interval, and multi-step API monitors up to 3 steps. No credit card required.&lt;/p&gt;

</description>
      <category>monitoring</category>
      <category>webdev</category>
      <category>devops</category>
      <category>uptime</category>
    </item>
    <item>
      <title>Uptime.com Alternative: Free to Start, $9 Floor Beat</title>
      <dc:creator>velprove</dc:creator>
      <pubDate>Wed, 06 May 2026 14:00:03 +0000</pubDate>
      <link>https://dev.to/velprove/uptimecom-alternative-free-to-start-9-floor-beat-dg3</link>
      <guid>https://dev.to/velprove/uptimecom-alternative-free-to-start-9-floor-beat-dg3</guid>
      <description>&lt;p&gt;** The short version: Uptime.com starts at $9 a month, with no free tier. That $9 buys 10 basic checks plus exactly one advanced check, so you pick: transaction monitoring OR multi-step API OR page speed, not all three. If you want all three, the floor moves up. Velprove's free plan includes HTTP monitors, a browser login monitor, and a multi-step API monitor at $0. Uptime.com is the right answer if you need SOC 2, a Terraform provider, or 80+ probe locations on day one. Velprove is the right answer if you want to start for free and grow into paid plans on your own timeline. **&lt;/p&gt;

&lt;p&gt;Most readers find this post one of two ways. Either you got quoted Uptime.com's pricing page and the $9 monthly floor felt indie-friendly until you read the fine print, or your 14-day trial expired and you discovered there is no free tier waiting on the other side. Either way, the math on that $9 number is what brought you here, and the math is worth unpacking honestly.&lt;/p&gt;

&lt;p&gt;This post is not about whether Uptime.com is a good company. They are a serious mid-market monitoring vendor with SOC 2 Type II, an enterprise customer wall that includes Microsoft, IBM, and Salesforce, and a no-code transaction recorder reviewers genuinely love. The question is whether the product is calibrated for the shape of your team today, or whether you would be better served starting on a free plan and growing into paid on your own pace.&lt;/p&gt;

&lt;h2&gt;
  
  
  What Uptime.com is good at
&lt;/h2&gt;

&lt;p&gt;Uptime.com has been around since 2013 and has built a real mid-market monitoring product. Honest list of what they do well, and what Velprove does not match today.&lt;/p&gt;

&lt;p&gt;They have completed the AICPA Service Organization Control 2 Type II audit, confirmed on their Trust Center. If your procurement process requires SOC 2 documentation, that is a hard gate Velprove does not currently clear. Uptime.com also supports GDPR with data processing terms and cross-border transfer safeguards, and runs SAML 2.0 SSO with Okta, OneLogin, AWS, and Azure AD.&lt;/p&gt;

&lt;p&gt;The customer logo wall is real: Accenture, IBM, Kraft Heinz, Lending Tree, Microsoft, Palo Alto Networks, Salesforce, VMware, and Webflow are listed as customers. That is a serious enterprise lineup. The flagship advanced product is a no-code transaction recorder that drives a real browser through clicks, form fills, and screenshots. TechRadar called it one of the best site monitoring platforms they have tested. If your team needs to record shopping-cart and form-submission flows without writing code, Uptime.com's recorder UX is genuinely strong.&lt;/p&gt;

&lt;p&gt;They run from 80+ probe locations worldwide across North America, South America, Europe, Africa, Asia, and Oceania. They run a 1-minute minimum interval on the Premium tier, and they offer a Terraform provider for teams that want to manage monitor configs as infrastructure-as-code. If your team needs SOC 2 day one, Terraform-managed monitor configs, or probe coverage in 80+ countries, Uptime.com is calibrated for that. The reason you are reading this post is probably that you want to start before you commit to that price point.&lt;/p&gt;

&lt;h2&gt;
  
  
  The $9 floor and the advanced-check math
&lt;/h2&gt;

&lt;p&gt;The single most important fact about Uptime.com's pricing is that the live calculator on uptime.com/pricing starts at $9 per month on monthly billing or $7 per month on yearly billing for the base Website Monitoring product. There is no free tier. There is a 14-day free trial with no credit card required, and a 30-day money-back guarantee, and yearly billing saves over 20 percent. But after the trial, it is $9 a month or you lose access.&lt;/p&gt;

&lt;p&gt;The $9 floor includes 10 basic checks, 1 advanced check, 25 SMS alerts, email and SMS alerting, 20+ integrations, maintenance windows, customizable reporting, and unlimited group checks. Read the line about "1 advanced check" carefully. That is the catch.&lt;/p&gt;

&lt;p&gt;An advanced check on Uptime.com is one slot, and you spend it on either transaction monitoring (their no-code browser recorder) OR multi-step API monitoring OR page speed monitoring. You cannot run one of each at $9 a month. To monitor an authenticated checkout flow AND a multi-step API workflow at the same time, you cross into a higher preset tier. The legacy Starter preset starts at $20 a month with 1 transaction check, Essential is reported around $67 a month with 5 transaction checks, and Premium scales into the high hundreds with 15 transaction checks. Real-world numbers vary; the structural point is that the second advanced check is what moves you off the $9 floor.&lt;/p&gt;

&lt;p&gt;Status pages are a separate $19 per month module on the calculator. Real User Monitoring is a separate $5 per month module. Probe locations beyond what your tier includes run $1 per month each. So for an indie team that wants "uptime + status page + 1 advanced check" in production, the practical floor is $9 + $19 = $28 per month before adding regions or RUM.&lt;/p&gt;

&lt;h2&gt;
  
  
  What the Velprove free plan includes
&lt;/h2&gt;

&lt;p&gt;Velprove's free plan exists because monitoring is the kind of tool you should be able to evaluate against your actual production surface, not against a 14-day trial clock. The free plan is not a time-limited demo. It is a permanent tier. No credit card required. Commercial use allowed.&lt;/p&gt;

&lt;p&gt;The free plan includes 10 monitors at a 5-minute minimum interval, 1 browser login monitor on a 15-minute interval, multi-step API monitors with up to 3 steps, email alerts, SSL certificate monitoring, 30-day incident history, 1 Velprove-branded status page at velprove.com/status/your-page, and monitoring from 5 global regions (North America, Europe, UK, Asia, Oceania) on every plan including Free. That is a real production-shaped surface for a side project or an early-stage SaaS.&lt;/p&gt;

&lt;p&gt;The browser login monitor is the piece that closes the gap most comparison shoppers care about. Velprove drives a real browser to your login URL, fills in credentials from a dedicated low-privilege test account, follows the post-login redirect, asserts post-login state, and captures a screenshot when anything fails. Uptime.com offers this too, through their transaction recorder, but it counts against your single advanced check slot at the $9 floor. On Velprove it is a separate monitor type with its own quota.&lt;/p&gt;

&lt;p&gt;When you are ready to grow past the free tier, Starter at $19 per month is 25 monitors at a 1-minute interval, 3 browser login monitors at 10 minutes, multi-step API at 5 steps, and Slack, Discord, Teams, and Webhook alerts. Pro at $49 per month is 100 monitors at a 30-second interval, 10 browser login monitors at 5 minutes, multi-step API at 10 steps, PagerDuty, 3 status pages, and custom domain support. Three flat plans, no metered per-monitor charges, no surprise modular surcharges. If you are also cross-shopping incumbent monitors, our &lt;a href="https://velprove.com/blog/pingdom-alternative" rel="noopener noreferrer"&gt;Pingdom alternative&lt;/a&gt; covers that comparison.&lt;/p&gt;

&lt;h2&gt;
  
  
  Where Uptime.com beats Velprove
&lt;/h2&gt;

&lt;p&gt;Honesty section. Per the comparison rules at the top of this post, here is what Uptime.com has today that Velprove does not.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;SOC 2 Type II attestation.&lt;/strong&gt; Uptime.com has it. Velprove is not currently SOC 2 attested. If procurement gates on SOC 2, this is a hard stop. &lt;strong&gt;Terraform provider.&lt;/strong&gt; Uptime.com publishes an official Terraform provider that lets you manage checks as infrastructure-as-code. Velprove does not have a Terraform provider today. &lt;strong&gt;80+ probe locations.&lt;/strong&gt; Uptime.com runs from 80+ locations across six continents. Velprove runs from 5 global regions (North America, Europe, UK, Asia, Oceania). If you need probe coverage in Africa, South America, or specific Asian or European countries beyond our regions, Uptime.com's footprint is wider. &lt;strong&gt;Real User Monitoring.&lt;/strong&gt; Uptime.com offers a RUM product bundled with synthetic monitoring, with page-view-based pricing. Velprove is synthetic monitoring only. We do not offer RUM today. &lt;strong&gt;No-code transaction recorder.&lt;/strong&gt; Uptime.com's transaction recorder records your clicks and form fills directly in the browser. Velprove's browser login monitor is configured manually with selectors and assertions, not recorded. &lt;strong&gt;SAML SSO.&lt;/strong&gt; Uptime.com supports SAML 2.0 SSO with major identity providers. Velprove does not support SAML SSO today. &lt;strong&gt;Enterprise customer references.&lt;/strong&gt; Uptime.com's customer wall includes Microsoft, IBM, Salesforce, VMware, and Accenture. Velprove is calibrated for indie founders and small SaaS teams; the enterprise logo comparison is not the right one.&lt;/p&gt;

&lt;h2&gt;
  
  
  Uptime.com vs Velprove side by side
&lt;/h2&gt;

&lt;p&gt;Capabilities verified against Uptime.com's published documentation as of 2026-05-06 and Velprove's billing configuration. Highlighted rows are where the price point or feature gap is most material.&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Capability&lt;/th&gt;
&lt;th&gt;Velprove Free&lt;/th&gt;
&lt;th&gt;Uptime.com $9/mo floor&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Monthly cost&lt;/td&gt;
&lt;td&gt;$0&lt;/td&gt;
&lt;td&gt;$9 monthly / $7 yearly&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Free tier&lt;/td&gt;
&lt;td&gt;Yes&lt;/td&gt;
&lt;td&gt;No (14-day trial only)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;HTTP / basic checks&lt;/td&gt;
&lt;td&gt;10 monitors included&lt;/td&gt;
&lt;td&gt;10 basic checks included&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Browser login monitor&lt;/td&gt;
&lt;td&gt;1 included (15-min interval)&lt;/td&gt;
&lt;td&gt;Counts against the 1 advanced-check slot&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Multi-step API monitor&lt;/td&gt;
&lt;td&gt;Up to 3 steps included&lt;/td&gt;
&lt;td&gt;Counts against the 1 advanced-check slot&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Page speed monitoring&lt;/td&gt;
&lt;td&gt;No&lt;/td&gt;
&lt;td&gt;Counts against the 1 advanced-check slot&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Status page&lt;/td&gt;
&lt;td&gt;1 included (velprove.com/status/your-page)&lt;/td&gt;
&lt;td&gt;Separate $19/mo add-on&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Real User Monitoring&lt;/td&gt;
&lt;td&gt;No&lt;/td&gt;
&lt;td&gt;Separate $5/mo module&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;HTTP check interval floor&lt;/td&gt;
&lt;td&gt;5-min Free, 1-min Starter, 30-sec Pro&lt;/td&gt;
&lt;td&gt;5 to 10 minutes on entry tier&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Monitoring regions&lt;/td&gt;
&lt;td&gt;5 global regions, free on every plan&lt;/td&gt;
&lt;td&gt;80+ probe locations&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;SOC 2 Type II&lt;/td&gt;
&lt;td&gt;No&lt;/td&gt;
&lt;td&gt;Yes&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Terraform provider&lt;/td&gt;
&lt;td&gt;No&lt;/td&gt;
&lt;td&gt;Yes&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;SAML SSO&lt;/td&gt;
&lt;td&gt;No&lt;/td&gt;
&lt;td&gt;Yes&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Trial / free start&lt;/td&gt;
&lt;td&gt;Free plan, no credit card required&lt;/td&gt;
&lt;td&gt;14-day free trial, no credit card required&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;h2&gt;
  
  
  Who should stay on Uptime.com
&lt;/h2&gt;

&lt;p&gt;The honest do-not-switch decision tree. Stay on Uptime.com if any of the following are true for your team today.&lt;/p&gt;

&lt;p&gt;You need SOC 2 Type II compliance documented for an enterprise procurement process. You need a Terraform provider to manage monitor configs as infrastructure-as-code. You need probe coverage in 80+ countries, especially in Africa, South America, or parts of Asia where Velprove's 5 regions do not cover. You are already happy with the no-code transaction recorder UX and your transaction count is below your tier cap. You need RUM bundled with synthetic monitoring inside one tool and one bill. You need SAML SSO and prefer to keep it on the entry tier.&lt;/p&gt;

&lt;p&gt;Uptime.com is calibrated for mid-market and enterprise teams that have already cleared an enterprise procurement budget. If that is you, the $9 floor is the wrong number to anchor on; you would almost certainly land on Essential or Premium and the per-monitor economics start to make sense at that scale. If you are also cross-shopping enterprise platform tools, our &lt;a href="https://velprove.com/blog/site24x7-alternative" rel="noopener noreferrer"&gt;Site24x7 alternative&lt;/a&gt; covers a closely related comparison.&lt;/p&gt;

&lt;h2&gt;
  
  
  Migration playbook if you do switch
&lt;/h2&gt;

&lt;p&gt;The migration from Uptime.com to Velprove takes about 20 minutes if your account is mostly basic checks plus one or two advanced checks. Here is the step by step.&lt;/p&gt;

&lt;h3&gt;
  
  
  Step 1: Inventory your Uptime.com monitors
&lt;/h3&gt;

&lt;p&gt;Open your Uptime.com dashboard and list every monitor. Separate basic checks (HTTP, SSL, DNS, ping) from advanced checks (transaction, API, page speed, RUM). Note your current advanced check count and your plan cap. The split tells you what migrates one-to-one and what needs a Velprove plan upgrade.&lt;/p&gt;

&lt;h3&gt;
  
  
  Step 2: Sign up for a free Velprove account
&lt;/h3&gt;

&lt;p&gt;Head to &lt;a href="https://velprove.com/signup" rel="noopener noreferrer"&gt;velprove.com/signup&lt;/a&gt; and create a free account. No credit card required. You will be in your dashboard in about 30 seconds.&lt;/p&gt;

&lt;h3&gt;
  
  
  Step 3: Recreate your basic HTTP and SSL checks
&lt;/h3&gt;

&lt;p&gt;Recreate your Uptime.com basic checks one-to-one as Velprove HTTP monitors. SSL certificate monitoring is included on every Velprove plan, so you do not need a separate SSL check type. Set the interval to 5 minutes on Free, 1 minute on Starter, or 30 seconds on Pro depending on your plan.&lt;/p&gt;

&lt;h3&gt;
  
  
  Step 4: Recreate your transaction check as a browser login monitor
&lt;/h3&gt;

&lt;p&gt;For your Uptime.com transaction check, recreate the auth flow as a Velprove browser login monitor. The safest approach is to create a dedicated monitoring-only account with the minimum permissions needed to verify the flow, never to use real admin credentials. Velprove drives a real browser through the login URL, fills in the credentials, follows the post-login redirect, asserts post-login state, and captures a screenshot on failure.&lt;/p&gt;

&lt;h3&gt;
  
  
  Step 5: Recreate your multi-step API check
&lt;/h3&gt;

&lt;p&gt;For your Uptime.com API check, recreate the request chain as a Velprove multi-step API monitor with variable extraction between steps. Step 1 returns a token via JSONPath, step 2 references it as &lt;code&gt;{{token}}&lt;/code&gt; in the header, and the chain runs inside the monitor instead of in a script you maintain separately. Free supports 3 steps, Starter 5, Pro 10.&lt;/p&gt;

&lt;h3&gt;
  
  
  Step 6: Run both systems in parallel, then disable Uptime.com
&lt;/h3&gt;

&lt;p&gt;Leave Uptime.com and Velprove running side by side for 24 to 48 hours to confirm Velprove catches the same events. Then disable the migrated monitors in Uptime.com. If you only used the $9 floor, downgrade or close the account before the next billing cycle. To &lt;a href="https://velprove.com/blog/better-stack-alternative" rel="noopener noreferrer"&gt;see how Velprove compares to other modern monitoring tools&lt;/a&gt; before you finalize the swap, the Better Stack comparison is the closest peer.&lt;/p&gt;

&lt;h2&gt;
  
  
  Frequently Asked Questions
&lt;/h2&gt;

&lt;h3&gt;
  
  
  Does Uptime.com have a free plan?
&lt;/h3&gt;

&lt;p&gt;No. Uptime.com offers a 14-day free trial with no credit card required, but the entry to paid is $9 per month on monthly billing or $7 per month on yearly billing. After the trial expires, you either pay or you lose access. Velprove offers a free plan with no time limit, no credit card required, and no expiration. The free plan includes 10 monitors, 1 browser login monitor on a 15-minute interval, multi-step API monitors with up to 3 steps, and 5 global regions.&lt;/p&gt;

&lt;h3&gt;
  
  
  How much does Uptime.com cost?
&lt;/h3&gt;

&lt;p&gt;Uptime.com's live pricing calculator starts at $9 per month on monthly billing or $7 per month on yearly billing for the base Website Monitoring product. The floor includes 10 basic checks, 1 advanced check, and 25 SMS alerts. To run more than one advanced check at the same time (transaction monitoring, API monitoring, or page speed monitoring), you scale up the calculator. Add-on modules include Status Page at $19 per month, RUM at $5 per month, and additional probe locations at $1 per month each.&lt;/p&gt;

&lt;h3&gt;
  
  
  What is included in Uptime.com's $9 floor?
&lt;/h3&gt;

&lt;p&gt;The $9 monthly floor includes 10 basic checks, 1 advanced check, 25 SMS alerts, email and SMS alerting, 20+ integrations, maintenance windows, customizable reporting, and unlimited group checks. The 1 advanced check slot is the catch: you spend it on transaction monitoring OR API monitoring OR page speed monitoring, not all three. To run a transaction check and a multi-step API check at the same time, you upgrade to the next preset tier or add advanced check slots through the calculator. A status page is a separate $19 per month module on top of the $9 floor. By comparison, Velprove's free plan includes HTTP monitors, a browser login monitor, and multi-step API monitors at $0.&lt;/p&gt;

&lt;h3&gt;
  
  
  Does Uptime.com support browser-based login monitoring?
&lt;/h3&gt;

&lt;p&gt;Yes. Uptime.com offers a no-code transaction recorder that drives a real browser through clicks, form fills, and screenshot capture. It is a strong product. The catch is plan gating: transaction checks count against your advanced check slot, with 1 transaction check on Starter, 5 on Essential, and 15 on Premium. Velprove offers a browser login monitor on every plan including Free, with 1 included on Free at a 15-minute interval, 3 on Starter at 10 minutes, and 10 on Pro at 5 minutes.&lt;/p&gt;

&lt;h3&gt;
  
  
  Is Uptime.com SOC 2 compliant?
&lt;/h3&gt;

&lt;p&gt;Yes. Uptime.com has completed the AICPA Service Organization Control 2 Type II audit, confirmed on their Trust Center. They also support GDPR with data processing terms and cross-border transfer safeguards, and offer SAML 2.0 SSO with Okta, OneLogin, AWS, and Azure AD. If your procurement process requires SOC 2 documentation today, Uptime.com is the right answer. Velprove is not currently SOC 2 attested.&lt;/p&gt;

&lt;h3&gt;
  
  
  When should I pick Velprove over Uptime.com?
&lt;/h3&gt;

&lt;p&gt;Pick Velprove if you want a free plan with no time limit, a browser login monitor at no cost, multi-step API monitoring at no cost, or a single predictable plan price as you grow. Velprove offers Free, Starter at $19 per month, and Pro at $49 per month, with no metered per-monitor charges. Pick Uptime.com if you need SOC 2 Type II compliance, a Terraform provider, RUM bundled with synthetic monitoring, probe coverage in 80+ countries, or already have an enterprise procurement budget. Both can be the right answer; they are calibrated for different team sizes.&lt;/p&gt;

&lt;p&gt;Uptime.com is a serious mid-market monitoring company with real enterprise compliance and a real customer wall. If your team needs what they have, the $9 floor is not the right number to anchor on and you would land higher up the calculator. If your team wants to start for free, run a browser login monitor on day one, and grow into paid plans on a predictable price as the surface expands, Velprove is the simpler swap. &lt;a href="https://velprove.com/signup" rel="noopener noreferrer"&gt;Start a free Velprove account.&lt;/a&gt; No credit card required.&lt;/p&gt;

</description>
      <category>monitoring</category>
      <category>webdev</category>
      <category>devops</category>
      <category>uptime</category>
    </item>
    <item>
      <title>Oh Dear Alternative: Velprove vs Oh Dear (2026 Comparison)</title>
      <dc:creator>velprove</dc:creator>
      <pubDate>Tue, 05 May 2026 14:00:04 +0000</pubDate>
      <link>https://dev.to/velprove/oh-dear-alternative-velprove-vs-oh-dear-2026-comparison-1n5e</link>
      <guid>https://dev.to/velprove/oh-dear-alternative-velprove-vs-oh-dear-2026-comparison-1n5e</guid>
      <description>&lt;p&gt;The honest take: &lt;a href="https://ohdear.app" rel="noopener noreferrer"&gt;Oh Dear&lt;/a&gt; is a polished Belgian uptime monitor with 99% positive Capterra sentiment and flat all-features pricing starting at EUR 15/mo. Velprove is the better Oh Dear alternative when you need a free browser login monitor, configurable 30-second intervals, or per-monitor instead of per-site pricing.&lt;/p&gt;

&lt;h2&gt;
  
  
  Oh Dear vs Velprove at a glance
&lt;/h2&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;What you get&lt;/th&gt;
&lt;th&gt;Velprove Free&lt;/th&gt;
&lt;th&gt;Oh Dear (EUR 15/mo, 5 sites)&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Price&lt;/td&gt;
&lt;td&gt;$0&lt;/td&gt;
&lt;td&gt;~$16 USD&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Browser login monitor&lt;/td&gt;
&lt;td&gt;1 included, 15-min interval&lt;/td&gt;
&lt;td&gt;Not a named product. They offer "AI Monitoring" natural-language assertions instead&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;HTTP check interval&lt;/td&gt;
&lt;td&gt;5 min on Free, 1 min on Starter, 30s on Pro&lt;/td&gt;
&lt;td&gt;1 min, fixed&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Monitor regions&lt;/td&gt;
&lt;td&gt;5 (North America, Europe, United Kingdom, Asia, Oceania)&lt;/td&gt;
&lt;td&gt;14 active locations&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;DNS change monitoring&lt;/td&gt;
&lt;td&gt;Not in our current plans&lt;/td&gt;
&lt;td&gt;Yes, native&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Free trial / free tier&lt;/td&gt;
&lt;td&gt;Free tier, no credit card&lt;/td&gt;
&lt;td&gt;10-day trial, no credit card. No free tier&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;h2&gt;
  
  
  Oh Dear pricing vs Velprove pricing
&lt;/h2&gt;

&lt;p&gt;Oh Dear charges per number of sites and runs in EUR. The pricing page uses a slider, so the discrete tiers below come from Capterra's published tier list. We confirmed the entry tier (5 sites for EUR 15/mo) on 2026-05-04. USD figures are conversions at roughly 1.08 USD per EUR, not Oh Dear's own quotes.&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Sites&lt;/th&gt;
&lt;th&gt;Oh Dear monthly (EUR)&lt;/th&gt;
&lt;th&gt;Approx USD&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;5&lt;/td&gt;
&lt;td&gt;EUR 15&lt;/td&gt;
&lt;td&gt;~$16&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;10&lt;/td&gt;
&lt;td&gt;EUR 25&lt;/td&gt;
&lt;td&gt;~$27&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;25&lt;/td&gt;
&lt;td&gt;EUR 50&lt;/td&gt;
&lt;td&gt;~$54&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;50&lt;/td&gt;
&lt;td&gt;EUR 80&lt;/td&gt;
&lt;td&gt;~$86&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;100&lt;/td&gt;
&lt;td&gt;EUR 140&lt;/td&gt;
&lt;td&gt;~$151&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;200&lt;/td&gt;
&lt;td&gt;EUR 220&lt;/td&gt;
&lt;td&gt;~$237&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;Every Oh Dear plan includes every feature. Annual billing offers a "modest" discount per their public material, though the exact percentage is not on the pricing page as of this writing.&lt;/p&gt;

&lt;p&gt;Velprove pricing:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Plan&lt;/th&gt;
&lt;th&gt;Price&lt;/th&gt;
&lt;th&gt;Monitors&lt;/th&gt;
&lt;th&gt;HTTP interval&lt;/th&gt;
&lt;th&gt;Browser login monitors&lt;/th&gt;
&lt;th&gt;Regions&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Free&lt;/td&gt;
&lt;td&gt;$0&lt;/td&gt;
&lt;td&gt;10&lt;/td&gt;
&lt;td&gt;5 min&lt;/td&gt;
&lt;td&gt;1 (15-min interval)&lt;/td&gt;
&lt;td&gt;5&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Starter&lt;/td&gt;
&lt;td&gt;$19/mo&lt;/td&gt;
&lt;td&gt;25&lt;/td&gt;
&lt;td&gt;1 min&lt;/td&gt;
&lt;td&gt;3 (10-min interval)&lt;/td&gt;
&lt;td&gt;5&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Pro&lt;/td&gt;
&lt;td&gt;$49/mo&lt;/td&gt;
&lt;td&gt;100&lt;/td&gt;
&lt;td&gt;30 sec&lt;/td&gt;
&lt;td&gt;10 (5-min interval)&lt;/td&gt;
&lt;td&gt;5&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;Two pricing differences worth naming.&lt;/p&gt;

&lt;p&gt;First, Velprove has a free plan. Oh Dear does not. Their words: a paid subscription is the entry point, with a 10-day no-card trial and a 30-day money-back guarantee. That's customer-friendly, but it isn't free.&lt;/p&gt;

&lt;p&gt;Second, Oh Dear's per-site model rewards customers with one site per domain and penalizes customers with many URLs on the same domain. If you monitor 30 API URLs across two domains, Oh Dear counts that as roughly 30 sites and lands you on the EUR 80/mo tier. Velprove counts monitors, not domains, and 25 of those URLs fit in Starter at $19. The web-alert.io 2026 review of Oh Dear flags this exact friction point.&lt;/p&gt;

&lt;h2&gt;
  
  
  Where Oh Dear is genuinely strong
&lt;/h2&gt;

&lt;p&gt;Oh Dear is not a weak product. Treat this section as honest, not throat-clearing.&lt;/p&gt;

&lt;h3&gt;
  
  
  DNS change monitoring is native
&lt;/h3&gt;

&lt;p&gt;"Receive a notification whenever your DNS records are modified, intentionally or maliciously." That's their copy and it's a real feature. If detecting unauthorized DNS edits is a top-three requirement for you, Oh Dear has a built-in answer that Velprove does not currently match.&lt;/p&gt;

&lt;h3&gt;
  
  
  Mixed-content scanning and broken links
&lt;/h3&gt;

&lt;p&gt;Oh Dear crawls your site looking for HTTP assets on HTTPS pages and broken internal links. It's bundled with their site-crawl product. Useful if you run content-heavy sites with editors making frequent changes.&lt;/p&gt;

&lt;h3&gt;
  
  
  14 monitor regions
&lt;/h3&gt;

&lt;p&gt;From their docs: Cape Town, Bangalore, Seoul, Singapore, Tokyo, Sydney, Toronto, Frankfurt, London, Paris, Stockholm, Sao Paulo, New York, Dallas, Los Angeles, San Francisco. Velprove runs from 5 regions today (North America, Europe, United Kingdom, Asia, Oceania). If you specifically need African or South American check origin points, Oh Dear has them.&lt;/p&gt;

&lt;h3&gt;
  
  
  All features on every plan
&lt;/h3&gt;

&lt;p&gt;Oh Dear's pricing page says "Every plan has access to all our features." That includes SSO across all tiers (generally available since 2026-04-27), unlimited team members, status pages, and DNS blocklist monitoring. If you hate feature-gated tiers on principle, the model is clean.&lt;/p&gt;

&lt;h3&gt;
  
  
  Customer reception
&lt;/h3&gt;

&lt;p&gt;Capterra reviews include "Easy to implement, and I had great feedback from their helpdesk. They even added a new feature based on my feedback" and "Great Tools beside the default monitoring service. Customer support is good. Nice User Interface." Their public sentiment is strong and earned.&lt;/p&gt;

&lt;h2&gt;
  
  
  Where Velprove is the better fit
&lt;/h2&gt;

&lt;h3&gt;
  
  
  Free browser login monitor on day zero
&lt;/h3&gt;

&lt;p&gt;This is the headline. We include 1 browser login monitor on the free plan, running every 15 minutes. Oh Dear positions their browser-flow story as "AI Monitoring": "Use natural language to verify anything on your pages. Check if login forms work, verify content exists, or validate complex page states." That's a different product. More on the distinction in the next section.&lt;/p&gt;

&lt;h3&gt;
  
  
  Real configurable intervals
&lt;/h3&gt;

&lt;p&gt;Velprove Free runs HTTP checks every 5 minutes, Starter every 1 minute, Pro every 30 seconds. Oh Dear runs uptime checks at a fixed 1-minute cadence with no published way to slow it down for non-production sites or speed it up below 60 seconds. A Capterra reviewer asked for "the ability to change uptime check frequency, especially for non-production sites" as an open wish.&lt;/p&gt;

&lt;h3&gt;
  
  
  Per-monitor pricing, not per-site
&lt;/h3&gt;

&lt;p&gt;If your 25 monitors live across 3 domains, Velprove Starter at $19 covers them. The closest Oh Dear tier (25 sites, EUR 50) is roughly 3x the price for the same monitor count.&lt;/p&gt;

&lt;h3&gt;
  
  
  Failure screenshots when something breaks
&lt;/h3&gt;

&lt;p&gt;When a browser login monitor fails, Velprove captures a screenshot of the page at the moment of failure and stores it for retention. Free retains 1 screenshot per failing monitor, Starter retains 5, Pro retains 30. You see exactly what the user saw at the broken step. A login form that rendered without the submit button, a CAPTCHA that appeared mid-flow, an error toast you didn't know existed. Oh Dear's mixed-content scanner finds HTTP assets on HTTPS pages, which is a different problem class. The screenshot artifact is what you actually open at 2 a.m. when an alert fires.&lt;/p&gt;

&lt;h3&gt;
  
  
  A free tier exists at all
&lt;/h3&gt;

&lt;p&gt;For a developer kicking the tires, a freelancer with 4 client sites, or a side-project owner, "free with one browser login monitor" is qualitatively different from "10-day trial then card on file." The audiences sort themselves.&lt;/p&gt;

&lt;h2&gt;
  
  
  What a browser login monitor actually does
&lt;/h2&gt;

&lt;p&gt;A browser login monitor opens a real headless browser (Chromium-class), navigates to your login page, fills the form, clicks the button, waits for the post-login state, and asserts that the dashboard or post-login page rendered correctly. It catches the things a 200-OK HTTP probe never sees: a JavaScript bundle that fails to hydrate, a session cookie that fails to set on login, a CAPTCHA that started blocking real users, an OAuth callback that times out, a billing-required interstitial that hijacked the redirect.&lt;/p&gt;

&lt;p&gt;It matters because login is where most SaaS revenue actually lives. If signed-in users can't reach their dashboard, your uptime monitor's green light is misleading.&lt;/p&gt;

&lt;p&gt;Velprove's free plan includes 1 browser login monitor at a 15-minute interval. Starter raises that to 3 monitors at 10-minute intervals. Pro takes it to 10 monitors at 5-minute intervals. The flow is configured as a multi-step script with real assertions on post-login state.&lt;/p&gt;

&lt;p&gt;Oh Dear's adjacent feature is called "AI Monitoring" and is described as "Use natural language to verify anything on your pages. Check if login forms work, verify content exists, or validate complex page states." That is a different category. We could not find public evidence on ohdear.app that AI Monitoring runs a real headless browser session for sustained logged-in flows the way a browser login monitor does. Natural-language assertion is useful, but it isn't the same shape as a configured login script with deterministic step assertions. Pick the right tool for the question you're asking.&lt;/p&gt;

&lt;h2&gt;
  
  
  How to switch from Oh Dear to Velprove
&lt;/h2&gt;

&lt;p&gt;Moving from Oh Dear to Velprove is straightforward for the common cases. A few things to think about:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Export your monitor list&lt;/strong&gt;. Oh Dear has a REST API with recent improvements for AI-agent consumption. Pull the list of sites and their settings to use as a checklist. Velprove doesn't have a bulk import today, so plan for the time to recreate each monitor in the new-monitor wizard (roughly a minute per HTTP monitor, longer for browser login or multi-step flows). &lt;strong&gt;Re-create alert routing&lt;/strong&gt;. Both products handle Slack, Discord, Microsoft Teams, and webhooks. Velprove adds PagerDuty on Pro. &lt;strong&gt;Plan for the per-monitor count&lt;/strong&gt;. If you were paying for the 50-site or 100-site Oh Dear tier mostly because of URLs on one domain, you may consolidate into Velprove Pro at $49 instead. &lt;strong&gt;Check region coverage&lt;/strong&gt;. If you actively rely on Cape Town, Sao Paulo, Toronto, or Stockholm origin points, confirm that Velprove's 5-region map (North America, Europe, United Kingdom, Asia, Oceania) is acceptable before switching. &lt;strong&gt;Status pages&lt;/strong&gt;. Velprove Free includes 1 Velprove-branded status page, Starter includes 1 unbranded, Pro includes 3 with custom domain support.&lt;/p&gt;

&lt;h2&gt;
  
  
  When Oh Dear is the right choice
&lt;/h2&gt;

&lt;p&gt;There are real cases where Oh Dear wins. Don't switch out of brand loyalty to a comparison page.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;You actively need DNS change monitoring&lt;/strong&gt; and want it from the same tool that does uptime. Oh Dear has it. We don't, today. &lt;strong&gt;You want flat all-features pricing&lt;/strong&gt; on principle. Oh Dear's "every plan has every feature" is a clean model. &lt;strong&gt;You're already happy and the price works&lt;/strong&gt;. The Capterra sentiment is 99% positive for a reason. Switching to save $20/mo is rarely worth the migration tax if the current product solves your problem. &lt;strong&gt;You need African or South American origin points&lt;/strong&gt;. Cape Town and Sao Paulo are not in our 5-region map. &lt;strong&gt;You want mixed-content scanning bundled in&lt;/strong&gt;. Oh Dear's site-crawl product covers it natively.&lt;/p&gt;

&lt;h2&gt;
  
  
  FAQ
&lt;/h2&gt;

&lt;h3&gt;
  
  
  Does Oh Dear have a free plan?
&lt;/h3&gt;

&lt;p&gt;No. Oh Dear's entry point is a paid plan starting at EUR 15/mo for 5 sites, with a 10-day no-card trial and a 30-day money-back guarantee. Velprove does have a free plan, which is why this comparison exists: 10 monitors at a 5-minute interval, 1 browser login monitor at a 15-minute interval, multi-step API monitors with up to 3 steps each, email alerts, and monitoring from 5 global regions. No credit card.&lt;/p&gt;

&lt;h3&gt;
  
  
  How is Velprove's browser login monitor different from Oh Dear's AI Monitoring?
&lt;/h3&gt;

&lt;p&gt;Velprove runs a configured multi-step browser flow in a real headless browser, with deterministic assertions at each step (form fill, click, wait for selector, verify post-login state). Oh Dear's AI Monitoring is a natural-language assertion layer pitched as "Check if login forms work, verify content exists, or validate complex page states." We could not find public evidence that AI Monitoring runs a sustained headless browser session for logged-in flows. They solve overlapping but different problems.&lt;/p&gt;

&lt;h3&gt;
  
  
  Does Velprove monitor DNS changes like Oh Dear?
&lt;/h3&gt;

&lt;p&gt;Oh Dear wins on DNS today. They have native DNS-change notifications and DNS blocklist monitoring on every plan. Velprove's current plans focus on HTTP, browser login, and multi-step API monitors. If DNS change detection is a primary requirement, Oh Dear is the better fit.&lt;/p&gt;

&lt;h3&gt;
  
  
  When should I pick Velprove vs Oh Dear?
&lt;/h3&gt;

&lt;p&gt;Pick Velprove if you want a free browser login monitor, configurable check intervals down to 30 seconds, per-monitor pricing instead of per-site math, or a real free tier to start on. Pick Oh Dear if you need native DNS change monitoring, prefer flat all-features pricing, want check origins in Africa or South America, or are already happy and not looking to move.&lt;/p&gt;

&lt;h3&gt;
  
  
  How hard is it to switch from Oh Dear to Velprove?
&lt;/h3&gt;

&lt;p&gt;For accounts with up to about 25 monitors, an afternoon. There's no bulk import today, so plan for roughly a minute per HTTP monitor in the new-monitor wizard and longer for browser login or multi-step flows that need step-level configuration. Pull your monitor list from Oh Dear's REST API to use as a checklist, re-create each one in Velprove, point your alert integrations (Slack, Discord, Teams, webhooks, PagerDuty on Pro) at the new account, and run both in parallel for a week before turning Oh Dear off. The migration tax is mostly in re-creating any custom assertions and re-pointing status pages.&lt;/p&gt;

&lt;h2&gt;
  
  
  Try Velprove free
&lt;/h2&gt;

&lt;p&gt;Start free with 10 monitors, a real browser login monitor, and 5 monitor regions. No credit card. If you outgrow the free plan, Starter at $19/mo and Pro at $49/mo add 1-minute and 30-second intervals plus more browser login monitors.&lt;/p&gt;

&lt;p&gt;For more comparisons, see our &lt;a href="https://velprove.com/blog/pingdom-alternative" rel="noopener noreferrer"&gt;Pingdom alternative&lt;/a&gt;, &lt;a href="https://velprove.com/blog/better-stack-alternative" rel="noopener noreferrer"&gt;Better Stack alternative&lt;/a&gt; , and &lt;a href="https://velprove.com/blog/checkly-alternative" rel="noopener noreferrer"&gt;Checkly alternative&lt;/a&gt; write-ups.&lt;/p&gt;

</description>
      <category>monitoring</category>
      <category>webdev</category>
      <category>devops</category>
      <category>uptime</category>
    </item>
    <item>
      <title>How to Monitor a Next.js App in Production (2026)</title>
      <dc:creator>velprove</dc:creator>
      <pubDate>Tue, 05 May 2026 14:00:02 +0000</pubDate>
      <link>https://dev.to/velprove/how-to-monitor-a-nextjs-app-in-production-2026-4i88</link>
      <guid>https://dev.to/velprove/how-to-monitor-a-nextjs-app-in-production-2026-4i88</guid>
      <description>&lt;p&gt;&lt;strong&gt;Quick take:&lt;/strong&gt; A standard 200-OK uptime check on a Next.js app misses three real failure modes. Vercel ISR can serve stale content for an unbounded window when revalidation fails. Cold starts on archived functions add latency that can blow past your monitor timeout. Auth-protected routes can return 200 while the actual page is empty, redirected, or rendering an error boundary. The fix is layered: an HTTP monitor with a freshness assertion, a multi-step API monitor for token-auth API routes, and a browser login monitor for &lt;code&gt;/dashboard&lt;/code&gt;. Velprove's free plan covers all three. No credit card required. &lt;a href="https://velprove.com/signup" rel="noopener noreferrer"&gt;Start for free&lt;/a&gt;.&lt;/p&gt;

&lt;h2&gt;
  
  
  Why a 200 OK on your Next.js app is not enough
&lt;/h2&gt;

&lt;p&gt;Next.js sits on top of three things that a status-code monitor cannot see through: a caching layer that keeps serving old responses when revalidation fails, a serverless runtime that occasionally adds seconds to a response without breaking it, and an App Router error model that can wrap a thrown Server Component in a styled fallback page. Each of those returns a 200 to the curl request your monitor is doing every 5 minutes. None of them mean the application is healthy.&lt;/p&gt;

&lt;p&gt;What does an HTTP monitor miss on a Next.js app? Three failure classes the framework introduces and a status-code check cannot see: ISR revalidation failures, cold-start latency on Vercel and Railway, and auth-protected routes that lie about their state. The same pattern that makes &lt;a href="https://velprove.com/blog/why-uptime-monitors-miss-outages" rel="noopener noreferrer"&gt;uptime monitors miss real outages&lt;/a&gt; on any framework gets worse on Next.js because the framework has more layers between the request and the response.&lt;/p&gt;

&lt;h2&gt;
  
  
  Monitoring Next.js ISR: when 200 OK serves stale content
&lt;/h2&gt;

&lt;p&gt;Incremental Static Regeneration is the most monitor-blind feature in modern Next.js. The whole point of ISR is to keep serving the cached version while a revalidation runs in the background. When that background revalidation fails, the cached version keeps being served. Your reader sees a page from last Tuesday. Your monitor sees a 200.&lt;/p&gt;

&lt;h3&gt;
  
  
  Why ISR failures are invisible to status-code monitors
&lt;/h3&gt;

&lt;p&gt;Per Vercel's &lt;a href="https://vercel.com/docs/incremental-static-regeneration" rel="noopener noreferrer"&gt;ISR documentation&lt;/a&gt; , when a revalidation fails the platform "preserves the stale content and sets a 30-second TTL" before retrying. Failure is defined broadly: network timeouts, function execution errors, or any HTTP status outside the small allow-list of 200, 301, 302, 307, 308, 404, and 410. Every one of those failures is invisible at the HTTP layer of the next request, because the next request is still hitting the cached body. The retry loop runs every 30 seconds, indefinitely, until either the data source recovers or someone notices that the page is stale.&lt;/p&gt;

&lt;h3&gt;
  
  
  What to assert against
&lt;/h3&gt;

&lt;p&gt;The monitor needs a signal that changes when the page is fresh. Three options work in practice. First, a literal date string the route prints in the body, asserted with a body-contains check that you update on a cadence you control. Second, a build ID or deploy SHA the route prints in the body, asserted with body-contains and updated whenever you deploy. Third, the &lt;code&gt;x-nextjs-cache&lt;/code&gt; response header, which Next.js documents as taking the values &lt;code&gt;HIT&lt;/code&gt;, &lt;code&gt;STALE&lt;/code&gt;, &lt;code&gt;MISS&lt;/code&gt;, or &lt;code&gt;REVALIDATED&lt;/code&gt;, asserted with a header-contains check.&lt;/p&gt;

&lt;p&gt;The most portable pattern is a build ID or deploy SHA in the body. The &lt;code&gt;x-nextjs-cache&lt;/code&gt; header is reliable when you read the response directly from the Next.js server, but Vercel and other CDN layers in front of the app can strip or rewrite it before the response reaches your monitor. A value printed in the body travels everywhere the body travels.&lt;/p&gt;

&lt;h3&gt;
  
  
  What this looks like in Velprove
&lt;/h3&gt;

&lt;p&gt;Configure an API monitor against a public ISR page or a small JSON route, then add a JSON path or body-contains assertion against a freshness signal that changes when the page is fresh, like a date string or a build SHA the route prints in the response. The full configuration walk-through, including how to &lt;a href="https://velprove.com/blog/monitor-rest-api-health-endpoint" rel="noopener noreferrer"&gt;monitor your /api/health route with JSON validation&lt;/a&gt; , is the right next read. The Next.js-specific piece is just deciding what value to put in the body.&lt;/p&gt;

&lt;h2&gt;
  
  
  Vercel cold starts and what to monitor
&lt;/h2&gt;

&lt;p&gt;Cold starts on Vercel are not a bug; they are a property of the runtime. The monitor needs to know they exist so its timeout does not turn an occasional cold boot into a fake outage page.&lt;/p&gt;

&lt;h3&gt;
  
  
  Cold starts on Node vs. Edge runtime
&lt;/h3&gt;

&lt;p&gt;Vercel's runtime documentation is direct about it: serverless applications "will always have the notion of cold starts." Fluid Compute, the default for new projects since April 23, 2025, reduces the likelihood of cold starts through optimized concurrency, but the docs note "it can still happen such as during periods of low traffic." The most concrete latency claim Vercel publishes is for archived functions, which are unarchived on first invocation and can take "at least 1 second longer than usual" on that boot. The Edge runtime is the architectural escape hatch: it is built on V8 isolates that "don't require a container or virtual machine," which removes the microVM startup cost. Edge has its own constraints, including no support for Cache Components, so the choice is route-by-route, not app-wide.&lt;/p&gt;

&lt;h3&gt;
  
  
  Preview deployments are not what you want to monitor
&lt;/h3&gt;

&lt;p&gt;Preview URLs change every deploy, and the underlying functions are archived after 48 hours of inactivity, compared to 2 weeks for production functions. That archival window is short enough that a monitor pointed at a branch preview URL will hit a cold-start penalty on most checks during a quiet weekend. The result is a monitor that looks unhealthy whenever the team is not pushing, which is exactly when you want a clear signal. Monitor your production domain. Use preview URLs for human eyeballs and CI smoke tests, not for synthetic uptime checks. If you want to verify a preview deploy before promotion, run an ad-hoc check against the commit-specific URL and discard it after the preview is merged.&lt;/p&gt;

&lt;h2&gt;
  
  
  Monitoring auth-protected routes (the /dashboard problem)
&lt;/h2&gt;

&lt;h3&gt;
  
  
  Why your /dashboard route can lie to an HTTP monitor
&lt;/h3&gt;

&lt;p&gt;There are three ways a Next.js &lt;code&gt;/dashboard&lt;/code&gt; route can return 200 while being broken: a styled error boundary, a follow-redirect to the login page, or an empty shell from a Server Component that fetched zero rows. The response your monitor sees on a route like &lt;code&gt;/dashboard&lt;/code&gt; varies based on your &lt;code&gt;error.tsx&lt;/code&gt; boundary and your proxy configuration. A thrown Server Component error can render a styled error page inside an error response. A missing session can issue a redirect to the login page, which most monitors follow and report as a 200 on the login screen. A working build can render an empty shell because the Server Component fetched zero rows. None of these are caught by checking the status code.&lt;/p&gt;

&lt;h3&gt;
  
  
  Browser login monitors for Next.js auth
&lt;/h3&gt;

&lt;p&gt;A browser login monitor signs in the way a user would: it opens your login page, types credentials, clicks submit, and asserts that a piece of post-login content actually rendered. That is the only check that distinguishes a working dashboard from a redirect-to-login or an empty error boundary. Use a test account, not a real admin account, and scope its permissions to read-only. The full setup, including selectors and assertions, is in the &lt;a href="https://velprove.com/blog/monitor-saas-login-page" rel="noopener noreferrer"&gt;browser login monitor walkthrough for SaaS auth&lt;/a&gt; .&lt;/p&gt;

&lt;h3&gt;
  
  
  Multi-step API monitors for token-auth API routes
&lt;/h3&gt;

&lt;p&gt;For App Router API routes that require a bearer token, &lt;a href="https://velprove.com/blog/multi-step-api-monitoring-guide" rel="noopener noreferrer"&gt;chain a login → token → protected-call monitor&lt;/a&gt; that captures the token from the first response and replays it against the protected route.&lt;/p&gt;

&lt;h2&gt;
  
  
  Monitoring self-hosted Next.js and Railway deployments
&lt;/h2&gt;

&lt;h3&gt;
  
  
  Self-hosted Next.js
&lt;/h3&gt;

&lt;p&gt;Running Next.js on your own Node.js process removes the platform layer but keeps the framework layer. The &lt;code&gt;error.tsx&lt;/code&gt; boundary still wraps thrown Server Component errors. There is no built-in liveness route, so &lt;code&gt;app/api/health/route.ts&lt;/code&gt; is your responsibility to write, populate, and keep honest. Auth gating typically lives in &lt;code&gt;proxy.ts&lt;/code&gt; (formerly &lt;code&gt;middleware.ts&lt;/code&gt; before the Next.js 16 rename), which runs on the Node.js runtime by default. Treat the proxy file as a routing concern that can fail like any other route, and make sure your monitor pattern catches a proxy that throws an error instead of redirecting cleanly.&lt;/p&gt;

&lt;h3&gt;
  
  
  Railway sleep and cold-boot
&lt;/h3&gt;

&lt;p&gt;Railway services with &lt;a href="https://docs.railway.com/reference/app-sleeping" rel="noopener noreferrer"&gt;Serverless enabled&lt;/a&gt; enter sleep mode when "no packets are sent from the service for over 10 minutes." The first request to a slept service wakes it, with a small delay that Railway describes as a "cold boot time." Two non-obvious gotchas matter for monitoring. First, the inactivity trigger is &lt;em&gt;outbound&lt;/em&gt; packets, so a Next.js app that receives inbound traffic but does no outbound polling can still sleep. Second, your uptime monitor itself is inbound traffic, so a frequent monitor will keep the service awake and mask the cold-boot behavior your real users hit at midnight. Monitor at a realistic interval and accept the cold boot as part of the SLO.&lt;/p&gt;

&lt;h2&gt;
  
  
  What to monitor on every Next.js app, by route type
&lt;/h2&gt;

&lt;p&gt;Different parts of a Next.js app fail in different ways, and they need different monitor types. Treating the whole app as one URL behind one HTTP monitor is the cheapest way to miss the failure modes covered above. The table below maps the five common route types to the monitor type that actually catches their failure modes.&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Route type&lt;/th&gt;
&lt;th&gt;Monitor type&lt;/th&gt;
&lt;th&gt;What to assert&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Static / ISR public pages&lt;/td&gt;
&lt;td&gt;HTTP body assertion&lt;/td&gt;
&lt;td&gt;Freshness signal (date string in body, or build id/deploy SHA in body)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;app/api/health/route.ts&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;API monitor with JSON validation&lt;/td&gt;
&lt;td&gt;
&lt;code&gt;$.status equals "ok"&lt;/code&gt; + response time threshold&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;
&lt;code&gt;app/api/&amp;amp;lt;resource&amp;amp;gt;/route.ts&lt;/code&gt; (auth required)&lt;/td&gt;
&lt;td&gt;Multi-step API monitor&lt;/td&gt;
&lt;td&gt;Login, call, assert&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;
&lt;code&gt;/dashboard&lt;/code&gt; and other Server Component pages behind auth&lt;/td&gt;
&lt;td&gt;Browser login monitor&lt;/td&gt;
&lt;td&gt;Sign in and assert post-login content&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Webhook receivers ( &lt;code&gt;app/api/webhooks/.../route.ts&lt;/code&gt;)&lt;/td&gt;
&lt;td&gt;HTTP monitor + dead-letter alert&lt;/td&gt;
&lt;td&gt;Status + log digest&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;Static and ISR pages are the ones most likely to silently rot, so they need the freshness assertion specifically. The dedicated &lt;code&gt;/api/health&lt;/code&gt; route should return JSON with a status field plus the few subsystem flags you actually care about, monitored with a JSON path assertion and a response-time threshold tight enough to catch a slow database. Token-protected API routes need the chained multi-step pattern because a single GET cannot prove the auth flow still works end to end. Server Component pages behind auth need the browser monitor because no HTTP-level check can distinguish a real dashboard from a styled error page. Webhook receivers usually need a status-code monitor plus a separate alert on dead-letter queue depth, because the receiver can return 200 while quietly dropping payloads downstream.&lt;/p&gt;

&lt;h2&gt;
  
  
  Setting up a Next.js monitor in Velprove (free)
&lt;/h2&gt;

&lt;p&gt;The free plan covers an HTTP monitor with body assertions, an API monitor with JSON path assertions, and one browser login monitor running every 15 minutes (or slower). That is enough to cover the three failure modes above for one production Next.js app. Every plan, including the free one, runs checks from all five regions: North America, Europe, UK, Asia, and Oceania.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Sign up for a free Velprove account.&lt;/strong&gt; No credit card is required, and the free plan includes 10 monitors, 1 browser login monitor, and email alerts. ** Add an HTTP monitor pointing at your Next.js production domain. ** Use the production URL, not a Vercel preview URL, so cold starts on archived preview functions do not pollute the signal. ** Add a body or JSON path assertion on a freshness signal. ** A date string or build SHA printed in the response body works on any Next.js setup. Match the value with a body-contains assertion and update it when you deploy. ** Add a browser login monitor for your &lt;code&gt;/dashboard&lt;/code&gt; route. ** Create a low-privilege test account first and use those credentials for the monitor, never a real admin login. &lt;strong&gt;Configure your alert channel.&lt;/strong&gt; The free plan sends email alerts. Slack, webhook, Discord, and Microsoft Teams are available on the Starter plan at $19 per month, and PagerDuty is available on Pro at $49 per month. For non-email channels, paste the webhook URL or routing key into Settings, Notifications first, then pick that channel on the monitor.&lt;/p&gt;

&lt;p&gt;Velprove runs on Next.js. We run an API monitor with body validation on &lt;code&gt;/api/health&lt;/code&gt;, HTTP body-validation checks on our marketing pages and dashboard route, and a browser login monitor that signs in and asserts it landed on &lt;code&gt;/dashboard&lt;/code&gt;, the same layered setup the post recommends, across all five regions. If you want the broader context outside the framework, the &lt;a href="https://velprove.com/blog/uptime-monitoring-saas-founders" rel="noopener noreferrer"&gt;solo founder's broader monitoring playbook&lt;/a&gt; covers what to monitor across the rest of the stack. Otherwise, &lt;a href="https://velprove.com/signup" rel="noopener noreferrer"&gt;start for free&lt;/a&gt; and have the three monitors above running in about 10 minutes.&lt;/p&gt;

&lt;h2&gt;
  
  
  Frequently Asked Questions
&lt;/h2&gt;

&lt;h3&gt;
  
  
  How do you monitor a Next.js app in production?
&lt;/h3&gt;

&lt;p&gt;Use a layered setup that matches the three failure modes the framework introduces. An HTTP monitor with a body assertion on a freshness signal catches ISR revalidation failures that return 200 with stale content. An API monitor with JSON path validation on &lt;code&gt;app/api/health/route.ts&lt;/code&gt; catches subsystem failures. A browser login monitor on &lt;code&gt;/dashboard&lt;/code&gt; catches auth-protected route errors that an HTTP check cannot see. The free plan on Velprove covers all three.&lt;/p&gt;

&lt;h3&gt;
  
  
  Does Next.js have a built-in health endpoint?
&lt;/h3&gt;

&lt;p&gt;No. Next.js does not include a default liveness or readiness route, so you create your own at &lt;code&gt;app/api/health/route.ts&lt;/code&gt; in the App Router. The recommended pattern is a small JSON response with a status field plus a few flags for the subsystems that actually matter for uptime, such as the database and any critical upstream dependency. The full design discussion is in the dedicated &lt;code&gt;/api/health&lt;/code&gt; route guide.&lt;/p&gt;

&lt;h3&gt;
  
  
  What is the best way to detect Vercel ISR revalidation failures?
&lt;/h3&gt;

&lt;p&gt;Status-code monitors miss it because Vercel keeps serving the existing cached 200 response and retries revalidation every 30 seconds in the background. The reliable signal is a freshness assertion: a timestamp embedded in the response body, a build-id header that changes per deploy, or the &lt;code&gt;x-nextjs-cache&lt;/code&gt; response header set to &lt;code&gt;HIT&lt;/code&gt;, &lt;code&gt;STALE&lt;/code&gt;, &lt;code&gt;MISS&lt;/code&gt;, or &lt;code&gt;REVALIDATED&lt;/code&gt;. The body timestamp is the most portable across CDN configurations.&lt;/p&gt;

&lt;h3&gt;
  
  
  How do I monitor cold starts on Vercel?
&lt;/h3&gt;

&lt;p&gt;Set your monitor timeout based on observed p95 plus headroom for cold-boot variance, not on Vercel's function maximum. Per Vercel docs, archived functions can take "at least 1 second longer than usual" on the first invocation after archival, and Fluid Compute reduces but does not eliminate cold starts. Monitor your production domain rather than preview URLs, since preview functions are archived after 48 hours of inactivity and will cold-boot on most checks.&lt;/p&gt;

&lt;h3&gt;
  
  
  Can I monitor a Next.js app on Railway or self-hosted?
&lt;/h3&gt;

&lt;p&gt;Yes, with one platform-specific note for each. Railway services with Serverless enabled sleep after 10 minutes without outbound packets and incur a cold-boot delay on the first request that wakes them, so calibrate your monitor interval and timeout accordingly. Self-hosted Next.js behaves like any other Node.js HTTP service: the same three monitor types apply, and you remain responsible for writing the &lt;code&gt;/api/health&lt;/code&gt; route and the &lt;code&gt;proxy.ts&lt;/code&gt; auth gates yourself.&lt;/p&gt;

&lt;h3&gt;
  
  
  How do you monitor a Next.js auth-protected page?
&lt;/h3&gt;

&lt;p&gt;A browser login monitor signs in as a dedicated low-privilege test user and asserts that an element from the post-login UI actually rendered. HTTP monitors cannot tell a working &lt;code&gt;/dashboard&lt;/code&gt; apart from a login redirect or a styled error boundary, because all three can return 200. Always use a separate test account scoped to read-only permissions for monitoring, never a real admin login, and rotate the credentials on the same cadence as your other secrets.&lt;/p&gt;

</description>
      <category>monitoring</category>
      <category>webdev</category>
      <category>devops</category>
      <category>uptime</category>
    </item>
    <item>
      <title>Uptime Monitoring for WordPress Agencies: 50+ Client Sites, One Dashboard</title>
      <dc:creator>velprove</dc:creator>
      <pubDate>Mon, 04 May 2026 14:00:03 +0000</pubDate>
      <link>https://dev.to/velprove/uptime-monitoring-for-wordpress-agencies-50-client-sites-one-dashboard-1972</link>
      <guid>https://dev.to/velprove/uptime-monitoring-for-wordpress-agencies-50-client-sites-one-dashboard-1972</guid>
      <description>&lt;p&gt;&lt;strong&gt;Bottom line:&lt;/strong&gt; WordPress agencies running 50+ client sites need more than HTTP probes. A 200 from wp-admin can hide a broken login behind a cache or a security plugin lockout, and only a real browser session through wp-login.php catches it. Velprove is a solo-founder-built monitor with a free browser login monitor on every plan, 30-second HTTP intervals on Pro ($49/month), and 3 branded status pages on custom client domains. Free tier covers a 5-site pilot with no credit card.&lt;/p&gt;

&lt;p&gt;Your client's wp-admin returned a 200, but a real browser session through wp-login.php would have caught the failure. That gap is where agency monitoring lives or dies. If a CDN or page-cache layer strips the auth cookie, wp-admin can return a 200 from cache while the underlying request would actually redirect to wp-login.php. Your dashboard is green. Your client is locked out. Your account manager finds out from a Slack message that starts with "hey, quick question." The fix is not a faster ping. The fix is monitoring the same thing your client does: open a browser, log in, see the dashboard.&lt;/p&gt;

&lt;h2&gt;
  
  
  Why agency monitoring is a different problem from single-site monitoring
&lt;/h2&gt;

&lt;p&gt;Single-site monitoring is a yes-or-no question. Agency monitoring is a routing problem, a reporting problem, and a margin problem stacked on top of each other. When you are running anywhere from 30 to 100+ sites for paying clients, the bottleneck is not the probe. The bottleneck is everything around it.&lt;/p&gt;

&lt;h3&gt;
  
  
  The three failure modes that ruin agency margins
&lt;/h3&gt;

&lt;p&gt;First, false negatives: an HTTP probe says 200 while the real site is broken behind a cache or a plugin. Second, false positives: a flaky probe pages your on-call at 3am for a 30-second blip on a $30/month care-plan site. Third, attribution chaos: an alert fires and nobody on the team knows which client owns it, which account manager handles it, or whether the SLA promises a 15-minute or 4-hour response. Each one bleeds time. Time is the only thing an agency sells.&lt;/p&gt;

&lt;h3&gt;
  
  
  What "managing 50+ sites" actually means operationally
&lt;/h3&gt;

&lt;p&gt;Operationally it means you need tags, naming conventions, tier-aware alerting, and reports that write themselves. It means a junior account manager should be able to look at the dashboard at 8am Monday and answer three questions in under a minute: did anything break this weekend, which client is affected, and is it already fixed. If your monitoring tool needs a senior engineer to interpret it, you do not have a monitoring tool. You have a second job.&lt;/p&gt;

&lt;h2&gt;
  
  
  Organizing monitors by client (tags, naming, groups)
&lt;/h2&gt;

&lt;p&gt;The single biggest lever for agency-scale monitoring is naming. Get this right on day one and the rest of the system gets easier. Get it wrong and you will be renaming 200 monitors at month six.&lt;/p&gt;

&lt;h3&gt;
  
  
  A naming convention that survives client churn
&lt;/h3&gt;

&lt;p&gt;Use &lt;code&gt;client-slug:env:check-type&lt;/code&gt;. So &lt;code&gt;acme-co:prod:wp-admin-login&lt;/code&gt; and &lt;code&gt;acme-co:prod:homepage-uptime&lt;/code&gt; sit next to each other alphabetically. When Acme Co churns, you filter on &lt;code&gt;acme-co:&lt;/code&gt; and archive the whole batch in one pass. When Acme Co adds a staging environment, &lt;code&gt;acme-co:staging:homepage-uptime&lt;/code&gt; drops in cleanly. The colon separators are not arbitrary. They sort cleanly, they grep cleanly, and they survive copy-paste into Slack without auto-linking.&lt;/p&gt;

&lt;h3&gt;
  
  
  Naming conventions in lieu of tag dimensions
&lt;/h3&gt;

&lt;p&gt;Velprove uses naming conventions instead of a separate tag UI, which means the dimensions you would otherwise tag get encoded into the name itself. A name like &lt;code&gt;acme-co:prod:wp-admin-login:p1:kinsta&lt;/code&gt; packs five dimensions in one string: &lt;strong&gt;client&lt;/strong&gt; (the slug), &lt;strong&gt;environment&lt;/strong&gt; (prod, staging), &lt;strong&gt;check type&lt;/strong&gt; (homepage-uptime, wp-admin-login, multi-step-checkout), &lt;strong&gt;criticality&lt;/strong&gt; (p1, p2, p3), and &lt;strong&gt;hosting&lt;/strong&gt; (wpengine, kinsta, cloudways, self-managed). The colon-delimited segments give you grep-style filters in any list view that supports text search. Knowing the site runs on WP Engine, for example, tells you to check whether Alternate Cron is enabled (it is opt-in there) before you escalate.&lt;/p&gt;

&lt;h3&gt;
  
  
  Why naming conventions beat folder-per-client at scale
&lt;/h3&gt;

&lt;p&gt;Folders look tidy when you have 5 clients. At 50 clients with 4 monitors each, you are clicking into 50 folders to do anything. A flat list with a strict naming convention lets you ask the same questions with a text filter: type &lt;code&gt;acme-co:&lt;/code&gt; and see every Acme monitor, type &lt;code&gt;:p1:&lt;/code&gt; and see every p1 monitor across the book. The trade-off versus a tag UI is real but small, and the workflow is fast once it is muscle memory.&lt;/p&gt;

&lt;h2&gt;
  
  
  Tiered monitoring matched to your care plan tiers
&lt;/h2&gt;

&lt;p&gt;A common industry pattern is to bundle WordPress care plans into three tiers. The dollar bands below are what most agencies are doing. Your monitoring should match the tier the client is paying for, not a one-size config that loses money on the bottom tier and underdelivers on the top.&lt;/p&gt;

&lt;h3&gt;
  
  
  Basic tier (HTTP uptime + SSL expiry) for $30 to $99 plans
&lt;/h3&gt;

&lt;p&gt;At the $30 to $99 band the client is paying for plugin updates, backups, and a heartbeat. Monitoring should match: HTTP homepage check at a 5-minute interval, SSL expiry check, and a single email alert to the account manager. No paging. No browser sessions. The math has to work, and at $40/month you cannot justify a $5/month-per-monitor cost stack. Velprove's free tier handles this layer at zero cost per site.&lt;/p&gt;

&lt;h3&gt;
  
  
  Professional tier (add keyword + Velprove browser login monitor) for $99 to $199 plans
&lt;/h3&gt;

&lt;p&gt;At the $99 to $199 band the client expects you to catch problems they would not catch themselves. That means a keyword check on the homepage (so a hacked site replacing your client's pharmacy with theirs gets flagged), and a browser login monitor against wp-admin. The browser login monitor is the differentiator. It is the difference between "the site is up" and "the client can actually run their business."&lt;/p&gt;

&lt;h3&gt;
  
  
  Premium tier (add multi-step API checks + 30-second intervals) for $200+ plans
&lt;/h3&gt;

&lt;p&gt;At $200+ the client usually has WooCommerce, a membership system, or a custom API. Browser monitors with custom step sequences handle cart-to-checkout and member-login-to-dashboard flows when the path lives in the UI. Multi-step API monitors handle login-to-API-call chains where each step is an HTTP request and the next step uses a value extracted from the previous response. On Velprove Pro, HTTP monitors run as fast as 30 seconds and browser login monitors run every 5 minutes, which catches outages inside a single SLA window for the homepage and detects wp-admin lockouts within the same maintenance hour. Pager-grade routing is appropriate here. So is a branded status page on a custom domain, covered below for premium clients specifically (Pro includes 3 of these).&lt;/p&gt;

&lt;h2&gt;
  
  
  How do you monitor wp-admin with a real browser session?
&lt;/h2&gt;

&lt;p&gt;This is the lead angle and the section worth reading twice. Most monitoring tools probe wp-admin with curl. Curl gets a 200, the dashboard goes green, and the client is locked out. The fix is &lt;a href="https://velprove.com/blog/monitor-wordpress-login" rel="noopener noreferrer"&gt;monitoring wp-login.php with a real browser session&lt;/a&gt; that actually fills in the form, submits it, and waits for the wp-admin dashboard to render.&lt;/p&gt;

&lt;h3&gt;
  
  
  The cached-200 trap
&lt;/h3&gt;

&lt;p&gt;Here is what goes wrong. Your client runs Cloudflare in front of WP Engine. Cloudflare has a page-cache rule that strips cookies on certain paths. A misconfigured rule, or a too-eager "cache everything" toggle, can land wp-admin in the cache. Now &lt;code&gt;GET /wp-admin/&lt;/code&gt; returns a cached 200 with the dashboard HTML, but only because that HTML was captured when an admin was logged in. A non-authenticated request to the origin would have been a 302 to wp-login.php. Your HTTP probe sees the cached 200 and reports green. Real users see a redirect loop or a blank page. A real browser session does not get fooled. It loads the page in a fresh session, finds the login form, submits credentials, and only reports green if the wp-admin dashboard actually loads.&lt;/p&gt;

&lt;h3&gt;
  
  
  Handling WPS Hide Login and Solid Security custom login URLs
&lt;/h3&gt;

&lt;p&gt;WPS Hide Login (2 million+ active installs) lets your client move wp-login.php to &lt;code&gt;/secret-door&lt;/code&gt;. Solid Security Hide Backend lets you set a custom login URL. Both break naive monitors that hardcode &lt;code&gt;/wp-login.php&lt;/code&gt;. The right answer in your monitor configuration is to point the browser session at the actual custom URL the client is using, not to bypass the security plugin. Use a dedicated WordPress user with the Subscriber role for the monitor account. Subscriber has read-only dashboard access, which is enough to confirm login worked, and it cannot publish, install, or delete anything if the credential ever leaks. Never use a real admin credential for a monitor.&lt;/p&gt;

&lt;h3&gt;
  
  
  Why WordPress nonce expiry doesn't break a properly built login monitor
&lt;/h3&gt;

&lt;p&gt;WordPress nonces have a 12 to 24 hour lifespan, invalidated on logout. People assume nonce expiry will break a long-running login monitor. It does not, because each monitor run starts a fresh browser session. New cookie jar, new login form load, new nonce, new submit. The nonce on every run is fresh. The pitfall is monitors that try to be clever by reusing sessions. Do not reuse sessions. The cost of one extra login per run is milliseconds. The benefit is a monitor that does not silently break at hour 13.&lt;/p&gt;

&lt;h2&gt;
  
  
  Where ManageWP, MainWP, and WP Umbrella stop short
&lt;/h2&gt;

&lt;p&gt;This is not a knock on these tools. ManageWP is excellent for plugin updates and bulk backups. MainWP is excellent if you want a self-hosted control plane. WP Umbrella is excellent for client-friendly PDF reports. None of them, based on their public feature pages as of May 2026, run a real browser through wp-login.php to verify a working login. That is a real gap if you are selling premium-tier care plans where login uptime is the actual SLA.&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Tool&lt;/th&gt;
&lt;th&gt;Browser login monitor&lt;/th&gt;
&lt;th&gt;Per-client public status page&lt;/th&gt;
&lt;th&gt;Pricing model&lt;/th&gt;
&lt;th&gt;Interval floor&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;ManageWP Uptime Monitor&lt;/td&gt;
&lt;td&gt;No (HTTP + keyword)&lt;/td&gt;
&lt;td&gt;No&lt;/td&gt;
&lt;td&gt;$1/site/month&lt;/td&gt;
&lt;td&gt;60s&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;MainWP Advanced Uptime Monitor&lt;/td&gt;
&lt;td&gt;No (HTTP/Ping/Keyword)&lt;/td&gt;
&lt;td&gt;No&lt;/td&gt;
&lt;td&gt;Self-hosted, free&lt;/td&gt;
&lt;td&gt;5min&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;WP Umbrella&lt;/td&gt;
&lt;td&gt;No (HTTP)&lt;/td&gt;
&lt;td&gt;No (white-label PDFs only)&lt;/td&gt;
&lt;td&gt;€1.99/site/month&lt;/td&gt;
&lt;td&gt;Continuous&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Velprove&lt;/td&gt;
&lt;td&gt;Yes (real browser through wp-login.php)&lt;/td&gt;
&lt;td&gt;Yes (3 on Pro, custom domain)&lt;/td&gt;
&lt;td&gt;Flat ($0 / $19 / $49)&lt;/td&gt;
&lt;td&gt;HTTP 30s, browser 5min (Pro)&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;The pricing model row matters more than agencies usually think. At 50 sites, $1/site/month is $50/month and grows with your book. A flat $49 Pro plan covers up to 100 monitors total, so a 50-site book with one HTTP monitor per site plus 10 browser login monitors on your premium clients fits cleanly. The cost-per-site keeps falling as you fill the plan. There is also the question of &lt;a href="https://velprove.com/blog/monitor-wordpress-uptime-without-plugins" rel="noopener noreferrer"&gt;monitoring WordPress without installing another plugin&lt;/a&gt; on every client site. ManageWP and MainWP both need a worker plugin. Velprove probes from outside, so there is no plugin install, no auto-update window to worry about, and no plugin-conflict surface to debug.&lt;/p&gt;

&lt;h2&gt;
  
  
  Branded status pages on your client's custom domain
&lt;/h2&gt;

&lt;p&gt;A branded status page on a client's own domain is the single highest-leverage thing you can add to a premium care plan. It moves the conversation from "is the site down?" to "here is the live status, here is the incident history, here is what we did about it." Clients stop asking. Account managers stop fielding the ask. The page does the work. Velprove Pro includes 3 status pages with custom-domain support, which is the right shape for an agency: branded pages for your top 3 premium clients, and a single agency-internal dashboard for the rest of the book.&lt;/p&gt;

&lt;h3&gt;
  
  
  What belongs on a client-facing status page
&lt;/h3&gt;

&lt;p&gt;Less than you think. Current status (up, degraded, or down), the last 7 days of uptime, the last 3 incidents with a one-line summary, and a link to subscribe. That is it. Do not show the client your raw probe data. Do not show them a flame graph. Do not show them which monitor fired. Show them what they would tell their boss; &lt;a href="https://velprove.com/blog/public-status-page-guide" rel="noopener noreferrer"&gt;what belongs on a public status page&lt;/a&gt; goes deeper on the layout decisions.&lt;/p&gt;

&lt;h3&gt;
  
  
  Custom domain + logo + incident history
&lt;/h3&gt;

&lt;p&gt;The status page should live at &lt;code&gt;status.clientdomain.com&lt;/code&gt;, not &lt;code&gt;status.youragency.com/clients/acme&lt;/code&gt;. A custom subdomain on the client's own domain reinforces that you are an extension of their team. Add their logo, pick a light or dark theme, and let the page show its rolling 30-day incident history. The incident history is the part clients actually screenshot for their own internal stakeholders. Make sure each incident has a clear start time, end time, and a one-paragraph postmortem.&lt;/p&gt;

&lt;h3&gt;
  
  
  Which 3 clients get the dedicated page
&lt;/h3&gt;

&lt;p&gt;A $30/month care-plan client does not need a branded status page. They need an email when the site goes down. Save the dedicated custom-domain pages for your top premium-tier clients, the ones whose internal stakeholders ask "how is the site doing" on a regular cadence and screenshot the status page for their own boss. The other 47 clients in the book get a single agency-branded internal status page (or a tagged group view inside the dashboard) that the agency team uses for triage, not for client-facing reporting. The 3-page cap on Pro is enough because most agency books have a 3-tier shape, and only the top tier actually pays for the white-glove deliverable.&lt;/p&gt;

&lt;h2&gt;
  
  
  How do you route alerts across 50+ sites without alarm fatigue?
&lt;/h2&gt;

&lt;p&gt;Alarm fatigue is the silent killer of agency monitoring. If every alert pages everyone, the team starts ignoring alerts. The first time a real outage gets ignored, you have lost the client.&lt;/p&gt;

&lt;h3&gt;
  
  
  Per-monitor channel mix and the per-account-manager workaround
&lt;/h3&gt;

&lt;p&gt;Each monitor in Velprove can pick its alert channels independently: email, Slack, Discord, Teams, PagerDuty, or a custom webhook. Slack, Discord, Teams, and PagerDuty each take a single destination configured once at the account level, so Velprove on its own routes every Slack alert to the same workspace channel. If your team needs per-account-manager Slack channels (Account Manager A's channel for Acme, AM B's for Beta), point Velprove's custom webhook at a small router (Zapier, n8n, or a 50-line agency proxy) that fans out by monitor name prefix. Email destinations are configured per user account, which covers solo or small teams cleanly.&lt;/p&gt;

&lt;h3&gt;
  
  
  Routing by criticality (page only for premium-tier sites)
&lt;/h3&gt;

&lt;p&gt;Tag-based routing pays off here. p1 monitors page on-call through PagerDuty. p2 monitors post to a team Slack channel. p3 monitors send a daily digest email. A $30/month care-plan site at 3am is not a paging event. A $400/month WooCommerce site at 3am absolutely is. The routing rules should encode that distinction so a tired on-call engineer never has to make the call manually. The same logic applies to &lt;a href="https://velprove.com/blog/monitor-whmcs-portal" rel="noopener noreferrer"&gt;the WHMCS client portal monitoring guide&lt;/a&gt; if you also run hosting reseller infrastructure.&lt;/p&gt;

&lt;h3&gt;
  
  
  Pausing monitors during planned maintenance
&lt;/h3&gt;

&lt;p&gt;Tuesday 2am plugin update window? Pause the relevant monitors before the window opens, re-enable when it closes. Velprove's active toggle on each monitor handles this manually today. Build the on-and-off into your maintenance runbook so the alert silence is a deliberate step in the change, not a thing someone has to remember at 1:55am. The rule of thumb: if you know in advance, pause in advance. If you pause after the alert fires, you have already woken someone up.&lt;/p&gt;

&lt;h2&gt;
  
  
  How do you send monthly client uptime reports at scale?
&lt;/h2&gt;

&lt;p&gt;Monthly is the most common cadence for client uptime reports. Reports are how you prove the value the client is paying for. A care plan without a monthly report is a care plan the client forgets they are paying for, until renewal time.&lt;/p&gt;

&lt;h3&gt;
  
  
  What clients actually read
&lt;/h3&gt;

&lt;p&gt;Clients read the headline number and the incident summary. That is it. The headline is "99.97% uptime this month." The incident summary is "one incident, 12 minutes, caused by a hosting provider outage in Frankfurt, resolved automatically." A 14-page report with response-time histograms gets skimmed for 8 seconds and filed in a folder nobody opens. Lead with the headline. Let the detail live in an appendix.&lt;/p&gt;

&lt;h3&gt;
  
  
  The status page IS the report
&lt;/h3&gt;

&lt;p&gt;Most agencies overbuild the monthly report. The lowest-overhead version: send the client a one-paragraph email on the first of the month with the headline uptime number, the count and total minutes of incidents, and a link to their custom-domain status page where the incident history lives. The page is already up to date, it is the live artifact, not a snapshot. The email is a summary on top. Total time per client: under 5 minutes when the per-client status page is already running. Velprove does not auto-generate the email itself today, you write the paragraph from the dashboard data and send through your normal client-comms channel.&lt;/p&gt;

&lt;h3&gt;
  
  
  Weekly cadence as premium-tier differentiator
&lt;/h3&gt;

&lt;p&gt;For premium-tier clients, send the same shape weekly instead of monthly. The cost is the same once the workflow is templated, you reuse the headline-paragraph format and let the page do the rest. The client gets a Friday-afternoon email with the week's uptime number and any incidents, plus the link to their status page. This positions the premium tier as "you hear from us proactively" versus the basic tier where the client only hears from you when something is wrong.&lt;/p&gt;

&lt;h2&gt;
  
  
  Free tier as the WordPress agency monitoring starting point
&lt;/h2&gt;

&lt;p&gt;Velprove's free tier exists for exactly this kind of pilot. Free includes one browser login monitor (15-minute interval), multi-step API monitors (3 steps), and 5 regions. There is no credit card on signup, and &lt;a href="https://velprove.com/blog/uptimerobot-commercial-alternative" rel="noopener noreferrer"&gt;the commercial-use side of free uptime monitoring&lt;/a&gt; is allowed.&lt;/p&gt;

&lt;h3&gt;
  
  
  How to pilot Velprove on 5 sites
&lt;/h3&gt;

&lt;p&gt;Pick 5 representative sites: one basic-tier client, two professional-tier clients, two premium-tier clients. Set up HTTP monitors for all 5. Add the free tier's one wp-admin browser login monitor to your hardest-to-monitor premium client (the one whose admin you actually worry about). Add a multi-step API monitor for one premium client's API or login flow. Run for 14 days. Compare the alert quality to whatever you are using today. The browser login monitor is usually the moment the decision gets made, and Starter at $19 unlocks more browser slots once the workflow is proven.&lt;/p&gt;

&lt;h3&gt;
  
  
  When to graduate to Starter or Pro
&lt;/h3&gt;

&lt;p&gt;Starter at $19/month makes sense around 15 to 25 sites or when you need 1-minute HTTP intervals (Starter includes 25 monitors, 3 browser login monitors at 10-minute intervals). Pro at $49/month makes sense at 30+ sites, when you need 30-second HTTP intervals on premium clients, when you want up to 10 browser login monitors, or when you need branded status pages on custom domains. Pro includes 100 monitors, 10 browser login monitors at 5-minute intervals, and 3 status pages. The break-even versus per-site pricing happens fast. At 50 sites, ManageWP's $1/site/month is $50/month for HTTP-only monitoring with no browser login monitor and no branded status pages. Pro at $49/month covers all of it.&lt;/p&gt;

&lt;h2&gt;
  
  
  Frequently Asked Questions
&lt;/h2&gt;

&lt;h3&gt;
  
  
  Can I monitor 50+ WordPress sites on the free tier?
&lt;/h3&gt;

&lt;p&gt;Velprove's free tier is built for piloting, not for running an entire agency book. You can absolutely set up 5 to 10 sites on free to validate the workflow, test the browser login monitor against your client stack, and prove the alerting routes work. For 50+ sites you will want Starter at $19 or Pro at $49 because of monitor count and interval-floor needs. The honest answer: pilot on free, graduate to paid once the workflow is proven. Commercial use is allowed on every tier.&lt;/p&gt;

&lt;h3&gt;
  
  
  How do I monitor wp-admin if my client uses WPS Hide Login or Solid Security?
&lt;/h3&gt;

&lt;p&gt;Point your Velprove browser login monitor at the actual custom URL the security plugin is using, not at &lt;code&gt;/wp-login.php&lt;/code&gt;. The monitor logs in through the same door real users do. WPS Hide Login (2 million+ active installs) and Solid Security Hide Backend both let you set a custom login URL. The monitor configuration stores that URL, plus a low-privilege Subscriber-role test account. Never use a real admin credential. If the credential ever leaks, a Subscriber role cannot publish, install, or delete anything.&lt;/p&gt;

&lt;h3&gt;
  
  
  How do I avoid alert fatigue when monitoring this many sites?
&lt;/h3&gt;

&lt;p&gt;Three rules. First, route by criticality: p1 sites page on-call, p2 sites post to Slack, p3 sites send a daily digest. Second, pick the channel mix per monitor so basic-tier sites never trigger your PagerDuty rotation. Third, pause monitors before scheduled maintenance windows so plugin updates and redesign launches do not generate noise (Velprove's active toggle is a manual flip today). Together these rules turn a 50-site book from a 3am pager nightmare into a system where alerts that fire are alerts that matter.&lt;/p&gt;

&lt;h3&gt;
  
  
  What's the cost per site compared to ManageWP or WP Umbrella?
&lt;/h3&gt;

&lt;p&gt;ManageWP is $1/site/month for uptime monitoring, so 50 sites is $50/month. WP Umbrella is €1.99/site/month, so 50 sites is around €100/month. Velprove Pro is a flat $49/month and includes 100 monitors, 10 browser login monitors, 3 branded status pages, and 30-second HTTP intervals. The math fits a 50-site book if you run an HTTP monitor on every site, layer browser login monitors on your top 10 premium accounts, and reserve the 3 branded status pages for your top 3 white-glove clients. Pro lands at roughly the same cost as ManageWP's HTTP-only tier and roughly half the cost of WP Umbrella, with capabilities neither of them include.&lt;/p&gt;

&lt;h3&gt;
  
  
  Can I give each client their own white-label status page?
&lt;/h3&gt;

&lt;p&gt;Pro includes 3 branded status pages with custom-domain support (so the page lives at &lt;code&gt;status.clientdomain.com&lt;/code&gt;, not on a Velprove subdomain), client logo upload, a light or dark theme, and a rolling 30-day incident history on the public page. That is the right shape for an agency: white-glove pages for your top 3 premium clients, and a single agency-internal dashboard or shared status page for the rest of the book. The page shows current status, recent uptime, and incident summaries written for non-technical readers. This is a real differentiator at the premium care-plan tier and a feature that ManageWP, MainWP, and WP Umbrella do not include based on their public feature pages as of May 2026.&lt;/p&gt;

&lt;h3&gt;
  
  
  Should I run weekly or monthly client uptime reports?
&lt;/h3&gt;

&lt;p&gt;Monthly is the most common cadence and the right baseline for basic and professional care-plan tiers. Weekly is a low-cost differentiator for premium-tier clients because it positions the relationship as proactive rather than reactive. Use the per-client status page as the live artifact and send a one-paragraph headline email referencing it (uptime number, incident count and total minutes, link to the page). Velprove does not auto-generate the client email today, agency operators write the paragraph from the dashboard data and send via their normal client-comms channel. Total time per client per report should be under 5 minutes.&lt;/p&gt;

&lt;p&gt;If you are running 30+ WordPress client sites and tired of HTTP probes that miss real failures, start a free Velprove account and pilot on a handful of sites this week. One browser login monitor, multi-step API monitors (3 steps), and 5-region coverage are all included on free. &lt;a href="https://velprove.com/signup" rel="noopener noreferrer"&gt;Start free&lt;/a&gt; or &lt;a href="https://velprove.com/pricing" rel="noopener noreferrer"&gt;see pricing&lt;/a&gt;.&lt;/p&gt;

</description>
      <category>monitoring</category>
      <category>webdev</category>
      <category>devops</category>
      <category>uptime</category>
    </item>
    <item>
      <title>Shopify Checkout Monitoring: Catch Silent Failures Fast</title>
      <dc:creator>velprove</dc:creator>
      <pubDate>Sun, 03 May 2026 14:00:04 +0000</pubDate>
      <link>https://dev.to/velprove/shopify-checkout-monitoring-catch-silent-failures-fast-mlg</link>
      <guid>https://dev.to/velprove/shopify-checkout-monitoring-catch-silent-failures-fast-mlg</guid>
      <description>&lt;p&gt;&lt;strong&gt;The short version:&lt;/strong&gt; A green Shopify status page does not prove your checkout works. To monitor Shopify checkout reliably, layer three monitor types: an HTTP body assertion on the cart and checkout pages, a multi-step API monitor on the storefront cart-create path, and a browser login monitor on a synthetic test customer. Velprove's free plan runs all three from 5 regions with 10 monitors total, no credit card, and commercial use allowed. &lt;a href="https://velprove.com/signup" rel="noopener noreferrer"&gt;Start for free.&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Shopify checkout monitoring is the practice of running scheduled synthetic checks against the cart page, the checkout page, the storefront API cart-create path, and the authenticated customer session, so per-store failures are detected within minutes rather than from a customer ticket. Checkout is the highest-leverage URL on your store and the easiest one to break without noticing. Your homepage is static and cached. Your product pages are template-driven and well-trodden. Your checkout flow touches a hosted Shopify component, every app you have installed that hooks into the cart or the order, your theme's custom Liquid, your shipping rate calculator, your discount and inventory rules, and the payment gateway your customers actually pay through. Any one of those can fail while the rest of the store keeps looking healthy.&lt;/p&gt;

&lt;p&gt;This walks through why Shopify's own status page is structurally unable to detect per-store checkout failures, the real failure modes Shopify merchants have documented in the wild, what to actually monitor in three layers, and how to set the whole thing up on Velprove's free plan in about fifteen minutes.&lt;/p&gt;

&lt;h2&gt;
  
  
  Why Shopify's status page does not tell you your checkout works
&lt;/h2&gt;

&lt;p&gt;Shopify runs one of the more transparent status pages in commerce. That is exactly why it is so easy to misread. The page tracks platform-level component health from Shopify's perspective. It does not track your store's checkout from your customer's perspective, and Shopify is upfront about that.&lt;/p&gt;

&lt;h3&gt;
  
  
  Checkout and Storefront are tracked independently
&lt;/h3&gt;

&lt;p&gt;On &lt;a href="https://www.shopifystatus.com/" rel="noopener noreferrer"&gt;Shopify's status page&lt;/a&gt; (verified 2026-05-03), Checkout and Storefront are listed as separate components, alongside Admin, API and Mobile, Third party services, Reports and Dashboards, Point of Sale, Oxygen, and Support. They each have their own status pill. They can each be Operational, Degraded Performance, Partial Outage, Major Outage, or Maintenance, and they regularly disagree. A green Storefront pill tells you nothing about Checkout, and even a green Checkout pill is a global average.&lt;/p&gt;

&lt;h3&gt;
  
  
  Shopify literally tells you the status page misses per-store outages
&lt;/h3&gt;

&lt;p&gt;At the bottom of &lt;a href="https://www.shopifystatus.com/history" rel="noopener noreferrer"&gt;Shopify's own status history page&lt;/a&gt; (verified 2026-05-03) is the line: * "Some issues affecting a small percentage of stores may not be reflected here." * Read that twice. The platform tracks platform-wide outages, not per-store ones. If your specific checkout breaks because of an app you installed last week, a theme change you pushed yesterday, or a misconfigured shipping rule, Shopify's status page is not going to tell you. By design.&lt;/p&gt;

&lt;h3&gt;
  
  
  The Cyber Monday 2025 paradox
&lt;/h3&gt;

&lt;p&gt;On December 1, 2025, Shopify's admin and Point of Sale went down for several hours. Per &lt;a href="https://www.pymnts.com/news/ecommerce/2025/shopify-outage-locks-merchants-out-on-cyber-monday" rel="noopener noreferrer"&gt;PYMNTS coverage of the Cyber Monday 2025 outage&lt;/a&gt; (verified 2026-05-03), "merchants worldwide lost access to their admin dashboards and point-to-sale systems" while  &lt;em&gt;"Checkout remained fully operational."&lt;/em&gt; The point is not that Shopify went down. The point is that components fail independently. Admin was on fire, checkout was fine. The same structural fact runs in reverse: checkout can be broken on your store while admin and the storefront are completely healthy. A monitor that only watches the homepage cannot tell you the difference.&lt;/p&gt;

&lt;h2&gt;
  
  
  How Shopify checkout actually breaks (real failure modes with real evidence)
&lt;/h2&gt;

&lt;p&gt;Shopify checkout broken silently is the failure mode this section catalogues. It happens when something specific to your store, not to Shopify, fails. Five patterns show up over and over in the community forums and the dev docs.&lt;/p&gt;

&lt;h3&gt;
  
  
  Third-party app conflicts
&lt;/h3&gt;

&lt;p&gt;The cleanest documented case is a &lt;a href="https://community.shopify.com/t/checkout-button-not-working/392987" rel="noopener noreferrer"&gt;Shopify community thread tracing a dead Buy button to a third-party app&lt;/a&gt; (verified 2026-05-03). A merchant posted on February 8, 2025 that "the Checkout button suddenly stopped working on both mobile as well as Desktop in the last two days." A community helper diagnosed the root cause: the CODKing app was authenticating with an invalid storefront token, and the console logs "show errors related to authentication issues with the storefront token." The app was installed and silently broke the checkout button for every visitor. Shopify's status page did not flag it. Shopify could not have flagged it. The failure was per-store and app-specific. A Shopify checkout app conflict like this is exactly the class of failure a layered monitor catches and the platform status page cannot.&lt;/p&gt;

&lt;h3&gt;
  
  
  Theme-injected JavaScript breaking the Buy button
&lt;/h3&gt;

&lt;p&gt;A theme update or a custom Liquid edit can inject or remove JavaScript that the cart and checkout flow depends on. The page renders, the button is visible, but the click handler is broken or the form submission silently fails. Shopify's own troubleshooting guidance for this class of failure is to test the affected page on the default theme to isolate theme-injected code as the cause. The problem is that you only think to do that after a customer tells you sales stopped. A monitor that clicks the button in a real browser catches the regression on the next 15-minute check.&lt;/p&gt;

&lt;h3&gt;
  
  
  Payment gateway misconfiguration
&lt;/h3&gt;

&lt;p&gt;Shopify Payments is the built-in option, but plenty of stores route some or all of their volume through Stripe direct, PayPal, Klarna, Affirm, or a regional gateway. Each one is a separate API account with its own keys, webhooks, and configuration. A revoked API key, a rotated webhook secret, a payment method disabled by accident, or a regional outage on the gateway side can all silently break checkout for the customers who happen to pick that payment method. The remediation for the webhook half of this story is covered in &lt;a href="https://velprove.com/blog/monitor-stripe-webhooks" rel="noopener noreferrer"&gt;monitor your Stripe webhooks end to end&lt;/a&gt; . The detection half is what this post is for.&lt;/p&gt;

&lt;h3&gt;
  
  
  Shipping rate calculator timeouts
&lt;/h3&gt;

&lt;p&gt;If you use a third-party carrier app to calculate shipping rates at checkout, that app has a hard read-timeout budget Shopify enforces. Per &lt;a href="https://shopify.dev/changelog/dynamic-timeout-for-carrierservice-api-rate-requests" rel="noopener noreferrer"&gt;documented carrier read-timeout values of 3, 5, and 10 seconds&lt;/a&gt; (verified 2026-05-03), the timeout depends on your store's rates-per-minute traffic: 10 seconds under 1500 RPM, 5 seconds between 1500 and 3000 RPM, and 3 seconds over 3000 RPM. If your carrier app cannot answer in time, the customer sees no shipping option and cannot proceed. Shopify's own &lt;a href="https://shopify.dev/docs/apps/build/checkout" rel="noopener noreferrer"&gt;checkout extensibility model&lt;/a&gt; documents the same risk and recommends backup rates so checkout is not blocked when an external call times out. That recommendation is only as good as the merchant's awareness, and a monitor is what gives you that awareness.&lt;/p&gt;

&lt;h3&gt;
  
  
  Discount, inventory, and shipping rule edge cases
&lt;/h3&gt;

&lt;p&gt;An April 2026 Shopify community thread documented a store where checkout failed with the error * "Shipping not available: Items in the cart do not meet price or weight requirements..." * for entire categories of orders. The shipping zone configuration was correct in the admin and looked correct in preview, but at checkout the rule did not match real-world cart contents. The merchant found out from a customer ticket. A multi-step API monitor that creates a cart with a representative product mix would have caught it on the next run.&lt;/p&gt;

&lt;h2&gt;
  
  
  What standard Shopify uptime monitors miss
&lt;/h2&gt;

&lt;p&gt;A standard ping monitor sends an HTTP request to your storefront URL and reads the status code. If the server responds with 200 OK, the monitor passes. That is a useful check for storefront availability, and it is the right tool for &lt;a href="https://velprove.com/blog/monitor-shopify-store-uptime" rel="noopener noreferrer"&gt;monitor your Shopify storefront uptime&lt;/a&gt; . It is the wrong tool for checkout.&lt;/p&gt;

&lt;h3&gt;
  
  
  HTTP 200 on the storefront does not prove checkout works
&lt;/h3&gt;

&lt;p&gt;Your homepage returns 200. Your product pages return 200. Your cart page returns 200. Your checkout page returns 200. The HTML renders on every one. But the Buy button is dead because of an app conflict, the cart-create API is timing out, or a payment gateway is rejecting the session. A status-only monitor stays green the entire time. The same trap applies to &lt;a href="https://velprove.com/blog/monitor-woocommerce-checkout" rel="noopener noreferrer"&gt;the same problem affects WooCommerce checkout&lt;/a&gt; . Status codes prove the server responded. They do not prove the application works.&lt;/p&gt;

&lt;h3&gt;
  
  
  The merchant report no one wants to be
&lt;/h3&gt;

&lt;p&gt;On a December 6, 2025 Shopify community post, a merchant reported spending ad money driving customers to a checkout that would not function: customers could not insert a shipping address, the Buy Now button did not work, and in some cases customers were charged but the order never appeared in Shopify. A follow-up commenter described "orders suddenly stop for 6-12 hours at a time despite confirmed high traffic." Shopify orders not coming through despite paid traffic is the failure mode this post is for. Money taken, no order created, no alert anywhere, because the storefront returned 200 OK the whole time.&lt;/p&gt;

&lt;h2&gt;
  
  
  What to actually monitor (the three layers)
&lt;/h2&gt;

&lt;p&gt;Reliable Shopify checkout failure detection needs three layers. Each layer catches a class of failure the others cannot. Together, on Velprove's free plan, they fit inside the 10-monitor cap with room to spare.&lt;/p&gt;

&lt;h3&gt;
  
  
  Layer 1: Browser login monitor on a synthetic test customer
&lt;/h3&gt;

&lt;p&gt;Velprove offers a &lt;a href="https://velprove.com/blog/monitor-saas-login-page" rel="noopener noreferrer"&gt;free browser login monitor&lt;/a&gt; that signs into your Shopify store as a synthetic test customer in a real browser, then verifies the authenticated session loaded correctly (a known piece of post-login text is visible, an expected element is on the page, or the URL settled where it should). This is the differentiator and the layer that catches the most failures. It runs the JavaScript your customers run. It loads the apps your customers load on the login and post-login pages. It catches the CODKing-style app conflicts and theme JavaScript regressions that break the storefront login flow itself, which no HTTP check can see. Velprove's free plan includes one browser login monitor running every 15 minutes from any one of 5 regions, with commercial use allowed.&lt;/p&gt;

&lt;h3&gt;
  
  
  Layer 2: Multi-step API monitor against your storefront API
&lt;/h3&gt;

&lt;p&gt;A &lt;a href="https://velprove.com/blog/multi-step-api-monitoring-guide" rel="noopener noreferrer"&gt;multi-step API monitoring&lt;/a&gt; chain proves the cart and checkout API path independently of the rendered HTML. This is the layer for shopify payment not working monitoring scenarios where the gateway-side cart-create call fails before the customer ever sees the payment form. Three steps fit on the free plan: fetch a product by handle from your Storefront API, create a cart with that product, then read the cart back and assert the line-item count. If your storefront API is down, your discount engine is misbehaving, or your inventory subsystem is out of sync, the multi-step run fails on the assertion in the third step.&lt;/p&gt;

&lt;h3&gt;
  
  
  Layer 3: HTTP monitors on cart and checkout URLs with body assertions
&lt;/h3&gt;

&lt;p&gt;HTTP monitors with body assertions are the cheapest layer to run and the right baseline for your cart and checkout URLs. Assert on the actual button text the page must contain ("Check out" on the cart, "Continue to shipping" or your checkout button text on the checkout page). Generic strings like "Shopify" pass on error pages too. Specific button text does not.&lt;/p&gt;

&lt;h2&gt;
  
  
  How to set up Shopify checkout monitoring in Velprove
&lt;/h2&gt;

&lt;p&gt;Six steps, about fifteen minutes, free plan, no credit card. The HowTo schema embedded in this page mirrors the steps below verbatim.&lt;/p&gt;

&lt;h3&gt;
  
  
  Step 1: Sign up and create your free Velprove account
&lt;/h3&gt;

&lt;p&gt;&lt;a href="https://velprove.com/signup" rel="noopener noreferrer"&gt;Create a free Velprove account&lt;/a&gt;. No credit card, commercial use allowed. The free plan includes 10 monitors total across HTTP, API, Multi-Step, and Browser Login monitor types, all 5 regions, and email alerts. For Shopify checkout, the three you actually want are HTTP for the storefront, Multi-Step for the Storefront API cart-create path, and Browser Login for the customer account flow.&lt;/p&gt;

&lt;h3&gt;
  
  
  Step 2: Add HTTP monitors on your cart and checkout URLs with body assertions
&lt;/h3&gt;

&lt;p&gt;Create one HTTP monitor pointed at &lt;code&gt;yourstore.com/cart&lt;/code&gt; and another at &lt;code&gt;yourstore.com/checkout&lt;/code&gt;. On each, add a body assertion for the actual button text the page must always contain. "Check out" on the cart, "Continue to shipping" or your checkout button text on the checkout page. Specific button text catches the case where the page renders 200 OK with a broken button DOM. Generic strings do not.&lt;/p&gt;

&lt;h3&gt;
  
  
  Step 3: Add a multi-step API monitor against your storefront API
&lt;/h3&gt;

&lt;p&gt;Open the Multi-Step builder and configure three steps on the free plan. Step 1 calls your Storefront API to fetch a product by handle, extracting the variant ID with &lt;code&gt;{{variableName}}&lt;/code&gt; syntax. Step 2 POSTs the cartCreate mutation with that variant. Step 3 reads the cart back and asserts on the line-item count. This is the layer that catches a cart-create API regression even when the rendered HTML keeps returning 200.&lt;/p&gt;

&lt;h3&gt;
  
  
  Step 4: Add a free browser login monitor for a synthetic test customer
&lt;/h3&gt;

&lt;p&gt;Create a Shopify customer account dedicated to monitoring. A regular customer account, never an admin or staff login. In Velprove, click new monitor, pick the Browser Login type, paste your account login URL (typically yourstore.com/account/login), and supply the test customer's email and password. Velprove runs a real browser for every check and signs in as the test customer. Add a success indicator on something only authenticated customers see: a piece of text in the header (the customer's name, "My Account", "Logout"), a CSS selector that is only visible when logged in, or a URL pattern the post-login page settles on. The free plan runs the browser login monitor every 15 minutes.&lt;/p&gt;

&lt;h3&gt;
  
  
  Step 5: Pick your region and review the 5-region distribution
&lt;/h3&gt;

&lt;p&gt;All five Velprove regions (North America, Europe, United Kingdom, Asia, Oceania) are available on every plan including free. Each browser login monitor runs from one region at a time. Pick the region closest to most of your customers. HTTP and multi-step API monitors can be distributed across regions on every plan, which is what catches the regional gateway and CDN failures that only affect part of your traffic.&lt;/p&gt;

&lt;h3&gt;
  
  
  Step 6: Configure alerts on email, Slack, or PagerDuty
&lt;/h3&gt;

&lt;p&gt;Email alerts are included on every plan, free included. Starter at $19/mo unlocks Slack, Discord, Teams, and outbound webhooks. Pro at $49/mo adds PagerDuty for on-call escalation. Pick the channel that wakes the right person up, not the channel they already ignore.&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Failure mode&lt;/th&gt;
&lt;th&gt;Status-only HTTP monitor&lt;/th&gt;
&lt;th&gt;Layered Shopify monitoring&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Storefront 200, Buy button click handler dead&lt;/td&gt;
&lt;td&gt;Misses&lt;/td&gt;
&lt;td&gt;Catches (browser monitor)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Third-party app breaks checkout JavaScript&lt;/td&gt;
&lt;td&gt;Misses&lt;/td&gt;
&lt;td&gt;Catches (browser monitor)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Cart-create API regression&lt;/td&gt;
&lt;td&gt;Misses&lt;/td&gt;
&lt;td&gt;Catches (multi-step API)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Checkout page renders error in body, returns 200&lt;/td&gt;
&lt;td&gt;Misses&lt;/td&gt;
&lt;td&gt;Catches (HTTP body assertion)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Carrier shipping app times out at checkout&lt;/td&gt;
&lt;td&gt;Misses&lt;/td&gt;
&lt;td&gt;Catches (browser monitor)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Storefront completely down (DNS, SSL, 5xx)&lt;/td&gt;
&lt;td&gt;Catches&lt;/td&gt;
&lt;td&gt;Catches&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;h2&gt;
  
  
  A note on what to monitor without breaking Shopify's terms
&lt;/h2&gt;

&lt;p&gt;Monitoring your own store is fine. Monitoring it sloppily is how you end up with a flagged account or a real charge nobody wanted to make. Three rules keep this clean.&lt;/p&gt;

&lt;h3&gt;
  
  
  Use a low-privilege test customer, never real admin or staff
&lt;/h3&gt;

&lt;p&gt;Create a regular Shopify customer account dedicated to monitoring. No staff role, no admin permissions, no access to anything beyond the storefront customer experience. If the monitor's credentials ever leak, the blast radius is one inert customer account that cannot do anything beyond browse and add to cart. Never wire your real admin login or a staff login into a monitor.&lt;/p&gt;

&lt;h3&gt;
  
  
  Use test mode payment gateway settings where available
&lt;/h3&gt;

&lt;p&gt;Velprove's browser login monitor itself never submits a payment, so this is mostly an "if you also build your own synthetic order test" note. If you do, Shopify Payments and most third-party gateways have a test mode that accepts test card numbers without ever creating a real charge. Configure the gateway in test mode for any custom monitoring you build, and never hand a monitor real card numbers.&lt;/p&gt;

&lt;h3&gt;
  
  
  Keep monitor frequency reasonable
&lt;/h3&gt;

&lt;p&gt;Every 5 to 15 minutes per layer is plenty for any store under a thousand orders a day. The free plan's 5-minute HTTP and multi-step intervals plus the 15-minute browser login monitor interval cover the layered approach without putting load on your store. Going faster on a busy storefront is what Starter and Pro are for, not what you need to start.&lt;/p&gt;

&lt;h2&gt;
  
  
  Frequently asked questions
&lt;/h2&gt;

&lt;h3&gt;
  
  
  How do I know if my Shopify checkout is broken right now?
&lt;/h3&gt;

&lt;p&gt;The most common causes are a per-store failure that does not surface on Shopify's global status page: a third-party app conflict, a theme JavaScript regression, a payment gateway misconfiguration, or a shipping rate calculator timeout. To know in real time, first check &lt;a href="https://www.shopifystatus.com/" rel="noopener noreferrer"&gt;shopifystatus.com&lt;/a&gt; for global Checkout component status, then run a layered scheduled monitor (HTTP body assertions on cart and checkout, a multi-step API call against your storefront API, and a browser login monitor on a synthetic test customer) every few minutes. Shopify's status page has a published disclaimer that some issues affecting a small percentage of stores may not be reflected there, so a layered monitor is the only reliable way to detect per-store failures.&lt;/p&gt;

&lt;h3&gt;
  
  
  Why does my Shopify checkout button not work for some customers?
&lt;/h3&gt;

&lt;p&gt;The most common causes are a third-party app injecting broken JavaScript (a documented case from the Shopify community traced a dead Buy button to the CODKing app's storefront-token errors), a theme update that altered the Buy button DOM, or a regional gateway issue. Monitoring from multiple regions with a real browser is what catches the regional and app-conflict variants.&lt;/p&gt;

&lt;h3&gt;
  
  
  Can Shopify report 100% uptime while my store is losing sales?
&lt;/h3&gt;

&lt;p&gt;Yes, you can monitor Shopify checkout uptime independently of your storefront. Shopify tracks Checkout and Storefront as separate components on its status page, and the disclaimer notes per-store issues may not surface at all. On December 1, 2025, Shopify's admin was down for hours while checkout itself stayed up. The inverse also happens, where the global storefront pill is green while a single store's checkout is dead from a bad app or theme. Velprove's layered checkout monitor reports per-store checkout uptime even when Shopify's global pill is green.&lt;/p&gt;

&lt;h3&gt;
  
  
  How do I test my Shopify checkout beyond the homepage?
&lt;/h3&gt;

&lt;p&gt;Three things. The cart and checkout pages with body assertions on the actual button text. A multi-step API monitor that fetches a product, creates a cart, and verifies line-item count. And a browser login monitor that signs in as a synthetic test customer in a real browser and verifies the authenticated session, since this catches the JavaScript and app-conflict failures on the login and post-login pages that HTTP monitors cannot see.&lt;/p&gt;

&lt;h3&gt;
  
  
  Will monitoring my own Shopify store violate Shopify's terms?
&lt;/h3&gt;

&lt;p&gt;Not if you do it sensibly. Use a low-privilege test customer account, not an admin or staff login. Keep monitor frequency reasonable (every 5 to 15 minutes is plenty). The Velprove browser login monitor itself stops at session verification, so it never submits an order. If you build any custom synthetic-checkout test on your own (for example a server-side script that hits the Storefront API), keep it short of payment authorization.&lt;/p&gt;

&lt;h3&gt;
  
  
  How fast can I detect a Shopify checkout failure with a free monitor?
&lt;/h3&gt;

&lt;p&gt;Velprove's free plan runs HTTP and multi-step API monitors every 5 minutes from 5 regions, and free browser login monitors every 15 minutes. That means a broken Buy button or a failed cart API call will trigger an email alert within 5 to 15 minutes, depending on which monitor catches it. Starter at $19/mo drops HTTP and multi-step intervals to 1 minute and browser login monitors to 10 minutes.&lt;/p&gt;

&lt;h2&gt;
  
  
  Stop being the last to know your checkout is broken
&lt;/h2&gt;

&lt;p&gt;Checkout is the page on your Shopify store that fails the most ways and gets the least monitoring. The fix is layered: HTTP body assertions on cart and checkout URLs, a multi-step API monitor on the cart-create path, and a browser login monitor that signs in as a synthetic test customer and verifies the authenticated session in a real browser. As of 2026, all three fit on Velprove's free plan with 10 monitors total, 5 regions (North America, Europe, United Kingdom, Asia, Oceania), email alerts, and commercial use allowed. Velprove is a Shopify checkout monitoring tool you can run free without a credit card. &lt;a href="https://velprove.com/signup" rel="noopener noreferrer"&gt;Set up free Shopify checkout monitoring with Velprove&lt;/a&gt; . If you also run a WooCommerce store, &lt;a href="https://velprove.com/blog/monitor-woocommerce-checkout" rel="noopener noreferrer"&gt;the same problem affects WooCommerce checkout&lt;/a&gt; and the same three-layer approach applies.&lt;/p&gt;

</description>
      <category>monitoring</category>
      <category>webdev</category>
      <category>devops</category>
      <category>uptime</category>
    </item>
  </channel>
</rss>
