velprove

Posted on May 5 • Originally published at velprove.com

How to Monitor a Next.js App in Production (2026)

#monitoring #webdev #devops #uptime

Quick take: A standard 200-OK uptime check on a Next.js app misses three real failure modes. Vercel ISR can serve stale content for an unbounded window when revalidation fails. Cold starts on archived functions add latency that can blow past your monitor timeout. Auth-protected routes can return 200 while the actual page is empty, redirected, or rendering an error boundary. The fix is layered: an HTTP monitor with a freshness assertion, a multi-step API monitor for token-auth API routes, and a browser login monitor for /dashboard. Velprove's free plan covers all three. No credit card required. Start for free.

Why a 200 OK on your Next.js app is not enough

Next.js sits on top of three things that a status-code monitor cannot see through: a caching layer that keeps serving old responses when revalidation fails, a serverless runtime that occasionally adds seconds to a response without breaking it, and an App Router error model that can wrap a thrown Server Component in a styled fallback page. Each of those returns a 200 to the curl request your monitor is doing every 5 minutes. None of them mean the application is healthy.

What does an HTTP monitor miss on a Next.js app? Three failure classes the framework introduces and a status-code check cannot see: ISR revalidation failures, cold-start latency on Vercel and Railway, and auth-protected routes that lie about their state. The same pattern that makes uptime monitors miss real outages on any framework gets worse on Next.js because the framework has more layers between the request and the response.

Monitoring Next.js ISR: when 200 OK serves stale content

Incremental Static Regeneration is the most monitor-blind feature in modern Next.js. The whole point of ISR is to keep serving the cached version while a revalidation runs in the background. When that background revalidation fails, the cached version keeps being served. Your reader sees a page from last Tuesday. Your monitor sees a 200.

Why ISR failures are invisible to status-code monitors

Per Vercel's ISR documentation , when a revalidation fails the platform "preserves the stale content and sets a 30-second TTL" before retrying. Failure is defined broadly: network timeouts, function execution errors, or any HTTP status outside the small allow-list of 200, 301, 302, 307, 308, 404, and 410. Every one of those failures is invisible at the HTTP layer of the next request, because the next request is still hitting the cached body. The retry loop runs every 30 seconds, indefinitely, until either the data source recovers or someone notices that the page is stale.

What to assert against

The monitor needs a signal that changes when the page is fresh. Three options work in practice. First, a literal date string the route prints in the body, asserted with a body-contains check that you update on a cadence you control. Second, a build ID or deploy SHA the route prints in the body, asserted with body-contains and updated whenever you deploy. Third, the x-nextjs-cache response header, which Next.js documents as taking the values HIT, STALE, MISS, or REVALIDATED, asserted with a header-contains check.

The most portable pattern is a build ID or deploy SHA in the body. The x-nextjs-cache header is reliable when you read the response directly from the Next.js server, but Vercel and other CDN layers in front of the app can strip or rewrite it before the response reaches your monitor. A value printed in the body travels everywhere the body travels.

What this looks like in Velprove

Configure an API monitor against a public ISR page or a small JSON route, then add a JSON path or body-contains assertion against a freshness signal that changes when the page is fresh, like a date string or a build SHA the route prints in the response. The full configuration walk-through, including how to monitor your /api/health route with JSON validation , is the right next read. The Next.js-specific piece is just deciding what value to put in the body.

Vercel cold starts and what to monitor

Cold starts on Vercel are not a bug; they are a property of the runtime. The monitor needs to know they exist so its timeout does not turn an occasional cold boot into a fake outage page.

Cold starts on Node vs. Edge runtime

Vercel's runtime documentation is direct about it: serverless applications "will always have the notion of cold starts." Fluid Compute, the default for new projects since April 23, 2025, reduces the likelihood of cold starts through optimized concurrency, but the docs note "it can still happen such as during periods of low traffic." The most concrete latency claim Vercel publishes is for archived functions, which are unarchived on first invocation and can take "at least 1 second longer than usual" on that boot. The Edge runtime is the architectural escape hatch: it is built on V8 isolates that "don't require a container or virtual machine," which removes the microVM startup cost. Edge has its own constraints, including no support for Cache Components, so the choice is route-by-route, not app-wide.

Preview deployments are not what you want to monitor

Preview URLs change every deploy, and the underlying functions are archived after 48 hours of inactivity, compared to 2 weeks for production functions. That archival window is short enough that a monitor pointed at a branch preview URL will hit a cold-start penalty on most checks during a quiet weekend. The result is a monitor that looks unhealthy whenever the team is not pushing, which is exactly when you want a clear signal. Monitor your production domain. Use preview URLs for human eyeballs and CI smoke tests, not for synthetic uptime checks. If you want to verify a preview deploy before promotion, run an ad-hoc check against the commit-specific URL and discard it after the preview is merged.

Monitoring auth-protected routes (the /dashboard problem)

Why your /dashboard route can lie to an HTTP monitor

There are three ways a Next.js /dashboard route can return 200 while being broken: a styled error boundary, a follow-redirect to the login page, or an empty shell from a Server Component that fetched zero rows. The response your monitor sees on a route like /dashboard varies based on your error.tsx boundary and your proxy configuration. A thrown Server Component error can render a styled error page inside an error response. A missing session can issue a redirect to the login page, which most monitors follow and report as a 200 on the login screen. A working build can render an empty shell because the Server Component fetched zero rows. None of these are caught by checking the status code.

Browser login monitors for Next.js auth

A browser login monitor signs in the way a user would: it opens your login page, types credentials, clicks submit, and asserts that a piece of post-login content actually rendered. That is the only check that distinguishes a working dashboard from a redirect-to-login or an empty error boundary. Use a test account, not a real admin account, and scope its permissions to read-only. The full setup, including selectors and assertions, is in the browser login monitor walkthrough for SaaS auth .

Multi-step API monitors for token-auth API routes

For App Router API routes that require a bearer token, chain a login → token → protected-call monitor that captures the token from the first response and replays it against the protected route.

Monitoring self-hosted Next.js and Railway deployments

Self-hosted Next.js

Running Next.js on your own Node.js process removes the platform layer but keeps the framework layer. The error.tsx boundary still wraps thrown Server Component errors. There is no built-in liveness route, so app/api/health/route.ts is your responsibility to write, populate, and keep honest. Auth gating typically lives in proxy.ts (formerly middleware.ts before the Next.js 16 rename), which runs on the Node.js runtime by default. Treat the proxy file as a routing concern that can fail like any other route, and make sure your monitor pattern catches a proxy that throws an error instead of redirecting cleanly.

Railway sleep and cold-boot

Railway services with Serverless enabled enter sleep mode when "no packets are sent from the service for over 10 minutes." The first request to a slept service wakes it, with a small delay that Railway describes as a "cold boot time." Two non-obvious gotchas matter for monitoring. First, the inactivity trigger is outbound packets, so a Next.js app that receives inbound traffic but does no outbound polling can still sleep. Second, your uptime monitor itself is inbound traffic, so a frequent monitor will keep the service awake and mask the cold-boot behavior your real users hit at midnight. Monitor at a realistic interval and accept the cold boot as part of the SLO.

What to monitor on every Next.js app, by route type

Different parts of a Next.js app fail in different ways, and they need different monitor types. Treating the whole app as one URL behind one HTTP monitor is the cheapest way to miss the failure modes covered above. The table below maps the five common route types to the monitor type that actually catches their failure modes.

Route type	Monitor type	What to assert
Static / ISR public pages	HTTP body assertion	Freshness signal (date string in body, or build id/deploy SHA in body)
`app/api/health/route.ts`	API monitor with JSON validation	`$.status equals "ok"` + response time threshold
`app/api/<resource>/route.ts` (auth required)	Multi-step API monitor	Login, call, assert
`/dashboard` and other Server Component pages behind auth	Browser login monitor	Sign in and assert post-login content
Webhook receivers ( `app/api/webhooks/.../route.ts`)	HTTP monitor + dead-letter alert	Status + log digest

Static and ISR pages are the ones most likely to silently rot, so they need the freshness assertion specifically. The dedicated /api/health route should return JSON with a status field plus the few subsystem flags you actually care about, monitored with a JSON path assertion and a response-time threshold tight enough to catch a slow database. Token-protected API routes need the chained multi-step pattern because a single GET cannot prove the auth flow still works end to end. Server Component pages behind auth need the browser monitor because no HTTP-level check can distinguish a real dashboard from a styled error page. Webhook receivers usually need a status-code monitor plus a separate alert on dead-letter queue depth, because the receiver can return 200 while quietly dropping payloads downstream.

Setting up a Next.js monitor in Velprove (free)

The free plan covers an HTTP monitor with body assertions, an API monitor with JSON path assertions, and one browser login monitor running every 15 minutes (or slower). That is enough to cover the three failure modes above for one production Next.js app. Every plan, including the free one, runs checks from all five regions: North America, Europe, UK, Asia, and Oceania.

Sign up for a free Velprove account. No credit card is required, and the free plan includes 10 monitors, 1 browser login monitor, and email alerts. ** Add an HTTP monitor pointing at your Next.js production domain. ** Use the production URL, not a Vercel preview URL, so cold starts on archived preview functions do not pollute the signal. ** Add a body or JSON path assertion on a freshness signal. ** A date string or build SHA printed in the response body works on any Next.js setup. Match the value with a body-contains assertion and update it when you deploy. ** Add a browser login monitor for your /dashboard route. ** Create a low-privilege test account first and use those credentials for the monitor, never a real admin login. Configure your alert channel. The free plan sends email alerts. Slack, webhook, Discord, and Microsoft Teams are available on the Starter plan at $19 per month, and PagerDuty is available on Pro at $49 per month. For non-email channels, paste the webhook URL or routing key into Settings, Notifications first, then pick that channel on the monitor.

Velprove runs on Next.js. We run an API monitor with body validation on /api/health, HTTP body-validation checks on our marketing pages and dashboard route, and a browser login monitor that signs in and asserts it landed on /dashboard, the same layered setup the post recommends, across all five regions. If you want the broader context outside the framework, the solo founder's broader monitoring playbook covers what to monitor across the rest of the stack. Otherwise, start for free and have the three monitors above running in about 10 minutes.

Frequently Asked Questions

How do you monitor a Next.js app in production?

Use a layered setup that matches the three failure modes the framework introduces. An HTTP monitor with a body assertion on a freshness signal catches ISR revalidation failures that return 200 with stale content. An API monitor with JSON path validation on app/api/health/route.ts catches subsystem failures. A browser login monitor on /dashboard catches auth-protected route errors that an HTTP check cannot see. The free plan on Velprove covers all three.

Does Next.js have a built-in health endpoint?

No. Next.js does not include a default liveness or readiness route, so you create your own at app/api/health/route.ts in the App Router. The recommended pattern is a small JSON response with a status field plus a few flags for the subsystems that actually matter for uptime, such as the database and any critical upstream dependency. The full design discussion is in the dedicated /api/health route guide.

What is the best way to detect Vercel ISR revalidation failures?

Status-code monitors miss it because Vercel keeps serving the existing cached 200 response and retries revalidation every 30 seconds in the background. The reliable signal is a freshness assertion: a timestamp embedded in the response body, a build-id header that changes per deploy, or the x-nextjs-cache response header set to HIT, STALE, MISS, or REVALIDATED. The body timestamp is the most portable across CDN configurations.

How do I monitor cold starts on Vercel?

Set your monitor timeout based on observed p95 plus headroom for cold-boot variance, not on Vercel's function maximum. Per Vercel docs, archived functions can take "at least 1 second longer than usual" on the first invocation after archival, and Fluid Compute reduces but does not eliminate cold starts. Monitor your production domain rather than preview URLs, since preview functions are archived after 48 hours of inactivity and will cold-boot on most checks.

Can I monitor a Next.js app on Railway or self-hosted?

Yes, with one platform-specific note for each. Railway services with Serverless enabled sleep after 10 minutes without outbound packets and incur a cold-boot delay on the first request that wakes them, so calibrate your monitor interval and timeout accordingly. Self-hosted Next.js behaves like any other Node.js HTTP service: the same three monitor types apply, and you remain responsible for writing the /api/health route and the proxy.ts auth gates yourself.

How do you monitor a Next.js auth-protected page?

A browser login monitor signs in as a dedicated low-privilege test user and asserts that an element from the post-login UI actually rendered. HTTP monitors cannot tell a working /dashboard apart from a login redirect or a styled error boundary, because all three can return 200. Always use a separate test account scoped to read-only permissions for monitoring, never a real admin login, and rotate the credentials on the same cadence as your other secrets.

DEV Community