quietpulse

Posted on Apr 30 • Originally published at quietpulse.xyz

Node.js Cron Job Monitoring Best Practices for Catching Silent Failures

#monitoring #devops #node #cron

Node.js cron job monitoring becomes important the first time a scheduled task quietly stops doing its job.

Your API can be healthy. Your frontend can load. Your uptime monitor can stay green. Meanwhile, a billing sync, cleanup task, report generator, or import job may have stopped running days ago.

That is the tricky part about cron-style work: the failure is often not visible from the outside.

The problem

Node.js scheduled jobs often run away from normal user requests.

They might handle:

daily email digests
payment retries
database cleanup
cache refreshes
scheduled notifications
data imports
report generation
third-party API syncs

When one of these breaks, there may be no customer-facing error at first. The job is simply missing.

That missing work can become stale data, failed billing, unprocessed records, or support tickets later.

Why it happens

Node.js cron jobs can break in obvious and non-obvious ways.

A simple job might look like this:

cron.schedule('0 * * * *', async () => {
  await syncCustomers();
});

This can fail because syncCustomers() throws. But scheduled jobs can also fail because:

the worker process crashed
the scheduler was not started after deploy
environment variables changed
the cron expression is wrong
the job hangs on an external API
database queries never return
the job overlaps with itself
multiple app instances run the same task
a server timezone changed
errors are caught and only logged

A common mistake is forgetting proper async handling:

cron.schedule('*/15 * * * *', () => {
  syncInventory(); // missing await / error handling
});

This can make production failures harder to notice.

Why it's dangerous

Missed scheduled jobs rarely create one neat incident.

They create slow damage.

A sync that fails once may not matter. A sync that fails for three days can create stale data, missing records, broken reports, or customer confusion.

The longer the issue continues, the more painful recovery becomes:

more data needs reprocessing
duplicate work becomes more likely
logs may rotate away
manual fixes become risky
customers may notice first

Uptime monitoring does not solve this. It tells you whether an endpoint responds. It does not tell you whether your scheduled jobs actually completed.

How to detect it

The core monitoring question is simple:

Did the job send a success signal within the expected time window?

This is usually called heartbeat monitoring.

The pattern is:

The scheduled job runs.
It completes the important work.
It sends a heartbeat ping.
A monitor expects that ping on schedule.
If the ping does not arrive, someone gets alerted.

For example:

a 15-minute job should check in every 15–20 minutes
an hourly job should check in every 60–70 minutes
a daily job should check in every 24–26 hours

This catches problems like missed runs, crashed workers, bad deploys, disabled schedulers, and jobs that hang before completion.

Simple solution

Here is a basic example using node-cron.

npm install node-cron

import cron from 'node-cron';

async function runJob() {
  console.log('Starting customer sync');

  await syncCustomers();

  await fetch('https://quietpulse.xyz/ping/{token}');

  console.log('Customer sync completed');
}

cron.schedule('0 * * * *', async () => {
  try {
    await runJob();
  } catch (error) {
    console.error('Customer sync failed:', error);
    process.exitCode = 1;
  }
});

The key detail: send the heartbeat after the work succeeds.

Do not do this:

await fetch('https://quietpulse.xyz/ping/{token}');
await syncCustomers();

If the sync fails after the ping, your monitor will think the job succeeded.

For older Node.js versions, use a small HTTP client:

npm install undici

import { fetch } from 'undici';

await fetch('https://quietpulse.xyz/ping/{token}');

You can also add a timeout:

async function sendHeartbeat() {
  const controller = new AbortController();
  const timeout = setTimeout(() => controller.abort(), 5000);

  try {
    await fetch('https://quietpulse.xyz/ping/{token}', {
      signal: controller.signal,
    });
  } finally {
    clearTimeout(timeout);
  }
}

Then call it after the job finishes:

async function runJob() {
  await syncCustomers();
  await sendHeartbeat();
}

Instead of building the monitoring side yourself, you can use a heartbeat monitoring service. The important part is the pattern: each successful job run should create an external signal, and missing signals should trigger alerts.

Common mistakes

1. Pinging too early

If you send a heartbeat before the real work, failures after that point are hidden.

Send the heartbeat after successful completion.

2. Relying only on process uptime

A process can be running while the scheduled task is broken.

PM2, Docker, systemd, or Kubernetes can tell you whether a process exists. They cannot always tell you whether a specific job completed.

3. Ignoring long runtimes

A job that usually takes 20 seconds but now takes 30 minutes may be failing in a slower way.

Long runtimes can cause overlap, stale data, and queue buildup.

4. Running jobs on every app instance

If your app runs on multiple servers and each one starts the scheduler, the same job may run multiple times.

Use a dedicated worker, external scheduler, or distributed lock when needed.

5. Swallowing errors

Logging errors is useful, but it is not the same as alerting.

try {
  await syncCustomers();
} catch (error) {
  console.error(error);
}

If nobody reads the logs, this is still a silent failure.

Alternative approaches

Logs

Logs are useful for debugging what happened. They are weaker at detecting something that never happened.

If the job never ran, there may be no log line.

Error tracking

Error tracking tools can catch thrown exceptions and rejected promises.

They help when a job starts and fails loudly. They do not catch every missed run, disabled scheduler, or stuck process.

Uptime checks

Uptime checks are great for websites and APIs.

They do not confirm that a background job completed.

Queue dashboards

If your scheduled job creates queue work, queue metrics can help. Watch queue depth, retries, failed jobs, and processing latency.

But queue metrics may not catch the scheduler failing to enqueue work in the first place.

Database timestamps

You can store last_success_at in your database.

This works, but you still need something that checks whether the timestamp is too old and sends an alert.

FAQ

What is Node.js cron job monitoring?

It is the practice of checking whether scheduled Node.js tasks run successfully when expected. This includes jobs for syncs, cleanup, billing, reports, imports, and other background work.

How do I detect if a Node.js cron job stopped running?

Send a heartbeat after each successful run. If the heartbeat does not arrive within the expected interval, alert someone.

Are logs enough for Node.js scheduled jobs?

No. Logs help with debugging, but they do not reliably detect missed runs. If the job never starts, logs may not show anything useful.

Should cron jobs run inside the main Node.js app?

For small apps, it can work. For production systems, a dedicated worker, external scheduler, or distributed lock is usually safer.

Conclusion

Node.js cron job monitoring is about detecting missing work, not just errors.

A scheduled job can stop running while the rest of your app looks healthy. Add a heartbeat after successful completion, alert when it goes missing, and you will catch silent failures much earlier.

Originally published at https://quietpulse.xyz/blog/node-js-cron-job-monitoring-best-practices

DEV Community