Vincent Abolarin

Posted on Mar 21 • Originally published at crontify.com

The cron job that always succeeded and never worked

#cron #devops #node #webdev

Cron job monitoring usually answers one question: did the job run?

That's the wrong question.

Your job ran last night. Exited 0. No exceptions in the logs. Your uptime dashboard is green. And somewhere in your database, a table that should have 50,000 new rows from last night's sync has zero.

Not an error. Not a crash. A silence.

The job ran. It just didn't do anything. Maybe the upstream API returned an empty response. Maybe a config change stopped the data flowing. Maybe a schema migration broke a query in a way that returns zero rows instead of throwing. The job saw nothing wrong. So it reported nothing wrong.

You find out when a customer emails you.

This is a silent failure — and it is the hardest class of production failure to catch because the entire failure chain reports success. Your cron scheduler: success. Your job runner: success. Your cron job monitoring tool: success. Meanwhile your data is stale, your pipeline is broken, and nobody knows.

Why cron job monitoring misses silent failures

The standard model for cron job monitoring is heartbeat monitoring. Your job pings a URL when it runs. If the ping doesn't arrive within a grace period, you get an alert. Simple and effective for missed runs.

Some tools extend this to start/finish pings, which adds hung job detection — jobs that start but never complete. Valuable. But both approaches are still asking only one question: did the job execute?

Silent failures require asking a different question: did the job accomplish anything?

That requires data from inside the job. Data that only your code can provide. No external monitoring tool can know that your nightly sync processed zero records unless you tell it — which is exactly what Crontify's alert rules are designed for.

How to detect silent failures with alert rules

Crontify introduces alert rules on job output metadata.

When your job calls success(), you attach a metadata object describing what the run actually did:

import { CrontifyMonitor } from '@crontify/sdk';

const monitor = new CrontifyMonitor({
  apiKey: process.env.CRONTIFY_API_KEY!,
  monitorId: 'your-monitor-id',
});

await monitor.success({
  meta: {
    rows_processed: 0,
    records_synced: 0,
    api_calls_made: 14,
  }
});

You then define alert rules against that metadata in the dashboard:

rows_processed eq 0 → fire alert
records_synced lt 100 → fire alert
api_calls_made gt 1000 → fire alert

The run is logged as a success. Your historical success rate is untouched. But when a rule fires, Crontify sends an immediate alert to your configured channels — Slack, email, Discord, or webhook.

You can stack multiple rules per monitor and mix operators (eq, lt, gt, ne). This is silent failure detection built into the monitoring layer, where it belongs — not bolted onto your job's business logic.

The three failure types every cron job monitor should catch

Silent failures are the hardest to catch. The other three are the ones every developer knows they need:

Missed runs. Your job didn't start at all. Crontify parses your cron expression and knows when each run is expected. If no start ping arrives within your configured grace period, an alert fires. Works with any cron syntax including complex multi-part expressions and timezone-aware schedules.

Hung jobs. Your job started but hasn't finished within your threshold. Catches deadlocks, stuck database queries, and infinite loops that don't throw — the class of failure that keeps a process alive and doing nothing forever. Without start/finish pings, a standard heartbeat monitor can't distinguish "job finished in 2 seconds" from "job has been running for 6 hours".

Failed jobs. Your job explicitly called fail(), or the SDK caught an unhandled exception and reported it automatically.

Attaching log output to failed runs

When a cron job fails, you need context to diagnose it quickly. Crontify lets you attach the full log output directly to the failure ping:

try {
  await runJob();
  await monitor.success();
} catch (err) {
  await monitor.fail({ message: err.message, log: err.stack });
  throw err;
}

Up to 10,000 characters, stored separately and delivered directly in the Slack or email alert. No switching to your logging infrastructure at 2am. No searching through CloudWatch or Datadog to find what broke. The context is in the notification.

Instrumenting your jobs in under a minute

The SDK is zero-dependency TypeScript, published to npm as @crontify/sdk.

npm install @crontify/sdk

wrap() handles start, success, and fail pings automatically:

import { CrontifyMonitor } from '@crontify/sdk';

const monitor = new CrontifyMonitor({
  apiKey: process.env.CRONTIFY_API_KEY!,
  monitorId: 'your-monitor-id',
});

await monitor.wrap(async () => {
  const result = await syncDatabase();
  return {
    meta: {
      rows_processed: result.rowCount,
      duration_ms: result.durationMs,
    }
  };
});

If the function throws, fail() is called automatically with the error message. If it completes, success() is called with whatever metadata you return. If it never completes within your hung job threshold, the scheduler catches it on the next detection cycle.

If you manage multiple monitors in the same process, CrontifyClient caches instances by ID:

import { CrontifyClient } from '@crontify/sdk';

const crontify = new CrontifyClient({ apiKey: process.env.CRONTIFY_API_KEY! });

await crontify.monitor('mon_abc123').wrap(async () => { await syncUsers(); });
await crontify.monitor('mon_def456').wrap(async () => { await syncOrders(); });

Not using TypeScript? The ping endpoints are plain HTTP — /api/v1/ping/{id}/start, /success, /fail. Any language that can make an HTTP request works: Python, Go, Bash, PHP, Ruby.

Recovery alerts

Cron job monitoring isn't just about knowing when things break — it's about knowing when they're fixed.

When a monitor that was in a failing state receives a healthy ping, Crontify automatically sends a recovery notification to the same channels. You know when an incident starts. You know when it's over. You're not left refreshing the dashboard to find out if the fix worked.

Frequently asked questions

What is a silent failure in a cron job?

A silent failure is when a cron job completes without errors — exits 0, no exceptions — but fails to accomplish its intended purpose. For example, a sync job that processes zero records, an email job that sends to an empty list, or a cleanup job that deletes nothing because its filter condition is wrong. Standard monitoring treats these as successes because the job technically ran.

How do I monitor cron jobs in Node.js?

Install @crontify/sdk from npm, create a monitor in the Crontify dashboard, and wrap your job function with monitor.wrap(). The SDK handles start, success, and failure pings automatically. Full instrumentation takes under 60 seconds.

What is the difference between a missed run and a hung job?

A missed run means the job never started — no start ping arrived within the grace period after the scheduled time. A hung job means the job started but never finished — a start ping arrived, but no success or fail ping followed within the maximum duration threshold. Both require start/finish ping architecture to detect; a simple heartbeat monitor can only catch missed runs.

Does cron job monitoring work with languages other than JavaScript?

Yes. Any language that can make an HTTP POST request works with Crontify's ping API. The @crontify/sdk npm package is the easiest path for Node.js and TypeScript projects, but Python, Go, Ruby, PHP, Bash, and any other runtime can ping the three HTTP endpoints directly.

Start monitoring for free

Crontify is free to get started — 5 monitors, no credit card required.

crontify.com — SDK on npm as @crontify/sdk.

If silent failures are a gap in your monitoring, this is what Crontify was built to close.

DEV Community