Ashish Singh

Posted on Mar 18

I Replaced Cron Jobs with Node Cron in My SaaS. Here's What Broke.

#javascript #webdev #devops #node

node-cron seems like the obvious choice when you're building a Node.js SaaS and need scheduled tasks. It integrates into your codebase, uses familiar cron syntax, and takes about five minutes to set up.

The problem is that it also fails in ways that are hard to notice until they hit production.

I used node-cron for scheduling digest emails, session cleanup, and third-party data syncs. Here's every problem I encountered, why it happened, and what I replaced it with.

What node-cron actually does

It runs cron jobs inside your Node process. That's the entire value proposition.

npm install node-cron

import cron from 'node-cron';

cron.schedule('0 8 * * *', () => {
  sendDailyDigestEmails();
});

No external scheduler. No separate process. The job lives and dies with your Node app.

That convenience also causes all the issues listed below.

Problem 1: Multiple instances run the same job

We were running two app instances for redundancy. Both had the same codebase. At 8 AM, both instances executed sendDailyDigestEmails() at the same time.

Users received two emails. Some received three during a period when we had three instances running. Support tickets came in immediately.

node-cron has no awareness of other instances. It does not coordinate. It does not lock. Every process with the same code runs the same job.

Fix: Distributed lock with Redis

Before executing any job, the instance attempts to acquire a lock. If it fails, another instance is already handling it.

import { createClient } from 'redis';
import cron from 'node-cron';

const redis = createClient({ url: process.env.REDIS_URL });
await redis.connect();

async function acquireLock(key, ttlSeconds) {
  const result = await redis.set(key, '1', {
    NX: true,       // Set only if key does not exist
    EX: ttlSeconds  // Auto-expire after TTL
  });
  return result === 'OK';
}

cron.schedule('0 8 * * *', async () => {
  const acquired = await acquireLock('lock:daily-digest', 300); // 5 min TTL

  if (!acquired) {
    console.log('Job already running on another instance. Skipping.');
    return;
  }

  await sendDailyDigestEmails();
});

The TTL is important. If the instance acquires the lock and crashes mid-execution, the lock does not stay permanently and block all future runs.

Problem 2: Jobs disappear on restart

We deploy multiple times a day. Every deployment restarts the Node process.

If a job was scheduled for 3 AM and a deployment happened at 2:58 AM, the job simply did not run. node-cron has no persistence. When the process restarts, the scheduler starts fresh from that moment.

For email digests, a missed run is an inconvenience. For billing jobs or third-party syncs, a missed run can cause real data inconsistencies.

Fix: BullMQ for critical jobs

BullMQ stores repeatable jobs in Redis. The schedule survives process restarts because it lives in Redis, not in memory.

import { Queue, Worker } from 'bullmq';

const emailQueue = new Queue('email-tasks', {
  connection: { url: process.env.REDIS_URL }
});

// Run once on startup. BullMQ deduplicates via jobId.
await emailQueue.add(
  'daily-digest',
  {},
  {
    repeat: { cron: '0 8 * * *' },
    jobId: 'daily-digest'
  }
);

const worker = new Worker('email-tasks', async (job) => {
  if (job.name === 'daily-digest') {
    await sendDailyDigestEmails();
  }
}, { connection: { url: process.env.REDIS_URL } });

Even if the app restarts 10 times before the scheduled time, the job still fires when it's supposed to.

After this change, I kept node-cron only for non-critical tasks where missing a run has no real impact, like clearing an in-memory cache.

Problem 3: Failed jobs produce no output

This is the one that cost us the most time.

cron.schedule('0 8 * * *', () => {
  sendDailyDigestEmails(); // Not awaited. No catch.
});

If this function throws, nothing happens. node-cron does not catch rejected promises. No error log. No alert. No retry. The next scheduled run attempts again, fails again, and you still have no idea.

We had a job silently failing for 11 days because a function was renamed during a refactor. The job ran on schedule, called undefined, threw an error, and the error was swallowed completely.

Fix: Wrap every callback in a guard function

async function runWithGuard(jobName, fn) {
  try {
    console.log(`[cron] Starting: ${jobName}`);
    const start = Date.now();
    await fn();
    console.log(`[cron] Completed: ${jobName} in ${Date.now() - start}ms`);
  } catch (err) {
    console.error(`[cron] Failed: ${jobName}`, err);
    await notifySlack(`Cron job failed: ${jobName}\n${err.message}`);
  }
}

cron.schedule('0 8 * * *', () => {
  runWithGuard('daily-digest', sendDailyDigestEmails);
});

Every job now produces a start log, a completion log with duration, or a failure log with a Slack notification. This one change alone caught five incidents that would have gone unnoticed.

Problem 4: Timezone mismatches

Our server runs on UTC. Most users are in IST (UTC+5:30) or US Eastern (UTC-5). We scheduled a morning email for 8 AM.

Indian users received it at 1:30 PM.

node-cron defaults to the server process's timezone. 0 8 * * * on a UTC server fires at 8 AM UTC, which is 1:30 PM IST. This is documented behavior, but it is easy to miss.

Fix: Set timezone explicitly per job

node-cron accepts a timezone option. Use it.

cron.schedule('0 8 * * *', () => {
  runWithGuard('morning-email-india', () => sendMorningEmail('IN'));
}, {
  timezone: 'Asia/Kolkata'
});

cron.schedule('0 8 * * *', () => {
  runWithGuard('morning-email-us', () => sendMorningEmail('US'));
}, {
  timezone: 'America/New_York'
});

If you need per-user timezone scheduling where each user gets an email at 8 AM their local time, cron is not the right tool. You need a job queue where each job carries timezone metadata, and you compute the UTC fire time per user. For region-level grouping, the timezone option works fine.

Problem 5: No visibility into what is scheduled

With a standard crontab, you can run crontab -l and see every registered job. With node-cron, there is nothing built-in. No dashboard. No last-run timestamp. No next-run time.

The 11-day silent failure mentioned above was only discovered because a user complained that data was not updating. It could have continued for weeks.

Fix: A job registry with a health endpoint

const jobRegistry = new Map();

function registerJob(name, schedule, fn, options = {}) {
  const task = cron.schedule(schedule, async () => {
    const meta = jobRegistry.get(name);
    meta.lastRunAt = new Date();
    meta.status = 'running';

    await runWithGuard(name, fn);

    meta.status = 'idle';
    meta.lastSuccessAt = new Date();
  }, options);

  jobRegistry.set(name, {
    schedule,
    status: 'idle',
    lastRunAt: null,
    lastSuccessAt: null,
    task
  });
}

registerJob('daily-digest', '0 8 * * *', sendDailyDigestEmails, {
  timezone: 'Asia/Kolkata'
});

app.get('/healthz/crons', (req, res) => {
  const jobs = Object.fromEntries(
    [...jobRegistry.entries()].map(([name, meta]) => [name, {
      schedule: meta.schedule,
      status: meta.status,
      lastRunAt: meta.lastRunAt,
      lastSuccessAt: meta.lastSuccessAt
    }])
  );
  res.json(jobs);
});

An uptime monitor hits /healthz/crons every five minutes. If any job's lastSuccessAt is older than its expected interval, something is broken. The setup takes 20 minutes. The value is immediate.

When node-cron is actually the right choice

node-cron is not a bad library. It has a clear use case.

It works well when:

You are running a single instance
Missing an occasional run has no real consequence
The task is lightweight and non-critical
You do not want to introduce Redis just for one job

It becomes a problem when:

You are running multiple instances
You deploy frequently
You need retry logic on failure
You need any observability without building it yourself

What I would set up from day one

BullMQ with repeatable jobs for anything business-critical: billing, emails, external API syncs
node-cron only for throwaway housekeeping that is safe to miss
Redis distributed lock on every node-cron job as a safety net
Job registry with health endpoint before anything reaches production
Every cron callback is async/await with try/catch, no exceptions

Setting up BullMQ with a repeatable job takes about 20 minutes. Debugging the double-email incident and writing apology emails to users took most of a workday. The tradeoff is obvious in hindsight.

If you are currently using node-cron in a multi-instance environment without locks or observability, that is worth fixing before the next incident surfaces it for you.

Curious if anyone has moved to Temporal or Inngest for scheduled jobs in Node. Drop it in the comments.

DEV Community