DEV Community

SEN LLC
SEN LLC

Posted on

Writing Cron Semantics in ~100 Lines of TypeScript

Writing Cron Semantics in ~100 Lines of TypeScript

A tiny in-process cron-style scheduler CLI with zero runtime dependencies. Hand-rolled 5-field parser, AbortController timeouts, retries with exponential backoff, jitter, structured JSON logs, and a graceful-shutdown dance — all in plain Node.

📦 GitHub: https://github.com/sen-ltd/cron-runner

Screenshot

I have a Docker Compose stack that needs to run three shell commands on schedules. The commands are tiny — a pg_dump, a find ... -delete, a heartbeat curl. The "right" answer for running scheduled work is supposed to be systemd timers, but systemd timers want root, want you to ship two files per job, and don't help one bit inside a container.

So what does everyone actually do? Three things, all annoying:

  1. Put cron inside the image. vixie-cron / busybox crond work, but they expect /var/log, they have no idea who your container's stdout belongs to, and when the daemon crashes the container keeps "running" while firing nothing.
  2. A second container running cron. Same problem, plus now there's two images to maintain.
  3. A full-fat scheduler library. node-cron, cron, node-schedule, @nestjs/schedule — all reasonable, all adding 200–400 KB of transitive deps, and none of them writing the "timeout + retry + kill-the-child + log-the-exit" loop for you. You still have to build that yourself.

I wanted a fourth option: a single tiny Node process that reads a JSON file of jobs and just runs them, with proper logs, proper timeouts, proper retries, and proper shutdown. No dependencies. cron-runner is the result.

The fun technical content is: how much of cron can you rewrite before it gets boring? Turns out about 100 lines of TypeScript gets you the whole 5-field grammar, name resolution (JAN-DEC, MON-SUN), wildcards, ranges, lists, steps, and the Vixie OR-semantics for day-of-month vs day-of-week. Another 50 lines gets you next-fire-time with "good enough" complexity. The remaining work isn't cron — it's the scheduler loop, the child-process plumbing, and the graceful shutdown dance, which turned out to be the more interesting problems.

Design

The whole thing is four small files:

  • cron.ts — parser + nextFire(expr, from). Pure, no I/O, no deps.
  • config.ts — load and validate the JSON config, surface friendly errors.
  • runner.ts — spawn a child command, enforce a timeout, retry on failure.
  • scheduler.ts — dispatch loop that wakes up at each job's next fire.
  • main.ts — CLI glue, --dry-run, --list, --once, graceful shutdown.

Three design decisions dominate everything:

1. nextFire is a linear minute search. I briefly tried to write an "intelligent" next-fire that constructs the next matching date field-by-field. It's fiddly, has edge cases around month rollover and leap years, and gains you essentially nothing: for a realistic cron expression, the number of minutes between now and the next fire is small. */5 * * * * is at most 4 iterations. 0 9 * * 1-5 is at most ~72 hours = ~4320 iterations, a <1 ms loop. I capped the search at two years of minutes (~1 M) as a safety net — that's the worst case for truly hostile inputs like 0 0 29 2 * on a non-leap year.

2. The clock, the timer, the spawn, and the RNG are all injectable. Every interesting function takes { now, setTimer, spawn, random, sleep } as dependencies with sensible defaults. The scheduler test suite runs the entire dispatch loop against a fake timer queue in about 2 ms.

3. Timeouts are AbortController, not setTimeout(() => kill). The AbortController chains cleanly: there's a process-wide "shutdown" controller, and each attempt creates a per-attempt controller that aborts when either the parent aborts or the per-attempt timer fires. This gives one primitive for two use cases ("kill because user pressed Ctrl-C" and "kill because the job is stuck") with no extra bookkeeping.

The cron parser

Here's the shape. The parser produces a CronExpr where each field is a precomputed Set<number> of valid values and a boolean for "was this literally a wildcard":

export interface CronField {
  kind: FieldKind;
  raw: string;
  values: Set<number>;
  isWildcard: boolean;
}

export interface CronExpr {
  source: string;
  minute: CronField;
  hour: CronField;
  dayOfMonth: CronField;
  month: CronField;
  dayOfWeek: CronField;
}
Enter fullscreen mode Exit fullscreen mode

Why precomputed sets? Because matches(expr, d) is then five hash-set lookups. No loops, no range evaluation at call time, no regex. For nextFire's ~4320-iteration hot loop that matters — not for correctness, but because making it fast-by-default means I never have to think about it again.

The parser itself is a small recursive-descent family: one function per field type, one function per "part" (wildcard, number, range, step, list). Names are resolved by a lookup table:

const MONTH_NAMES: Record<string, number> = {
  jan: 1, feb: 2, mar: 3, apr: 4, may: 5, jun: 6,
  jul: 7, aug: 8, sep: 9, oct: 10, nov: 11, dec: 12,
};
const DOW_NAMES: Record<string, number> = {
  sun: 0, mon: 1, tue: 2, wed: 3, thu: 4, fri: 5, sat: 6,
};
Enter fullscreen mode Exit fullscreen mode

The only real semantic quirk is day-of-week 7. Traditional cron accepts both 0 and 7 for Sunday. The parser collapses 7 to 0 on entry so no other code has to care.

Vixie OR semantics

The trap in 5-field cron is that day-of-month and day-of-week are ORed, not ANDed, whenever both are restricted. The canonical head-scratcher is:

0 0 15 * MON    # fires at midnight on the 15th OR on any Monday
Enter fullscreen mode Exit fullscreen mode

If you write dow_ok && dom_ok you'll get it wrong. If you write dow_ok || dom_ok you'll get that right but break * * * * * (because now "no Mondays in March" suddenly matches nothing). The rule is:

  • If both dom and dow are wildcards → always match (AND).
  • If exactly one is wildcard → that one doesn't constrain; match on the other.
  • If neither is a wildcard → match if either matches (OR).

The "precomputed set + isWildcard boolean" representation makes this clean:

export function matches(expr: CronExpr, d: Date): boolean {
  if (!expr.minute.values.has(d.getMinutes())) return false;
  if (!expr.hour.values.has(d.getHours())) return false;
  if (!expr.month.values.has(d.getMonth() + 1)) return false;

  const domOk = expr.dayOfMonth.values.has(d.getDate());
  const dowOk = expr.dayOfWeek.values.has(d.getDay());

  if (expr.dayOfMonth.isWildcard && expr.dayOfWeek.isWildcard) return true;
  if (expr.dayOfMonth.isWildcard) return dowOk;
  if (expr.dayOfWeek.isWildcard) return domOk;
  return domOk || dowOk;
}
Enter fullscreen mode Exit fullscreen mode

nextFire is then just "bump the minute and ask matches until it says yes":

export function nextFire(
  expr: CronExpr,
  from: Date,
  maxMinutes = 60 * 24 * 366 * 2,
): Date | null {
  const d = new Date(from.getTime());
  d.setSeconds(0, 0);
  d.setMinutes(d.getMinutes() + 1);
  for (let i = 0; i < maxMinutes; i++) {
    if (matches(expr, d)) return new Date(d.getTime());
    d.setMinutes(d.getMinutes() + 1);
  }
  return null;
}
Enter fullscreen mode Exit fullscreen mode

That's it. That's the cron engine. Everything else is plumbing.

The runner: spawn, timeout, retry

The runner is where I spent the most debugging time, because spawning a child process "correctly" is always more work than you expect. The contract is:

  • Run sh -c <command>.
  • Track stdout/stderr byte counts (not the contents — pipe them to /dev/null).
  • Enforce a timeout via AbortController. On timeout, SIGKILL.
  • On external abort, SIGTERM.
  • Emit exactly one log line per attempt.
  • Retry with exponential backoff (100ms * 2^attempt, capped at 30s) until success or max_retries + 1 attempts.

Here's the spawn function:

export const defaultSpawn: SpawnFn = ({ command, timeoutMs, signal }) => {
  return new Promise((resolve) => {
    let stdoutBytes = 0;
    let stderrBytes = 0;
    let settled = false;
    let timedOut = false;

    const child = nodeSpawn('sh', ['-c', command], {
      stdio: ['ignore', 'pipe', 'pipe'],
    });

    const timer = setTimeout(() => {
      timedOut = true;
      try { child.kill('SIGKILL'); } catch {}
    }, timeoutMs);

    const abortHandler = () => {
      try { child.kill('SIGTERM'); } catch {}
    };
    signal.addEventListener('abort', abortHandler, { once: true });

    child.stdout?.on('data', (chunk) => { stdoutBytes += Buffer.byteLength(chunk); });
    child.stderr?.on('data', (chunk) => { stderrBytes += Buffer.byteLength(chunk); });

    const finalize = (exitCode: number | null, error?: string) => {
      if (settled) return;
      settled = true;
      clearTimeout(timer);
      signal.removeEventListener('abort', abortHandler);
      resolve({ exitCode, stdoutBytes, stderrBytes, timedOut, error });
    };

    child.on('error', (err) => finalize(null, err.message));
    child.on('close', (code) => finalize(code));
  });
};
Enter fullscreen mode Exit fullscreen mode

Three subtleties:

  1. settled guard. Both the timeout kill and the child's close event will fire — the close comes after the kill. Without the guard you'd resolve the promise twice.
  2. abort event listener cleanup. Without removeEventListener you leak a listener per attempt, and after 10 attempts Node starts warning about MaxListenersExceededWarning.
  3. Byte counting, not buffering. If a child dumps 500 MB to stdout, buffering it would OOM. We count bytes and drop the content.

The retry loop wrapping this is boring on purpose:

for (let attempt = 1; attempt <= maxAttempts; attempt++) {
  if (externalSignal?.aborted) return false;
  const ctl = new AbortController();
  externalSignal?.addEventListener('abort', () => ctl.abort(), { once: true });
  const started = now();
  const result = await spawn({ command, timeoutMs, signal: ctl.signal });
  const finished = now();
  logger.run({ /* per-run log line */ });
  if (result.exitCode === 0 && !result.timedOut) return true;
  if (attempt < maxAttempts) await sleep(backoffMs(attempt - 1));
}
return false;
Enter fullscreen mode Exit fullscreen mode

The whole retry path is sleep-injectable, which is how tests run 3 retries + 2 backoffs in 4ms instead of 600ms.

The scheduler dispatch loop

The scheduler's one job is to wake up at each next fire, run the job, and re-schedule. The trick is doing it in a way that tests can drive synchronously. Here's the part that actually dispatches:

function schedule(job: JobConfig): void {
  if (stopped) return;
  const next = computeNextDelayMs(job, now(), random);
  if (!next) { logger.warn('no future fire time', { job: job.name }); return; }
  logger.info('scheduled', { job: job.name, fire_at: next.fireAt.toISOString(), delay_ms: next.delayMs });

  const timer = setTimer(() => {
    timers.delete(timer);
    if (stopped) return;
    inflightCount++;
    runJobOnce(job, runnerDeps, { signal: options.shutdownSignal })
      .catch((err) => logger.error('runner threw', { job: job.name, error: err.message }))
      .finally(() => {
        inflightCount--;
        if (!stopped) schedule(job);
        maybeResolveDrain();
      });
  }, next.delayMs);
  timers.add(timer);
}
Enter fullscreen mode Exit fullscreen mode

The fake timer queue in tests looks like this:

const setTimer: SetTimer = (fn, delay) => {
  const entry = { fn, at: now + delay };
  pending.push(entry);
  return { cancel: () => { /* remove from pending */ } };
};

// ...
advanceTo(30_000); // fires anything due, in timestamp order
Enter fullscreen mode Exit fullscreen mode

With this plus now: () => fakeClock, tests can step a scheduler through hours of simulated time in microseconds.

Jitter is a one-liner: after computing the cron minute boundary, add [0, jitterMs) ms. This is what keeps a fleet of cron-runner sidecars from hammering a shared target at the same instant. Even a small 5-second jitter turns a thundering-herd spike into a smooth ramp.

Graceful shutdown

On SIGINT or SIGTERM the sequence is:

  1. logger.info('shutdown signal received')
  2. shutdown.abort() — signals in-flight children to SIGTERM.
  3. handle.stop() — cancels all pending scheduler timers.
  4. await handle.drain() — resolves when inflightCount reaches 0.
  5. Process exits 0.

The drain() promise is a little dance: it resolves only when stopped === true && inflightCount === 0. It's wired up so that either "stop after last in-flight job finishes" or "stop while already idle" both work.

If someone sends SIGKILL, Node dies instantly and Docker takes the child process down with the container. There's no "persist missed runs" logic — if cron-runner dies, you miss runs, and that's by design.

Tradeoffs

This is not a general-purpose scheduler. The things it doesn't do:

  • No persistence of missed runs. Process dies → runs get missed. If you need at-least-once, put the scheduler behind a queue.
  • No clustering. Two cron-runner processes on the same config will double-fire. Single-replica only.
  • Minute resolution. No seconds field. No sub-minute schedules.
  • No year field. Vixie cron semantics only.

If any of those matter, the answer is Nomad periodic, Kubernetes CronJob, or Temporal. This is the "three-job sidecar" size.

Try it

git clone https://github.com/sen-ltd/cron-runner.git
cd cron-runner
docker build -t cron-runner .

cat > config.json << 'EOF'
{
  "jobs": [
    {"name": "hi", "schedule": "* * * * *", "command": "sh -c 'date -u +%FT%TZ'", "timeout_ms": 5000, "max_retries": 1, "jitter_ms": 2000}
  ]
}
EOF

docker run --rm -v "$PWD":/work cron-runner /work/config.json --list
docker run --rm -v "$PWD":/work cron-runner /work/config.json --dry-run
docker run --rm -v "$PWD":/work cron-runner /work/config.json --once hi
docker run --rm -v "$PWD":/work cron-runner /work/config.json
Enter fullscreen mode Exit fullscreen mode

The runtime image is 136 MB (node:20-alpine + compiled JS, no node_modules). Tests are 61 vitest cases, all passing in-container. Zero runtime dependencies, full Vixie 5-field cron, injectable clock and spawn, structured JSON logs out of the box.

MIT-licensed. PRs welcome. Next on the list: optional --max-parallel to cap concurrent in-flight jobs across the whole process.

Top comments (0)