DEV Community

Lucas Milanez
Lucas Milanez

Posted on

How I Built a Real-Time Job Scheduler That Fires Within 30 Seconds of Any DateTime

I needed to build a system that fires an action at the exact DateTime stored in a database column — not "in 5 minutes" or "every hour," but "at 2026-06-15T14:30:00Z, within 30 seconds."

This is the architecture I ended up with after 4 months of building Precise Triggers, a monday.com app that schedules automations at precise times. The patterns here apply to any system that needs time-based job execution with sub-minute accuracy.

The Problem

The naive approach to "fire something at a specific time" is a polling loop:

// Don't do this
setInterval(async () => {
  const due = await db.query(
    `SELECT * FROM jobs WHERE fire_at <= NOW() AND status = 'pending'`
  );
  for (const job of due) await execute(job);
}, 10_000); // check every 10s
Enter fullscreen mode Exit fullscreen mode

This works for a demo. In production it fails because:

  • Polling interval = guaranteed delay. If you poll every 10s, you're late by up to 10s on average.
  • DB load scales linearly with the number of pending jobs. 10,000 scheduled items = 10,000 rows scanned every 10 seconds.
  • No deduplication. Two workers polling simultaneously can execute the same job twice.
  • No backpressure. If 500 jobs come due at the same second, your worker processes them sequentially while the rest wait.

The Architecture: BullMQ Delayed Jobs

Instead of polling, I pre-schedule each job as a BullMQ delayed job with a precise delay:

import { Queue } from 'bullmq';

const triggerQueue = new Queue('trigger-queue', { connection: redis });

async function scheduleJob(itemId: string, fireAt: Date, payload: object) {
  const delay = fireAt.getTime() - Date.now();

  if (delay <= 0) return; // past dates are skipped

  await triggerQueue.add('fire-trigger', payload, {
    delay,
    jobId: `trigger__${configId}__${itemId}`, // dedup key
    removeOnComplete: { count: 1000, age: 7 * 24 * 3600 },
    removeOnFail: { count: 1000, age: 7 * 24 * 3600 },
    attempts: 3,
    backoff: { type: 'exponential', delay: 5000 },
  });
}
Enter fullscreen mode Exit fullscreen mode

How BullMQ delayed jobs work under the hood:

  1. The job is added to a Redis sorted set with score = Date.now() + delay.
  2. BullMQ's internal loop checks the sorted set and moves jobs to the "waiting" list when their score <= current timestamp.
  3. A Worker picks up waiting jobs and processes them.

The precision is limited by BullMQ's internal check interval (default ~1 second). In practice, I measure 1-4 seconds of latency — well within my 30-second target.

Bulk Scheduling: Scanning a Board

When a user creates a trigger configuration, I need to schedule jobs for every existing item with a future DateTime. This means paginating through potentially thousands of items via the monday.com GraphQL API:

async function bulkScheduleForTrigger(config: TriggerConfig, tenantId: string) {
  const token = await getTenantAccessToken(tenantId);
  let cursor: string | null = null;
  let scheduled = 0;

  do {
    const response = await callMondayApi(token, query, { cursor, boardId: config.boardId });
    const items = response.data.boards[0].items_page.items;
    cursor = response.data.boards[0].items_page.cursor;

    for (const item of items) {
      const dateValue = parseMondayDateTime(item.column_values[0]?.value);
      if (!dateValue || dateValue.getTime() <= Date.now()) continue;

      await scheduleJob(item.id, dateValue, {
        triggerId: config.id,
        itemId: item.id,
        boardId: config.boardId,
        tenantId,
        action: config.action,
        scheduledAt: dateValue.toISOString(),
      });
      scheduled++;
    }
  } while (cursor);

  return { scheduled };
}
Enter fullscreen mode Exit fullscreen mode

Rescheduling: Webhook-Driven Updates

What happens when someone changes a date after the trigger is created? Polling the API for changes would bring back all the problems of the naive approach.

Instead, I register a monday.com webhook that notifies my backend whenever a column value changes:

POST /webhooks/monday
{ "event": { "type": "change_specific_column_value", "pulseId": 12345, "value": { "date": "2026-06-20", "time": "15:00:00" } } }
Enter fullscreen mode Exit fullscreen mode

When this arrives:

  1. Look up the trigger config for this board + column.
  2. Cancel the old BullMQ job (by its deterministic jobId).
  3. Schedule a new job with the updated DateTime.
async function handleColumnChange(event: ColumnChangeEvent) {
  const configs = await findConfigsForColumn(event.boardId, event.columnId);

  for (const config of configs) {
    const jobId = `trigger__${config.id}__${event.pulseId}`;

    // Cancel existing job
    const existingJob = await triggerQueue.getJob(jobId);
    if (existingJob) await existingJob.remove();

    // Schedule new job (if the new date is in the future)
    const newDate = parseDateTime(event.value);
    if (newDate && newDate.getTime() > Date.now()) {
      await scheduleJob(event.pulseId, newDate, { ... });
    }
  }
}
Enter fullscreen mode Exit fullscreen mode

The jobId format (trigger__${configId}__${itemId}) ensures deduplication: the same item can't have two pending jobs for the same trigger.

Fire-Time Quota Enforcement

The system has a monthly usage limit per tenant (25/500/5000 automations per month depending on plan). The tricky part: where do you enforce it?

If you enforce at creation time, a user can schedule 10,000 triggers while under the limit, and they'll all fire even if the user should be throttled.

I enforce at fire time — the moment the worker picks up the job:

async function processJob(job: Job<TriggerJobData>) {
  const quota = await checkExecutionQuota(job.data.tenantId);

  if (!quota.allowed) {
    // Record as "skipped" (doesn't count toward quota)
    await recordSkippedExecution(job.data, 
      `Monthly limit reached (${quota.used}/${quota.limit})`
    );
    return; // don't fire the action
  }

  // Execute the actual action...
  const result = await dispatchAction(job.data);
  await recordExecutionLog(job.data, result);
}
Enter fullscreen mode Exit fullscreen mode

The countMonthlyExecutions function queries execution_logs for the current month:

SELECT COUNT(*) FROM execution_logs
WHERE tenant_id = $1
  AND outcome IN ('success', 'failure')
  AND executed_at >= date_trunc('month', NOW())
Enter fullscreen mode Exit fullscreen mode

The key insight: "skipped" executions don't count toward the quota. This prevents a cascading lockout where the first 25 jobs fire, then 975 get skipped — and next month, the user has 975 skipped logs that don't contribute to their allowance.

Orphan Detection

What happens when a user deletes a board (or loses access to it) after scheduling triggers? The jobs are still in Redis, and when they fire, the monday.com API returns 404 or 403.

if (error instanceof MondayApiError && (error.statusCode === 404 || error.statusCode === 403)) {
  await handleApiErrorForBoard(boardId, error.statusCode);
  await recordExecutionLog(jobData, 'failure', `Board orphaned (${error.statusCode})`);
  return; // don't retry — the board is gone
}
Enter fullscreen mode Exit fullscreen mode

The handleApiErrorForBoard function marks all trigger configs for that board as status = 'orphaned', which shows a clear state in the UI.

Dead Letter Queue

After 3 failed attempts (exponential backoff), a job moves to a separate DLQ:

if (isLastAttempt) {
  await recordExecutionLog(jobData, 'failure', errorMessage);
  await dlq.add('dead-letter', {
    ...jobData,
    failedAt: new Date().toISOString(),
    errorMessage,
    attemptsMade: 3,
  });
}
throw error; // let BullMQ handle the retry/fail
Enter fullscreen mode Exit fullscreen mode

The DLQ worker logs the permanent failure and can optionally notify the user.

Token Refresh (Proactive)

monday.com OAuth tokens expire. If you wait for a 401 and then refresh, the job has already failed. Instead, I check token expiry before every API call:

async function getTenantAccessToken(tenantId: string): Promise<string> {
  const { token, expiresAt } = await getStoredToken(tenantId);

  if (isTokenNearExpiry(expiresAt)) {
    const refreshResult = await refreshTenantToken(tenantId);
    if (refreshResult.success) {
      return getFreshToken(tenantId); // re-read from DB
    }
    if (refreshResult.errorType === 'auth') {
      throw new Error('Token expired and refresh failed — re-authorization required');
    }
  }

  return decrypt(token, ENCRYPTION_KEY);
}
Enter fullscreen mode Exit fullscreen mode

This ensures the worker never hits a 401 during job processing (unless the refresh token itself is revoked).

Results

The system handles:

  • Sub-30-second precision for job execution (measured 1-4s in production)
  • Automatic rescheduling via webhook (no polling)
  • Graceful degradation (orphaned boards, expired tokens, quota limits)
  • Zero polling overhead (delayed jobs are O(1) per scheduled item)

The full stack: TypeScript, Fastify, BullMQ 5, ioredis, PostgreSQL, React 18, Docker, Hetzner Cloud.

If you're building something that needs to "do X at time Y" reliably, BullMQ delayed jobs with webhook-driven rescheduling is a solid pattern. The edge cases (orphans, tokens, quotas, DLQ) are where the real engineering lives.


This powers Precise Triggers — a monday.com app that fires automations at exact DateTimes. Free to try if you use monday.com and need minute-level scheduling precision.

Questions about the architecture? Drop them in the comments.

Top comments (0)