Young Gao

Posted on Mar 21

Background Job Processing in Node.js: BullMQ, Queues, and Worker Patterns (2026)

#node #backend #redis #typescript

Here's the article body markdown:

Your API returns 200. The user sees "success." Behind the scenes, nothing happened because your email service timed out and took the request handler down with it.

Stop processing heavy work inline. Use a job queue.

The Problem With Synchronous Processing

Every second your request handler spends on non-essential work is a second that connection is occupied. Send an email, generate a PDF, resize an image — do any of that synchronously and you're begging for:

Request timeouts under load
Cascading failures when downstream services go down
Zero visibility into what failed and why
No retry mechanism

The fix is dead simple: accept the request, enqueue the work, return immediately. Process it in the background.

BullMQ + Redis: The Setup

BullMQ is the successor to Bull. It's built on Redis Streams, supports TypeScript natively, and handles everything you'd want from a production queue.

npm install bullmq ioredis

// src/lib/queue.ts
import { Queue, Worker, QueueScheduler } from 'bullmq';
import IORedis from 'ioredis';

const connection = new IORedis({
  host: process.env.REDIS_HOST || '127.0.0.1',
  port: 6379,
  maxRetriesPerRequest: null, // required by BullMQ
});

export const emailQueue = new Queue('email', { connection });
export const reportQueue = new Queue('reports', { connection });

That's your queue. Now enqueue a job from your route handler:

// src/routes/signup.ts
import { emailQueue } from '../lib/queue';

router.post('/signup', async (req, res) => {
  const user = await createUser(req.body);

  await emailQueue.add('welcome-email', {
    userId: user.id,
    email: user.email,
  });

  res.status(201).json({ id: user.id });
  // Email sends in background. Response is instant.
});

Job Types That Actually Matter

Delayed jobs — charge a card 30 minutes after signup (lets users cancel):

await billingQueue.add('trial-charge', { userId }, {
  delay: 30 * 60 * 1000, // 30 minutes
});

Recurring jobs — daily digest, cleanup cron:

await reportQueue.add('daily-digest', {}, {
  repeat: { pattern: '0 9 * * *' }, // 9 AM daily
});

Priority jobs — paid users get processed first:

await emailQueue.add('password-reset', { userId }, {
  priority: 1, // lower number = higher priority
});

await emailQueue.add('marketing-blast', { campaignId }, {
  priority: 10,
});

Workers: Concurrency and Rate Limiting

A worker pulls jobs off the queue and processes them. You control how many run in parallel.

// src/workers/email.worker.ts
import { Worker } from 'bullmq';
import { connection } from '../lib/queue';
import { sendEmail } from '../lib/mailer';

const worker = new Worker('email', async (job) => {
  switch (job.name) {
    case 'welcome-email':
      await sendEmail({
        to: job.data.email,
        template: 'welcome',
        vars: { userId: job.data.userId },
      });
      break;
    case 'password-reset':
      await sendEmail({
        to: job.data.email,
        template: 'reset',
        vars: { token: job.data.token },
      });
      break;
  }
}, {
  connection,
  concurrency: 5,        // 5 jobs in parallel
  limiter: {
    max: 100,             // max 100 jobs
    duration: 60 * 1000,  // per minute
  },
});

The limiter is essential when your downstream service has rate limits. Without it, 10 workers with concurrency 5 will hammer your SMTP server with 50 simultaneous requests.

Failure and Retry Strategies

Jobs fail. Networks flake. Services go down. Your retry config decides whether things self-heal or page you at 3 AM.

await emailQueue.add('welcome-email', { userId }, {
  attempts: 5,
  backoff: {
    type: 'exponential',
    delay: 2000, // 2s, 4s, 8s, 16s, 32s
  },
  removeOnComplete: { age: 24 * 3600 },  // cleanup after 24h
  removeOnFail: { age: 7 * 24 * 3600 },  // keep failures 7 days
});

Handle failures explicitly in your worker:

worker.on('failed', (job, err) => {
  logger.error(`Job ${job?.id} failed: ${err.message}`, {
    queue: 'email',
    jobName: job?.name,
    attemptsMade: job?.attemptsMade,
    data: job?.data,
  });

  if (job?.attemptsMade === job?.opts.attempts) {
    // Final failure — alert on-call
    alerting.notify(`Email job permanently failed: ${job.id}`);
  }
});

Use exponential backoff for transient failures (network, rate limits). Use fixed backoff when retrying makes equal sense at any interval. Set removeOnComplete aggressively — stale completed jobs bloat Redis memory.

Monitoring: Bull Board

You need a dashboard. Bull Board gives you one in minutes.

// src/admin/queue-dashboard.ts
import { createBullBoard } from '@bull-board/api';
import { BullMQAdapter } from '@bull-board/api/bullMQAdapter';
import { ExpressAdapter } from '@bull-board/express';
import { emailQueue, reportQueue } from '../lib/queue';

const serverAdapter = new ExpressAdapter();
serverAdapter.setBasePath('/admin/queues');

createBullBoard({
  queues: [
    new BullMQAdapter(emailQueue),
    new BullMQAdapter(reportQueue),
  ],
  serverAdapter,
});

// Mount in your Express app
app.use('/admin/queues', authMiddleware, serverAdapter.getRouter());

Put authMiddleware in front of it. Exposing your job queue to the internet is a security incident waiting to happen.

For production, also emit metrics to Prometheus or Datadog:

worker.on('completed', (job) => {
  metrics.histogram('job.duration', job.processedOn! - job.timestamp, {
    queue: 'email', name: job.name,
  });
});

worker.on('failed', () => {
  metrics.increment('job.failed', { queue: 'email' });
});

Production Architecture

Run workers as separate processes, not inside your API server. This gives you:

Independent scaling (3 API pods, 1 worker pod)
Isolation (a worker OOM doesn't kill your API)
Independent deploys

# Dockerfile.worker
FROM node:20-alpine
WORKDIR /app
COPY dist/ ./dist/
COPY package*.json ./
RUN npm ci --production
CMD ["node", "dist/workers/index.js"]

In Kubernetes, your worker deployment scales on queue depth:

apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
  name: email-worker-hpa
spec:
  scaleTargetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: email-worker
  minReplicas: 1
  maxReplicas: 10
  metrics:
    - type: External
      external:
        metric:
          name: bullmq_queue_waiting
          selector:
            matchLabels:
              queue: email
        target:
          type: AverageValue
          averageValue: "50"

Common Mistakes

Running workers inside the API process. Worker crashes bring down your API. Always separate processes.

No dead letter handling. When a job exhausts all retries, it sits in the failed set forever. Log it, alert on it, or move it to a dead letter queue for manual inspection.

Ignoring Redis memory. Every completed job stays in Redis unless you set removeOnComplete. A queue doing 10K jobs/day will eat gigabytes within weeks. Set aggressive TTLs.

Not making jobs idempotent. If a job runs twice (retry after timeout), it shouldn't send two emails or charge twice. Use idempotency keys or check state before acting.

Giant job payloads. Don't stuff a 5MB PDF into the job data. Store it in S3, put the key in the job. Redis is not a blob store.

Missing health checks. If your worker dies silently, jobs pile up. Monitor queue depth. Alert when waiting count exceeds a threshold for more than N minutes.

Part of my Production Backend Patterns series. Follow for more practical backend engineering.

DEV Community