1xApi

Posted on Mar 23 • Originally published at 1xapi.com

How to Build Production-Grade Background Job Queues with BullMQ 5 in Node.js (2026 Guide)

#webdev #node #redis #javascript

Why Your API Needs a Job Queue in 2026

If your API does anything that takes more than 200ms — sending emails, resizing images, calling third-party services, generating PDFs, or processing webhooks — you should not be doing it in the request/response cycle.

Here's what happens without a job queue:

Users wait while your server blocks on slow operations
A spike in traffic can cascade into timeouts across your whole API
A single failed external call can kill the user experience
You have zero visibility into what failed and why

BullMQ is the de-facto standard job queue for Node.js in 2026. Built on Redis, it's used by thousands of companies processing billions of jobs every day. Version 5.71 (released March 11, 2026) ships with OpenTelemetry telemetry support, flow producers for DAG-style job dependencies, rate limiting, priority queues, and dead letter queue patterns.

This guide walks through building a production-grade background job system from scratch — with real code you can ship today.

Prerequisites

Node.js 20+ (LTS)
Redis 7.x
BullMQ ^5.71.0

npm install bullmq ioredis
npm install -D @types/node

For telemetry (optional but recommended):

npm install bullmq-otel @opentelemetry/sdk-node @opentelemetry/exporter-trace-otlp-http

Core Concepts: Queue, Worker, Job

BullMQ has three primitives:

Concept	Role
Queue	Accepts and stores jobs in Redis
Worker	Pulls jobs off the queue and executes them
QueueEvents	Streams job lifecycle events (completed, failed, etc.)

Jobs flow: Queue.add() → Redis → Worker.process() → done.

Step 1: Create a Queue

// src/queues/emailQueue.ts
import { Queue } from 'bullmq';
import { redisConnection } from '../redis';

export const emailQueue = new Queue('email', {
  connection: redisConnection,
  defaultJobOptions: {
    attempts: 3,
    backoff: {
      type: 'exponential',
      delay: 1000, // 1s, 2s, 4s
    },
    removeOnComplete: { count: 100 }, // keep last 100
    removeOnFail: { count: 500 },     // keep last 500 failures
  },
});

// src/redis.ts
import { Redis } from 'ioredis';

export const redisConnection = new Redis({
  host: process.env.REDIS_HOST || 'localhost',
  port: parseInt(process.env.REDIS_PORT || '6379'),
  maxRetriesPerRequest: null, // Required by BullMQ
});

Why maxRetriesPerRequest: null? BullMQ uses Redis blocking commands (BLPOP) internally. If you set a max retry count, ioredis throws on blocked commands. Always set it to null for BullMQ connections.

Step 2: Add Jobs to the Queue

// From your API route handler
import { emailQueue } from '../queues/emailQueue';

// Fire-and-forget from your API
app.post('/api/orders', async (req, res) => {
  const order = await db.orders.create(req.body);

  // Add job — returns immediately, don't await the job result
  await emailQueue.add('order-confirmation', {
    orderId: order.id,
    customerEmail: order.email,
    customerName: order.name,
    total: order.total,
  });

  res.status(201).json({ orderId: order.id });
});

Job Options You'll Actually Use

// Delayed job — run in 5 minutes
await emailQueue.add('follow-up', payload, {
  delay: 5 * 60 * 1000,
});

// High priority job (lower number = higher priority)
await emailQueue.add('password-reset', payload, {
  priority: 1, // 1 = highest, default 0 = lowest
});

// Unique job — prevent duplicates by key
await emailQueue.add('daily-digest', payload, {
  jobId: `digest:${userId}:${todayISO}`,
});

// Repeatable job — cron-style
await emailQueue.add('weekly-report', payload, {
  repeat: {
    pattern: '0 9 * * 1', // 9 AM every Monday
    tz: 'Asia/Saigon',
  },
});

Step 3: Create Workers

Workers are separate processes (or at least separate Node.js threads). In production, run them on dedicated infrastructure so a queue backlog doesn't steal resources from your API server.

// src/workers/emailWorker.ts
import { Worker, Job } from 'bullmq';
import { redisConnection } from '../redis';
import { sendEmail } from '../services/email';

const emailWorker = new Worker(
  'email',
  async (job: Job) => {
    console.log(`Processing job ${job.id}: ${job.name}`);

    switch (job.name) {
      case 'order-confirmation':
        await sendEmail({
          to: job.data.customerEmail,
          subject: `Order #${job.data.orderId} Confirmed`,
          template: 'order-confirmation',
          data: job.data,
        });
        break;

      case 'follow-up':
        await sendEmail({
          to: job.data.customerEmail,
          subject: 'How was your order?',
          template: 'follow-up',
          data: job.data,
        });
        break;

      default:
        throw new Error(`Unknown job name: ${job.name}`);
    }

    return { sentAt: new Date().toISOString() };
  },
  {
    connection: redisConnection,
    concurrency: 10, // Process up to 10 jobs simultaneously
  }
);

emailWorker.on('completed', (job, result) => {
  console.log(`Job ${job.id} completed:`, result);
});

emailWorker.on('failed', (job, err) => {
  console.error(`Job ${job?.id} failed:`, err.message);
});

emailWorker.on('error', (err) => {
  console.error('Worker error:', err);
});

Concurrency Strategy in 2026

// For CPU-bound work: 1-2x vCPUs
const imageWorker = new Worker('image-processing', processor, {
  connection: redisConnection,
  concurrency: 2,
});

// For I/O-bound work (email, HTTP calls): 20-50x
const emailWorker = new Worker('email', processor, {
  connection: redisConnection,
  concurrency: 50,
});

// For rate-limited APIs (e.g., Twilio): rate limit at queue level
const smsWorker = new Worker('sms', processor, {
  connection: redisConnection,
  concurrency: 1,
  limiter: {
    max: 10,       // 10 jobs
    duration: 1000, // per second
  },
});

Step 4: Handling Retries and Dead Letter Queues

BullMQ doesn't have a built-in "dead letter queue" (DLQ) — but you can implement one with the failed event:

// src/workers/emailWorker.ts
import { Queue, Worker, Job } from 'bullmq';
import { redisConnection } from '../redis';

// DLQ: a separate queue for permanently failed jobs
const deadLetterQueue = new Queue('email:dlq', {
  connection: redisConnection,
});

const emailWorker = new Worker('email', processor, {
  connection: redisConnection,
  concurrency: 10,
});

emailWorker.on('failed', async (job: Job | undefined, err: Error) => {
  if (!job) return;

  // Only send to DLQ after all retries exhausted
  if (job.attemptsMade >= (job.opts.attempts ?? 1)) {
    await deadLetterQueue.add(
      job.name,
      {
        ...job.data,
        _failedReason: err.message,
        _failedAt: new Date().toISOString(),
        _originalJobId: job.id,
      },
      { removeOnComplete: false }
    );

    // Alert your team
    await alerting.notify({
      channel: 'engineering',
      message: `Job ${job.name} (${job.id}) permanently failed: ${err.message}`,
      severity: 'high',
    });
  }
});

DLQ jobs can be inspected, replayed, or escalated. Think of it as a safety net for your most critical async operations.

Step 5: Flow Producers — DAG-Style Job Dependencies

BullMQ's FlowProducer (introduced in v3, matured in v5) lets you define job trees where parent jobs wait for all children to complete:

// src/queues/flows.ts
import { FlowProducer } from 'bullmq';
import { redisConnection } from '../redis';

const flow = new FlowProducer({ connection: redisConnection });

// Example: Process a video upload
// 1. Extract audio (child)
// 2. Generate thumbnails (child)
// 3. Both must finish before: Publish video (parent)
await flow.add({
  name: 'publish-video',
  queueName: 'video',
  data: { videoId: '123', userId: 'abc' },
  children: [
    {
      name: 'extract-audio',
      queueName: 'video',
      data: { videoId: '123' },
    },
    {
      name: 'generate-thumbnails',
      queueName: 'video',
      data: { videoId: '123', count: 5 },
    },
  ],
});

The parent publish-video job won't run until both children complete successfully. If either child fails (after all retries), the parent is also marked failed.

Step 6: OpenTelemetry Telemetry Support (New in 2026)

BullMQ 5 announced telemetry support on January 29, 2026. You can now trace job execution end-to-end using the bullmq-otel package:

// src/telemetry.ts
import { NodeSDK } from '@opentelemetry/sdk-node';
import { OTLPTraceExporter } from '@opentelemetry/exporter-trace-otlp-http';
import { Resource } from '@opentelemetry/resources';

const sdk = new NodeSDK({
  resource: new Resource({ 'service.name': 'job-worker' }),
  traceExporter: new OTLPTraceExporter({
    url: process.env.OTEL_EXPORTER_OTLP_ENDPOINT || 'http://localhost:4318/v1/traces',
  }),
});
sdk.start();

// src/workers/emailWorker.ts
import { Worker } from 'bullmq';
import { BullMQOtel } from 'bullmq-otel';
import { redisConnection } from '../redis';

const emailWorker = new Worker('email', processor, {
  connection: redisConnection,
  concurrency: 10,
  telemetry: new BullMQOtel('email-worker'), // 👈 One line to enable tracing
});

This gives you distributed traces showing:

Time in queue (waiting)
Processing duration
Retry attempts
Which worker instance handled the job

Connect to Jaeger, Grafana Tempo, or Datadog — it all speaks OpenTelemetry.

Step 7: Monitoring Your Queues

Bull Board (UI Dashboard)

npm install @bull-board/express @bull-board/api

// src/dashboard.ts
import { createBullBoard } from '@bull-board/api';
import { BullMQAdapter } from '@bull-board/api/bullMQAdapter';
import { ExpressAdapter } from '@bull-board/express';
import { emailQueue } from './queues/emailQueue';

const serverAdapter = new ExpressAdapter();
serverAdapter.setBasePath('/admin/queues');

createBullBoard({
  queues: [new BullMQAdapter(emailQueue)],
  serverAdapter,
});

// Protect with auth middleware in production!
app.use('/admin/queues', authMiddleware, serverAdapter.getRouter());

Programmatic Queue Metrics

import { Queue } from 'bullmq';

async function getQueueHealth(queue: Queue) {
  const [waiting, active, completed, failed, delayed] = await Promise.all([
    queue.getWaitingCount(),
    queue.getActiveCount(),
    queue.getCompletedCount(),
    queue.getFailedCount(),
    queue.getDelayedCount(),
  ]);

  return { waiting, active, completed, failed, delayed };
}

// Expose as a health endpoint
app.get('/health/queues', async (req, res) => {
  const emailHealth = await getQueueHealth(emailQueue);
  const isHealthy = emailHealth.failed < 100 && emailHealth.waiting < 10000;

  res.status(isHealthy ? 200 : 503).json({
    status: isHealthy ? 'healthy' : 'degraded',
    queues: { email: emailHealth },
  });
});

Production Architecture Checklist

Before shipping your job queue to production in 2026:

Infrastructure

[ ] Redis 7.x with persistence (appendonly yes)
[ ] Redis Sentinel or Cluster for high availability
[ ] Workers run as separate processes (not embedded in API server)
[ ] Workers deployed with a process manager (PM2, systemd, Docker)

Reliability

[ ] maxRetriesPerRequest: null on all BullMQ Redis connections
[ ] Exponential backoff configured on defaultJobOptions
[ ] Dead letter queue implemented for critical jobs
[ ] removeOnComplete and removeOnFail set (avoid Redis memory bloat)

Observability

[ ] Bull Board or equivalent UI deployed behind auth
[ ] Queue depth metrics exported to your monitoring stack
[ ] Worker failed events send alerts
[ ] OpenTelemetry tracing enabled (BullMQ 5+)

Operations

[ ] Graceful shutdown: worker.close() on SIGTERM
[ ] Job deduplication for idempotent jobs (use jobId)
[ ] Rate limiters configured for external API jobs

// Graceful shutdown
process.on('SIGTERM', async () => {
  await emailWorker.close();
  await redisConnection.quit();
  process.exit(0);
});

Common Pitfalls

1. Blocking the event loop in a worker

Never do synchronous CPU-heavy work in a Worker — it blocks all concurrent jobs.

// ❌ Bad — blocks event loop
const result = crypto.pbkdf2Sync(password, salt, 100000, 64, 'sha512');

// ✅ Good — async/non-blocking
const result = await new Promise((resolve, reject) =>
  crypto.pbkdf2(password, salt, 100000, 64, 'sha512', (err, key) =>
    err ? reject(err) : resolve(key)
  )
);

2. Stale job data

Don't store large payloads in job data. Store references (IDs), not full objects.

// ❌ Bad — 50KB user object in every job
await queue.add('export', { user: fullUserObject });

// ✅ Good — just the ID, fetch fresh in the worker
await queue.add('export', { userId: '123' });
// In worker: const user = await db.users.findById(job.data.userId);

3. Not handling worker crashes

Always implement the error event listener on your worker. Otherwise, unhandled crashes are silent.

worker.on('error', (err) => {
  // Log to Sentry, Datadog, etc.
  Sentry.captureException(err);
});

Scaling Workers Horizontally

One of BullMQ's strengths is horizontal scaling. Multiple workers connecting to the same Redis queue automatically distribute load — no extra configuration needed.

# docker-compose.yml (simplified)
services:
  redis:
    image: redis:7-alpine
    volumes:
      - redis-data:/data
    command: redis-server --appendonly yes

  api:
    build: .
    command: node dist/server.js
    environment:
      - REDIS_HOST=redis

  worker:
    build: .
    command: node dist/workers/emailWorker.js
    environment:
      - REDIS_HOST=redis
    deploy:
      replicas: 3  # Scale workers independently

Scale worker replicas independently of your API. During a traffic spike, docker-compose up --scale worker=10 and your queue drains 10x faster — no code changes required.

Wrapping Up

BullMQ 5 in 2026 is a mature, production-ready job queue system that solves the fundamental problem: don't make your users wait for things that can happen in the background.

The key patterns to take away:

Use queues for anything that takes >200ms or talks to external services
Run workers as separate processes with proper concurrency tuning
Implement dead letter queues for critical jobs
Use FlowProducer for multi-step job pipelines
Enable OpenTelemetry tracing for production visibility
Monitor queue depth as a first-class health signal

If you're building API products, check out 1xAPI on RapidAPI for ready-made API endpoints you can integrate without the queue complexity.

Building something with BullMQ? Drop your questions in the comments below.

DEV Community