Why Your API Needs a Job Queue in 2026
If your API does anything that takes more than 200ms — sending emails, resizing images, calling third-party services, generating PDFs, or processing webhooks — you should not be doing it in the request/response cycle.
Here's what happens without a job queue:
- Users wait while your server blocks on slow operations
- A spike in traffic can cascade into timeouts across your whole API
- A single failed external call can kill the user experience
- You have zero visibility into what failed and why
BullMQ is the de-facto standard job queue for Node.js in 2026. Built on Redis, it's used by thousands of companies processing billions of jobs every day. Version 5.71 (released March 11, 2026) ships with OpenTelemetry telemetry support, flow producers for DAG-style job dependencies, rate limiting, priority queues, and dead letter queue patterns.
This guide walks through building a production-grade background job system from scratch — with real code you can ship today.
Prerequisites
- Node.js 20+ (LTS)
- Redis 7.x
- BullMQ
^5.71.0
npm install bullmq ioredis
npm install -D @types/node
For telemetry (optional but recommended):
npm install bullmq-otel @opentelemetry/sdk-node @opentelemetry/exporter-trace-otlp-http
Core Concepts: Queue, Worker, Job
BullMQ has three primitives:
| Concept | Role |
|---|---|
| Queue | Accepts and stores jobs in Redis |
| Worker | Pulls jobs off the queue and executes them |
| QueueEvents | Streams job lifecycle events (completed, failed, etc.) |
Jobs flow: Queue.add() → Redis → Worker.process() → done.
Step 1: Create a Queue
// src/queues/emailQueue.ts
import { Queue } from 'bullmq';
import { redisConnection } from '../redis';
export const emailQueue = new Queue('email', {
connection: redisConnection,
defaultJobOptions: {
attempts: 3,
backoff: {
type: 'exponential',
delay: 1000, // 1s, 2s, 4s
},
removeOnComplete: { count: 100 }, // keep last 100
removeOnFail: { count: 500 }, // keep last 500 failures
},
});
// src/redis.ts
import { Redis } from 'ioredis';
export const redisConnection = new Redis({
host: process.env.REDIS_HOST || 'localhost',
port: parseInt(process.env.REDIS_PORT || '6379'),
maxRetriesPerRequest: null, // Required by BullMQ
});
Why
maxRetriesPerRequest: null? BullMQ uses Redis blocking commands (BLPOP) internally. If you set a max retry count, ioredis throws on blocked commands. Always set it tonullfor BullMQ connections.
Step 2: Add Jobs to the Queue
// From your API route handler
import { emailQueue } from '../queues/emailQueue';
// Fire-and-forget from your API
app.post('/api/orders', async (req, res) => {
const order = await db.orders.create(req.body);
// Add job — returns immediately, don't await the job result
await emailQueue.add('order-confirmation', {
orderId: order.id,
customerEmail: order.email,
customerName: order.name,
total: order.total,
});
res.status(201).json({ orderId: order.id });
});
Job Options You'll Actually Use
// Delayed job — run in 5 minutes
await emailQueue.add('follow-up', payload, {
delay: 5 * 60 * 1000,
});
// High priority job (lower number = higher priority)
await emailQueue.add('password-reset', payload, {
priority: 1, // 1 = highest, default 0 = lowest
});
// Unique job — prevent duplicates by key
await emailQueue.add('daily-digest', payload, {
jobId: `digest:${userId}:${todayISO}`,
});
// Repeatable job — cron-style
await emailQueue.add('weekly-report', payload, {
repeat: {
pattern: '0 9 * * 1', // 9 AM every Monday
tz: 'Asia/Saigon',
},
});
Step 3: Create Workers
Workers are separate processes (or at least separate Node.js threads). In production, run them on dedicated infrastructure so a queue backlog doesn't steal resources from your API server.
// src/workers/emailWorker.ts
import { Worker, Job } from 'bullmq';
import { redisConnection } from '../redis';
import { sendEmail } from '../services/email';
const emailWorker = new Worker(
'email',
async (job: Job) => {
console.log(`Processing job ${job.id}: ${job.name}`);
switch (job.name) {
case 'order-confirmation':
await sendEmail({
to: job.data.customerEmail,
subject: `Order #${job.data.orderId} Confirmed`,
template: 'order-confirmation',
data: job.data,
});
break;
case 'follow-up':
await sendEmail({
to: job.data.customerEmail,
subject: 'How was your order?',
template: 'follow-up',
data: job.data,
});
break;
default:
throw new Error(`Unknown job name: ${job.name}`);
}
return { sentAt: new Date().toISOString() };
},
{
connection: redisConnection,
concurrency: 10, // Process up to 10 jobs simultaneously
}
);
emailWorker.on('completed', (job, result) => {
console.log(`Job ${job.id} completed:`, result);
});
emailWorker.on('failed', (job, err) => {
console.error(`Job ${job?.id} failed:`, err.message);
});
emailWorker.on('error', (err) => {
console.error('Worker error:', err);
});
Concurrency Strategy in 2026
// For CPU-bound work: 1-2x vCPUs
const imageWorker = new Worker('image-processing', processor, {
connection: redisConnection,
concurrency: 2,
});
// For I/O-bound work (email, HTTP calls): 20-50x
const emailWorker = new Worker('email', processor, {
connection: redisConnection,
concurrency: 50,
});
// For rate-limited APIs (e.g., Twilio): rate limit at queue level
const smsWorker = new Worker('sms', processor, {
connection: redisConnection,
concurrency: 1,
limiter: {
max: 10, // 10 jobs
duration: 1000, // per second
},
});
Step 4: Handling Retries and Dead Letter Queues
BullMQ doesn't have a built-in "dead letter queue" (DLQ) — but you can implement one with the failed event:
// src/workers/emailWorker.ts
import { Queue, Worker, Job } from 'bullmq';
import { redisConnection } from '../redis';
// DLQ: a separate queue for permanently failed jobs
const deadLetterQueue = new Queue('email:dlq', {
connection: redisConnection,
});
const emailWorker = new Worker('email', processor, {
connection: redisConnection,
concurrency: 10,
});
emailWorker.on('failed', async (job: Job | undefined, err: Error) => {
if (!job) return;
// Only send to DLQ after all retries exhausted
if (job.attemptsMade >= (job.opts.attempts ?? 1)) {
await deadLetterQueue.add(
job.name,
{
...job.data,
_failedReason: err.message,
_failedAt: new Date().toISOString(),
_originalJobId: job.id,
},
{ removeOnComplete: false }
);
// Alert your team
await alerting.notify({
channel: 'engineering',
message: `Job ${job.name} (${job.id}) permanently failed: ${err.message}`,
severity: 'high',
});
}
});
DLQ jobs can be inspected, replayed, or escalated. Think of it as a safety net for your most critical async operations.
Step 5: Flow Producers — DAG-Style Job Dependencies
BullMQ's FlowProducer (introduced in v3, matured in v5) lets you define job trees where parent jobs wait for all children to complete:
// src/queues/flows.ts
import { FlowProducer } from 'bullmq';
import { redisConnection } from '../redis';
const flow = new FlowProducer({ connection: redisConnection });
// Example: Process a video upload
// 1. Extract audio (child)
// 2. Generate thumbnails (child)
// 3. Both must finish before: Publish video (parent)
await flow.add({
name: 'publish-video',
queueName: 'video',
data: { videoId: '123', userId: 'abc' },
children: [
{
name: 'extract-audio',
queueName: 'video',
data: { videoId: '123' },
},
{
name: 'generate-thumbnails',
queueName: 'video',
data: { videoId: '123', count: 5 },
},
],
});
The parent publish-video job won't run until both children complete successfully. If either child fails (after all retries), the parent is also marked failed.
Step 6: OpenTelemetry Telemetry Support (New in 2026)
BullMQ 5 announced telemetry support on January 29, 2026. You can now trace job execution end-to-end using the bullmq-otel package:
// src/telemetry.ts
import { NodeSDK } from '@opentelemetry/sdk-node';
import { OTLPTraceExporter } from '@opentelemetry/exporter-trace-otlp-http';
import { Resource } from '@opentelemetry/resources';
const sdk = new NodeSDK({
resource: new Resource({ 'service.name': 'job-worker' }),
traceExporter: new OTLPTraceExporter({
url: process.env.OTEL_EXPORTER_OTLP_ENDPOINT || 'http://localhost:4318/v1/traces',
}),
});
sdk.start();
// src/workers/emailWorker.ts
import { Worker } from 'bullmq';
import { BullMQOtel } from 'bullmq-otel';
import { redisConnection } from '../redis';
const emailWorker = new Worker('email', processor, {
connection: redisConnection,
concurrency: 10,
telemetry: new BullMQOtel('email-worker'), // 👈 One line to enable tracing
});
This gives you distributed traces showing:
- Time in queue (waiting)
- Processing duration
- Retry attempts
- Which worker instance handled the job
Connect to Jaeger, Grafana Tempo, or Datadog — it all speaks OpenTelemetry.
Step 7: Monitoring Your Queues
Bull Board (UI Dashboard)
npm install @bull-board/express @bull-board/api
// src/dashboard.ts
import { createBullBoard } from '@bull-board/api';
import { BullMQAdapter } from '@bull-board/api/bullMQAdapter';
import { ExpressAdapter } from '@bull-board/express';
import { emailQueue } from './queues/emailQueue';
const serverAdapter = new ExpressAdapter();
serverAdapter.setBasePath('/admin/queues');
createBullBoard({
queues: [new BullMQAdapter(emailQueue)],
serverAdapter,
});
// Protect with auth middleware in production!
app.use('/admin/queues', authMiddleware, serverAdapter.getRouter());
Programmatic Queue Metrics
import { Queue } from 'bullmq';
async function getQueueHealth(queue: Queue) {
const [waiting, active, completed, failed, delayed] = await Promise.all([
queue.getWaitingCount(),
queue.getActiveCount(),
queue.getCompletedCount(),
queue.getFailedCount(),
queue.getDelayedCount(),
]);
return { waiting, active, completed, failed, delayed };
}
// Expose as a health endpoint
app.get('/health/queues', async (req, res) => {
const emailHealth = await getQueueHealth(emailQueue);
const isHealthy = emailHealth.failed < 100 && emailHealth.waiting < 10000;
res.status(isHealthy ? 200 : 503).json({
status: isHealthy ? 'healthy' : 'degraded',
queues: { email: emailHealth },
});
});
Production Architecture Checklist
Before shipping your job queue to production in 2026:
Infrastructure
- [ ] Redis 7.x with persistence (
appendonly yes) - [ ] Redis Sentinel or Cluster for high availability
- [ ] Workers run as separate processes (not embedded in API server)
- [ ] Workers deployed with a process manager (PM2, systemd, Docker)
Reliability
- [ ]
maxRetriesPerRequest: nullon all BullMQ Redis connections - [ ] Exponential backoff configured on defaultJobOptions
- [ ] Dead letter queue implemented for critical jobs
- [ ]
removeOnCompleteandremoveOnFailset (avoid Redis memory bloat)
Observability
- [ ] Bull Board or equivalent UI deployed behind auth
- [ ] Queue depth metrics exported to your monitoring stack
- [ ] Worker
failedevents send alerts - [ ] OpenTelemetry tracing enabled (BullMQ 5+)
Operations
- [ ] Graceful shutdown:
worker.close()on SIGTERM - [ ] Job deduplication for idempotent jobs (use
jobId) - [ ] Rate limiters configured for external API jobs
// Graceful shutdown
process.on('SIGTERM', async () => {
await emailWorker.close();
await redisConnection.quit();
process.exit(0);
});
Common Pitfalls
1. Blocking the event loop in a worker
Never do synchronous CPU-heavy work in a Worker — it blocks all concurrent jobs.
// ❌ Bad — blocks event loop
const result = crypto.pbkdf2Sync(password, salt, 100000, 64, 'sha512');
// ✅ Good — async/non-blocking
const result = await new Promise((resolve, reject) =>
crypto.pbkdf2(password, salt, 100000, 64, 'sha512', (err, key) =>
err ? reject(err) : resolve(key)
)
);
2. Stale job data
Don't store large payloads in job data. Store references (IDs), not full objects.
// ❌ Bad — 50KB user object in every job
await queue.add('export', { user: fullUserObject });
// ✅ Good — just the ID, fetch fresh in the worker
await queue.add('export', { userId: '123' });
// In worker: const user = await db.users.findById(job.data.userId);
3. Not handling worker crashes
Always implement the error event listener on your worker. Otherwise, unhandled crashes are silent.
worker.on('error', (err) => {
// Log to Sentry, Datadog, etc.
Sentry.captureException(err);
});
Scaling Workers Horizontally
One of BullMQ's strengths is horizontal scaling. Multiple workers connecting to the same Redis queue automatically distribute load — no extra configuration needed.
# docker-compose.yml (simplified)
services:
redis:
image: redis:7-alpine
volumes:
- redis-data:/data
command: redis-server --appendonly yes
api:
build: .
command: node dist/server.js
environment:
- REDIS_HOST=redis
worker:
build: .
command: node dist/workers/emailWorker.js
environment:
- REDIS_HOST=redis
deploy:
replicas: 3 # Scale workers independently
Scale worker replicas independently of your API. During a traffic spike, docker-compose up --scale worker=10 and your queue drains 10x faster — no code changes required.
Wrapping Up
BullMQ 5 in 2026 is a mature, production-ready job queue system that solves the fundamental problem: don't make your users wait for things that can happen in the background.
The key patterns to take away:
- Use queues for anything that takes >200ms or talks to external services
- Run workers as separate processes with proper concurrency tuning
- Implement dead letter queues for critical jobs
- Use FlowProducer for multi-step job pipelines
- Enable OpenTelemetry tracing for production visibility
- Monitor queue depth as a first-class health signal
If you're building API products, check out 1xAPI on RapidAPI for ready-made API endpoints you can integrate without the queue complexity.
Building something with BullMQ? Drop your questions in the comments below.
Top comments (0)