Here's the article body markdown:
Your API returns 200. The user sees "success." Behind the scenes, nothing happened because your email service timed out and took the request handler down with it.
Stop processing heavy work inline. Use a job queue.
The Problem With Synchronous Processing
Every second your request handler spends on non-essential work is a second that connection is occupied. Send an email, generate a PDF, resize an image — do any of that synchronously and you're begging for:
- Request timeouts under load
- Cascading failures when downstream services go down
- Zero visibility into what failed and why
- No retry mechanism
The fix is dead simple: accept the request, enqueue the work, return immediately. Process it in the background.
BullMQ + Redis: The Setup
BullMQ is the successor to Bull. It's built on Redis Streams, supports TypeScript natively, and handles everything you'd want from a production queue.
npm install bullmq ioredis
// src/lib/queue.ts
import { Queue, Worker, QueueScheduler } from 'bullmq';
import IORedis from 'ioredis';
const connection = new IORedis({
host: process.env.REDIS_HOST || '127.0.0.1',
port: 6379,
maxRetriesPerRequest: null, // required by BullMQ
});
export const emailQueue = new Queue('email', { connection });
export const reportQueue = new Queue('reports', { connection });
That's your queue. Now enqueue a job from your route handler:
// src/routes/signup.ts
import { emailQueue } from '../lib/queue';
router.post('/signup', async (req, res) => {
const user = await createUser(req.body);
await emailQueue.add('welcome-email', {
userId: user.id,
email: user.email,
});
res.status(201).json({ id: user.id });
// Email sends in background. Response is instant.
});
Job Types That Actually Matter
Delayed jobs — charge a card 30 minutes after signup (lets users cancel):
await billingQueue.add('trial-charge', { userId }, {
delay: 30 * 60 * 1000, // 30 minutes
});
Recurring jobs — daily digest, cleanup cron:
await reportQueue.add('daily-digest', {}, {
repeat: { pattern: '0 9 * * *' }, // 9 AM daily
});
Priority jobs — paid users get processed first:
await emailQueue.add('password-reset', { userId }, {
priority: 1, // lower number = higher priority
});
await emailQueue.add('marketing-blast', { campaignId }, {
priority: 10,
});
Workers: Concurrency and Rate Limiting
A worker pulls jobs off the queue and processes them. You control how many run in parallel.
// src/workers/email.worker.ts
import { Worker } from 'bullmq';
import { connection } from '../lib/queue';
import { sendEmail } from '../lib/mailer';
const worker = new Worker('email', async (job) => {
switch (job.name) {
case 'welcome-email':
await sendEmail({
to: job.data.email,
template: 'welcome',
vars: { userId: job.data.userId },
});
break;
case 'password-reset':
await sendEmail({
to: job.data.email,
template: 'reset',
vars: { token: job.data.token },
});
break;
}
}, {
connection,
concurrency: 5, // 5 jobs in parallel
limiter: {
max: 100, // max 100 jobs
duration: 60 * 1000, // per minute
},
});
The limiter is essential when your downstream service has rate limits. Without it, 10 workers with concurrency 5 will hammer your SMTP server with 50 simultaneous requests.
Failure and Retry Strategies
Jobs fail. Networks flake. Services go down. Your retry config decides whether things self-heal or page you at 3 AM.
await emailQueue.add('welcome-email', { userId }, {
attempts: 5,
backoff: {
type: 'exponential',
delay: 2000, // 2s, 4s, 8s, 16s, 32s
},
removeOnComplete: { age: 24 * 3600 }, // cleanup after 24h
removeOnFail: { age: 7 * 24 * 3600 }, // keep failures 7 days
});
Handle failures explicitly in your worker:
worker.on('failed', (job, err) => {
logger.error(`Job ${job?.id} failed: ${err.message}`, {
queue: 'email',
jobName: job?.name,
attemptsMade: job?.attemptsMade,
data: job?.data,
});
if (job?.attemptsMade === job?.opts.attempts) {
// Final failure — alert on-call
alerting.notify(`Email job permanently failed: ${job.id}`);
}
});
Use exponential backoff for transient failures (network, rate limits). Use fixed backoff when retrying makes equal sense at any interval. Set removeOnComplete aggressively — stale completed jobs bloat Redis memory.
Monitoring: Bull Board
You need a dashboard. Bull Board gives you one in minutes.
// src/admin/queue-dashboard.ts
import { createBullBoard } from '@bull-board/api';
import { BullMQAdapter } from '@bull-board/api/bullMQAdapter';
import { ExpressAdapter } from '@bull-board/express';
import { emailQueue, reportQueue } from '../lib/queue';
const serverAdapter = new ExpressAdapter();
serverAdapter.setBasePath('/admin/queues');
createBullBoard({
queues: [
new BullMQAdapter(emailQueue),
new BullMQAdapter(reportQueue),
],
serverAdapter,
});
// Mount in your Express app
app.use('/admin/queues', authMiddleware, serverAdapter.getRouter());
Put authMiddleware in front of it. Exposing your job queue to the internet is a security incident waiting to happen.
For production, also emit metrics to Prometheus or Datadog:
worker.on('completed', (job) => {
metrics.histogram('job.duration', job.processedOn! - job.timestamp, {
queue: 'email', name: job.name,
});
});
worker.on('failed', () => {
metrics.increment('job.failed', { queue: 'email' });
});
Production Architecture
Run workers as separate processes, not inside your API server. This gives you:
- Independent scaling (3 API pods, 1 worker pod)
- Isolation (a worker OOM doesn't kill your API)
- Independent deploys
# Dockerfile.worker
FROM node:20-alpine
WORKDIR /app
COPY dist/ ./dist/
COPY package*.json ./
RUN npm ci --production
CMD ["node", "dist/workers/index.js"]
In Kubernetes, your worker deployment scales on queue depth:
apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
name: email-worker-hpa
spec:
scaleTargetRef:
apiVersion: apps/v1
kind: Deployment
name: email-worker
minReplicas: 1
maxReplicas: 10
metrics:
- type: External
external:
metric:
name: bullmq_queue_waiting
selector:
matchLabels:
queue: email
target:
type: AverageValue
averageValue: "50"
Common Mistakes
Running workers inside the API process. Worker crashes bring down your API. Always separate processes.
No dead letter handling. When a job exhausts all retries, it sits in the failed set forever. Log it, alert on it, or move it to a dead letter queue for manual inspection.
Ignoring Redis memory. Every completed job stays in Redis unless you set removeOnComplete. A queue doing 10K jobs/day will eat gigabytes within weeks. Set aggressive TTLs.
Not making jobs idempotent. If a job runs twice (retry after timeout), it shouldn't send two emails or charge twice. Use idempotency keys or check state before acting.
Giant job payloads. Don't stuff a 5MB PDF into the job data. Store it in S3, put the key in the job. Redis is not a blob store.
Missing health checks. If your worker dies silently, jobs pile up. Monitor queue depth. Alert when waiting count exceeds a threshold for more than N minutes.
Part of my Production Backend Patterns series. Follow for more practical backend engineering.
If this was useful, consider:
- Sponsoring on GitHub to support more open-source tools
- Buying me a coffee on Ko-fi
You Might Also Like
- BullMQ Job Queues in Node.js: Background Processing Done Right (2026 Guide)
- API Rate Limiting with Redis: Token Bucket, Sliding Window, and Per-Client Limits
- Graceful Shutdown in Node.js: Stop Dropping Requests (2026)
Follow me for more production-ready backend content!
If this helped you, buy me a coffee on Ko-fi!
Top comments (0)