DEV Community

Cover image for The Hidden Machinery Behind Background Jobs
Adam - The Developer
Adam - The Developer

Posted on

The Hidden Machinery Behind Background Jobs

This will be a boring article today, but I wanted to write it after talking to a developer who genuinely believed background jobs are the key to faster software.

They’re not hard to add. They’re easy to write, easy to deploy, and deceptively minimal, depending on how much complexity you’re willing to ignore elsewhere.

Let’s talk about something most of us use every day but rarely think about.

You enqueue a job (or, in my language, “drop something into a queue”). Your API returns 202 Accepted in 3 milliseconds. It feels good, responsive, non-blocking, event-driven. All the right words.

But between queue.enqueue() and a worker picking it up, a lot happens across your app, the OS, the network stack, and the broker. Most of it is invisible. All of it matters when something breaks at 2am.

This is that journey, layer by layer. By the end, you’ll have a clearer picture of what “fast” and “non-blocking” actually mean when a queue is involved.

Table of Contents


What We Think Is Happening

Most of us carry a mental model that looks roughly like this:

Producer  →  [ Queue ]  →  Worker
Enter fullscreen mode Exit fullscreen mode

Clean. Simple. And not wrong, exactly, just kinda incomplete. The full picture looks more like:

Producer
  → serialize payload to bytes
  → write bytes to TCP socket
  → kernel buffers → NIC → wire
  → broker receives bytes
  → broker deserializes, routes, stores message
  → broker sends ACK to producer
  → worker connects and subscribes
  → broker reads message from storage
  → broker sends to worker over TCP
  → worker deserializes → your function runs
  → worker sends ACK back to broker
  → broker marks message delivered, removes it
Enter fullscreen mode Exit fullscreen mode

Every one of those arrows is real work. Let's walk through each one.


Layer 1: Your Application Code

Here is a fairly typical job enqueue in TypeScript using BullMQ:

import { Queue } from 'bullmq';

const emailQueue = new Queue('emails', {
  connection: { host: 'localhost', port: 6379 }
});

await emailQueue.add('sendWelcome', {
  userId: 'usr_abc123',
  to: 'user@example.com',
  template: 'welcome-v2',
});

console.log('Job enqueued'); // Returns in ~3ms
Enter fullscreen mode Exit fullscreen mode

That await resolves fast. But a few things happened before it did.

Serialization. Your payload object got converted to a flat sequence of bytes. In BullMQ's case, this is JSON. Every nested field, every string, every number, reduced to a string that can cross a process boundary. Your clean TypeScript object does not survive the trip; its data does.

Message wrapping. The payload gets wrapped in an envelope before it goes anywhere. Job ID, timestamp, retry count, delay, priority, queue name. Depending on the broker, this metadata overhead can be comparable in size to your actual payload for small messages.

Connection reuse. BullMQ does not open a new TCP connection to Redis for every job. It keeps a pool of connections alive. Enqueuing borrows one of those connections, writes the message bytes, and returns it. If every connection in the pool is busy, your await waits here. Not in some clean async queue, but blocked, waiting for a connection to free up.

So even before a byte has left your machine, your "fast, non-blocking enqueue" has already done serialization, object allocation, pool management, and potentially some waiting. Still fast. Just not free.


Layer 2: The Network Stack and the Kernel

When the library writes to the socket, your application hands control to the OS kernel. This boundary has a name: a syscall. Roughly speaking, it is your code saying "I need to do something I am not allowed to do myself" and the kernel taking over.

On Linux, that write looks like this at the system level:

// What your queue library is ultimately triggering
ssize_t bytes_written = write(socket_fd, message_bytes, message_length);
Enter fullscreen mode Exit fullscreen mode

A few things happen in sequence:

  1. Your bytes are copied from your process's memory into the kernel's TCP send buffer. This is a separate memory region. Once copied, your code cannot touch those bytes anymore.
  2. The kernel's TCP stack breaks the data into segments, adds sequence numbers and checksums, and hands them to the network driver.
  3. The NIC DMA-transfers the data to its own buffer and puts it on the wire.

the important part here: write() returns before the data is confirmed received by the broker. It returns when the kernel has accepted the bytes. The actual network transfer happens asynchronously behind your process's back.

When your enqueue call returns quickly, you have confirmed the kernel accepted your bytes. Not that the broker received them.


Layer 3: Inside the Broker

This is where things differ meaningfully depending on which broker you are using. Let's look at three common ones.

Redis

Redis works entirely in memory. When the message arrives over TCP, Redis's event loop reads the bytes from the socket, deserializes the command, and modifies an in-memory data structure. For a BullMQ job, this is an entry in a Redis sorted set or list, depending on job type.

No disk is touched in the default path. This is why Redis is fast. It is also the catch: a crash or restart with default settings means everything in memory is gone. Your enqueued jobs, vanished, with no error and no log entry.

Redis does support persistence (RDB snapshots, AOF logging), but these are asynchronous by default. There is always a small window between when you enqueue and when that data is safely written to disk. That window is your risk to size and accept consciously.

RabbitMQ

RabbitMQ has a more layered approach to storage. A message arrives via AMQP, gets deserialized, passes through the exchange's routing logic (which queue does this message belong to, based on binding keys and patterns), and lands in the target queue.

Each message carries more weight than it looks. A payload of 1KB actually occupies roughly 2KB in RabbitMQ's memory once internal metadata is factored in. The internal database keeps an in-memory copy of all data, even on disk nodes.

For persistent messages, RabbitMQ writes to disk before sending the ACK back to your producer. If memory pressure climbs above a configurable threshold, RabbitMQ starts paging messages to disk. If it keeps climbing, RabbitMQ does something that surprises most people who have not read the docs: it blocks publishers. Stops reading from their TCP sockets entirely. Your producer's write() call hangs. Your enqueue appears to freeze. The queue is not broken. It is telling you it is full and it cannot accept more until workers catch up.

Kafka

Kafka treats the problem differently at a fundamental level. A message is not stored in a data structure in memory. It is appended sequentially to a log file on disk.

Sequential disk writes on modern SSDs and NVMe drives are extremely fast because they avoid seek times entirely. This is where much of Kafka's throughput comes from. Consumer progress is tracked as a simple integer, an offset into that log file. To read the next message, a consumer increments its offset and fetches. To replay messages, it resets the offset.

This model means multiple independent consumers can read the same data, message replay is trivially possible, and retention is time-based rather than acknowledgment-based. It also means Kafka is disk-heavy and its throughput degrades with larger messages. NVMe storage is not optional for serious Kafka deployments.


Layer 4: The Event Loop, and What Is Actually Powering It

This is the part that ties everything together. Your broker, your async worker, your web framework: all of them are built on the same foundational mechanism.

To understand it, we need to understand the problem it exists to solve.

A server that handles connections by blocking on a socket read can only handle one connection per thread. To handle 10,000 concurrent connections, you would need 10,000 threads. A thread typically needs 1 to 8 MB of stack space. That is a lot of RAM spent on processes that are almost entirely idle, just waiting for data.

The solution is I/O multiplexing: a single thread monitors thousands of file descriptors and reacts only when one actually has data.

How It Evolved

select() was Unix's first answer. You pass the kernel a bitmask of file descriptors to watch. The kernel scans every one of them on every call to see which are ready:

// kernel scans ALL of these on every single call
fd_set read_fds;
FD_ZERO(&read_fds);
FD_SET(sock1, &read_fds);
FD_SET(sock2, &read_fds);
// ... up to 1024 max

select(max_fd + 1, &read_fds, NULL, NULL, &timeout);
// returns, then you scan the whole bitmask yourself to find who is ready
Enter fullscreen mode Exit fullscreen mode

This is O(n) per call, where n is your total number of watched file descriptors. With thousands of connections, this burns meaningful CPU just on scanning. Plus there is a hard ceiling of 1,024 descriptors. Fine for an early-nineties Unix workstation. Not fine for anything built in the last twenty years.

poll() removed the 1,024 limit but kept the linear scan. A step forward, not a solution.

epoll() is the modern Linux answer. The philosophy shifts: instead of asking the kernel "which of these are ready right now?", you register your file descriptors once and say "notify me when something changes."

// create an epoll instance
// the kernel allocates a red-black tree and a ready list internally
int epfd = epoll_create1(0);

// register a file descriptor once
// kernel inserts it into the red-black tree — O(log n)
struct epoll_event ev;
ev.events = EPOLLIN;       // wake me when data is available to read
ev.data.fd = socket_fd;
epoll_ctl(epfd, EPOLL_CTL_ADD, socket_fd, &ev);

// The event loop — this is what Node.js, Redis, Nginx all do at their core
struct epoll_event events[MAX_EVENTS];
while (1) {
    // process sleeps here, consuming zero CPU
    // Kernel wakes it when at least one FD has data
    int n = epoll_wait(epfd, events, MAX_EVENTS, -1);

    // n is only the count of *ready* file descriptors, not total registered ones
    for (int i = 0; i < n; i++) {
        handle_event(events[i].data.fd);
    }
}
Enter fullscreen mode Exit fullscreen mode

When epoll_create1() is called, the kernel allocates two internal structures: a red-black tree for all registered file descriptors (insertion and lookup in O(log n)), and a linked list for descriptors that currently have data ready.

When epoll_wait() is called, the process is suspended and consumes no CPU.

When a network packet arrives for one of your registered sockets, the NIC fires an interrupt. The kernel processes the incoming data and a callback moves that socket's file descriptor from the red-black tree to the ready list. epoll_wait() wakes up and returns only the descriptors that are ready.

With select, the kernel scans your entire list on every call regardless of how many are actually ready. With epoll, the kernel does the bookkeeping internally and only tells you about the ones you care about. For a server with 10,000 connected clients where 3 of them sent data right now, select scans 10,000, epoll returns 3.

This is why:

  • A single-threaded Node.js process can handle tens of thousands of concurrent connections.
  • Redis processes millions of operations per second on one thread.
  • Nginx serves enormous traffic without spawning thousands of OS threads.

Your event-driven architecture is, at its foundation, a process sleeping in the kernel, waiting to be woken by a network interrupt.


Layer 5: The Worker, Acknowledgments, and What "At Least Once" Actually Means

Below is a TypeScript worker consuming from the same queue:

import { Worker } from 'bullmq';

const worker = new Worker('emails', async (job) => {
  const { userId, to, template } = job.data;

  await sendEmail({ to, template, userId });
  // Ii we reach here without throwing, BullMQ sends the ACK automatically
  // if we throw, BullMQ sends a NACK and schedules a retry

}, {
  connection: { host: 'localhost', port: 6379 },
  concurrency: 5,   // how many jobs this worker handles in parallel
});
Enter fullscreen mode Exit fullscreen mode

Simple from the outside. Under the hood, the worker process opened a TCP connection to Redis, subscribed to the queue, and entered its own event loop. From the OS's perspective, it is a sleeping process blocked on a socket read, parked in the kernel's wait queue, consuming no CPU.

When Redis has a job ready, it sends data over TCP. The kernel receives the bytes, the socket becomes readable, the wait queue callback fires, the process wakes, BullMQ deserializes the job data, and your function is called.

When your function returns without throwing, BullMQ sends an ACK to Redis. Redis marks the job complete. If your function throws before that ACK is sent, the job stays in an active state. After a configurable timeout, it gets requeued.

This is at-least-once delivery. Your function will be called at minimum once. It may be called more than once. Here is why that matters in practice:

// Unsafe: if this crashes after the charge but before the receipt,
// the job retries and the customer gets charged twice
async function processPaymentJob(job: Job) {
  await chargeCustomer(job.data.customerId, job.data.amount);
  await sendReceipt(job.data.email);
}

// Safer: the idempotency key ensures a duplicate call
// is treated as the same operation, not a new one
async function processPaymentJob(job: Job) {
  await chargeCustomer(job.data.customerId, job.data.amount, {
    // Idempotency is not a nice-to-have in a queue-based system. 
    // It is a correctness requirement for any handler that makes side effects in the outside world.
    idempotencyKey: job.id,
  });
  await sendReceipt(job.data.email);
}
Enter fullscreen mode Exit fullscreen mode

Layer 6: The Things That Quietly Go Wrong

Now that we can see the full path, these failure modes are a lot less surprising.

The queue fills up and nobody is watching. Workers fall behind. Messages pile up. RabbitMQ reaches its memory threshold and stops accepting new messages. Your producers' TCP writes start hanging. Your API response times climb. Engineers look for a slow database query. Add queue depth and consumer lag to your dashboards and save everyone a confusing hour.

Data loss because persistence was not configured. Redis with default settings holds everything in memory. A clean deployment restart flushes the queue. No error is thrown because nothing went wrong from the broker's perspective. The data simply stops existing. Enable AOF persistence and understand the durability tradeoff you are making before your first production incident.

The dead letter queue fills up unnoticed. A dead letter queue is where failed jobs end up after repeated retries. It’s easy to ignore because nothing actively lives there by default. But that’s exactly the problem. If you don’t build a consumer for it, and don’t monitor its depth, you can quietly accumulate months of failed work without realizing it. By the time you notice, you’re just gonna be digging through history of dead queues.

Duplicate processing because the handler is not idempotent. A worker completes its work, then crashes before ACKing. The broker redelivers. The external side effect runs twice. This has caused real double-charges, duplicate emails, and inconsistent records at real companies.

A blocking call inside an async worker. If your handler makes a synchronous blocking call, it stalls the event loop for the duration. No other messages are processed. Throughput drops to the speed of the slowest synchronous operation. Use async drivers for databases, HTTP clients, and file I/O inside workers.


What "Fast" Actually Means

When we say a queue makes an endpoint faster, we mean something specific: the HTTP response no longer waits for the work to finish. The work itself takes the same amount of time. Often more, when you factor in serialization, network round trips to the broker, storage, retrieval, deserialization, and worker execution.

What changed is when the user gets their response. Not how long the work takes.

This is genuinely valuable. Decoupling user-facing latency from background work is a real improvement. But it is a different thing than making work faster, and conflating the two leads to queues in places that add complexity without benefit, and missing queues in places where the decoupling would genuinely help.

A queue earns its place when the work is not needed to form the response, the work is slow enough that waiting feels bad, the work is safe to retry, and the producer and consumer need to scale independently.

A queue adds complexity without clear payoff when you need the result to respond, your latency budget is too tight for broker round trips, or the operation is fast enough that doing it inline is simpler and easier to observe.


The Full Journey, Summarized

Your TypeScript code
  ↓  JSON.stringify → bytes allocated in your process memory
  ↓  write() syscall → bytes copied to kernel TCP send buffer
  ↓  kernel TCP stack → NIC DMA transfer → wire

Broker process (Redis / RabbitMQ / Kafka)
  ↓  epoll_wait() wakes on incoming socket data
  ↓  deserialize bytes → route → store (RAM / disk / log file)
  ↓  ACK sent back to producer over TCP

Worker process
  ↓  sleeping in kernel wait queue, zero CPU consumption
  ↓  broker pushes job data over TCP
  ↓  kernel wakes worker via socket callback
  ↓  deserialize → your async handler function runs
  ↓  ACK on success → broker removes job
  ↓  throw on failure → broker requeues after timeout
Enter fullscreen mode Exit fullscreen mode

Every arrow is real work with real cost and real failure modes. None of it is magic. All of it is understandable, and understanding it makes you a meaningfully better engineer of the systems that depend on it.


Closing

Queues and background jobs are genuinely good tools. The resilience they add, the ability to absorb traffic spikes, retry failures gracefully, and let producers and consumers scale independently — all of that is real and worth the added complexity.

The point is not to be wary of them. It is to understand what you are working with so you can configure them thoughtfully, monitor the right things, and know where to look when something behaves unexpectedly.

The queue depth is a metric worth watching. The dead letter queue is an inbox worth reading. The ACK timeout is a contract with the broker worth understanding.

Keep an eye on those three and your queues will mostly be a quiet, reliable part of your system doing exactly what they promised.

Top comments (0)