DEV Community

Building a Stateful, Session-Based Worker Tier on Heroku (Circa 2015)

In 2015, building real-time, compute-heavy web applications often meant navigating the limitations of ephemeral cloud environments. Heroku was the undisputed king of PaaS, but its router had a strict 30-second timeout. If you needed to process heavy, stateful datasets for an active user session, you couldn't do it on the web dyno.

The solution? A custom, cloud-native worker tier that spun up dedicated processes per user session, retained data in memory, and communicated asynchronously. Here is a look at how to architect this system using Node.js, Socket.IO, Redis, and the Heroku Platform API.


1. The Architecture: A Session-Based Worker Model

Unlike traditional background job queues (like Celery or Resque) where anonymous workers pick up stateless tasks, this architecture requires a 1:1 mapping between a user session and a worker process.

When a user connects via a WebSocket, the system provisions a dedicated worker. This worker loads the user's specific dataset into memory and waits for commands. Because the worker holds state, subsequent compute and filter operations happen with near-zero latency.

2. The Worker Provider: Local vs. Cloud Forking

To make this developer-friendly, the system needs an environment-aware "Provider" to handle worker provisioning.

Environment Strategy Mechanism
Local Development Node.js Child Process child_process.fork('worker.js')
Heroku Production One-Off Dynos Heroku Platform API (POST /apps/{app}/dynos)

In production: When the web dyno recognizes a new session, it makes an authenticated HTTP request to the Heroku Platform API to boot a one-off dyno (e.g., command: "node worker.js --session=xyz").
In development: To avoid API rate limits and speed up testing, the web process simply forks a local child process. An abstract WorkerFactory handles the logic, returning a uniform interface regardless of the underlying environment.

3. Communication: JSON-RPC, Socket.IO, and Redis Lists

Direct process-to-process communication across Heroku dynos isn't natively supported. Instead, Redis acts as the message broker.

  1. The Ingress: The client sends a command over Socket.IO to the web dyno.
  2. The Queue: The web dyno formats this as a JSON-RPC 2.0 payload and pushes it to a session-specific Redis list (e.g., session:xyz:queue) using LPUSH.
  3. The Polling: The worker dyno continuously polls this list.

Transactional Slice and Truncate

To safely dequeue messages without losing data if the worker crashes, the worker uses Redis transactions (MULTI / EXEC). Instead of popping one item at a time, the worker pulls the entire batch of pending commands and clears the queue atomically:

// Pseudocode for the worker polling loop
redis.multi()
  .lrange('session:xyz:queue', 0, -1) // Slice: Get all pending messages
  .ltrim('session:xyz:queue', 1, 0)   // Truncate: Empty the list
  .exec((err, results) => {
     const messages = results[0];
     if (messages.length > 0) {
        processMessages(messages);
     }
  });
Enter fullscreen mode Exit fullscreen mode

4. In-Memory State and Compute

Once the worker receives a command via the Redis queue, it processes it against the data held in memory. Loading the dataset from a database (like Postgres or MongoDB) happens exactly once when the worker boots.

Fast Filtering via Binary Masking

For complex, multi-faceted filtering (e.g., filtering a catalog of thousands of items by various overlapping criteria), the worker utilizes binary masking.

Instead of iterating through objects and checking string values, each entry is assigned a bitmask representing its attributes. Filter commands from the user are translated into a target bitmask. The worker then processes the compute command by applying bitwise operators (like AND &) against the in-memory array.

This approach takes advantage of V8's raw processing speed, allowing the worker to filter tens of thousands of records in single-digit milliseconds and push the result IDs back to the web dyno via a Redis pub/sub channel.

5. Bookkeeping and Session Cleanup

Because Heroku one-off dynos cost money (billed by the second), orphaned dynos are a serious risk. Robust bookkeeping is required.

  • Managing Workers: The web dyno maintains a Redis Hash mapping Session_ID -> Worker_Dyno_ID (or local PID).
  • Heartbeats: The worker periodically writes a heartbeat to a Redis key with an expiration (TTL).
  • Expiration & Cleanup: * If the client disconnects, the web dyno explicitly sends a "terminate" JSON-RPC command.
    • If the web dyno crashes and fails to send the termination command, the worker monitors its own idle time. If no commands are received and the WebSocket session heartbeat dies, the worker calls process.exit(0). On Heroku, exiting the process safely shuts down the one-off dyno, stopping the billing meter.

Summary

By combining the flexibility of Heroku's Platform API, the speed of Redis transactional polling, and the power of in-memory bitwise compute, 2015-era Node.js applications could achieve massive performance gains, bypassing HTTP timeouts and delivering real-time, heavy-duty data processing to the browser.

Top comments (0)