DEV Community

Cover image for My experiences with concurrency while writing an NPM package.
Wade Zimmerman
Wade Zimmerman

Posted on • Originally published at wadecodez.Medium

My experiences with concurrency while writing an NPM package.

How to Use Every Core on your Machine using NodeJS

Each job takes seconds to complete which is expensive in the long run. Now it takes less than a minute for 3000 jobs! This is the final result.

image

Background

You have probably used other languages that have developer-friendly ways to multitask complex jobs. Unfortunately, doing this in JavaScript has always been complicated.

For the longest time, JavaScript and NodeJS were limited by the event loop. Code executes asynchronously, but not in true parallel fashion. However, that changed with the release of worker threads in NodeJS.

After discovering this concept, I immediately want to test its full capability. Unfortunately, the existing libraries are overly complex and/or lack true parallel capabilities.

Goal

I want a package that is perfect for small projects. Something that provides a job queue without relying on databases or the filesystem while proving obvious performance benefits.

Problem

Many packages are half-baked implementation of concurrency. For example, some packages have code that look like this.

The above code is incorrect because it leaves out some common edge cases:

  • What happens if the pool must terminate abruptly?
  • What happens if the amount of jobs is fewer than the thread count?
  • What if one job takes significantly longer than the other?

The last question is the nail in the coffin. If most jobs take 2 seconds to process, but one takes 3 hours, then the entire pool must wait for 3 hours until all the workers are freed up.

Some libraries work around this problem by spawning additional workers, but that means the developer lacks full control over the number of workers. The pool should be deterministic.

Initial Solutions

Since Promise.all is blocking, I immediately thought that Promise.any or Promise.race must be the answer to true parallelism, but I was wrong. Actually, no Promise methods alone are sufficient enough for multitasking.

So it's settled, Promise.race is likely the solution, and Promise.any is flawed because Promise.any must sucessfully complete at least on promise, or wait for all to fail.

What happens if all jobs fail besides one that takes 3 hours? Again, the entire pool must wait 3 hours before the job completes or causes an Aggregate Error.

Unfortunately, Promise.race is not the correct solution either. Sure, it solves the problem of hanging workers, but there is another edge case. How will you retrieve the result from multiple workers if the quickest promise is the only one handled? After all, quick is not always right.

Jobs Hold the Thread

The solution to the Promise.race problem is workers themselves. It does not matter when the promise resolves because the worker is running in the background.

My solution is, every worker takes a thread id from the pool. When the worker finishes executing it gives the id back. This allows the pool to dynamically allocate threads.

Halting

The last goal is halting all pool execution. Even if there is a 3-hour-long job running, it halts immediately. Honestly, this is more difficult to figure out than the other problems with promises.

My first instinct is rejecting the promise, but this is problematic. I noticed that passing reasons through the reject call meant Promise.race can only resolve one reason. Yet, promising all reasons puts me back to the drawing board.

Even worse, rejecting the promise allows the main event loop to terminate, but the workers turn into zombies! 3 hours later-- worker output is still clogging your terminal!

Thankfully, I made the discovery. Threads must explicitly terminate the worker. This makes the termination process completely deterministic thus no data compromising. The promise resolves after the job promise race settles.

Project Success!

All the tests pass and I met my goals! The pool of workers executes jobs asynchronously without any external tools. It’s on NPM. If you are interested in how to use the library, keep reading!

npm install jpool
Enter fullscreen mode Exit fullscreen mode

Features

The amount of threads is variable, and all states are deterministic. A job will either pass, fail, or halt. This allows the pool to gracefully shut down or quit abruptly without zombies or runaway processes.

Basic Example (Main.js)

Cont. Example (Job.js)

See the Difference!

Each terminal window is processing the same set of jobs. From left to right, the programs use 1, 8, and 256 workers. Threads increase memory usage, but the benefits are worth it!

image

The end

The documentation needs work, otherwise, the package seems stable for v1.0.0. If you want to help, I am accepting PRs. Thank you for reading!

https://github.com/wadez/jpool

Oldest comments (1)

Collapse
 
invaderb profile image
Braydon Harris

This is a great little package nice job, can't wait to find a project to try it out on!