How to Use Every Core on your Machine using NodeJS
Each job takes seconds to complete which is expensive in the long run. Now it takes less than a minute for 3000 jobs! This is the final result.
After discovering this concept, I immediately want to test its full capability. Unfortunately, the existing libraries are overly complex and/or lack true parallel capabilities.
I want a package that is perfect for small projects. Something that provides a job queue without relying on databases or the filesystem while proving obvious performance benefits.
Many packages are half-baked implementation of concurrency. For example, some packages have code that look like this.
The above code is incorrect because it leaves out some common edge cases:
- What happens if the pool must terminate abruptly?
- What happens if the amount of jobs is fewer than the thread count?
- What if one job takes significantly longer than the other?
The last question is the nail in the coffin. If most jobs take 2 seconds to process, but one takes 3 hours, then the entire pool must wait for 3 hours until all the workers are freed up.
Some libraries work around this problem by spawning additional workers, but that means the developer lacks full control over the number of workers. The pool should be deterministic.
Promise.all is blocking, I immediately thought that
Promise.race must be the answer to true parallelism, but I was wrong. Actually, no
Promise methods alone are sufficient enough for multitasking.
So it's settled,
Promise.race is likely the solution, and
Promise.any is flawed because
Promise.any must sucessfully complete at least on promise, or wait for all to fail.
What happens if all jobs fail besides one that takes 3 hours? Again, the entire pool must wait 3 hours before the job completes or causes an
Promise.race is not the correct solution either. Sure, it solves the problem of hanging workers, but there is another edge case. How will you retrieve the result from multiple workers if the quickest promise is the only one handled? After all, quick is not always right.
The solution to the
Promise.race problem is workers themselves. It does not matter when the promise resolves because the worker is running in the background.
My solution is, every worker takes a thread id from the pool. When the worker finishes executing it gives the id back. This allows the pool to dynamically allocate threads.
The last goal is halting all pool execution. Even if there is a 3-hour-long job running, it halts immediately. Honestly, this is more difficult to figure out than the other problems with promises.
My first instinct is rejecting the promise, but this is problematic. I noticed that passing
reasons through the
reject call meant
Promise.race can only resolve one
reason. Yet, promising all reasons puts me back to the drawing board.
Even worse, rejecting the promise allows the main event loop to terminate, but the workers turn into zombies! 3 hours later-- worker output is still clogging your terminal!
Thankfully, I made the discovery. Threads must explicitly terminate the worker. This makes the termination process completely deterministic thus no data compromising. The promise resolves after the job promise race settles.
All the tests pass and I met my goals! The pool of workers executes jobs asynchronously without any external tools. It’s on NPM. If you are interested in how to use the library, keep reading!
npm install jpool
The amount of threads is variable, and all states are deterministic. A job will either pass, fail, or halt. This allows the pool to gracefully shut down or quit abruptly without zombies or runaway processes.
Each terminal window is processing the same set of jobs. From left to right, the programs use 1, 8, and 256 workers. Threads increase memory usage, but the benefits are worth it!
The documentation needs work, otherwise, the package seems stable for v1.0.0. If you want to help, I am accepting PRs. Thank you for reading!