In this post I'm going to show you how to potentially triple your Node application's performance by managing multiple threads. This is an important tutorial, where the methods and examples shown, will give you what you need to set up production-ready thread management.
Watch The Video on YouTube
Child Processes, Clustering and Worker Threads
For the longest time, Node's had the ability to be multi-threaded, by using either Child Processes, Clustering, or the more recent preferred method of a module called Worker Threads.
Child processes were the initial means of creating multiple threads for your application and have been available since version 0.10. This was achieved by spawning a node process for every additional thread you wanted created.
Clustering, which has been a stable release since around version 4, allows us to simplify the creation and management of Child Processes. It works brilliantly when combined with PM2.
Now before we get into multithreading our app, there are a few points that you need to fully understand:
1. Multithreading already exists for I/O tasks
There is a layer of Node that's already multithreaded and that is the libuv thread-pool. I/O tasks such as files and folder management, TCP/UDP transactions, compression and encryption are handed off to libuv, and if not asynchronous by nature, get handled in the libuv's thread-pool.
2. Child Processes/Worker Threads only work for synchronous JavaScript logic
Implementing multithreading using Child Processes or Worker Threads will only be effective for your synchronous JavaScript code that's performing heavy duty operations, such as looping, calculations, etc. If you try to offload I/O tasks to Worker Threads as an example, you will not see a performance improvement.
3. Creating one thread is easy. Managing multiple threads dynamically is hard
Creating one additional thread in your app is easy enough, as there are tons of tutorials on how to do so. However, creating threads equivalent to the number of logical cores your machine or VM is running, and managing the distribution of work to these threads is way more advanced, and to code this logic is above most of our pay grades 😎.
Thank goodness we are in a world of open source and brilliant contributions from the Node community. Meaning, there is already a module that will give us full capability of dynamically creating and managing threads based on the CPU availability of our machine or VM.
Worker Pool
The module we will work with today is called Worker Pool. Created by Jos de Jong, Worker Pool offers an easy way to create a pool of workers for both dynamically offloading computations as well as managing a pool of dedicated workers. It's basically a thread-pool manager for Node JS, supporting Worker Threads, Child Processes and Web Workers for browser-based implementations.
To make use of the Worker Pool module in our application, the following tasks will need to be performed:
- Install Worker Pool
First we need to install the Worker Pool module - npm install workerpool
- Init Worker Pool
Next, we'll need to initialize the Worker Pool on launch of our App
- Create Middleware Layer
We'll then need to create a middleware layer between our heavy duty JavaScript logic and the Worker Pool that will manage it
- Update Existing Logic
Finally, we need to update our App to hand off heavy duty tasks to the Worker Pool when required
Managing Multiple Threads Using Worker Pool
At this point, you have 2 options: Use your own NodeJS app (and install workerpool and bcryptjs modules), or download the source code from GitHub for this tutorial and my NodeJS Performance Optimization video series.
If going for the latter, the files for this tutorial will exist inside the folder 06-multithreading. Once downloaded, enter into the root project folder and run npm install. After that, enter into the 06-multithreading folder to follow along.
In the worker-pool folder, we have 2 files: one is the controller logic for the Worker Pool (controller.js). The other holds the functions that will be triggered by the threads…aka the middleware layer I mentioned earlier (thread-functions.js).
worker-pool/controller.js
'use strict'
const WorkerPool = require('workerpool')
const Path = require('path')
let poolProxy = null
// FUNCTIONS
const init = async (options) => {
const pool = WorkerPool.pool(Path.join(__dirname, './thread-functions.js'), options)
poolProxy = await pool.proxy()
console.log(`Worker Threads Enabled - Min Workers: ${pool.minWorkers} - Max Workers: ${pool.maxWorkers} - Worker Type: ${pool.workerType}`)
}
const get = () => {
return poolProxy
}
// EXPORTS
exports.init = init
exports.get = get
The controller.js is where we require the workerpool module. We also have 2 functions that we export, called init and get. The init function will be executed once during the load of our application. It instantiates the Worker Pool with options we'll provide and a reference to the thread-functions.js. It also creates a proxy that will be held in memory for as long as our application is running. The get function simply returns the in memory proxy.
worker-pool/thread-functions.js
'use strict'
const WorkerPool = require('workerpool')
const Utilities = require('../2-utilities')
// MIDDLEWARE FUNCTIONS
const bcryptHash = (password) => {
return Utilities.bcryptHash(password)
}
// CREATE WORKERS
WorkerPool.worker({
bcryptHash
})
In the thread-functions.js file, we create worker functions that will be managed by the Worker Pool. For our example, we're going to be using BcryptJS to hash passwords. This usually takes around 10 milliseconds to run, depending on the speed of one's machine, and makes for a good use case when it comes to heavy duty tasks. Inside the utilities.js file is the function and logic that hashes the password. All we are doing in the thread-functions is executing this bcryptHash via the workerpool function. This allows us to keep code centralized and avoid duplication or confusion of where certain operations exist.
2-utilities.js
'use strict'
const BCrypt = require('bcryptjs')
const bcryptHash = async (password) => {
return await BCrypt.hash(password, 8)
}
exports.bcryptHash = bcryptHash
.env
NODE_ENV="production"
PORT=6000
WORKER_POOL_ENABLED="1"
The .env file holds the port number and sets the NODE_ENV variable to "production". It's also where we specify if we want to enable or disable the Worker Pool, by setting the WORKER_POOL_ENABLED to "1" or "0".
1-app.js
'use strict'
require('dotenv').config()
const Express = require('express')
const App = Express()
const HTTP = require('http')
const Utilities = require('./2-utilities')
const WorkerCon = require('./worker-pool/controller')
// Router Setup
App.get('/bcrypt', async (req, res) => {
const password = 'This is a long password'
let result = null
let workerPool = null
if (process.env.WORKER_POOL_ENABLED === '1') {
workerPool = WorkerCon.get()
result = await workerPool.bcryptHash(password)
} else {
result = await Utilities.bcryptHash(password)
}
res.send(result)
})
// Server Setup
const port = process.env.PORT
const server = HTTP.createServer(App)
;(async () => {
// Init Worker Pool
if (process.env.WORKER_POOL_ENABLED === '1') {
const options = { minWorkers: 'max' }
await WorkerCon.init(options)
}
// Start Server
server.listen(port, () => {
console.log('NodeJS Performance Optimizations listening on: ', port)
})
})()
Finally, our 1-app.js holds the code that will be executed on launch of our App. First we initialize the variables in the .env file. We then setup an Express server and create a route called /bcrypt. When this route is triggered, we will check to see if the Worker Pool is enabled. If yes, we get a handle on the Worker Pool proxy and execute the bcryptHash function that we declared in the thread-functions.js file. This will in turn execute the bcryptHash function in Utilities and return us the result. If the Worker Pool is disabled, we simply execute the bcryptHash function directly in Utilities.
At the bottom of our 1-app.js, you'll see we have a self calling function. We're doing this to support async/await, which we are using when interacting with the Worker Pool. Here is where we initialize the Worker Pool if it's enabled. The only config we want to override is setting the minWorkers to "max". This will ensure that the Worker Pool will spawn as many threads as there are logical cores on our machine, with the exception of 1 logical core, which is used for our main thread. In my case, I have 6 physical cores with hyperthreading, meaning I have 12 logical cores. So with minWorkers set to "max", the Worker Pool will create and manage 11 threads. Finally, the last piece of code is where we start our server and listen on port 6000.
Testing the Worker Pool
Testing the Worker Pool is as simple as starting the application and while it's running, preforming a get request to http://localhost:6000/bcrypt
. If you have a load testing tool like AutoCannon, you can have some fun seeing the difference in performance when the Worker Pool is enabled/disabled. AutoCannon is very easy to use.
Conclusion
I hope this tutorial has provided insight into managing multiple threads in your Node application. The embedded video at the top of this article provides a live demo of testing the Node App.
Till next time, cheers :)
Top comments (12)
Hi
Great article and I already watched your YouTube video some days ago.
Here's my doubt.
According to ur 2nd point , Worker threads/ child process can only process synchronous logic.
But bcrypt module is an asynchronous task
These 2 statements looks to me contradictory
Hi there thank you very much for watching and for the comment.
So, with my 2nd point...take note that I used the "bcryptjs", and not the well known "bcrypt" module, which would get offloaded to the libuv thread-pool because it runs on the OS level. "bcryptjs" is purposely designed to be written completely in JavaScript and to be synchronous, because it's actually a security feature to hash passwords synchronously and cause the delay.
I hope that clears up my 2nd point? Below is from their NPM Docs:
"While bcrypt.js is compatible to the C++ bcrypt binding, it is written in pure JavaScript and thus slower (about 30%), effectively reducing the number of iterations that can be processed in an equal time span."
yes, it makes all sense now.
Thanks for your efforts,
But I wonder what kind of such huge computational tasks exists especially on DB operations from Node.js programs so that I could use these techniques to improve the performance w.r.t to DB operations
Always a pleasure. Most of the time it's going to come to advanced business logic that needs to perform a series of processes on returned data from DB or 3rd party queries, etc.
What you always want to be doing is keeping the primary thread and event loop spinning as fast as it can and processing those incoming requests. The moment you have tasks that contain synchronous logic that cause even the slightest of delays, hand it over to Worker Pool 👍.
In my opinion, he write the bcryptHash function using async await statement, it mean the return from that function not a promise again, so the process was changed to synchronous. CMIIW.
hmm,
but in my opinion, i think if any function that surrounded by ASYNC statement would always return the promise, no matter if it's really asynchronous or not.
I think you miss some statement, not offending your opinion but if you write await on the body function that wrapped with async statement, the return will synchronus. You can check on this link javascript.info/async-await#await, with example showAvatar function.
First Picture
Second Picture
I give an example, on the first picture, the variable githubUser using await statement to retrieve the value from fetch, it will return the actual value that send by the server.
On the second picture, the return was Promise that we knew it asynchronous. The value can be fulfilled or rejected and need chained function to process the result.
I understand your point, it's definitely returning resolved value or rejected value.
Hi John! Thank you for your article, it helps me a lot currently. But in TypeScript it seems that the pool variable has no minWorkers/maxWorkers comment. I could not find it in documentation also. Did you use an older version of this library?
Hi Bunyamin. I'm glad the article helped 👍. The current version is 6.1.4 and the one I created a video for was 6.1.0, so it's pretty recent.
Also, in their GitHub docs, you will see references to minWorkers and maxWorkers. Strange how it's not available in the TS Interfaces.
github.com/josdejong/workerpool
From Their README:
The following options are available:
minWorkers: number | 'max'. The minimum number of workers that must be initialized and kept available. Setting this to 'max' will create maxWorkers default workers (see below).
maxWorkers: number. The default number of maxWorkers is the number of CPU's minus one. When the number of CPU's could not be determined (for example in older browsers), maxWorkers is set to 3.
Yes, I know it is written in docs.I was talking not about the options, but about the data after the pool starts. So, anyway, I found the answer. If someone will have issues like me: The pool data was moved into seperate .stats() method, so you can view all worker number there).
Anyway, thanks John)
Oh I see. Bleh sorry for misunderstanding your question. I'm glad you came right and this is a great find 👏.