NodeJS is well known for being single-threading, but it is not true, because only the event-loop is handled by a single thread. NodeJS gives to us the possibility to use 2 approaches for multithreading, worker_threads
and child_process
.
-
worker_threads
are controlled by a process and have shared memory, which makes communication between them easier. -
child_process
are processes generated from the main thread, it's useful when we need direct communication with the Operational System, but we need more memory to create them.
Application: The purpose of the application we are going to create is to load all the folder's content and upload every file to Google Cloud Storage, but the most interesting part is that we will decide how many threads should do this operation, speeding up the upload process.
Main Principles:
- Worker threads: NodeJS module to create threads.
- Streams: if you don't know NodeJS streams I suggest you take a look at the tutorial that I've created here.
- File System: NodeJS provides us with a simple way to access the OS and manipulate files and folders.
Steps to reproduce:
- Load the folder's content
- Create the threads
- Create the upload worker
- Assign the threads to upload worker
Requirements:
We are going to use NodeJS
16.16
versionYou need to have a Google account to access Google Cloud Services.
Cloud Storage Service
We need to install cloud storage library to deal with google cloud service.
npm install @google-cloud/storage
If you need help with configuring your Cloud Storage service on your Google account, many good tutorials can help you with that, it's not the purpose of this tutorial.
Let's create our first file cloudStorageFileService.js
to work with our storage.
cloudStorageFileService.js
const { Storage } = require('@google-cloud/storage')
const path = require('path')
const serviceKey = path.join(__dirname, '../gkeys.json')
class CloudStorageFileService {
// (1)
constructor() {
this.storage = new Storage({
projectId: 'my-project-id',
keyFilename: serviceKey
})
}
// (2)
async uploadFile(bucketName, destFileName) {
return await this.storage
.bucket(bucketName)
.file(destFileName)
.createWriteStream()
}
}
module.exports = CloudStorageFileService
From the code sections above:
Basic configurations to use Cloud Storage service, as the project id and the path with your Google Cloud credentials.
Google Cloud Storage provides us a Writable Stream for uploading files.
Thread Controller
The thread controller will handle the thread distribution, we want to give a thread for each file, and upload them separately.
threadController.js
const {
Worker
} = require('node:worker_threads');
const { readdir } = require('fs/promises')
const path = require('path')
class ThreadController {
// (1)
constructor(threadsNumber) {
this.files = []
this.threadsNumber = threadsNumber
this.count = 0
}
// (2)
async loadFiles() {
this.files = await readdir(path.join(__dirname, '/content'))
}
// (3)
async uploadThread(filePath) {
return new Promise((resolve, reject) => {
const worker = new Worker('./fileUploadWorker.js', {
workerData: {
file: filePath
}
});
worker.once('error', reject);
worker.on('exit', (code) => {
resolve(filePath)
});
})
}
// (4)
async execute() {
const init = performance.now()
await this.loadFiles()
let promises = []
while (this.count < this.files.length) {
for (let i = this.count; i < this.count + this.threadsNumber; i++) {
if (this.files[i]) {
promises.push(this.uploadThread(this.files[i]))
}
}
const result = await Promise.all(promises)
promises = []
this.count += this.threadsNumber
console.log(result)
}
const end = performance.now()
console.log(end - init)
}
}
module.exports = ThreadController
From the code sections above:
Initializing our three main parameters, the number of threads that we want, the files we want to upload, and the counter for created threads.
Load all the files contained into the folder we want to process.
Here we are sending a message with the right file path to the worker thread and waiting until the thread finishes its process using a
Worker
object.Running everything together. Now we are giving a thread for each file until there are no files to process. for example if there are 5 files and we pass 3 threads to process, at the first time It will process the first 3 files and at the second time will process the 2 files remaining. Also, I put a performance meter to test the behavior with a different number of threads.
File Upload Worker
The upload worker is the representation of the thread as a code, here we are going to put all we want that the thread does.
fileUploadWorker.js
const {
isMainThread, parentPort, workerData
} = require('node:worker_threads');
const path = require('path')
const { pipeline } = require('stream/promises')
const { createReadStream } = require('fs')
const CloudStorageFileService = require('./cloudStorageFileService');
class FileUploadWorker {
// (1)
constructor() {
this.storage = new CloudStorageFileService()
this.filePath = path.join(__dirname, '/content/', workerData.file)
this.fileName = workerData.file
}
// (2)
async upload() {
if (!isMainThread) {
await pipeline(createReadStream(this.filePath), await this.storage.uploadFile('myfileuploads', this.fileName))
}
}
}
// (3)
;
(async () => {
const fileUploader = new FileUploadWorker()
await fileUploader.upload()
})()
From the code sections above:
In the constructor we need to initialize the Storage service or we could receive it as a parameter. Also, we need to get the file path from the parent thread through the
workerData
.Here we check if we are in a thread dynamically created by us or in the NodeJS main thread. If we are not in the main thread we create a Readable Stream object from the file and upload it.
This anonymous function is responsible for executing our created thread.
Executing Everything
To test our application I will put 9 threads, one for each file in my folder. You can experiment with other values to measure the performance.
fileUploadWorker.js
const ThreadController = require('./threadController');
const controller = new ThreadController(9)
;
(async () => {
await controller.execute()
})()
Takeaways
- NodeJS is not single threading.
- Threads are handy when you need to process a heavy job and don't want to crash the NodeJS main thread.
- We also can use multithreading for batch jobs.
You can take a look at the entire code here
Top comments (13)
You don't need multiple threads to upload multiple files in parallel in Node - the single threaded part is only on the instruction pipeline, you could easilly issue 100 uploads and have them run in parallel using a
Promise.all()
- the Async library has lots of useful calls to batch up things too if you don't just want to start all at once.Multiple threads and processes are very handy if you are actually doing processing in your Javascript code, where other operations would be blocked.
Makes sense! thanks for your comment!
Hello ! Don't hesitate to put colors on your
codeblock
like this example for have to have a better understanding of your code 😎Thanks a lot! I didnt know that
You don't need multithreading to upload files concurrently in NodeJS.
NodeJS is not C++ where I/O operations are synchronous by default.
All I/O operations are always async in NodeJS. There are some exceptions though, like synchronous I/O APIs but they are very bad for performance so don't use them.
NodeJS uses libuv to issue non-blocking I/O syscalls and to get I/O event notifications.
Create Workers in NodeJS, only if you are doing CPU bound operations like computing hash of a large ArrayBuffer, or a very simple example would be finding a prime number. These operations will put heavy load on the CPU thread and prevent other tasks from executing.
It is always better to do I/O based operations in an async event loop as OS threads/processes are very expensive. When doing I/O, most of the time, our application spends waiting for a Disk or a Network Device.
If you know Rust see Tokio and Rayon.
See here to learn how Nginx delivers high performance with Non Blocking I/O.
Makes sense! thanks for your comment!
This approach I used to show how to create worker threads, perhaps is not the best!
Awesome insights!
Thanks a lot for sharing such concepts on a perfectly reasonable use-case, 10 out of 10 😁
Thanks a lot for your comment!
I am planning to write more contents to show the things under the hood.
That would be amazing! I'm following you to read more about these topics as soon as you publish them 😄
Nice article, about that conceptual part in the introduction, wouldn't
child_process
be an approach for multiprocessing instead of multithreading?Good point! Parallel processing should be the best word for that!
thanks for your comment!
i like the 'piscina' library for using multiple thread in node.js, however shared memory would still be awesome.
Nice! I know it. It was created by a brazilian guy I guess.
Some comments have been hidden by the post's author - find out more