DEV Community

Cover image for Node.js CPU intensive 🔥
Adam Crockett 🌀
Adam Crockett 🌀

Posted on • Updated on

Node.js CPU intensive 🔥

I posted a little while ago about a need to convert Java classes into Typescript declarations.
The goal is to give Rhino JS super powers by using the Typescript backend, exposing and understanding what's available to the JavaScript context.

The problem is, I have some 300 jar archives making up the application which I am trying to get typescript to understand.
You can unzip a jar or use the command jar to get the contents of that jar, from here scan the output for .class extensions, this is the first bottleneck
If 300 jars contain let's say 100 classes each well you can imagine, it's a big lot of classes.
There is some disk IO going on here but not sure how expensive it is. I am spawning and awaiting a promise in a loop to run this command, 1 at a time I suppose?
Could this be done better?

Node can handle it but my CPU on my MacBook Pro 2020 gets damn hot (nothing new here but, it's absolutely not what I want)

Then the next thing for each class in this parent loop, loop and run javap to decompile the class and get something that can be parsed into generated typescript a step down the line. This is SLOW and even though we are using spawn it's still not ideal. There is yet more writing to disk here as we try dumping the output to disk.

Is node.js right for this application? Could I use workers or multi processes, is the bottleneck javap or jar maybe, lots of confusion 😑

Top comments (9)

Collapse
 
miketalbot profile image
Mike Talbot ⭐

I'd suggest using something like the async library to have a maximum n processes running at once - this should reduce thrashing. You'd use forEachLimit or something like that and create a promise awaiting the result of the external process. Try to balance the number of concurrent processes against the number of CPUs etc.

Collapse
 
adam_cyclones profile image
Adam Crockett 🌀

Ah yes that library came up alot digging around Stack Overflow, it used to be really popular, anyway you mentioned balancing. Where I have 8 CPUs and 3xx process I should find a limit divisible by those numbers?

Collapse
 
miketalbot profile image
Mike Talbot ⭐

Yeah I still use it occasionally when I hit those kind of challenges. I'd go with a limit of 8 or 16 and see which works out best.

Thread Thread
 
adam_cyclones profile image
Adam Crockett 🌀

Yeah seems to run slower now but also more stable so it's a trade-off I'm happy with, thanks for the tips!

Collapse
 
pierrewahlberg profile image
Pierre Vahlberg

Seems the bottleneck is hardware so the solution should be hardware

Would it be possible (and practical) to fetch and write files from for example s3 buckets and then scale processing using apis such as lambdas or ec2s running a tiny node app, like below;

Your main app could parse the file list and distribute jobs to parse one file at a time to a scalable "parser endpoint" that gets a s3 object path, converts it and puts it in another bucket. This parser service would then scale with load on the endpoint.

You could probably write an MVP or POC with like 5 jar files

Collapse
 
adam_cyclones profile image
Adam Crockett 🌀

Looks like I'm going to need to learn some GCP we don't use AWS where I work 😑 not that GCP is bad.

It sounds reasonable to make an app like this, maybe I can get the whole thing running on my crappy 16 logical cores 😅 (I'm from the duel core era do anything above 4 sounds outstanding) then once I understand how my so far cobbled together node app works, take that and part it out in GCP, it's a great suggestion actually, I did wonder how far optimization of my code would go, but as I said in other threads, it's now slower and more stable because you correctly point out hardware is the bottleneck here. Still, what a learning experience this is

Collapse
 
pierrewahlberg profile image
Pierre Vahlberg

Your main app would then, as you were suggested, await the async ajax call and fire away like ~ 20 concurrent ajax calls and then wait until one is done before firing the next, should be a fairly simple loop

Collapse
 
mse99 profile image
Mohamed Edrah

You could use multiple processes, but I suggest a simpler solution, boost your thread pool size and use parallel scheduling and scheduling queues to schedule more than one task just remember to keep the number of spawned tasks under the number of threads in the thread pool.

Collapse
 
adam_cyclones profile image
Adam Crockett 🌀

It's a shame I didn't see this comment, I went the fork process route (1 per jar), it's now 300 processes for allocated 'jobs'

  • Parse jars
  • bytecode forEach classpath
  • Parse bytecode to typescript

Although it's not sequential, some jobs finish sooner than others and then when the child proc does we pass on to the next one

The multi process is a hell of a lot faster but now running kind of stalls and lags my system... One of the jobs is the classpaths processing, that could have quite a lot of org.paths.goo paths so I think instead of multiprocess here, maybe a queue and limit the amount of processing within that job?

Your suggestions I will need to research 🦉