DEV Community

Erol
Erol

Posted on

Why Letting Piscina Spin Up Workers Dynamically Broke My Game Server

First, the background: I'm running a Carcassonne alternative web game called TileLord. It also supports singleplayer mode, so you play against bots.

Bots are CPU-heavy, as they calculate heuristics for each possible move the bot could make, so an obvious way is to use threads on the server side, otherwise, your whole server will "lag" each time a bot is calculating a move (which happened when I first launched the game).

Worker threads and Piscina

As mentioned, it was obvious to add threads, and chatgpt suggested that I use Piscina. On Github, this is their repo description:

A fast, efficient Node.js Worker Thread Pool implementation

So it makes total sense to use it. Gpt suggested using this configuration, based on the VPS I was running my server on (8 cores):

const minThreads = 2;
const maxThreads = 6;
Enter fullscreen mode Exit fullscreen mode

I rebooted the server, and.. the lag was gone! It worked perfectly. Until... my game got more users, and bots started getting slow-ish again. 3 seconds for a bot move instead of 1.5s.

I ran htop on my server, and saw random spikes of 800% cpu utilization (all 8 cores at 100%) for a few seconds, then back to 0%.

htop showing all 8 cores hitting 100% utilization

Which doesn't make any sense. There should be max 6 worker threads, and nothing else should consume much cpu, right?

Wrong! Spawning a worker in very heavy in Nodejs, as it needs to:

  • Create a new v8 isolate
  • Parse + execute your worker's js bundle
  • Run module initialization (require/import)
  • JIT-compile hot code paths
  • Maybe it triggers garbage collector activity

And these operations are parallelized internally, so when Piscina is spawning a new worker it can hit all cpu cores at once, causing the whole server to lag for a second or two.

After hard-coding both min/max threads to 6 (so the other 2 cpu cores are available to the server), my cpu consumption didn't go above 200% (I added the fix 4.2):

Cpu compute graph over last week

Lesson learned

Worker creation can be far more expensive than actual worker execution (bot move compute, which atm is 50-500ms). In such case, it might make sense to pre-allocate threads:)

Sometimes the bug isnโ€™t in your algorithm, itโ€™s in how your workers come into existence.

Read more

I also wrote about how How I Built a Free Online Carcassonne Game Alt You Can Play in the Browser

Top comments (0)