It all started with me trying to improve our Continuous Integration pipeline. I'm a strong believer in having proper CI - the threshold for how much to invest in unit & integration tests is always tricky to set, but to me the bare minimum should be to have linting and type checking run on every commit.
Now, having that bare minimum is great, but it also needs to be as fast as possible. When you want commits & reviews to be fast, CI cannot be the one thing holding you back.
Yet... This is what we would see in the best case scenario on that bare minimum linting & type checking job:
1 minute and 11 seconds just to install dependencies. Obviously, the job has to do more afterwards, and that's where I'd prefer it to spend time.
But wait, there's more. This was the best case scenario. You may know that package managers have caches, and a known trick to speed installs is to save that cache after CI runs, so it can be reused for subsequent runs. An easy way to do that nowadays is to use actions/node-setup's caching capabilities.
However the cache cannot always be used. As soon as the lock file changes, typically when adding dependencies, the cache isn't reused because the cache's hash is usually computed based on the lock file. We would then get:
6 minutes and 31 seconds 🐌.
That's when we really thought we needed to do something.
Where we stood with Yarn
We have been using Yarn 2 for quite some time, having originally switched to it for its native workspace support which is great for monorepos as we happen to have one. Because we use a lot of different dev tools (in no particular order - Vite, Vitepress, Astro, esbuild, Webpack, Eleventy, Firebase tools, Tailwind...) and many more actual dependencies. It's easy to understand how many dependencies we're bound to have when you see all the frameworks we support, whether on WebComponents.dev or on Backlight.
You may know Yarn 2 for introducing the Plug'n'Play linker. To make it short, it completely forfeits the idea of the node_modules
resolution mechanism and tells Node to depend on Yarn for dependency resolution.
It's a really interesting idea but dropping node_modules
is a compatibility challenge which kept us away from trying it. We stuck and are sticking to node_modules
for now.
Anyway, because Yarn 3 had been released for a few months with performances improvements, we decided to give it try to see if that would speedup our builds.
Trying yarn 3
Upgrading to Yarn 3 is fairly simple:
> yarn set version berry
➤ YN0000: Retrieving https://repo.yarnpkg.com/3.1.1/packages/yarnpkg-cli/bin/yarn.js
➤ YN0000: Saving the new release in .yarn/releases/yarn-3.1.1.cjs
➤ YN0000: Done in 0s 758ms
And there we go, we were upgraded to Yarn 3.
I'll spare you another pair of screenshots, but that got us down a bit, to 4 minutes 50 seconds without cache and 57 seconds with cache.
I'm sparing you the screenshots for a good reason - I did mention we have been using Yarn 2 in that monorepo for a while. We've also been adding so many packages in different workspaces that we ended up with a lot of duplicated dependencies, ie with multiple versions of the same packages.
So just for the sake of the comparison and because our original point was to speedup install times, I went ahead and completely removed the yarn.lock
file and tested again.
With cache, down to 50 seconds:
And without cache, we got down to 4 minutes and 1 second:
It's fair to say we've sped up our builds quite a lot already, but we wanted to go further yet.
Now, that's where we decided to try out pnpm. But as pointed out by @larixer, there are also options you can set to speedup Yarn. We tried them out to make it a fairer comparison.
@larixer mentions the 3 following options:
nmMode: hardlinks-global
enableGlobalCache: true
compressionLevel: 0
And they do help a lot, especially without cache where we go down to 1 minute 10 seconds:
It is also slightly faster with a cache, yielding 45 seconds:
So if you are running Yarn, do considering trying them out! Chances are they will greatly improve your install times.
Anyhow, let's jump into pnpm!
Enter pnpm
pnpm stands for Performant NPM. Its adoption has been really steady as its close to the 15k stars at the moment on Github. It also comes with out of the box support for workspaces, making it easier for us to consider.
As its name indicates, it really emphasizes performances, both regarding disk space and installation times. In all the figures provided, whether from pnpm or from Yarn you can see that pnpm really comes out faster most of the time.
There seems to be two main reasons for it.
One, being performance-oriented, its implementation targets speed. You may have seen when installing with yarn or npm timings for each of the resolution/fetch/link steps. It appears that pnpm is not doing those steps sequentially globally, but sequentially for each package in parallel which explains why it's so efficient.
The other reason is the way it deals with the node_modules
folder.
Centralized addressable cache
pnpm calls it a content addressable filestore, and we know other package managers like yarn or npm have caches as well, which allow you not to have to re-download.
The difference with pnpm's is that this cache is also referenced by your node_modules files, which are effectively hard-links to that cache. A hard-link means your OS will report those files as being actual files - but they're not. So the actual disk usage occurs in pnpm's cache, not in your node_modules folder. You save space, and installation time, because there's way less IO involved in setting up that infamous node_modules folder! 🪄
Non-flat node_modules
What's also interesting is the way the node_modules is organized with pnpm. npm and yarn (when using the node_modules linker) tend to do hoisting to save space as they're not using links. Hoisting is the act of installing a dependency in a parent directory rather than where it is depended on. So if you have a dependency that can be resolved to the same version pulled by two other packages, they'll try to hoist that dependency to avoid storing that same dependency twice in your node_modules.
The behavior of pnpm is different, somewhat more consistent. It's always setting up the node_modules structure the same way. First, it's non-flat. So running pnpm install vite
in an empty folder will result in the following node_modules:
> tree node_modules -L 1
node_modules
└── vite -> .pnpm/vite@2.7.10/node_modules/vite
So our node_modules only contains vite and not all of its dependencies. This may seem unusual, but this avoids phantom dependencies. Phantom dependencies are dependencies that you end up being able to use without explicitly depending on them. This is a rather dangerous practice, because you do not control those - you may update the original dependency, just upgrading it to a new patch, but its dependencies may have been upgraded to major versions breaking your own code!
In our previous example, my source code won't be able to require any other dependency but vite
as it's the only one which was effectively installed to the top of my node_modules.
Now we can see that this folder is actually linking to another folder in node_modules/.pnpm
: this is pnpm's Virtual Store where you will find all the packages installed in your project.
If we take a peek at this folder:
> tree node_modules/.pnpm/vite@2.7.10 -L 2
node_modules/.pnpm/vite@2.7.10
└── node_modules
├── esbuild -> ../../esbuild@0.13.15/node_modules/esbuild
├── postcss -> ../../postcss@8.4.5/node_modules/postcss
├── resolve -> ../../resolve@1.21.0/node_modules/resolve
├── rollup -> ../../rollup@2.63.0/node_modules/rollup
└── vite
├── bin
├── CHANGELOG.md
├── client.d.ts
├── dist
├── LICENSE.md
├── node_modules
├── package.json
├── README.md
├── src
└── types
So, vite itself and its dependencies were installed to node_modules/.pnpm/vite@2.7.10/node_modules
.
The magic that makes it all work is that Node, when resolving packages, considers the target of the symlink instead of using the symlink's path itself. So when I do require('vite')
from an src/index.js
file, Node finds the node_modules/vite
file by iterating on parent directories looking for a node_modules
folder containing vite
but actually resolves it to the source of the symlink:
> node -e "console.log(require.resolve('vite'))
/tmp/foobar/node_modules/.pnpm/vite@2.7.10/node_modules/vite/dist/node/index.js
That means that any further package resolutions needed will effectively be done from this folder - so if that /tmp/foobar/node_modules/.pnpm/vite@2.7.10/node_modules/vite/dist/node/index.js
file requires esbuild
it will find it in node_modules/.pnpm/vite@2.7.10/node_modules/esbuild
!
This is also why some dependencies don't play nice with pnpm: because they don't resolve symlink targets. But we'll get to that later.
Note: peer dependencies have a special handling in pnpm, if you're interested about it I suggest this read.
Now that we have a rough understanding of how pnpm works, let's try using it! 🚀
Migrating to pnpm
pnpm import
pnpm comes with a command to import yarn's locked dependencies:
There's just one gotcha when you're using it in a monorepo: the workspaces have to be declared in your pnpm-workspace.yaml first. If you don't, then at best pnpm import
will only import the dependencies declared in your root file.
Dependencies which have undeclared dependencies
Another kind of problem we ran into is some dependencies having undeclared dependencies. When using yarn it was not a problem because those undeclared dependencies are sometimes very used. For example, after the migration we realized mdjs-core
had not declared its dependency on slash
.
A simple way to fix this is again through the readPackage hook we mentioned in the previous section. There, you can simply declare the dependency explicitly for mdjs-core
:
if (pkg.name === '@mdjs/core') {
pkg.dependencies = {
...pkg.dependencies,
slash: '^3.0.0',
};
}
shamefully-hoist when tools don't play along
We talked about the non-flat node-modules earlier. This structure is unfortunately not compatible with every Node tool.
An example of this is Astro which at the moment recommends using shamefully-hoist
.
Kind of a funny name, meant to dissuade you from using it :-)
As the name implies, this one will hoist all your dependencies in your root node_modules, fixing any incompatibility you may have with dev tools not playing along with the nested node_modules. This typically happens because they don't resolve symlinks to their target.
At the time of this writing, Astro requiring it, if you're not using it will fail at loading its dependencies, with a
Error: The following dependencies are imported but could not be resolved:
react (imported by /not-relevant/testimonial-card/src/index.tsx)
svelte/internal (imported by /not-relevant/double-cta/dist/DoubleCta.svelte.js)
Instead of going this way, I preferred adding manually the missing dependencies to the workspace using Astro. It's a hack, but one I prefer living with than using shamefully-hoist
globally as it would cancel out the advantages of the non-flat node-modules.
How fast is it
I know, that was the whole point of us trying out pnpm - let's see how fast it is!
So, when the cache is hit, we get down to 24 seconds:
And when the cache cannot be used, we get down to a whopping 53 seconds:
Summarizing the results:
Without cache | With cache | |
---|---|---|
yarn 2 (without dedupe) | 6min 31s | 1min 11s |
yarn 3 (without dedupe) | 4min 50s | 57s |
yarn 3 | 4min 1s | 50s |
yarn 3 (optimized) | 1min 10 | 45s |
pnpm | 58s | 24s |
Honestly, I'm particularly impressed about the results when there's no cache.
I would have expected network to be the bottleneck for both yarn or pnpm in that case, but somehow pnpm still really shines there, while also being faster (at least for us) when the cache is used too!
Now I'm happy - the CI is snappy, at least way snappier than it was, and our local install times also benefitted from it. Thank you pnpm!
Top comments (0)