The node_modules problem

Leonardo Teteo on November 19, 2018

I think I'm not the first one to talk about this problem even here in dev.to. I quick research trying to found any solution concluded with the imag... [Read Full]
markdown guide
 
 

Words cannot describe how much I love that picture...

 
 

Biggest data ever I copied, causing late reaching to party, thought would take couple of seconds to copy to pendrive at EOD... Uuuufffffff... And could not explained why I was late..๐Ÿ˜€๐Ÿ˜€๐Ÿ˜€

 

At the risk of being too snarky this morning:

a library that sums two numbers (hypothetical example, I hope)

npmjs.com/package/addition2

 
 
 

Yup. My first thought when I read that line was, "I don't even have to check. I'm sure it is there..."

 
 

Thanks for making me lose my hope in humanity today. hahahaha

 

That's the reason why some developers can't have nice things.

 

Aah damn, I just needed something that could add four numbers. I'll keep looking!

 

Lol - looks like a good time to submit a pull request!

 
 

I'm using Yarn these days, I have no issues with disk size cap as I got like 2TB SSD raids and it's so freaking fast and cool that it works just right for me, however if you or anybody is willing to try PNPM it seems the way to go at solving the node_modules size issues

 

Ooh, I hadnโ€™t heard about PNPM and Iโ€™m excited to try it and get whole gigabytes back, hahaha

 
 

By the way, there is a pnpm tag on dev, so you can follow it dev.to/t/pnpm

 

Apparently there's also @zkochan on dev.to
A welcome addition ๐Ÿ‘Œ๐Ÿ˜ซ

 

I tested on one of my projects and it looks a bit faster indeed, but I don't know if it just that it is in memory right now, so I will test for the upcoming days to see its performance for real.

 

Why copy it at all? The power of NPM is the you can recreate the exact environment described in your package.json file with one command. Copying the node_modules directory is pointless.

 

I ended up deleting all node_modules folders before copying it because it would be impossible to copy otherwise, but my first thought was convenience, I see my projects folder with dozens of folders of projects, the first thought is to select everything, zip it and copy to my backup location.
My main point in the text is that there are other languages that deal with it better, providing what you said, recreation of environment without adding a dependency folder in every project with thousands of files and folders.

 

Backing up your computer or replacing your hard drive is not working against the framework/community/tools. Those are common tasks, and if a particular tool makes them awful for users because of some design decisions, then the users are not to blame and are allowed to complain.

 

If your code is under source control what is the need for copy them for backup?

I think that's where your problem lies and then your article is about one of the symptoms when not using source control.

Node_modules is not designed for moving around. With all due respect, but I think you're just using it in a way it was never intended.

I never intended to move them around, it was a natural action to copy & paste all folders in the project folder, after I remembered that I canceled the copy, deleted all of them and started the copy process again. However, the problem still remains, it has too many files and it is heavy to download and install. NPM tends to take a long time here to finish. I think my point remains.

I understand that you come from a different environment Java/C#. This was also my first languages before I learned Javascript through jQuery/React/Nodejs. I had to learn that there are different philosophies in play here.

Most of the modules are made by hobbyists that do it on their free time. There are great ways to minimize libraries, e.g. aviod duplications by using peerDependency in package.json or make sure you strip away unnecessary files during publish. But, you will have to get use to that there will be a lot of small modules (unix style: doing one thing and doing it great).

Instead of working against the framework/community/tools please try to understand it first and then try to come with improvements in the form of PRs, issues and encouraging posts about how to do it right. I don't think your current attitude will help you succeed in becoming a better Nodejs developer. Because the community need more good developers. Thank you for understanding.

I put a bunch of my projects in Dropbox, and it spends TONS of time syncing node_modules which is completely pointless. If anyone knows if there's a way to tell Dropbox to ignore node_modules folders I'm all ears...

I do use git in certain cases but I don't always need a git repo for every JavaScript project.

I understand. But, I would not recommend you using Dropbox as source control, because you run into the problems as you just mentioned previously.

When using Azure DevOps, like in my case, adding a git repo can be as fast as creating a project and saving it to Dropbox.

PS: Dropbox gets full fast (not paying for it)!
PPS: Use a .gitignore file but I guess you knew that.

@mroggy85 :

"If your code is under source control what is the need for copy them for backup?"

"I think that's where your problem lies and then your article is about one of the symptoms when not using source control."

Not every project is meant to be on GitHub and a local repository won't fix the problem.

The problem is that you have to delete all the node_modules folders manually in a up to infinity number of project folders and then re-install them after moving.

Maybe you can do that with a little cli magic, but that's not really user friendly and will take it's time...

I understand that it can be a litte inconvenience that one time you move your project. But I would not call it a common use case moving project files around. Then, I would suggest that you review your development process instead.

There are lots of steps between github and a local repo. For example, for small one-off projects I'll start with a local repo and then clone it on my other dev machine (each repo is a remote for the other, so I can easily sync changes back and forth). As it gets bigger, it can be pushed to the gitolite install I have on my media PC (which was just a random machine I had lying around that was always on; a Raspberry Pi would do just as well). Only when I want other people to look at it do I push to Github.

Even if you're just using local repos, git still helps you out; you only need to back up the .git directory; to restore, you put it back in place and run git checkout HEAD. No backing up of node_modules necessary!

 

I do it when I archive my projects because there is no guarantee that npm will be there in say 20 years from now.

 

Great rant, I wholeheartedly agree with everything you said.

I think that the major problem is the carelessness with which library developers pull in dependencies.

I have a pet project with one production dependency and three development dependencies and npm has pulled in a total of, no kidding, 2352 packages.

One of those packages is called path-is-inside and is a simple script with fifteen lines of code that checks whether a path is inside another path.

Now, I'm not saying that everybody should be reinventing the wheel all the time everywhere, but if developers are pulling in packages instead of writing fifteen-line utility functions (it's not even hard to figure out if a path is inside another path), no wonder that there are so many jokes about node_modules out there.

Personally, I'm always wary of pulling in dependencies and I always try to avoid packages that seem to have an unreasonable amount of dependencies for the job they do, although I'm more liberal with development dependencies (after all, there's only so much you can avoid when you're stuck with nodemon and webpack).

 

it's not even hard to figure out if a path is inside another path

It's not that it's hard, it's that, between normalization and platform (Windows) specific issues, it's easy to get wrong. Reimplementing it is not worth the 7kb you may save.

 

I would normally concede that it's a grey zone, but path-is-inside doesn't even do path normalisation (which, for the record, you can easily do with node's path module). I'd say that 15 lines of code to save 7kb and a broken implementation is worth every bit of it.

Anyway, this was a particular example, you could argue about every single small package that there is a reason you may want to add it as a dependency instead of implementing it yourself and you would be right. However, if we don't draw the line anywhere, well... we end up with 1875 subfolders in our node_modules folder. I just wish library developers were more careful when adding dependencies and only add them when they are truly required and have decent quality.

I'd say that 15 lines of code to save 7kb and a broken implementation is worth every bit of it.

In all likelihood, you're not saving 7kb; another library you're using probably has the same requirement and probably uses that same package. And if you get all libraries to do as you do, you now have n identical implementations of a 1kb function as opposed to a constant 7kb.

if we don't draw the line anywhere, well... we end up with 1875 subfolders in our node_modules folder.

node_modules aren't huge because somebody pulls into a small library. They're huge because people'll publish unneeded files, or don't split up their library into more focused libraries, or don't make an effort to deduplicate their dependency trees, or keep support for old node versions, or pin to specific versions so other dependencies can't deduplicate.

I just wish library developers were more careful when adding dependencies

It doesn't have to be a wish; if it is important to you, it is totally within your power to attempt to rectify that which you find a mistake (at least as far as open source goes). Many repositories will be responsive or even appreciative if you make an issue.

 

Hey, just a heads up since you don't reference it in your post! ๐Ÿ˜ƒ

Yarn has worked on a feature aiming to remove node_modules folders from the equation. It works, has already shipped, and is used in production in various places. As for npm, they've started working on an experimental project sharing some aspects, called Tink.

For more information, the RFC that introduced Plug'n'Play (the name of this "no node_modules" install strategy) is github.com/yarnpkg/rfcs/pull/101 - there's a lot of great discussion there.

 

I knew yarn, but I never used it indeed. I will take a look at the PR you linked and yarn in general, it may be better than NPM in this management. Do you think it is better?

 

I'm part of the Yarn core team, so my opinion is biased ;)

Overall both Yarn and npm are pretty good tools. We tend to ship more features, in part thanks to our community which frequently contributes, but npm isn't that far behind.

Plug'n'Play is currently exclusive to Yarn, though (Tink is quite different, I wrote an article about it not too long ago).

I found your text here, I will read it. :)
Since I wrote this article, I need to be opened for everything so I will test yarn thoroughly. It looks nice, starting by the mascot. :D

 

So, without reading other comments, let me add my two cents...

I've done a quick check on a typical microservice at work. I got 200MB and 19k files for node_modules. If I do the same with a virtualenv site-packages directory of another project at work, i get 10k files and 550MB.

I have no idea if these are typical sizes, but it's what I have handy. In summary: two projects, both in production, with similar sizes, node wins on files with a x2 factor and python wins on size with a x2.25 factor. That isn't so bad for node, is it?

And that's the point, node_modules isn't that bad when compared to other systems. Most memes were created when npm made no efforts to deduplicate dependencies, and at that time node_modules WAS a black hole. You could exceed 10 or 20 directory levels very easily, and 100k files wasn't at all uncommon. But npm got better, first we got npm dedup, and finally it started deduping by default.

Yet, node has a key advantage over other languages' dependency systems, that you don't mention. You compare node with Java or C#, but it's not at all fair, because Java, C#, Python, Ruby and other languages use global or per-project dependencies. Node, instead, has package-local dependencies. This means that, in a node project, each package can declare a dependency against a different version of the same package. You cannot do that on most other languages, and I've had unsolvable situations where two libraries I needed wanted different versions of some third party lib. Node doesn't do that.

So, yeah, node could possibly do better, but right now, in 2018 almost 2019, it is NOT what the memes will make you believe it is.

Also, if you're syncing node_modules between hard drives, you're doing it wrong. It's the whole point of dependency declarations.

 

The memes were just to lighten the mood, I wrote the text based on personal experiences and struggles with node_module in 2018, so it really need to improve, indeed. I've never came across the situation where libraries really needed different versions in Java, the language I have most experience, since developers generally take care to make new versions with backward compatibility. Using well constructed libraries you can use the latest version and don't worry about other libraries dependent of that one that may eventually need an old feature. If you experienced the situation of conflict in Java the library you were using was probably not well maintained. Java is full object-oriented and fully takes advantage of its features to build a healthy dependency management system.

 

The dependency problem has zero to do with OO and all to do with module isolation. In Java, your dependencies are simply "I have this package accessible and can import it". If foo.bar.Baz is on your path, you can import it, that's it. But where you import it from makes no difference, and so you can't have foo.bar v1 and v2 in the same project unless they took care themselves of using different package names.

 
 

PNPM does exactly this, with a global repository for the whole machine.
But regardless of whether you install with PNPM, Yarn, or even NPM, you shouldn't carry node_modules around with you, just exclude them when copying or just keep your stuff in git so that you can clone without node_modules and compiled/derived artefacts.

 

Just my experience, that most node modules have many more additional dependencies than for example a C# nuget package. Node modules are often a source of dependency nightmares for larger projects.

 
 

Why? What is wrong with the new ES6/7 standard?
I am interested in what you are missing.

 

ES6 is a language, a standard library is a little bit more than that.

The absurdity of the amount of third party packages, utilities, snippets, slightly different implementations and so on that you find on npm might be a result of the lack of a common standard library.

That's what I meant.

Thanks, I've read it. Well, I think everyone would agree on the practicality of a common and official standard library. The hard thing would be to find who's going to take lead on that. You need at least the support of the TC39 committee, the big browser vendors and so on. The issue is that most successful open-source projects have either a BDFL or a cohesive core team, usually there since the beginning, not 20 years after the creation of the language. I hope time and possibly some fed up big players in the community will result in a real conversation being started.

Nobody forbids you to write your own package "is true", it just probably shouldn't be included automatically by hundreds of packages ๐Ÿ˜…

 

It's coming. Chrome already has the first stdLib built-in. More are about to come.

 

pnpm was created to solve this issue and is actively maintained since 2016.

This is how pnpm solves the issue:

  • one version of a package is only ever saved once on a disk in a global store
  • dependencies are imported to node_modules using hard links. So physically files of the package are the same in the global store and in every node_modules
  • symlinks are created inside node_modules to create a nested structure (more about it here

pnpm's solution is battle tested and does not hook into Node's module resolution algorithm, so it is backward compatible.

Indeed, there are new concepts like Yarn Plug'n'Play and Tink. However, they hook into Node's resolution algorithm. They might change the way we use JavaScript but that will be a long process. pnpm works now.

 

You are right, managing node_modules folder is a biggest challenge for JavaScript developer.

And it doesn't matter which tool we use to install the dependency (Yarn or NPM).

Ultimately, the problem is that storage space.

 

I am so tired of seeing posts like this with no solutions. Yes yarn. Yes pnpm. Yes IMS. Yes . They exist. They are not a solution. They are an alternative.

Comparison to Java or equivalent development experiences is not fair nor should one really inform the other this way. Unfortunately, this post does just that.

This is yet another rant without an elixir. What is your actual issue? Too many files? Is your IO/poll/refresh too slow in your IDE?

I come from the world of node, c, and bash. I helped build a package manager for C called clib and one for bash called bpkg. The ideas have always been to allow for clean, small, and reusable code with the help of the package manager for removing duplicate code. Some have been successful in that more than others. I've recently started working with Kotlin/native. This comes with the baggage and tropes of the Java community (Scala too): Verbosity and complicated tooling, being married to IDE and JetBrains (quiet obvious how they monetize through OSS), and the general attitude of the community that they got things right over everyone else. This post is an example of that and the comments do not help.

Instead of having your editor open and bike shedding about the dependencies of a node project vs your Java project, audit what you install and use. Audit what it installs and uses, and so on. This may not be convention for the status quo here, but this is kind of what npm, node, OSS (FOSS too), etc expects implicitly.

We don't get shared objects, static archives, dynamic libraries, frameworks, jars, or any type of compiled object in node. There are ways to make them, but that is outside the scope of node. Instead, we get snapshots and wasm. But that's not useful to you.

What we need is reform in how packages are published and installed. We publish blindly, without whitelist, without security, without tests and we practically do the same for installation.

We (mostly) install source with npm that is built and/or executed on the host machine (ideally). This (hopefully) requires audit or knowledge of the source beyond what's advertised by the developers. We are not installing compatible ABI, we are installing JIT VM (except chakra et al) JavaScript that is not compiled, but evaluated. There are no symbols to deduplicate at compile time because there is no real compile time.

Being aware of these things when publishing and installing will not reduce your file and folder count, it may just help you understand it and why

 

I'm sorry If I sounded conceited or put Java and C# over JavaScript in any part of my text, it was not the intention, it was just that I had experience with these three languages mainly and it sounded to me a good idea to compare the various ideas around dependency management and, in the end, Java and C# indeed looks better for me in this front, but it is just my opinion and you can disagree with it. I think it is important to look around and see what we have already made regarding one problem and use the solution if we see fit and not create this environment of rivalry between languages and communities and reinvent the wheel just to do something different from language X or framework Y. That's what it sounds while reading your comment, sorry if it is not what you intended.

You say that we should audit what we use, but audit can only go so far in my point of view. I can audit the dependencies of my project, meaning that I will try to have as little dependencies as possible, pick only the ones that do just what I want them to do etc. But it is the job of the developer of the library to audit its own code. Downloading a dependency is a signal that we trust and believe that that library will do its job in an efficient way, without security risks. We assume that they are well maintained, if we have to audit the entire code of each library we use, it is better implement ourselves.

 

Itโ€™s a fun rant but I canโ€™t follow it. You assert that deep dependencies are bad, but... why? Because they take a long time to copy?

I think node community has birthed a mindset to provide reusable code thatโ€™s really small, targeted, and does just what you need. It can be a mess sometimes sure, but so is dependency management in other languages. What node does is kind of beautiful IMO.

 

What Jackson Pollock did was also kind of beautiful.

 

Always try to bundle your node_modules after that you don't need to copy and paste all the node_modules file from one pc to another you just need to copy build file and you can serve that bundle file locally .
or you can create a .pkg file which size is also small .
npmjs.com/package/pkg

Peace .

 

What kind of bundler do you suggest? That's something I was thinking about to make everything more compact. Webpack? Parcel?

"pkg" looks interesting, I always thought it would be convenient to transform the application into an executable file. :)

 
 

We already know for decades that hiding complexity from the programming user makes systems more complex. Especially in node, where everyone tells you to depend on explicit versions. npm handles version dependencies, nuget doesn't. Remember the dll hell in Windows? Same symptom.
Maybe we should ask ourself if it wouldn't be better to focus on API Versions and limit them to 2. That means as long as there is an old version needed, a new one can't be established. Additionally there might be a notification: "a" upgrades the API from Module "x". Your project depends on "x", we recommend to upgrade this dependency. As long as it isn't upgraded, the API of Module "x" can't be developed further.

Remember the developer terms 'responsibility' and 'KISS'?

 

I 100% agree! I posted a question on StackOverflow asking for help on what are somethings I can do to remedy this problems, all I got was bunch of downvotes for npm fanboys. I agree, npm is great, but node_module is still a big problem!

 

I'm all for lots of little packages. Most of these packages exist to save time and avoid edge cases.
However there are definitely times when this gets out of hand, like pulling in Mustache.js for the sake of a few one line templates that will be part of the client bundle in a web app.
In my experience, one of the big bloat issues is the large number of projects that include the test files, source files, documentation, etc instead of just the compiled distribution.

 

Part of the problem is the node community themselves. I don't know how many modules that I have seen come bundled with their testware included. If I want the tests, I can go to source.

 

Why aren't you just deleting your node modules folder and copying everything? Since you have package.json, you don't need that folder at all, unless you have installed some packages that haven't been added to package.json

 

I think you missed the one thing that other languages can't do. Node.js you can run multiple versions of the same dependency. All the other languages I have worked with do not let you do this. Really does save on dependency hell when one dependency hasn't updated to the latest yet.

 

This is also why I created the project "mhy".
npm i @mhy /mhy -g
Still a work in progress stuff but started to heavily use at our company.

 
 

I never could understand why npm doesn't have a single local repo approach with simply linking.

 

Me neither. It would be easier and faster for everybody.

 

LMAO! I have to double check your cover image cause' i thought I was the only one with this perception (I'm newbe wuth node)

 

It is much more efficient to handle dependencies this way, download once use in any project locally.

You can install the node modules globally and achieve the same.

 

Copying lots of files on Windows has always been painfully slow. On my Mac, copying a node_modules folder, 625MB and 63,130 files. Took 30 sec.

code of conduct - report abuse