The common edge case all dependency managers miss

#dependencies #productivity #devops #tooling

What do Ruby's Bundler & RubyGems, Mac's Homebrew, Linux's Apt (Advanced Package Tool) & Debian Packages, Javascript's NPM & Yarn, Python's Pip, and practically every other dependency installer or manager have in common other than the fact that they work on dependencies? None of them can handle all edge cases of installing dependencies because they "naively" manage dependencies.

This post does not aim to shame or belittle the work that has gone into these incredible pieces of software (I've contributed to Bundler myself, and continue to do so!). The dependency managers we do have provide us with an invaluable service, and I am very thankful for them. This post does, however, aim to unpack this statement of naive dependency management, show you an alternative, and explore the differences.

This post is aimed at developers who know a little bit about dependency management, but it does its best to explain concepts along the way (please comment if you don't understand anything and I'll do my best to explain it). This post likely isn't too simplistic for people with a more thorough understanding of dependency management either. So, the post is aimed at developers in general. While I tend to use more Ruby-based examples, the concept applies equally to other languages and software like Javascript, Java, Python, Linux, and Mac.

Enjoy!

How do dependency systems work?

First, let's start with how these dependency installers/managers function. They all work in approximately the same way:

Fetch some index/listing of [Dependencies]
Based on some definition of desired dependencies, perform a dependency resolution algorithm intended at de-duplicating sub-dependencies and finding a version that matches every requirement
- The list of desired dependencies can be derived in various ways, such a file (like package.json, requirements.txt, Brewfile, and Gemfile) or command (like yarn install foo, gem install foo, or apt-get install foo),
Install the list of resulting dependencies from (2)

These dependencies install into a global or local namespace, may be linked to static system dependencies, and may rely on the state of the system when you installed the dependencies.

Example

Let's look at an example of this system at play.

In this highly simplified diagram, we see that when we are working on my_app then we depend on Ruby, MySQL, and Rmagick. MySQL is expected at /usr/local/bin/mysql and points to MySQL 5.7, RMagick depends on ImageMagick@6, which depends on pkg-config and freetype (among other things).

Theoretically, this system works well. It’s straightforward with a clear directed dependency graph. However, this is an ideal scenario and rarely the actual real-world result.

The problem occurs on the system when you have multiple apps working side-by-side. Imagine a scenario where my_app and another_app both depend on MySQL. The former depends on MySQL 5.7 and the latter on MySQL 5.5. Unfortunately, all dependencies assume that MySQL is at /usr/local/bin/mysql. This path is often hardcoded into many dependencies, or we use the same directory set by some global variable and expect the binary to be called mysql. This fact means that we can only have one version running at a time without some hacky juggling of globally set environment variables.

What we could do is make MySQL 5.5 be a different binary name (e.g. /usr/local/bin/mysql_55), but most packages and systems expect mysql not mysql_55, and so this causes many problems as well.

This scenario implies that we need to update all dependent apps at once to be able to change the version of MySQL. This issue is exasperated as we have more apps/services, as there is more chance for it to occur and more opportunity for dependencies to overlap.

What does this mean?

This issue is prevalent in most modern dependency managers. We've made a mistake in assuming only one version of a piece software can run at once - or perhaps a deliberate simplification. I suspect that we've made it this far because servers, and particularly containerized services, often mean a system is only running one application. However, on your own computer, you may be running multiple applications at once.

I've seen this issue happen a lot with a system's dependency on ImageMagick (programmatic image manipulation software), MySQL (database software), and Readline (Ruby, Python, and other languages depend on this. When Readline changes, you might have to recompile all versions of Ruby and Python, and all installed gems/eggs/wheels/dependencies).

You can mitigate this issue with patterns like using docker-compose. The docker-compose pattern puts all of your code into a container so that the multi-version issue can't happen as easily. This pattern, however, means your editor now has to be capable of working in the resulting docker instance, or you need to sync your files to your local machine (which I've seen, while managing hundreds of instances of this pattern, cause a lot of sync issues and confusion about where the source of truth lay).

So without docker-compose (which can still hit this edge case!), we talked about how we can't have duplicate versions in the same spot due to naming, but we can have multiple versions at different spots. The issue becomes managing those different spots and telling the dependencies how to run in that context.

There is a system that takes into account all edge cases of dependency management and can handle the different spots called Nix.

A bit of a preamble before the regularly scheduled post

We've just talked about how dependency management is naive in most modern dependency systems. I believe this is a relic from times when systems had fewer dependencies and less hard drive space to keep those dependencies.

As the number of dependencies and the size of those dependencies continue to grow, it was not feasible to keep them all on the same small hard drive of the 1990s when development of some of the older dependency management software like Linux's Apt (Advanced Package Tool) in 1998, RubyGems (Ruby) in 2003/2004, DisUtils (Python) of 1998, PyPi (Python) of 2003 happened. The hard drive size issue is a theory of mine, and I won't dive too deep into it, but I feel it gives some empathy to the decisions of 15-25 years ago. I also suspect that dependency managers all sort of "copied" each other over the years without re-evaluating the underlying dependency theories or deliberately maintain the simplification. Nevertheless, the dependency managers we do have to provide us with an invaluable service, and I am very thankful for them, but I do think there can be some improvements.

What is nix?

Nix is a package manager that is “functional” and “pure.” That means that it treats packages like values in purely functional programming languages, such as Haskell. These properties of Nix translate to mean that packages are built by functions that don’t have side-effects and cannot change.

This method differs from a system like Homebrew, NPM, and RubyGems which may install a package differently if you have various other software installed, specific packages in certain locations, or particular environment variables set.

Instead, Nix relies on a system to build an entire dependency directory and versions the dependencies with a constructed hash. The hash is constructed by taking into account everything used to build the package, so we can guarantee it is unique for any build setup. Then, instead of referring to MySQL directly, Nix refers to the hashed copy via a symlink. This setup means we can refer to the same binary name (e.g. mysql which we've seen is required), but it points to a different version based on the app you’re using and in some cases, a different version for different dependencies. You can see this in the following diagrams:

Pure Packages

When I say "pure package" I mean that the package is not impacted or influenced by anything outside of what is specified, which means that environment variables, other system dependencies, and even your HOME and TMPDIR directories do not affect the resulting dependency.

On Linux, dependencies are built using what is known as a derivation in a virtually isolated area of your system. On Mac, you have no access to TMPDIR or HOME and both of those environment variables are set to spots that don't exist. Likewise, PATH is empty, which tells your system where to find dependencies, so you have no access to your pre-installed dependencies.

Instead of relying on what was previously on the system, you specify exactly what is needed to run and build. Nix only uses something that is previously installed in the Nix system if and only if the calculated hash (which determines if it's compatible) matches the hash of the requested dependency, otherwise Nix builds a new one.

This method creates a guaranteed non-cyclic dependency graph free of conflicts and is the way to "correctly" handle all edge cases of dependency management.

What can I do?

So, what can you do? Honestly, you probably cannot do much without a bunch of work right now. However, I hope this helps you understand the dependency conflicts you experience. Unless you're willing to invest the time to switch to nix-os (a Linux distro), nix-shell (a subshell that handles activating the appropriate dependencies), or writing your own integrations with Nix, then there likely is nothing for you to change.

That said, if you're working in a larger organization with many inter-related services, then it may become more pressing to solve these needs and Nix may be a good solution. As a reference point, in a company where I used to work, with approximately 1000 members on the RnD team, 50% of internal developer support issues were related to dependency management.

Final Word

We've looked at how most modern dependency management systems do not handle an edge case in dependency management. This edge case can be hit by having 2 different and divergent requirements for a dependency, causing a conflict because we can only use one at a time. We then looked into Nix and how that solves this issue using an isolated build system and a functionally linked system. I don't think there's much for anyone to do right now, without a lot of re-writing of dependency systems, but we can improve error messages!

I hope appropriately managed dependency management improves in the future, becomes much more simple, and much more approachable. For now, if you're writing a dependency manager, please look at making error messages more clear when dependencies conflict and provide better error messaging/education (using methods that I describe in my RubyKaigi conference presentation in 2018!).

Resources

If you want to read more about Nix, here are a few resources:

http://notes.burke.libbey.me/learning-nix/
- My friend wrote up a bunch of stuff he learned about Nix. It's quite helpful!
https://nixcloud.io/tour/
- This link is a tour of the Nix functional language. It's a pretty great tour and can be used to learn a great way to teach, on top of learning the Nix language.
https://nixos.org/nixos/nix-pills/
- These are a few dozen bite-sized "pills" that allow you to take in small doses of Nix to learn
https://nixos.org/~eelco/pubs/phd-thesis.pdf
- Nix came from the work on a Ph.D. thesis of a Dutch Computer Scientist. Their focus was on making a mathematically and logically sound system, and I think they achieved that