Giovanni Barillari

Posted on Jan 8, 2022 • Edited on Jan 10, 2022

How I made a binary version of Poetry package manager

#python #showdev #programming

Almost a year ago, I started producing binary executables of Poetry, a package manager for Python. Here is a technical deep-dive on how why and how I made it.

Poetry package manager

Started by Sébastien Eustace, Poetry quickly began – for good reasons – a popular package and dependency management tool in the Python scene, thanks to the PEP 518 and the standardised pyproject.toml manifest.

python-poetry / poetry

Python packaging and dependency management made easy

Poetry really helps out thousands of developers out there into managing their Python projects, from their dependencies to the publication.

And, as many projects out there within the same context, Poetry is written in Python. Which really makes sense: it has to deal with Python code and dependencies, and working with Poetry clearly means you already have a Python environment fully working.

But – there's always a but – this also means Poetry itself is hardly tightened to the Python environment you have on your machine, a VM or the docker container executing your code. And while this not necessarily an issue by itself, it still can produce issues in some cases.

The issue

And here it also comes the rationale behind the idea of producing binary versions of Poetry.

As I said a few lines back, Poetry depends itself on the Python environment you setup on your machine. Of course the team behind Poetry made everything correct under a Python packaging perspective: when you install Poetry through the installation script a virtual environment is created to isolate poetry from the rest of your projects and packages. But here is the thing: a Python virtual environment is still linked to the python executable and library in which the environment was built. And why is this an issue? Because on some platforms, upgrading the python distribution will just break the virtual environments produced from that distribution.

For instance, on MacOS, the typical way of handling Python installations is by Homebrew. But what happens when brew upgrade your Python version to your poetry installation? It will probably break. And you will need to reinstall Poetry.

Also, any further change in how you manage your Python installation(s) might affect your Poetry instance.

So how do we solve this? We need a way to fully-isolate our Poetry environment from the system one.

The PyOxidizer project

Now, since Poetry is a CLI, and thus all the interactions you can have with it won't require access to its code, we just need a way to produce some executable which contains Poetry code, all of its dependencies, and a Python interpreter.

Here is where the PyOxidizer project by Gregory Szorc comes super-handy. To quote the project description:

PyOxidizer is a utility for producing binaries that embed Python [...] PyOxidizer is capable of producing a single file executable - with a copy of Python and all its dependencies statically linked and all resources (like .pyc files) embedded in the executable. You can copy a single executable file to another machine and run a Python application contained within. It just works.

indygreg / PyOxidizer

A modern Python application packaging and distribution tool

But how does it work? Once you installed PyOxidizer on your system, everything starts with a Starlark file:

def make_exe():
    dist = default_python_distribution(python_version="3.9")

    policy = dist.make_python_packaging_policy()
    policy.resources_location_fallback = "filesystem-relative:lib"

    config = dist.make_python_interpreter_config()
    config.module_search_paths = ["$ORIGIN/lib"]
    config.run_module = "poetry.console.application"

    exe = dist.to_python_executable(
        name="poetry",
        packaging_policy=policy,
        config=config,
    )
    exe.add_python_resources(exe.pip_install(["./poetry"]))

def make_install(exe):
    files = FileManifest()
    files.add_python_resource(".", exe)
    return files

register_target("exe", make_exe)
register_target("install", make_install, depends=["exe"], default=True)

resolve_targets()

With this content in pyoxidizer.bzl and the poetry source code folder, you can invoke pyoxidizer build --release and a binary file will be produced under the build folder.

Is that so? Well.. actually no.

Making code "oxidizable"

There are few caveats in using the PyOxidizer project, mainly because the importer is not the standard one provided with Python, but a custom one provided by the PyOxidizer project. This makes perfectly sense: normally importing something in your python code means looking for a .py file in the filesystem, but since we're producing a binary executable, the contents of your Python files won't actually reside in the filesystem.

Now, the main issue here is that there's a long history in the Python community about the usage of the __file__ variable to access resources or to dynamically import packages: this variable won't hold any value in PyOxidizer, as a path for the matching Python source file won't exists.

Considering this – and the compliance of the oxidized importer you generally have a couple of options in PyOxidizer:

patch the code to convert all the __file__ instances into something else, like the importlib.resources library
force PyOxidizer to store those contents as files and fall-backs to the original Python importer

The difference between the two approaches is quite obvious: while the latter requires less effort, it also means your executable will still need some other files around it when you distribute it. And which also means those files will be editable by anyone.

And this is also why I chose to patch Poetry code and its dependencies.

Patching everything

If you look at the patches in my repository, we can make some observations:

there's an importlib_metadata patch, even if the embedded Python version is 3.9 and includes importlib.metadata module. This is beacuse the importlib.metadata implementation in the PyOxidizer project does not implement the full Distribution interface, and some dependencies of Poetry make calls to that interface
there's a patch for the requests package. This is because the requests package loads the SSL certificates from certifi using paths instead of a resource loader
the virtualenv patch is HUGE. This was needed because Poetry interacts quite a lot with the virtualenv package, and this is designed to work a lot with the file-system
the poetry-core patch is mainly required to comply with external dependencies which expect some resource files to be present in the file-system

While if we look at the poetry patch file, we can see the majority of code is included to change some behaviour of Poetry in treating Python environments. Specifically speaking, the environment handling in Poetry expects to have a standard "system" Python distribution, which is actually the one used to install Poetry in the standard way.

Now, since Poetry uses this environment as one of the possible implementations to build projects virtualenvs, I had to change the logic to always look to "external" Python distributions, as the "system" one will be the embedded distribution in the final executable, which – for obvious reasons – cannot be used to produce new environments.

Put everything together

In the end, the steps involved into producing the final binary are:

forking the code in need of patches
apply the patches
tell PyOxidizer to install the patched code instead of the standard packages available on pypi
adding some magic to treat special resources that need to be on the file-system no-matter-what

Then is just a matter of writing the Github workflow to produce the binaries for all the different platforms.

gi0baro / poetry-bin

Poetry binary builds

And adding some bonus points like:

an Homebrew formula
some Docker images with poetry included
a Github action to setup the binary version of Poetry

Final considerations

Looking back at everything involved in the process, I just want to leave some final thoughts.

First of all, the Poetry project was definitely a game-changer in the Python dependency management scene, so kudos to Sébastien and all the team for their efforts!

Secondly, the PyOxidizer project is – at least in my opinion – THE way of producing binaries from Python code. Comparing this project to all the available solutions out there leaves me with no doubts: the design, the principles, the correctness and the documentation of PyOxidizer are simply on a different level compared to the competition. Kudos to Gregory!

Then, let's move to the real question: should you compile your Python code into binaries? And – even more importantly – is it worth the hassle?

Well, it depends, of course.

If you already have a Python CLI project, you can stop requiring a Python environment and distribute it with PyOxidizer. This also gives you the advantage of avoiding compatibility code, as the only Python version you need to support is the one you embed with your project.

In case you're starting a new project and you're comfortable with Python, rather than a compiled language like Rust, and you can write code which will respect PyOxidizer requirements from line zero, sure, why not? This is what I did with noir.

But in cases where you have dozens of dependencies and/or code you can't control, like in my case, I just can't recommend it. Too many things out of your control can produce unwanted bugs, and today is still quite hard to test how your oxidized-only code will work. And in case you still want to make it, I suggest you to open up pull-requests to the involved open source projects to eliminate at least the __file__ usage, so the next person trying to do this won't need to re-walk the entire patch process.

And remember: making binaries also means you target specific architectures and platforms, so the Python code that worked on all the systems with an interpreter mightn't be compiled on all the same architectures and platforms – which is the point of interpreted languages vs compiled ones, isn't it?

In the end, I can only suggest you to precisely define your use-case and your audience. Answering the question "should I compile this?" will be easy then.

Cheers,
/G

DEV Community