DEV Community

Hakim
Hakim

Posted on

An introduction to packaging in Python

Learning how to package your code is very useful for any python developer. It gives you a better understanding of how Python works and, above all, enables you to share your code with others or simply deploy it in a runtime environment.

So how does it work? Why not just share your code via git? Actually, it's more complex than it sounds. There's even a whole working group, the Python Packaging Authority (aka pypa), which has been working on this subject since since 2011 and it's a constantly evolving field.

In this article, we'll take a quick look at packaging in Python and present a simple method for packaging your code in 2023.

TL;DR

  • Use the pyproject.toml file following the setuptools guide.
  • You can simply use pip install build && python -m build to build your package.
  • Adopt src-layout.

Packages in Python

First, a few definitions:

Python module

A module python un is a text file with the extension .py containing python code.

It's called a module because it "modularizes" the python source code into several .py files. We then use python's import functionality to import elements from another module.

For example, if you create a count_lines.py file containing a count_lines_file function, you can then import this function:

# count_lines.py
def count_lines_file(filepath: str) -> int:
    """Count the number of lines in a file"""
    return sum(1 for _ in open(filepath))
Enter fullscreen mode Exit fullscreen mode
from count_lines import count_lines_file
count_lines_file("count_lines.py")
3
Enter fullscreen mode Exit fullscreen mode

When you execute the instruction from count_lines import count_lines_file, python looks for the module in the current directory (where python was launched); then in the python installation directory /usr/lib/python3.11; then in the directory where python installs default packages: /usr/lib/python3.11/site-packages.

As soon as the module is found, it is executed in the python environment and the elements defined in it become available.

The list of directories in which python searches for modules is provided by the sys package:

import sys
sys.path
["", "/usr/lib/python3.11/python311.zip", "/usr/lib/python3.11/python311", "/usr/lib/python3.11/site-packages"]
Enter fullscreen mode Exit fullscreen mode

Package python

A package is a folder that groups together a set of python modules and facilitates access to them by creating a namespace: from numpy.linalg import norm.

To create a package, simply create a __init__.py file (which can be empty) in a folder.
The folder is then considered by python as a package.

Let's create a package for the count_lines.py module:

count_package
├── __init__.py
└── count_lines.py
Enter fullscreen mode Exit fullscreen mode

We can now use "pointed" notation to import the count_lines module:

from count_package.count_lines import count_lines_file
count_lines_file("count_lines.py")
3
Enter fullscreen mode Exit fullscreen mode

When executing the instruction from count_package.count_lines import count_lines_file, python looks for a count_package folder containing a __init__.py file in the sys.path directories (the current directory and the default directories seen above).

If the package is found, the modules present in it can be accessed using the "pointed" notation.

Distribute your package

To distribute your package, create a distribution. This is an archive containing the package to be distributed, which can then be installed using the pip package manager.

There are two main distribution formats:

  • The Source Distribution (sdist) format: this is an archive containing all source code and metadata.
  • The Built Distribution format: this is a distribution format in which a number of things have been pre-compiled to facilitate installation on other environments. This is particularly useful for modules written in C / C++.

The wheel format is the reference Built Distribution format. It is the format developed by the Python Packaging Authority and is widely used to distribute packages.

There are many tools available for creating distributions, but here we'll focus mainly on the tools created by the Python Packaging Authority, which have become indispensable: setuptools, build and twine.

setuptools

setuptools is the tool used by the vast majority of projects to build their distributions.

Let's take our count_package example from earlier and see how to create a distribution with setuptools.

Our project tree might look something like this:

projet_genial
├── count_package
│   ├── __init__.py
│   └── count_lines.py
├── tests
│   └── test_count_lines.py
├── .gitignore
├── LICENSE.md
└── README.md
Enter fullscreen mode Exit fullscreen mode

setuptools needs a configuration file to know what to include in the distribution and all the project metadata.

Unfortunatly there is three configuration files standards currently used (and you can use them all at the same time...) :

pyproject.toml is the official standard, but is still in the minority compared to setup.py, so we'll be looking at all three file formats.

setup.py

The setup.py file, as its extension suggests, is a python file. It has the following form:

# setup.py
from setuptools import setup

setup(
    name='count_package',
    author='me',
    description='Package for counting the number of lines in files.'
    version='0.0.1',
    python_requires='>=3.7, <4',
    install_requires=[
        'pandas',
        'importlib-metadata; python_version >= "3.8"',
    ],
)
Enter fullscreen mode Exit fullscreen mode

The very fact that it's a python file is both its strength and its weakness: it's possible to build the configuration dynamically in the code, but this makes it difficult to parse and interface with other external tools.

In addition, since this format is specific to setuptools, distributions of the sdist type can only be installed if setuptools has been installed on the target environment and in a compatible version.

Sadly, since the vast majority of projects started using setuptools and setup.py, it became difficult to propose alternatives, so projects like flit had to be built "on top" of setuptools, so the setup.py doesn't encourage innovation.

Moreover, its use is often problematic.
To take the example from this article, you might be tempted to introduce a if/else condition in your setup.py to manage a dependency needed in python 2.7 based on sys.version, but in doing so you would be introducing a vicious bug: the dependency will be included or not depending on the environment that compiles the distribution and not depending on the environment that is installing it.

It's also tempting to import your own package from setup.py to manage the version. But by doing so, sdist distributions will crash at installation because the package you are trying to import is not yet present in the python environnment.

In short : please do not use setup.py anymore

And if you do, use it in a declarative way.

Using the file as a script: python setup.py is depreciated, as the documentation clearly explained in the setuptools documentation :

It is important to remember, however, that running this file as a script (e.g. python setup.py sdist) is strongly
discouraged, and that the majority of the command line interfaces are (or will be) deprecated (e.g. python setup.py
install, python setup.py bdist_wininst, ...).

We also recommend users to expose as much as possible configuration in a more declarative way via the pyproject.toml
or setup.cfg, and keep the setup.py minimal with only the dynamic parts (or even omit it completely if applicable).

See Why you shouldn't invoke setup.py directly
for more background.

setup.cfg

To address the issues mentioned above and make configuration more declarative, in 2016 pypa created the setup.cfg file format.

The example setup.py file above is equivalent to the following setup.cfg file:

# setup.cfg
[metadata]
name = count_package
version = 0.0.1
author = me
description = Package for counting the number of lines in files.

[options]
python_requires = >=3.7,<4
install_requires =
    pandas
    importlib-metadata; python_version >= "3.8"
Enter fullscreen mode Exit fullscreen mode

This format has had many fans but has recently been superseded by the pyproject.toml format, which is now the official way of declaring python package configuration.

pyproject.toml

In addition to adopting the declarative approach of setup.cfg, the pyproject.toml format introduces a number of new features.

It is now possible (and even mandatory) to specify the package builder.
It is also a means of centralizing the configuration of numerous development tools in an agnostic way, rather than multiplying configuration files such as tox.ini, .coveragerc, etc.

The pyproject.toml format includes a mandatory section to define the builder to be used to build the package :

# pyproject.toml
[build-system]
requires = [
  "setuptools>=60",
  "wheel>=0.30.0",
  "cython>=0.29.4",
]
build-backend = "setuptools.build_meta"
Enter fullscreen mode Exit fullscreen mode

With pyproject.toml it is now possible to declare to pip the dependencies needed for the build!

It is then perfectly possible to specify the use of a builder other than setuptools, such as flit :

# pyproject.toml
[build-system]
requires = ["flit"]
build-backend = "flit.api:main"
Enter fullscreen mode Exit fullscreen mode

This format is gradually becoming the preferred way of centralizing package configuration.
It is the format preferred by setuptools and a number of third-party tools use it to store their configuration: black, pytest, isort, etc.

Here's a sample pyproject.toml file for our count_package example package:

# pyproject.toml
[build-system]
requires = ["setuptools"]
build-backend = "setuptools.build_meta"

[project]
name = "count_package"
version = "0.0.1"
description = "Package for counting the number of lines in files."
name = "my_package"
authors = [
    {name = "me", email = "email@me.fr"},
]
requires-python = ">=3.8,<4"
dependencies = [
    "pandas",
    'importlib-metadata; python_version >= "3.8"',
]
dynamic = ["version"]
Enter fullscreen mode Exit fullscreen mode

The setuptools documentation gives more details on configuration in pyproject.toml format.

Build a distribution of your package

Once you've written your configuration file (preferably pyproject.toml), all that's left to do is build your package.

The modern way to do this is to use the build package developed by pypa:

pip install --upgrade build
python -m build
Enter fullscreen mode Exit fullscreen mode

build will first install the builder specified in your pyproject.toml file, then use it to build an sdist distribution and a wheel.

You can then use the twine package to publish it on the official pypi repository.

For backward compatibility with older versions of packaging libraries, you can create a minimal setup.py file in addition to the pyproject.toml file:

# setup.py
from setuptools import setup
setup()

Version management

In the pyproject.toml example seen above, the package version is set manually. You therefore need to change it each time you want to publish a new version of your package.

I find the tool setuptools-scm very useful for managing package versions using git or mercurial.

This is done very simply by adding the pyproject.toml dependency and specifying that the version is dynamic:

# pyproject.toml
[build-system]
requires = ["setuptools>=45", "setuptools_scm[toml]>=6.2"]

[project]
# version = "0.0.1"  # Remove any existing version parameter.
dynamic = ["version"]

[tool.setuptools_scm]
write_to = "src/pkg/_version.py"
Enter fullscreen mode Exit fullscreen mode

When building a package, setuptools-scm will search for the last tag with a valid version number and then deduce the package version number. By default, the version is built from three elements:

  1. The last tag with a valid version number (example: v1.2.3)
  2. The distance to this tag (number of revisions since this tag)
  3. Working directory status (if there are any uncommitted changes)

Once the version number has been deduced, a _version.py file will be created inside the distribution at the specified location (e.g. src/pkg/_version.py), allowing the package version to be known from the distribution without the git history being present on the target environment.

If you are in the habit of entering the version number of your packages yourself, please note that valid version formats are governed by PEP 440.
If you don't comply with these specifications, you're likely to run into problems when publishing or installing your packages.
In particular, versions v1.2.3-local or v1.2.3-dev are invalid.

The layout

When configuring your package, regardless of the method used (setup.py, setup.cfg or pyproject.toml), you must specify the packages and subpackages you wish to include in your distribution:

# pyproject.toml
[tool.setuptools]
packages = ["mypkg", "mypkg.subpkg1", "mypkg.subpkg2"]
Enter fullscreen mode Exit fullscreen mode

Fortunately, setuptools has an automatic discovery feature for your packages and subpackages. This is compatible with two classic project layouts:

flat-layout:

count_package
├── count_package
│   ├── __init__.py
│   └── count_lines.py
├── tests
│   └── test_count_lines.py
├── .gitignore
├── LICENSE.md
└── README.md
Enter fullscreen mode Exit fullscreen mode

and layout with a src-layout folder:

projet_genial
├── src
|   ├── count_package
│   ├── __init__.py
│   └── count_lines.py
├── tests
│   └── test_count_lines.py
├── .gitignore
├── LICENSE.md
└── README.md
Enter fullscreen mode Exit fullscreen mode

The difference may seem minimal, but personally I have a strong preference for src-layout because it prevents bad habits and forces you to understand how the package import and installation system
system works in Python.

In fact, when you develop your package, you test the functionalities you add to it as you go along.

To do this, we are very tempted to simply import our package from our :

# test_file.py
from count_package import count_lines
Enter fullscreen mode Exit fullscreen mode

This will work if you use a flat-layout and run your test module from the directory containing the count_package folder, because as we saw above python includes the current directory in the list of directories where it searches for modules.

However, this is a bad habit for two reasons:

  • Firstly, if you use setup.py, which is located at the root of your project, it is able to import the count_package package it is supposed to install on a client in sdist mode, which can cause bugs if you're not careful.
  • Secondly, you're not really testing the package as it will be installed on others! For example, you may not have thought to include data files in your configuration file, and your tests should crash as a result. But since these files are present in your working directory, you won't notice a thing.

For these reasons, I think it's best to opt for a src-layout and use an editable installation with the command: pip install -e . for local development. This allows you to install the package by making a symbolic link with your code, so that any changes you make are immediately reflected in the installed package.

For an in-depth analysis of the benefits of src-layout, I refer you to this article (which dates back to 2014).

Going further

If you want to learn more about the packaging eco-system I found the resources listed in this article very usefull : 🐍 Best resources on Python packaging 📖

In the meantime here is a non-exhaustive list of alternatives to setuptools you should consider :

  • Pipenv: allows you to jointly manage your project's virtual environment and its dependencies. dependencies. Adds a valuable feature: generation of Pipfile.lock files, which reference reference exact versions of dependencies to enable identical reproduction of the development environment. development environment. Gif montrant les fonctionnalités de pipenv
  • Poetry: a powerful tool for managing virtual environments environments, dependencies (and dependencies on dependencies), generate a poetry.lock file similar to Pipfile.lock, publish your package, etc. However, it does not comply with certain PEP standards. Gif showing poetry features
  • PDM: next-generation package manager for python. Unlike PerformanceEntry, it respects PEP standards.
  • Hatch: Pypa's new tool for managing python projects. It has many interesting features.
  • uv: pip written in rust to make it 10 to 100 times faster.

Bonus

Let's be happy

Everything I've told you here may seem like a lot, and yet I've only skimmed the surface. In any case, I think we can count ourselves lucky when we see what the Sam & Max site wrote in 2018:

First we have distutils, setuptools, distribute, and distribute2 which were all at one time the "standards recommended for packaging a lib.
Then came the days of eggs, exe, and other stuff that easy_install would go and find anywhere in the wild, blindly following links on PyPi.
Not to mention the stuff that had to be compile at every turn.
Besides, nothing was encrypted when downloaded, and pip wasn't packaged with Python.
Python, it was dying on stupid errors like badly managed encoding...
On top of that, virtualenv was a separate thing, with lots of competitors, and linked system packages by default.
Not to mention that we didn't have python -m.

In short, Python packaging was a real mess. Not to mention a shitty documentation.

It's a lot simpler these days.

References

  1. pypa's guide to python packaging: An Overview of Packaging for Python
  2. An article on wheels: What Are Python Wheels and Why Should You Care?
  3. A series of three very enlightening articles on how Python packaging works, written by Bernát Gábor in 2019 :
  4. An article in praise of setup.cfg on the late Sam & Max site: about setup.cfg
  5. Stackoverflow question: What is pyproject.toml file for

Top comments (0)