Juan Luis Cano Rodríguez

Posted on Jun 30

A whirlwind tour of Python packaging

#python #packaging #opensource

What is even packaging for in the first place?

The goal of packaging is publishing a distribution package, "a piece of software that you can install". In other words: something that can be consumed by developers as a dependency of other projects, which themselves might or might not be packaged.

Many Python projects are not meant to be packaged. Such is the case of web applications, for example. Modern workflow tools make a distinction between applications and libraries because of this reason. All the following discussion assumes Python projects that are prepared to be packaged.

Terminology, standards, metadata

The starting point is the source tree, "containing its raw source code before being packaged". There are two distribution formats: source distributions (sdist) and built distributions, of which wheels is the standard one.

sdists are specified in:

PEP 517 – A build-system independent format for source trees (accepted 2017)
PEP 643 – Metadata for Package Source Distributions (accepted 2020)
PEP 625 – Filename of a Source Distribution (accepted 2022)

wheels are specified in:

PEP 427 – The Wheel Binary Package Format 1.0 (accepted 2013)

The key element that enables consumption of the distribution package by downstream tools is metadata. In particular, core metadata specifications have evolved over the years to standardize a series of core metadata fields.

Core metadata has been specified in a range of PEPs, including:

1.0 in PEP 241 – Metadata for Python Software Packages (accepted 2001)
1.2 in PEP 345 – Metadata for Python Software Packages 1.2
2.1 in PEP 566 – Metadata for Python Software Packages 2.1 (accepted 2018)
2.5 in PEP 794 – Import Name Metadata (accepted 2025)

These fields are stored in different ways:

In pyproject.toml files in source trees as TOML keys, as specified by PEP 621
In PKG-INFO files in source distributions as RFC 822 headers
In *.dist-info/METADATA files in built distributions as RFC 822 headers

By virtue of PEP 621, pyproject.toml might already have a complete representation of the necessary metadata. But in many situations this isn't the case, for example:

The version might come from the version control system or from one of the source tree files (__version__ attribute, an outdated pattern)
The dependencies might depend on the Python version and the operating system
Certain workflow tools might use arbitrary tool-specific configuration (written in the [tool] table of pyproject.toml) to fill the metadata

For this reason, both PEP 621 and PEP 643 allow for metadata fields to be set to dynamic: "intentionally unspecified so another tool can/will provide such metadata". Built distributions, on the other hand, do not allow for dynamic metadata fields: all of them have to be final.

Producing source and built distributions

The process to produce both source distributions and built distributions is standardized in PEP 517, which defines the build backend interface. Any build backend must implement two mandatory hooks:

build_sdist(sdist_directory, config_settings=None) -> str: "Must build a .tar.gz source distribution and place it in the specified sdist_directory. It must return the basename (not the full path) of the .tar.gz file it creates, as a unicode string."
build_wheel(wheel_directory, config_settings=None, metadata_directory=None) -> str: "Must build a .whl file, and place it in the specified wheel_directory. It must return the basename (not the full path) of the .whl file it creates, as a unicode string."

The PEP also proposes some optional hooks (get_requires_for_build_sdist, get_requires_for_build_wheel, and prepare_metadata_for_build_wheel) and defines the shape of config_settings.

Projects must declare their build backend in the [build-system] table of pyproject.toml, by virtue of PEP 518 – Specifying Minimum Build System Requirements for Python Projects and the aforementioned PEP 517. This information is then picked up by build frontends, which are the tools responsible for driving the build process by invoking the backend.

For backwards compatibility purposes, in the absence of a pyproject.toml file build frontends must assume that the backend is setuptools and the hooks are the ones from setuptools.build_meta:__legacy__.

While different build backends might have special features not available in others, they all abide by this common interface (which was introduced on purpose to help move the ecosystem forward. Build frontends are supposed to be relatively minimal, with pypa/build being one example of a "simple, correct" one.

Compiled code (native dependencies)

Certain build backends are specialized in handling non-Python compiled code, also known as native dependencies. Examples include setuptools itself (capable of building C and C++ extension modules), scikit-build-core (CMake), meson-python (Meson), maturin (Rust), and more.

At the same time, compiled code, by its own nature, is not cross-platform. Packagers may need to produce different built distributions for different combinations of Python version and operating system, and installers need to know which built distribution to download. To this end, a series of platform compatibility tags are included in wheel filenames. The first one of such tags was manylinux1, defined in PEP 513 – A Platform Tag for Portable Linux Built Distributions, and many more have been defined over the years.

Workflow tools

Workflow tools are the user-facing layer of Python packaging. A special one is pip, which is included in the standard library thanks to the ensurepip mechanism and is widely regarded as "official". Many other tools have been developed over the years with the goal of simplifying the management of complex projects, improving the user experience, and experimenting with new features. Examples include Pipenv, pip-tools, Poetry, PDM, Hatch, and uv.

Most of such tools predate the standards mentioned above, and retrofitted them over the years with varying degrees of success. A full historical perspective is out of scope for this text. At the time of writing these lines, the two most downloaded ones are pip and uv.

Crucially though, because the build process is standardized and build backends are declared in pyproject.toml, any standards-compliant build frontend should be able to build a source distribution and a built distribution of any project, regardless of what workflow tool was used to develop it. In other words: worfklow tools are consumed by humans but are decoupled from the distribution build process.

Reproducible environments (or "what about `requirements.txt`?")

One of the staples of Python packaging that hasn't been discussed yet are requirements.txt files. They are an interesting case because they are among the oldest and most prevalent mechanisms for statically declaring dependencies and locking environments. So much so that setuptools added experimental support for dynamically loading dependencies from them.

However, these requirements.txt files were never meant to be a packaging standard. In fact, they are pip-specific, they accept pip-specific flags in them, and as such some hacks are required to reliably use them as a source of dependency metadata. The confusion was widespread as early as 2013, and although things have improved since (as discussed above), many developers still rely on them.

Thanks to the pip freeze command, and more so after the popularization of pip-tools, these files have been used to define reproducible environments, or locked dependencies. However, that didn't change the fact that requirements.txt files were never standardized. As such, different workflow tools introduced their own "lockfiles": Pipfile.lock by Pipenv, poetry.lock by Poetry, uv.lock by uv, and so forth.

After several failed attempts, PEP 751 – A file format to record Python dependencies for installation reproducibility finally introduced pylock.toml as a standard format "to enable reproducible installation in a Python environment". Most workflow tools have slowly added export support to it, and it is starting to become more widespread. However, some design decisions made it impossible for workflow tools to completely replace their custom lockfiles, and as such it is unclear when or if these will go away.

Regardless, packagers are still discouraged from using requirements.txt files to record dependencies for their projects, which should be written in the [project.dependencies] and [dependency-groups] tables of pyproject.toml.

The future of Python packaging

Some unaddressed challenges remain, and there are still areas where there is room for improvement. Hot topics discussed in the last Packaging Summit were native dependencies and security.

Consolidating native dependency handling

Challenges around native dependencies are well understood, but the community is in the process of figuring out a path forward that is well-designed, sustainable, and helps traditionally underserved users.

Historically, many subcommunities have been dealing with native dependencies way before all the standards above matured. This has been the case for the scientific computing community (created in the mid-90s, with NumPy and SciPy being their cornerstone projects), later on the PyData community (mid-00s with pandas, matplotlib, PyArrow), and eventually the deep learning community (mid-10s, with Tensorflow and PyTorch).

Because of the limitations of Python packaging tooling and standards, Conda was created in the early 2010's to address some of these shortcomings. Conda is a language-agnostic packaging system, and as such has much better support for non-Python dependencies. It became extremely successful, and it continues to live as a parallel ecosystem to this day.

In the meantime, the PyPI ecosystem caught up and figured out native ways to ship native dependencies as part of built distributions, as explained above. However, this mechanism is suboptimal, and impacts the experience of all the stakeholders:

Packagers need complex mechanisms to compile wheels for different platforms.
Distributors host heavy artifacts, sometimes close to 1 gigabyte in size.
Users might find themselves with broken environments.

What's worse: because these two ecosystems complement each other and don't fully solve 100 % of the use cases, users often need to reconcile or combine them, and this is hard.

As such, several initiatives are under discussion to try to improve the Python packaging experience when native dependencies are involved. A subset of related PEPs follows:

External dependencies: PEP 725 – Specifying external dependencies in pyproject.toml and PEP 804 – An external dependency registry and name mapping mechanism
PEP 771 – Default Extras for Python Software Packages
And more ongoing proposals by the WheelNext initiative

In the meantime, worfklow tools like Pixi, widely used in robotics, try to bridge the gap between the Conda and the PyPI ecosystem.

Security in packaging

A full deep dive on security aspects is out of scope for this text. Still, it is worth noting that there are efforts to harden the Python ecosystem in the face of mounting so-called supply-chain attacks:

PEP 770 – Improving measurability of Python packages with Software Bill-of-Materials was accepted in April 2025
PEP 777 – How to Re-invent the Wheel and the inclusion of Zstandard in the Python 3.14 standard library via PEP 784 laid the ground for constraining packaging artifacts ("No hard links, no device files, no resource forks or NTFS streams, no xattrs")
Work is also ongoing to better propagate "CVE coverage across repackagings of the same project"
After PyPI introduced OIDC-based trusted publishing, PEP 807 seeks to standardize it

Conclusions

It took many years of heated discussions and experimentation, but in 2026 we can safely say that Python packaging is great, actually. Despite the apparent proliferation of packaging-related tools, most aspects have been standardized, and modern workflow tools have made creating, publishing and consuming Python dependencies straightforward.

Lots of historical details have been left out. Some of them are so drama-rich they could be part of a Netflix series. Maybe I will write about those some day, because they do help understand how we got here and what is the way forward.

Still, I hope this brief overview is useful for folks having to deal with this beautiful ecosystem.

If you want to have an more visual overview, stay tuned for the recording of Luca Mancusi’s talk at PyCon Italia 2026. For now, here is a beautiful diagram he kindly shared with me:

Happy packaging!

DEV Community

A whirlwind tour of Python packaging

What is even packaging for in the first place?

Terminology, standards, metadata