How can I produce a dependency graph for Python packages? Why PyPI does not state dependencies of Python packages? Let's have a look at these questions and a solution for Python developers.
PyPI, The Python Package Index, is the main source of open-source Python packages. It provides a way to publish, browse as well as obtain open-source Python packages. However, it does not list information about dependencies to users.
Why PyPI Doesn't Know Your Projects Dependencies
Here I would like to refer to an article by Dustin Ingram, one of the PyPI maintainers. The referenced article nicely explains this problem and shows why it is not possible to list all the dependencies for a Python package.
Long story short, the article explains that packaging of Python projects can execute a Python script that computes dependencies on installation time in the target environment. The script can evaluate what dependencies should be installed based on arbitrary code execution which creates the listing of dependencies dynamically. This can be seen powerful as users can express their needs in the code. It might not be necessarily true and handy. The approach causes headaches to maintainers and developers as dependencies are not statically declared and always known deterministically in advance.
This issue is slowly getting fixed with static wheel metadata, but source distributions can still suffer from this issue.
Project Thoth and Python Package Dependencies
Project Thoth offers a cloud Python resolver available publicly as an alternative to pip, Pipenv, or Poetry. Naturally, a resolver needs to know dependency graph to resolve application dependencies. Thoth's trick for obtaining the dependency graph lies in pre-computing dependency information by installing packages into containerized environments.
Imagine a containerized environment such as Fedora 34. It provides a prepared environment which is used to install Python packages - it ships Python interpreter version 3.9 and other software packages in specific versions. The container image provides environment for installing Python packages. And that is what Thoth's background data aggregation logic does. It installs each Python package into the containerized environment and checks what dependencies the given package has in the given container image.
Of course, there can be nuances when a package is not behaving deterministically even in the predefined environment (the example is taken from the linked Dustin's article):
import random from setuptools import setup dependency = random.choice(['Schrodinger', 'Cat']) setup( name='paradox', version='0.0.1', description='A nondeterministic package', install_requires=[dependency], )
This is however rare and considered as a really bad practice. You should not do it. (By the way, Thoth has a solution to fix even this.)
Python Dependency Information in Thoth
A component called thoth-solver is responsible for extracting dependency information together with additional metadata. Other components in Thoth's cloud resolver make sure that the dependency listing is kept up to date with new package releases. Check the following article for more information.
Thoth invests resources to analyze Python packages. Once Python packages are analyzed and dependency information is extracted, data are synced into Thoth's database and made available to users as well as to Thoth's cloud resolver. You can query dependency information on Thoth's API endpoints.
Mind that dependency information is obtained for each containerized environment individually. That way, the dependency information is more accurate than dependency information available on Open Source Insights. Open Source Insights state dependency information for a very specific setup and only default to "latest versions" that were found when the dependency information was obtained or refreshed. Thoth shows all the matching versions of available Python packages even across multiple Python package indexes for selected GNU/Linux distributions.
Consuming Dependency Information
As of now, Thoth provides API endpoints to consume the computed dependency information. API endpoints are publicly available so feel free to consume available dependency data.
To obtain dependency information for package pandas in version 1.3.3 from PyPI in Fedora 34 running Python 3.9, simply issue the following HTTP GET request:
curl -X 'GET' \ 'https://khemenu.thoth-station.ninja/api/v1/python/package/version/metadata?name=pandas&version=1.3.3&index=https%3A%2F%2Fpypi.org%2Fsimple&os_name=fedora&os_version=34&python_version=3.9' \ -H 'accept: application/json'
Note all the dependency versions, respecting extras and environment markers besides other package metadata provided. Additional metadata shown include core Python packaging metadata, files available or packages (modules) brought when installing pandas==1.3.3 from PyPI into the given environment. Check thoth-solver documentation for more information.
You can compare the shown dependency listing with Open Source Insights.
Using Thoth's Resolver
The described dependency data are used in Thoth's resolver. The cloud based resolver uses a reinforcement learning techniques to come up with the best possible libraries for your application. All the dependency resolvers in Python - pip, Pipenv, and Poetry resolve application dependencies to the latest possible versions which might not be always the best choice. Check the following tutorial that will walk you through some security-related aspects of Thoth.
If you wish to give Thoth's cloud resolver a try, install Thamos. Thamos is a command line interface to Thoth's backend:
pip install thamos
Once Thamos is installed, check available environments and add dependencies to your project. Finally, ask Thoth's resolver for an advisory on your application:
thamos environments thamos config thamos add "flask~=2.0.0" thamos advise
Check available help for each Thamos command shown by supplying
--help option. Do not hesitate to provide feedback.
If you wish to be updated with Thoth news, follow @ThothStation on Twitter or check Thoth-Station YouTube channel.
Top comments (0)