<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>DEV Community: Fridolín Pokorný</title>
    <description>The latest articles on DEV Community by Fridolín Pokorný (@fridex).</description>
    <link>https://dev.to/fridex</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F339135%2F3f0210d1-08dc-4990-a33a-30d333332183.jpg</url>
      <title>DEV Community: Fridolín Pokorný</title>
      <link>https://dev.to/fridex</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://dev.to/feed/fridex"/>
    <language>en</language>
    <item>
      <title>When checking your Python package sources matters</title>
      <dc:creator>Fridolín Pokorný</dc:creator>
      <pubDate>Sun, 16 Apr 2023 18:12:26 +0000</pubDate>
      <link>https://dev.to/fridex/when-checking-your-python-package-sources-matters-3p8h</link>
      <guid>https://dev.to/fridex/when-checking-your-python-package-sources-matters-3p8h</guid>
      <description>&lt;p&gt;In today's article, we will take a look at a small tool called &lt;a href="https://pypi.org/project/yorkshire/" rel="noopener noreferrer"&gt;Yorkshire&lt;/a&gt;. It's goal is to check configured Python package indexes in projects to make sure only desired package sources are used.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fhsj5ktj8xc4tepwin3vg.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fhsj5ktj8xc4tepwin3vg.png" alt="Yorkshire"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;A cute Yorkshire terrier, &lt;a href="https://pixabay.com/photos/terrier-dog-cute-puppy-pet-canine-410298/" rel="noopener noreferrer"&gt;Pixabay&lt;/a&gt;.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;Python's packaging allows users to consume packages from multiple sources - multiple Python package indexes. During an installation process, the resolution algorithm implemented in pip searches all the configured package indexes to satisfy requirements.&lt;/p&gt;

&lt;p&gt;The resolution algorithm treats all the configured indexes as mirrors. If a package &lt;code&gt;foo&lt;/code&gt; is available on an index A as well as on an index B, they are both treated with the same relevance, considering versions available. Options &lt;code&gt;--index-url&lt;/code&gt; and &lt;code&gt;--extra-index-url&lt;/code&gt; allow specifying the primary and secondary indexes, but there is no guarantee on which index is actually used. If there is a network issue, the resolution process can use secondary indexes as they are just mirrors.&lt;/p&gt;

&lt;p&gt;In some cases, users want to consume packages from index A and a specific package from index B. As of today, there is no configuration option in pip to specify which index should be used to consume the specific package. This allows dependency confusion attacks, such as &lt;a href="https://pytorch.org/blog/compromised-nightly-dependency/" rel="noopener noreferrer"&gt;the PyTorch incident&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;There was &lt;a href="https://discuss.python.org/t/proposal-preventing-dependency-confusion-attacks-with-the-map-file/23414" rel="noopener noreferrer"&gt;a discussion to prevent dependency confusion attacks using a map file on discuss.python.org&lt;/a&gt;. The idea of the map file was not accepted, nevertheless there was a proposal &lt;a href="https://peps.python.org/pep-0708/" rel="noopener noreferrer"&gt;PEP-708: Extending the Repository API to Mitigate Dependency Confusion Attacks&lt;/a&gt; that pushed the idea of preventing the dependency confusion attacks further.&lt;/p&gt;

&lt;p&gt;Until the &lt;a href="https://peps.python.org/pep-0708/" rel="noopener noreferrer"&gt;PEP-708&lt;/a&gt; gets eventually accepted and implemented, there is a space to check how projects configure their Python package indexes. Even if &lt;a href="https://peps.python.org/pep-0708/" rel="noopener noreferrer"&gt;PEP-708&lt;/a&gt; is accepted, it might be a good idea for organizations to check which indexes are used to monitor consumption of software in their environments.&lt;/p&gt;

&lt;p&gt;To support checks of the index configuration, there was developed a tool called &lt;a href="https://pypi.org/project/yorkshire/" rel="noopener noreferrer"&gt;Yorkshire&lt;/a&gt;. Yorkshire checks any index configuration in files that can be used to specify project dependencies, such as &lt;code&gt;requirements.txt&lt;/code&gt;, &lt;code&gt;pyproject.toml&lt;/code&gt;, or Pipenv files. If there are used multiple Python package indexes, Yorkshire reports it. Optionally, it can check only allowed indexes are configured.&lt;/p&gt;

&lt;p&gt;Let's take &lt;a href="https://python-poetry.org/docs/repositories/#installing-from-private-package-sources" rel="noopener noreferrer"&gt;Poetry's configuration for specifying secondary indexes&lt;/a&gt; as an example. The linked command can generate a &lt;code&gt;pyproject.toml&lt;/code&gt; file similar to this one:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight toml"&gt;&lt;code&gt;&lt;span class="nn"&gt;[tool.poetry]&lt;/span&gt;
&lt;span class="py"&gt;name&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s"&gt;"foo"&lt;/span&gt;
&lt;span class="py"&gt;version&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s"&gt;"1.0.0"&lt;/span&gt;
&lt;span class="py"&gt;description&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s"&gt;"My package"&lt;/span&gt;
&lt;span class="py"&gt;authors&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s"&gt;"Author &amp;lt;author@email.com&amp;gt;"&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;

&lt;span class="nn"&gt;[tool.poetry.dependencies]&lt;/span&gt;
&lt;span class="py"&gt;python&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s"&gt;"^3.6"&lt;/span&gt;
&lt;span class="py"&gt;flask&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s"&gt;"^2"&lt;/span&gt;

&lt;span class="nn"&gt;[[tool.poetry.source]]&lt;/span&gt;
&lt;span class="py"&gt;name&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s"&gt;"private_repo"&lt;/span&gt;
&lt;span class="py"&gt;url&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s"&gt;"https://test.pypi.org/simple/"&lt;/span&gt;
&lt;span class="py"&gt;default&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="kc"&gt;false&lt;/span&gt;
&lt;span class="py"&gt;secondary&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="kc"&gt;true&lt;/span&gt;

&lt;span class="nn"&gt;[build-system]&lt;/span&gt;
&lt;span class="py"&gt;requires&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s"&gt;"poetry-core"&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
&lt;span class="py"&gt;build-backend&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s"&gt;"poetry.core.masonry.api"&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The configuration above allows Poetry to consume packages hosted on &lt;a href="https://pypi.org/" rel="noopener noreferrer"&gt;PyPI&lt;/a&gt; as well as the ones hosted on the &lt;a href="https://test.pypi.org/" rel="noopener noreferrer"&gt;test PyPI&lt;/a&gt;. Which one will be actually used to consume packages? Well, it depends on dependencies of the &lt;code&gt;flask&lt;/code&gt; package. Note the version might be relevant here as well, depending on the environment to which dependencies are installed.&lt;/p&gt;

&lt;p&gt;Let's run Yorkshire on the &lt;code&gt;pyproject.toml&lt;/code&gt; file above:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight console"&gt;&lt;code&gt;&lt;span class="gp"&gt;$&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;yorkshire detect ./pyproject.toml
&lt;span class="go"&gt;2023-04-15 20:13:44,984 [1767887] INFO     yorkshire._lib: Performing detection in pyproject.toml file located at '.'
2023-04-15 20:13:44,985 [1767887] WARNING  yorkshire._lib: File './pyproject.toml' uses an explicitly configured Poetry source: ['https://test.pypi.org/simple/']
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;As can be seen, Yorkshire issues a warning as multiple package sources can be eventually used.&lt;/p&gt;

&lt;p&gt;Next, let's specify the test PyPI to be allowed:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight console"&gt;&lt;code&gt;&lt;span class="gp"&gt;$&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;yorkshire detect &lt;span class="nt"&gt;--allowed-index-url&lt;/span&gt; &lt;span class="s2"&gt;"https://test.pypi.org/simple/"&lt;/span&gt; ./pyproject.toml
&lt;span class="go"&gt;2023-04-15 20:16:33,806 [1773955] INFO     yorkshire._lib: Performing detection in pyproject.toml file located at '.'
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The command above shows that the test PyPI specified in the &lt;code&gt;pyprojec.toml&lt;/code&gt; is now no longer flagged as a possible issue.&lt;/p&gt;

&lt;p&gt;Yorkshire understands requirements file types used in the Python ecosystem. Similarly to the &lt;code&gt;pyproject.toml&lt;/code&gt; configuration specific to &lt;a href="https://python-poetry.org/" rel="noopener noreferrer"&gt;Poetry&lt;/a&gt;, Yorkshire supports configuration as used in &lt;a href="https://pdm.fming.dev/" rel="noopener noreferrer"&gt;PDM&lt;/a&gt;, &lt;a href="https://pipenv.pypa.io/" rel="noopener noreferrer"&gt;Pipenv&lt;/a&gt;, &lt;a href="https://pypi.org/project/pip-tools/" rel="noopener noreferrer"&gt;pip-tools&lt;/a&gt;, or &lt;a href="https://pypi.org/project/pip-tools/" rel="noopener noreferrer"&gt;pip&lt;/a&gt; itself. All the tools have their own specifics (some of them even support assigning packages to an index, as mentioned above).&lt;/p&gt;

&lt;p&gt;Yorkshire &lt;a href="https://github.com/DataDog/yorkshire/blob/14ddb9b9b31dc92f499e0a89d8b070539c9f696d/yorkshire/__init__.py#L14-L15" rel="noopener noreferrer"&gt;provides an API&lt;/a&gt; to eventually incorporate checks into other projects or systems.&lt;/p&gt;

&lt;p&gt;Organizations can use Yorkshire in their checks or monitoring to make sure only trusted Python package indexes are used. Would you find it useful?&lt;/p&gt;

</description>
      <category>python</category>
      <category>programming</category>
      <category>opensource</category>
      <category>security</category>
    </item>
    <item>
      <title>How to get information about the provenance of Python packages installed</title>
      <dc:creator>Fridolín Pokorný</dc:creator>
      <pubDate>Thu, 13 Apr 2023 20:59:28 +0000</pubDate>
      <link>https://dev.to/fridex/how-to-get-information-about-the-provenance-of-python-packages-installed-4f65</link>
      <guid>https://dev.to/fridex/how-to-get-information-about-the-provenance-of-python-packages-installed-4f65</guid>
      <description>&lt;p&gt;Let's take a look on how to obtain information about the provenance of installed packages in the Python ecosystem. This idea is part of &lt;a href="https://peps.python.org/pep-0710/" rel="noopener noreferrer"&gt;PEP-710&lt;/a&gt; which is in a draft state as of today.       &lt;/p&gt;

&lt;p&gt;&lt;a href="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fnxpgm3fgh3m01kjtysxq.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fnxpgm3fgh3m01kjtysxq.png" alt="Image description"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;&lt;a href="https://goo.gl/maps/aJhsQRMb5sBnv2W37" rel="noopener noreferrer"&gt;Židlochovice - Rozhledna Akátová věž; Czech republic&lt;/a&gt;. Image by author.&lt;/p&gt;
&lt;/blockquote&gt;




&lt;blockquote&gt;
&lt;p&gt;The tutorial uses files that are available at &lt;a href="https://github.com/fridex/pip-provenance" rel="noopener noreferrer"&gt;github.com/fridex/pip-provenance&lt;/a&gt;.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;Let's create a simple Python application using &lt;a href="https://github.com/chainguard-images/images/tree/main/images/python" rel="noopener noreferrer"&gt;Chainguard's Python image&lt;/a&gt;. This application will be a simple flask hello world application. The &lt;code&gt;app.py&lt;/code&gt; script will have the following content:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;flask&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;Flask&lt;/span&gt;

&lt;span class="n"&gt;app&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;Flask&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;__name__&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="nd"&gt;@app.route&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;/&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;index&lt;/span&gt;&lt;span class="p"&gt;():&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;Hello, world!&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;

&lt;span class="n"&gt;app&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;run&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;host&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;0.0.0.0&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;port&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;8080&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Additionally, we will create a &lt;code&gt;requirements.in&lt;/code&gt; file with the following content:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;flask
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;We will use &lt;a href="https://pypi.org/project/pip-tools/" rel="noopener noreferrer"&gt;pip-tools&lt;/a&gt; to lock dependencies to specific versions for reproducibility. Also, we will keep hashes of the Python distributions installed:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight console"&gt;&lt;code&gt;&lt;span class="go"&gt;pip-compile --generate-hashes
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The command above will create a &lt;code&gt;requirements.txt&lt;/code&gt; file. An example of such a file can be &lt;a href="https://github.com/fridex/pip-provenance/blob/main/requirements.txt" rel="noopener noreferrer"&gt;found here&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;Next, let's create a containerized environment with our application.&lt;/p&gt;

&lt;h2&gt;
  
  
  Using the upstream pip
&lt;/h2&gt;

&lt;p&gt;First, we will use the upstream pip which is also shipped in Chainguard's images. We can directly take the &lt;a href="https://github.com/chainguard-images/images/tree/main/images/python" rel="noopener noreferrer"&gt;Dockerfile as written by Chainguard&lt;/a&gt; with minimal changes to make sure we have a containerized application:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight docker"&gt;&lt;code&gt;&lt;span class="k"&gt;FROM&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s"&gt;cgr.dev/chainguard/python:latest-dev&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="k"&gt;as&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s"&gt;builder&lt;/span&gt;

&lt;span class="k"&gt;WORKDIR&lt;/span&gt;&lt;span class="s"&gt; /app&lt;/span&gt;
&lt;span class="k"&gt;COPY&lt;/span&gt;&lt;span class="s"&gt; requirements.txt .&lt;/span&gt;
&lt;span class="k"&gt;RUN &lt;/span&gt;pip &lt;span class="nb"&gt;install&lt;/span&gt; &lt;span class="nt"&gt;-r&lt;/span&gt; requirements.txt &lt;span class="nt"&gt;--user&lt;/span&gt;

&lt;span class="k"&gt;FROM&lt;/span&gt;&lt;span class="s"&gt; cgr.dev/chainguard/python:latest&lt;/span&gt;

&lt;span class="k"&gt;WORKDIR&lt;/span&gt;&lt;span class="s"&gt; /app&lt;/span&gt;
&lt;span class="c"&gt;# Make sure you update Python version in path&lt;/span&gt;
&lt;span class="k"&gt;COPY&lt;/span&gt;&lt;span class="s"&gt; --from=builder /home/nonroot/.local/lib/python3.11/site-packages /home/nonroot/.local/lib/python3.11/site-packages&lt;/span&gt;
&lt;span class="k"&gt;COPY&lt;/span&gt;&lt;span class="s"&gt; app.py .&lt;/span&gt;

&lt;span class="k"&gt;ENTRYPOINT&lt;/span&gt;&lt;span class="s"&gt; ["python", "/app/app.py"]&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The containerized application can be built:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;podman build -f raw/Dockerfile -t pip-provenance:raw .
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Subsequently, the built application can be run and accessed at &lt;a href="https://localhost:8080" rel="noopener noreferrer"&gt;locahost:8080&lt;/a&gt;:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;podman run -p 8080:8080 pip-provenance:raw
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Now, let's imagine someone published this image to a registry and we would like to get information about the packages installed. We can pull the &lt;code&gt;pip-provenance:raw&lt;/code&gt; image and run &lt;code&gt;pip freeze&lt;/code&gt;. Unfortunately, &lt;code&gt;pip freeze&lt;/code&gt; shows only Python packages installed and their versions:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight console"&gt;&lt;code&gt;&lt;span class="gp"&gt;$&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;pip freeze                     
&lt;span class="go"&gt;click==8.1.3
Flask==2.2.3
itsdangerous==2.1.2
Jinja2==3.1.2
MarkupSafe==2.1.2
Werkzeug==2.2.3
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;We don't have any information from where these packages were actually installed. Also, we do not have any information on digests of these packages. An exception are packages installed using a direct URL following &lt;a href="https://peps.python.org/pep-0610/" rel="noopener noreferrer"&gt;PEP-610&lt;/a&gt;, but that's not the case in our example.&lt;/p&gt;

&lt;h2&gt;
  
  
  Using the patched pip
&lt;/h2&gt;

&lt;p&gt;There was a proposal in &lt;a href="https://peps.python.org/pep-0710/" rel="noopener noreferrer"&gt;PEP-710&lt;/a&gt; to store provenance information about the installed packages when they are identified using their name, and optionally their version (which is our example). Let's take a look on what information is stored and how we could access it.&lt;/p&gt;

&lt;p&gt;First, let's adjust our Dockerfile to use &lt;a href="https://github.com/pypa/pip/pull/11865" rel="noopener noreferrer"&gt;a patched version of pip&lt;/a&gt; that follows &lt;a href="https://peps.python.org/pep-0710/" rel="noopener noreferrer"&gt;PEP-710&lt;/a&gt;:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight docker"&gt;&lt;code&gt;&lt;span class="k"&gt;FROM&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s"&gt;cgr.dev/chainguard/python:latest-dev&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="k"&gt;as&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s"&gt;builder&lt;/span&gt;

&lt;span class="k"&gt;WORKDIR&lt;/span&gt;&lt;span class="s"&gt; /app&lt;/span&gt;
&lt;span class="k"&gt;COPY&lt;/span&gt;&lt;span class="s"&gt; requirements.txt .&lt;/span&gt;
&lt;span class="c"&gt;# -----&amp;gt;%------&lt;/span&gt;
&lt;span class="k"&gt;USER&lt;/span&gt;&lt;span class="s"&gt; root&lt;/span&gt;
&lt;span class="k"&gt;RUN &lt;/span&gt;pip &lt;span class="nb"&gt;install&lt;/span&gt; &lt;span class="nt"&gt;--force-reinstall&lt;/span&gt; pip &lt;span class="nb"&gt;install &lt;/span&gt;git+https://github.com/fridex/pip.git@provenance-url
&lt;span class="k"&gt;USER&lt;/span&gt;&lt;span class="s"&gt; nonroot&lt;/span&gt;
&lt;span class="c"&gt;# -----%&amp;lt;------&lt;/span&gt;
&lt;span class="k"&gt;RUN &lt;/span&gt;pip &lt;span class="nb"&gt;install&lt;/span&gt; &lt;span class="nt"&gt;-r&lt;/span&gt; requirements.txt &lt;span class="nt"&gt;--user&lt;/span&gt;

&lt;span class="k"&gt;FROM&lt;/span&gt;&lt;span class="s"&gt; cgr.dev/chainguard/python:latest&lt;/span&gt;

&lt;span class="k"&gt;WORKDIR&lt;/span&gt;&lt;span class="s"&gt; /app&lt;/span&gt;
&lt;span class="c"&gt;# Make sure you update Python version in path&lt;/span&gt;
&lt;span class="k"&gt;COPY&lt;/span&gt;&lt;span class="s"&gt; --from=builder /home/nonroot/.local/lib/python3.11/site-packages /home/nonroot/.local/lib/python3.11/site-packages&lt;/span&gt;
&lt;span class="k"&gt;COPY&lt;/span&gt;&lt;span class="s"&gt; app.py .&lt;/span&gt;

&lt;span class="k"&gt;ENTRYPOINT&lt;/span&gt;&lt;span class="s"&gt; ["python", "/app/app.py"]&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Let's build this application:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight console"&gt;&lt;code&gt;&lt;span class="go"&gt;podman build -f patched/Dockerfile -t pip-provenance:patched .
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;We can run the application and access it at &lt;a href="http://localhost:8080" rel="noopener noreferrer"&gt;localhost:8080&lt;/a&gt;, the changes introduced in pip will have no effect on it:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight console"&gt;&lt;code&gt;&lt;span class="go"&gt;podman run -p 8080:8080 pip-provenance:patched
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Following &lt;a href="https://peps.python.org/pep-0710/" rel="noopener noreferrer"&gt;PEP-710&lt;/a&gt;, pip stores information about the provenance in &lt;code&gt;*.dist-info&lt;/code&gt; directories that are located in &lt;code&gt;site-packages&lt;/code&gt;. Let's copy the &lt;code&gt;site-packages&lt;/code&gt; directory out of the containerized environment so that we can check what was installed there (substitute &lt;code&gt;[CONTAINER_HASH]&lt;/code&gt; with the hash of the containerized environment that was run in the previous example):&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;podman cp [CONTAINER_HASH]:/home/nonroot/.local/lib/python3.11/site-packages site-packages
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;We can take a look at &lt;code&gt;provenance_url.json&lt;/code&gt; file for package &lt;code&gt;flask&lt;/code&gt;*:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight console"&gt;&lt;code&gt;&lt;span class="gp"&gt;$&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nb"&gt;cat&lt;/span&gt; ./site-packages/Flask-2.2.3.dist-info/provenance_url.json | jq
&lt;span class="go"&gt;{
  "archive_info": {
    "hash": "sha256=c0bec9477df1cb867e5a67c9e1ab758de9cb4a3e52dd70681f59fa40a62b3f2d",
    "hashes": {
      "sha256": "c0bec9477df1cb867e5a67c9e1ab758de9cb4a3e52dd70681f59fa40a62b3f2d"
    }
  },
  "url": "https://files.pythonhosted.org/packages/95/9c/a3542594ce4973786236a1b7b702b8ca81dbf40ea270f0f96284f0c27348/Flask-2.2.3-py3-none-any.whl"
}
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This file is created by the patched pip and is described more in detail in &lt;a href="https://peps.python.org/pep-0710/" rel="noopener noreferrer"&gt;PEP-710&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;A small tool, called &lt;a href="https://github.com/fridex/pip-preserve" rel="noopener noreferrer"&gt;pip-preserve&lt;/a&gt;, can read content of the &lt;code&gt;site-packages&lt;/code&gt; directory and understands the &lt;code&gt;provenance_url.json&lt;/code&gt; for each Python package installed. Moreover, if a package was installed using a direct URL, the tool can also read &lt;code&gt;direct_url.json&lt;/code&gt; as described in &lt;a href="https://peps.python.org/pep-0610/" rel="noopener noreferrer"&gt;PEP-610&lt;/a&gt; to fully reconstruct the environment. Let's use the tool on our &lt;code&gt;site-packages&lt;/code&gt; directory from the containerized environment:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight console"&gt;&lt;code&gt;&lt;span class="gp"&gt;$&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;pip &lt;span class="nb"&gt;install &lt;/span&gt;pip-preserve
&lt;span class="c"&gt;...
&lt;/span&gt;&lt;span class="gp"&gt;$&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;pip-preserve &lt;span class="nt"&gt;--ignore-errors&lt;/span&gt; &lt;span class="nt"&gt;--site-packages&lt;/span&gt; ./site-packages      
&lt;span class="gp"&gt;#&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="gp"&gt;#&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;This file is autogenerated by pip-preserve version 0.0.2.post1 with Python 3.10.6.
&lt;span class="gp"&gt;#&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="go"&gt;click==8.1.3 \
  --hash=sha256:bb4d8133cb15a609f44e8213d9b391b0809795062913b383c62be0ee95b1db48
flask==2.2.3 \
  --hash=sha256:c0bec9477df1cb867e5a67c9e1ab758de9cb4a3e52dd70681f59fa40a62b3f2d
itsdangerous==2.1.2 \
  --hash=sha256:2c2349112351b88699d8d4b6b075022c0808887cb7ad10069318a8b0bc88db44
jinja2==3.1.2 \
  --hash=sha256:6088930bfe239f0e6710546ab9c19c9ef35e29792895fed6e6e31a023a182a61
markupsafe==2.1.2 \
  --hash=sha256:f2bfb563d0211ce16b63c7cb9395d2c682a23187f54c3d79bfec33e6705473c6
werkzeug==2.2.3 \
  --hash=sha256:56433961bc1f12533306c624f3be5e744389ac61d722175d543e1751285da612
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;As you can see, the tool reconstructed &lt;code&gt;requirements.txt&lt;/code&gt; file, listing all the packages installed together with their versions and hashes.&lt;/p&gt;

&lt;p&gt;A reader can notice that the reconstructed file has only one hash per package. The reason is that pip installs only one package. Our original &lt;code&gt;requirements.txt&lt;/code&gt; file &lt;a href="https://github.com/fridex/pip-provenance/blob/main/requirements.txt" rel="noopener noreferrer"&gt;lists multiple hashes that correspond to Python distributions as published on PyPI&lt;/a&gt; at the time the &lt;code&gt;pip-compile&lt;/code&gt; command was run. On installation time, pip takes the one that is matching the environment to which the Python distribution is installed. For example, pip took the wheel file published for &lt;a href="https://pypi.org/project/Flask/2.2.3/#files" rel="noopener noreferrer"&gt;flask==2.2.3&lt;/a&gt;, not the source distribution available on PyPI (you can verify it by checking artifact hashes). Using the patched version of pip, we can point to the exact artifact that was installed.&lt;/p&gt;

&lt;p&gt;If we pass &lt;code&gt;--direct-url&lt;/code&gt; option to the &lt;code&gt;pip-preserve&lt;/code&gt; tool, we can get exact URLs from where Python packages were installed:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;$ pip-preserve --ignore-errors --direct-url --site-packages ./site-packages
#
# This file is autogenerated by pip-preserve version 0.0.2.post1 with Python 3.10.6.
#
https://files.pythonhosted.org/packages/c2/f1/df59e28c642d583f7dacffb1e0965d0e00b218e0186d7858ac5233dce840/click-8.1.3-py3-none-any.whl \
  --hash=sha256:bb4d8133cb15a609f44e8213d9b391b0809795062913b383c62be0ee95b1db48
https://files.pythonhosted.org/packages/95/9c/a3542594ce4973786236a1b7b702b8ca81dbf40ea270f0f96284f0c27348/Flask-2.2.3-py3-none-any.whl \
  --hash=sha256:c0bec9477df1cb867e5a67c9e1ab758de9cb4a3e52dd70681f59fa40a62b3f2d
https://files.pythonhosted.org/packages/68/5f/447e04e828f47465eeab35b5d408b7ebaaaee207f48b7136c5a7267a30ae/itsdangerous-2.1.2-py3-none-any.whl \
  --hash=sha256:2c2349112351b88699d8d4b6b075022c0808887cb7ad10069318a8b0bc88db44
https://files.pythonhosted.org/packages/bc/c3/f068337a370801f372f2f8f6bad74a5c140f6fda3d9de154052708dd3c65/Jinja2-3.1.2-py3-none-any.whl \
  --hash=sha256:6088930bfe239f0e6710546ab9c19c9ef35e29792895fed6e6e31a023a182a61
https://files.pythonhosted.org/packages/5a/94/d056bf5dbadf7f4b193ee2a132b3d49ffa1602371e3847518b2982045425/MarkupSafe-2.1.2-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl \
  --hash=sha256:f2bfb563d0211ce16b63c7cb9395d2c682a23187f54c3d79bfec33e6705473c6
https://files.pythonhosted.org/packages/f6/f8/9da63c1617ae2a1dec2fbf6412f3a0cfe9d4ce029eccbda6e1e4258ca45f/Werkzeug-2.2.3-py3-none-any.whl \
  --hash=sha256:56433961bc1f12533306c624f3be5e744389ac61d722175d543e1751285da612
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  Why is this useful?
&lt;/h2&gt;

&lt;p&gt;Okay, now we know we can get information about the provenance of installed packages using &lt;a href="https://peps.python.org/pep-0710/" rel="noopener noreferrer"&gt;PEP-710&lt;/a&gt;. If we take a look at other packages, such as &lt;a href="https://www.tensorflow.org/" rel="noopener noreferrer"&gt;TensorFlow&lt;/a&gt;, we can see that there are published &lt;a href="https://pypi.org/project/tensorflow/2.12.0/#files" rel="noopener noreferrer"&gt;multiple wheel files&lt;/a&gt; - each corresponding to a specific environment. If we just &lt;code&gt;pip install tensorflow&lt;/code&gt;, which wheel file is actually used (assuming we do not have access to installation logs all the time)?&lt;/p&gt;

&lt;p&gt;Also note, there can be specific builds of Python packages hosted on a private Python package index. These wheels can be built with options that might not be expressed using &lt;a href="https://packaging.python.org/en/latest/specifications/platform-compatibility-tags/" rel="noopener noreferrer"&gt;wheel tags&lt;/a&gt;. If you are using a Python environment (not necessarily a containerized environment), how do you know the provenance of the Python packages installed (without accessing installation logs, or eventually any build configuration)?&lt;/p&gt;

&lt;p&gt;Built containerized environments used in this article are available at &lt;a href="https://hub.docker.com/repository/docker/fridex/pip-provenance/general" rel="noopener noreferrer"&gt;docker.io/fridex/pip-provenance&lt;/a&gt;:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight console"&gt;&lt;code&gt;&lt;span class="go"&gt;podman pull fridex/pip-provenance:raw
podman pull fridex/pip-provenance:patched
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;You can follow related discussion about &lt;a href="https://peps.python.org/pep-0710/" rel="noopener noreferrer"&gt;PEP-710&lt;/a&gt; at &lt;a href="https://discuss.python.org/t/pep-710-recording-the-provenance-of-installed-packages/25428" rel="noopener noreferrer"&gt;discuss.python.org&lt;/a&gt;.&lt;/p&gt;




&lt;p&gt;*Even though the &lt;code&gt;provenance_url.json&lt;/code&gt; files produced by the patched pip keep the &lt;code&gt;hash&lt;/code&gt; key, PEP-710 does not define it. The patched pip implementation uses code that is defined by PEP-610 (direct URL). The &lt;code&gt;hash&lt;/code&gt; key is now deprecated in the &lt;code&gt;direct_url.json&lt;/code&gt; file introduced by PEP-610.&lt;/p&gt;

</description>
      <category>python</category>
      <category>security</category>
      <category>containers</category>
      <category>community</category>
    </item>
    <item>
      <title>Why PyPI Doesn't Know Your Projects Dependencies but Thoth Does</title>
      <dc:creator>Fridolín Pokorný</dc:creator>
      <pubDate>Sun, 13 Feb 2022 16:50:14 +0000</pubDate>
      <link>https://dev.to/fridex/why-pypi-doesnt-know-your-projects-dependencies-but-thoth-does-4eji</link>
      <guid>https://dev.to/fridex/why-pypi-doesnt-know-your-projects-dependencies-but-thoth-does-4eji</guid>
      <description>&lt;p&gt;How can I produce a dependency graph for Python packages? Why PyPI does not state dependencies of Python packages? Let's have a look at these questions and a solution for Python developers.&lt;/p&gt;

&lt;h2&gt;
  
  
  PyPI
&lt;/h2&gt;

&lt;p&gt;&lt;a href="https://pypi.org/" rel="noopener noreferrer"&gt;PyPI&lt;/a&gt;, The Python Package Index, is the main source of open-source Python packages. It provides a way to publish, browse as well as obtain open-source Python packages. However, it does not list information about dependencies to users.&lt;/p&gt;

&lt;h2&gt;
  
  
  Why PyPI Doesn't Know Your Projects Dependencies
&lt;/h2&gt;

&lt;p&gt;Here I would like to &lt;a href="https://dustingram.com/articles/2018/03/05/why-pypi-doesnt-know-dependencies/" rel="noopener noreferrer"&gt;refer to an article by Dustin Ingram&lt;/a&gt;, one of the PyPI maintainers. The referenced article nicely explains this problem and shows why it is not possible to list all the dependencies for a Python package.&lt;/p&gt;

&lt;p&gt;Long story short, the article explains that packaging of Python projects can execute a Python script that computes dependencies on installation time in the target environment. The script can evaluate what dependencies should be installed based on arbitrary code execution which creates the listing of dependencies dynamically. This can be seen powerful as users can express their needs in the code. It might not be necessarily true and handy. The approach causes headaches to maintainers and developers as dependencies are not statically declared and always known deterministically in advance.&lt;/p&gt;

&lt;p&gt;This issue is slowly getting fixed with static wheel metadata, but source distributions can still suffer from this issue.&lt;/p&gt;

&lt;h2&gt;
  
  
  Project Thoth and Python Package Dependencies
&lt;/h2&gt;

&lt;p&gt;&lt;a href="https://thoth-station.ninja/" rel="noopener noreferrer"&gt;Project Thoth&lt;/a&gt; offers &lt;a href="https://developers.redhat.com/articles/2021/11/17/customize-python-dependency-resolution-machine-learning" rel="noopener noreferrer"&gt;a cloud Python resolver available publicly&lt;/a&gt; as an alternative to pip, Pipenv, or Poetry. Naturally, a resolver needs to know dependency graph to resolve application dependencies. Thoth's trick for obtaining the dependency graph lies in pre-computing dependency information by installing packages into containerized environments.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fgd5yglayrh7u3lgp02t7.jpg" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fgd5yglayrh7u3lgp02t7.jpg" alt="Image description"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Imagine a containerized environment such as Fedora 34. It provides a prepared environment which is used to install Python packages - it ships Python interpreter version 3.9 and other software packages in specific versions. The container image provides environment for installing Python packages. And that is what Thoth's &lt;a href="https://developers.redhat.com/blog/2021/04/26/continuous-learning-in-project-thoth-using-kafka-and-argo" rel="noopener noreferrer"&gt;background data aggregation logic&lt;/a&gt; does. It installs each Python package into the containerized environment and checks what dependencies the given package has in the given container image.&lt;/p&gt;

&lt;p&gt;Of course, there can be nuances when a package is not behaving deterministically even in the predefined environment (the example is taken from the linked Dustin's article):&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;random&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;setuptools&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;setup&lt;/span&gt;

&lt;span class="n"&gt;dependency&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;random&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;choice&lt;/span&gt;&lt;span class="p"&gt;([&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;Schrodinger&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;Cat&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;])&lt;/span&gt;

&lt;span class="nf"&gt;setup&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;name&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;paradox&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;version&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;0.0.1&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;description&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;A nondeterministic package&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;install_requires&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;dependency&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This is however rare and considered as a &lt;em&gt;really&lt;/em&gt; bad practice. You should not do it. (By the way, Thoth has &lt;a href="https://developers.redhat.com/articles/2021/09/22/thoth-prescriptions-resolving-python-dependencies" rel="noopener noreferrer"&gt;a solution to fix even this&lt;/a&gt;.)&lt;/p&gt;

&lt;h2&gt;
  
  
  Python Dependency Information in Thoth
&lt;/h2&gt;

&lt;p&gt;A component called &lt;a href="https://github.com/thoth-station/solver" rel="noopener noreferrer"&gt;thoth-solver&lt;/a&gt; is responsible for extracting dependency information together with additional metadata. Other components in Thoth's cloud resolver make sure that the dependency listing is kept up to date with new package releases. Check &lt;a href="https://developers.redhat.com/articles/2022/01/14/extracting-dependencies-python-packages" rel="noopener noreferrer"&gt;the following article for more information&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;Thoth invests resources to analyze Python packages. Once Python packages are analyzed and dependency information is extracted, data are synced into Thoth's database and made available to users as well as to Thoth's cloud resolver. You can query dependency information on &lt;a href="https://khemenu.thoth-station.ninja/" rel="noopener noreferrer"&gt;Thoth's API endpoints&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;Mind that dependency information is obtained for each containerized environment individually. That way, the dependency information is more accurate than dependency information available on &lt;a href="https://deps.dev/" rel="noopener noreferrer"&gt;Open Source Insights&lt;/a&gt;. Open Source Insights state dependency information for &lt;a href="https://blog.deps.dev/pypi/" rel="noopener noreferrer"&gt;a very specific setup&lt;/a&gt; and only default to "latest versions" that were found when the dependency information was obtained or refreshed. Thoth shows all the matching versions of available Python packages even &lt;a href="https://developers.redhat.com/articles/2021/12/21/prevent-python-dependency-confusion-attacks-thoth" rel="noopener noreferrer"&gt;across multiple Python package indexes&lt;/a&gt; for selected GNU/Linux distributions.&lt;/p&gt;

&lt;h2&gt;
  
  
  Consuming Dependency Information
&lt;/h2&gt;

&lt;p&gt;As of now, Thoth &lt;a href="https://khemenu.thoth-station.ninja/" rel="noopener noreferrer"&gt;provides API endpoints&lt;/a&gt; to consume the computed dependency information. API endpoints are publicly available so feel free to consume available dependency data.&lt;/p&gt;

&lt;p&gt;To obtain dependency information for &lt;a href="https://pypi.org/project/pandas/1.3.3/" rel="noopener noreferrer"&gt;package pandas in version 1.3.3&lt;/a&gt; from PyPI in Fedora 34 running Python 3.9, simply issue &lt;a href="https://khemenu.thoth-station.ninja/api/v1/python/package/version/metadata?name=pandas&amp;amp;version=1.3.3&amp;amp;index=https%3A%2F%2Fpypi.org%2Fsimple&amp;amp;os_name=fedora&amp;amp;os_version=34&amp;amp;python_version=3.9" rel="noopener noreferrer"&gt;the following HTTP GET request&lt;/a&gt;:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;curl -X 'GET' \
  'https://khemenu.thoth-station.ninja/api/v1/python/package/version/metadata?name=pandas&amp;amp;version=1.3.3&amp;amp;index=https%3A%2F%2Fpypi.org%2Fsimple&amp;amp;os_name=fedora&amp;amp;os_version=34&amp;amp;python_version=3.9' \
  -H 'accept: application/json'
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Note all the dependency versions, respecting &lt;a href="https://www.python.org/dev/peps/pep-0508/" rel="noopener noreferrer"&gt;extras and environment markers&lt;/a&gt; besides other package metadata provided. Additional metadata shown include &lt;a href="https://packaging.python.org/en/latest/specifications/core-metadata/" rel="noopener noreferrer"&gt;core Python packaging metadata&lt;/a&gt;, files available or packages (modules) brought when installing pandas==1.3.3 from PyPI into the given environment. &lt;a href="https://github.com/thoth-station/solver#produced-output" rel="noopener noreferrer"&gt;Check thoth-solver documentation&lt;/a&gt; for more information.&lt;/p&gt;

&lt;p&gt;You can &lt;a href="https://deps.dev/pypi/pandas/1.3.3" rel="noopener noreferrer"&gt;compare the shown dependency listing with Open Source Insights&lt;/a&gt;.&lt;/p&gt;

&lt;h2&gt;
  
  
  Using Thoth's Resolver
&lt;/h2&gt;

&lt;p&gt;The described dependency data are used in Thoth's resolver. The cloud based &lt;a href="https://developers.redhat.com/articles/2021/11/17/customize-python-dependency-resolution-machine-learning" rel="noopener noreferrer"&gt;resolver uses a reinforcement learning techniques&lt;/a&gt; to come up with the best possible libraries for your application. All the dependency resolvers in Python - pip, Pipenv, and Poetry resolve application dependencies to the latest possible versions which might not be always the best choice. &lt;a href="https://redhat-scholars.github.io/managing-vulnerabilities-with-thoth" rel="noopener noreferrer"&gt;Check the following tutorial&lt;/a&gt; that will walk you through some security-related aspects of Thoth.&lt;/p&gt;

&lt;p&gt;If you wish to give Thoth's cloud resolver a try, install &lt;a href="https://pypi.org/project/thamos" rel="noopener noreferrer"&gt;Thamos&lt;/a&gt;. Thamos is a command line interface to Thoth's backend:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight console"&gt;&lt;code&gt;&lt;span class="go"&gt;pip install thamos
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Once Thamos is installed, check available environments and add dependencies to your project. Finally, ask Thoth's resolver for an advisory on your application:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight console"&gt;&lt;code&gt;&lt;span class="go"&gt;thamos environments
thamos config
thamos add "flask~=2.0.0"
thamos advise
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Check available help for each Thamos command shown by supplying &lt;code&gt;--help&lt;/code&gt; option. Do not hesitate &lt;a href="https://github.com/thoth-station/support" rel="noopener noreferrer"&gt;to provide feedback&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;If you wish to be updated with Thoth news, follow &lt;a href="http://twitter.com/ThothStation" rel="noopener noreferrer"&gt;@ThothStation on Twitter&lt;/a&gt; or &lt;a href="https://www.youtube.com/channel/UClUIDuq_hQ6vlzmqM59B2Lw" rel="noopener noreferrer"&gt;check Thoth-Station YouTube channel&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;🧙🪄🐍&lt;/p&gt;

</description>
      <category>python</category>
      <category>programming</category>
      <category>opensource</category>
      <category>security</category>
    </item>
    <item>
      <title>(Late) Hacktoberfest 2021 x Monstarlab Prague</title>
      <dc:creator>Fridolín Pokorný</dc:creator>
      <pubDate>Sun, 14 Nov 2021 18:47:11 +0000</pubDate>
      <link>https://dev.to/fridex/late-hacktoberfest-2021-x-monstarlab-prague-5ed</link>
      <guid>https://dev.to/fridex/late-hacktoberfest-2021-x-monstarlab-prague-5ed</guid>
      <description>&lt;p&gt;&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--plSAbNsE--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_800/https://dev-to-uploads.s3.amazonaws.com/uploads/articles/gqml0jxm42bnsebu4g3i.png" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--plSAbNsE--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_800/https://dev-to-uploads.s3.amazonaws.com/uploads/articles/gqml0jxm42bnsebu4g3i.png" alt="Image description" width="800" height="400"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;COVID19 made IT conferences and meetups virtual. It feels still more natural to me to meet people, exchange ideas and socialize. Surprisingly, an e-mail landed to my inbox one day:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;Hello Fridolín!&lt;/p&gt;

&lt;p&gt;My name is Tiago, I'm organizing once again the Hacktoberfest in Prague and I found your profile on GitHub and would like to ask you if you're interested in joining us in the event and maybe do a small talk (around 20-25min) related to open source.&lt;br&gt;
%&amp;lt; snip &amp;gt;%&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;That's how the invite to "&lt;a href="https://www.meetup.com/Monstarlab-Prague/events/281441439"&gt;(Late) Hacktoberfest 2021 x Monsterlab Prague&lt;/a&gt;" looked like. The conversation with &lt;a href="https://www.linkedin.com/in/taraujodesouza/"&gt;Tiago&lt;/a&gt; resulted in an accepted talk titled "&lt;a href="https://www.youtube.com/watch?v=-Kcrx3ASbaw&amp;amp;list=PLokviu1ft5w3HupZRe2hZK9jG37eEo4YG&amp;amp;index=5"&gt;Full-time Open Source&lt;/a&gt;". The whole meetup was in a very friendly atmosphere with different topics from various areas of open-source (... of course, the pizza was tasty! 😋🍕).&lt;/p&gt;

&lt;p&gt;Check &lt;a href="https://www.youtube.com/playlist?list=PLokviu1ft5w3HupZRe2hZK9jG37eEo4YG"&gt;Hacktoberfest 2021 @ Monsterlab Prague playlist on YouTube&lt;/a&gt; with talks recorded. Big thanks to Tiago and other meetup organizers. I would definitely recommend watching interesting talks and hopefully seeing everyone who is interested in open-source joining this meetup next year.&lt;/p&gt;

&lt;p&gt;If you are not familiar with Hacktoberfest, &lt;a href="https://hacktoberfest.digitalocean.com/"&gt;visit their homepage&lt;/a&gt; to change the world with open-source and win a cool t-shirt or (newly) &lt;a href="https://tree-nation.com/profile/digitalocean"&gt;plant a tree&lt;/a&gt;! 👕🌲&lt;/p&gt;

</description>
      <category>hacktoberfest</category>
      <category>opensource</category>
      <category>meetup</category>
    </item>
    <item>
      <title>How to beat Python’s pip: Software stack resolution pipelines</title>
      <dc:creator>Fridolín Pokorný</dc:creator>
      <pubDate>Mon, 16 Nov 2020 20:44:22 +0000</pubDate>
      <link>https://dev.to/fridex/how-to-beat-python-s-pip-software-stack-resolution-pipelines-19kg</link>
      <guid>https://dev.to/fridex/how-to-beat-python-s-pip-software-stack-resolution-pipelines-19kg</guid>
      <description>&lt;p&gt;&lt;a href="https://dev.to/fridex/how-to-beat-python-s-pip-reinforcement-learning-based-dependency-resolution-2he2"&gt;Following our previous article about reinforcement learning-based dependency resolution&lt;/a&gt;, we will take a look at actions taken during the resolution process. An example can be resolving &lt;a href="https://pypi.org/project/intel-tensorflow/"&gt;intel-tensorflow&lt;/a&gt; instead of &lt;a href="https://pypi.org/project/tensorflow/"&gt;tensorflow&lt;/a&gt; following programmable rules.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--HHMai6B0--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_800/https://dev-to-uploads.s3.amazonaws.com/i/9585d4vejsl3qb6pkmst.png" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--HHMai6B0--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_800/https://dev-to-uploads.s3.amazonaws.com/i/9585d4vejsl3qb6pkmst.png" alt="Alt Text" width="800" height="455"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;Lungern, Switzerland. Image by the author.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;h1&gt;
  
  
  Dependency graphs and software stack resolution
&lt;/h1&gt;

&lt;p&gt;Users and maintainers have limited control with additional semantics when it comes to dependencies installed and resolved in Python applications. Tools, such as &lt;a href="https://pypi.org/project/pip/"&gt;pip&lt;/a&gt;, &lt;a href="https://pypi.org/project/pipenv/"&gt;Pipenv&lt;/a&gt;, or &lt;a href="https://pypi.org/project/poetry/"&gt;Poetry&lt;/a&gt; resolve a dependency stack to the latest possible candidate following dependency specification stated by application developers (direct dependencies) and by library maintainers (transitive dependencies). This can become a limitation, especially considering applications that were written months or years ago and require non-zero attention in maintenance. A trigger to revisit the dependency stack can be &lt;a href="https://www.cvedetails.com/product/53738/Google-Tensorflow.html?vendor_id=1224"&gt;a security vulnerability found in the software stack&lt;/a&gt; or an observation the given software stack is no longer suitable for the given task and should be upgraded or downgraded (e.g. performance improvements in releases).&lt;/p&gt;

&lt;p&gt;&lt;a href="https://developers.redhat.com/blog/2020/09/30/ai-software-stack-inspection-with-thoth-and-tensorflow/"&gt;Even the resolution of the latest software can lead to issues that library maintainers haven’t spotted or haven’t considered&lt;/a&gt;. We already know that the state space of all the possible software stacks in the Python ecosystem is in many cases &lt;a href="https://dev.to/fridex/how-to-beat-python-s-pip-a-brief-intro-4bec"&gt;too large to explore and evaluate&lt;/a&gt;. Moreover, dependencies in the application stack evolve over time and &lt;a href="https://thoth-station.ninja/j/tf_41902.html"&gt;underpinning or overpinning dependencies happen quite often&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;Another aspect to consider is the human being itself. The complexity behind libraries and application stacks becomes a field of expertise on their own. &lt;em&gt;What’s the best performing TensorFlow stack for the given application running on specific hardware&lt;/em&gt;? Should I use &lt;a href="https://tensorflow.pypi.thoth-station.ninja/"&gt;a Red Hat build&lt;/a&gt;, an &lt;a href="https://pypi.org/project/intel-tensorflow/"&gt;Intel build&lt;/a&gt;, or a &lt;a href="https://pypi.org/project/tensorflow/"&gt;Google TensorFlow build&lt;/a&gt;? All of the companies stated have dedicated teams that focus on the performance of the builds produced and there is required certain manpower to quantify these questions. The performance aspect described is just another item in the vector coming to the application stack quality evaluation.&lt;/p&gt;

&lt;h1&gt;
  
  
  Software stack resolution pipeline and pipeline configuration
&lt;/h1&gt;

&lt;p&gt;Let’s promote the whole resolution process and let’s make it server-side. In that case, the resolver can use a shared database of knowledge that can assist with the software stack resolution. The whole resolution process can be treated as a pipeline made out of units that cooperate together to form the most suitable stack for user needs.&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;The server-side resolution is not required, but it definitely helps with the whole process. Users are not required to maintain the database and serving software stacks as a service has also other pros (e.g. allocated pool of resources).&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;The software stack resolution pipeline can:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;inject new packages&lt;/strong&gt; or new package versions to the dependency graph based on packages resolved (e.g. a package accidentally not stated as a dependency of a library, dependency underpinning issues, ...)&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;remove a dependency&lt;/strong&gt; in a specific version or the whole dependency with its dependency subgraph from the dependency graph and let resolver find another resolution path (e.g. a package accidentally stated as a dependency, missing ABI symbols in the runtime environment, dependency overpinning issues, ...)&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;score a package occurring in the dependency graph positively — &lt;strong&gt;prioritize resolution of a specific package&lt;/strong&gt; in the dependency graph (e.g. positive performance aspect of a package in a specific version/build)&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;score a package in a specific version occurring in the dependency graph negatively — &lt;strong&gt;prioritize resolution of other versions&lt;/strong&gt; (e.g. a security vulnerability present in a specific release)&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;prevent resolving a specific package&lt;/strong&gt; in a specific version so that resolver tries to find a different resolution path if any (e.g. buggy package releases)&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;These pipeline units form &lt;em&gt;autonomous pieces&lt;/em&gt; that know when they should be included in the resolution pipeline (thus be part of the "&lt;em&gt;pipeline configuration&lt;/em&gt;") and know when to perform certain actions during the actual resolution.&lt;/p&gt;

&lt;p&gt;A component called "&lt;em&gt;pipeline builder&lt;/em&gt;" adds pipeline units to the pipeline configuration based on the decision made by the pipeline unit itself. This is done during the phase which creates the pipeline configuration.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--ZD7NW7_A--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_66%2Cw_800/https://dev-to-uploads.s3.amazonaws.com/i/7gooadzbwg2ud4e89aac.gif" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--ZD7NW7_A--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_66%2Cw_800/https://dev-to-uploads.s3.amazonaws.com/i/7gooadzbwg2ud4e89aac.gif" alt="Alt Text" width="800" height="450"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;Creation of a resolution pipeline configuration by the pipeline builder. Image by the author.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;Once the resolution pipeline is built, it is used during the resolution process.&lt;/p&gt;

&lt;h1&gt;
  
  
  A software stack resolution process
&lt;/h1&gt;

&lt;p&gt;&lt;a href="https://dev.to/fridex/how-to-beat-python-s-pip-reinforcement-learning-based-dependency-resolution-2he2"&gt;In the last article, we have described a resolution process as a Markov decision process&lt;/a&gt;. This uncovered the potential to use reinforcement learning algorithms to come up with a suitable software stack candidate for applications.&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;Latest software is not always the greatest.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;&lt;a href="https://dev.to/fridex/how-to-beat-python-s-pip-reinforcement-learning-based-dependency-resolution-2he2"&gt;The last article described&lt;/a&gt; three main entities used during the resolution process:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Resolver&lt;/strong&gt; — an entity for resolving software following Python packaging specification&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Predictor&lt;/strong&gt; — an entity used for guiding the resolution in the dependency graph&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Software stack resolution pipeline&lt;/strong&gt; — an abstraction for scoring and adjusting the dependency graph&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The whole resolution process is then seen as a cooperation of the three described.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s---bYg6I3T--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_66%2Cw_800/https://dev-to-uploads.s3.amazonaws.com/i/jfcss0g16htfgvbdnrqx.gif" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s---bYg6I3T--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_66%2Cw_800/https://dev-to-uploads.s3.amazonaws.com/i/jfcss0g16htfgvbdnrqx.gif" alt="Alt Text" width="800" height="450"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;A resolution process guided by a predictor — magician. The fairy girl corresponds to the resolver which passes the predicted part of the dependency graph (a package) to the scoring pipeline. Results of the scoring pipeline (reward signal) are reported back to the predictor. Image by the author.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;The software stack resolution pipeline is formed out of units of a different type. Each one is serving its own purpose. An example can be pipeline units of type "Step" which map to an action that is taken in a Markov decision process.&lt;/p&gt;

&lt;p&gt;The resolver can be easily extended by &lt;a href="https://thoth-station.ninja/docs/developers/adviser/unit.html"&gt;providing pipeline units that follow semantics, API, and help with the software stack resolution process&lt;/a&gt;. The interface is simple so anyone can provide their own implementation and extend the resolution process with the provided knowledge. &lt;a href="https://dev.to/fridex/how-to-beat-python-s-pip-solving-python-dependencies-2d6e"&gt;The pre-aggregated knowledge of dependencies&lt;/a&gt; helps with the offline resolution so that the system can score hundreds of software stacks per second.&lt;/p&gt;

&lt;p&gt;&lt;iframe width="710" height="399" src="https://www.youtube.com/embed/OCX8JQDXP9s"&gt;
&lt;/iframe&gt;
&lt;/p&gt;

&lt;p&gt;The demo shown above demonstrates how pipeline units can be used in a resolution process to come up with a software stack that respects the pipeline configuration supplied. The resolution process finds a &lt;code&gt;intel-tensorflow==2.0.1&lt;/code&gt; software stack instead of the pinned &lt;code&gt;tensorflow==2.1.0&lt;/code&gt; as specified in the direct dependency listing. The notebook shown can be found in &lt;a href="https://github.com/thoth-station/notebooks/blob/master/notebooks/development/Pipeline%20units.ipynb"&gt;the linked repository&lt;/a&gt;.&lt;/p&gt;

&lt;h1&gt;
  
  
  Thoth adviser
&lt;/h1&gt;

&lt;p&gt;If you are interested in the resolution process and core principles used in the implementation, you can check &lt;a href="https://thoth-station.ninja/docs/developers/adviser/"&gt;thoth-adviser documentation&lt;/a&gt; and sources available on GitHub.&lt;/p&gt;

&lt;p&gt;Also, check other articles from the “How to beat Python’s pip” series:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;a href="https://dev.to/fridex/how-to-beat-python-s-pip-a-brief-intro-4bec"&gt;A brief intro&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://dev.to/fridex/how-to-beat-python-s-pip-solving-python-dependencies-2d6e"&gt;Solving Python dependencies&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://dev.to/fridex/how-to-beat-python-s-pip-dependency-monkey-inspecting-the-quality-of-tensorflow-dependencies-6fc"&gt;Dependency Monkey inspecting the quality of TensorFlow dependencies&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://dev.to/fridex/how-to-beat-python-s-pip-inspecting-the-quality-of-machine-learning-software-1pkp"&gt;Inspecting the quality of machine learning&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;h1&gt;
  
  
  Project Thoth
&lt;/h1&gt;

&lt;p&gt;Project Thoth is an application that aims to help Python developers. If you wish to be updated on any improvements and any progress we make in project Thoth, feel free to &lt;a href="https://www.youtube.com/channel/UClUIDuq_hQ6vlzmqM59B2Lw"&gt;subscribe to our YouTube channel&lt;/a&gt; where we post updates as well as recordings from scrum demos. We also &lt;a href="https://twitter.com/thothstation"&gt;post updates on Twitter&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;Stay tuned for any new updates!&lt;/p&gt;

</description>
      <category>python</category>
      <category>datascience</category>
      <category>opensource</category>
      <category>machinelearning</category>
    </item>
    <item>
      <title>How to beat Python’s pip: Reinforcement learning-based dependency resolution</title>
      <dc:creator>Fridolín Pokorný</dc:creator>
      <pubDate>Sat, 07 Nov 2020 18:09:27 +0000</pubDate>
      <link>https://dev.to/fridex/how-to-beat-python-s-pip-reinforcement-learning-based-dependency-resolution-2he2</link>
      <guid>https://dev.to/fridex/how-to-beat-python-s-pip-reinforcement-learning-based-dependency-resolution-2he2</guid>
      <description>&lt;p&gt;The next episode from our series will be more theoretical. It will prepare the ground for the next article that will conclude the things we discussed so far. We will take a look at Monte Carlo tree search, Temporal Difference learning, and Markov decision process and how they can be used in a resolution process.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fi%2F18y5aidzj9l2ggizrdw1.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fi%2F18y5aidzj9l2ggizrdw1.png" alt="Alt Text"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;Balos beach on Crete island, Greece. Image by the author.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;h1&gt;
  
  
  Markov decision process as a base for resolver
&lt;/h1&gt;

&lt;p&gt;First, let’s take a look at &lt;a href="https://en.wikipedia.org/wiki/Markov_decision_process" rel="noopener noreferrer"&gt;Markov decision process (MDP)&lt;/a&gt;. In a base, it provides us with a mathematical framework for modeling decision making (see &lt;a href="https://en.wikipedia.org/wiki/Markov_decision_process" rel="noopener noreferrer"&gt;more info in the linked Wikipedia article&lt;/a&gt;). To understand the decision-making process let’s apply it to the resolution process (other examples can be found on the Internet).&lt;/p&gt;

&lt;p&gt;Instead of implementing a resolver using &lt;a href="https://en.wikipedia.org/wiki/Boolean_satisfiability_problem" rel="noopener noreferrer"&gt;SAT&lt;/a&gt; or using &lt;a href="https://en.wikipedia.org/wiki/Backtracking" rel="noopener noreferrer"&gt;backtracking&lt;/a&gt;, we will try to walk through the dependency graph. In that case, the resolver will try to satisfy dependencies of an application considering version range specification for direct dependencies, and recursively considering the transitive ones until it finds a valid resolution (or it concludes there is none).&lt;/p&gt;

&lt;p&gt;At first, the resolver starts in an “initial state” which states all the requirements of an application to be included in the application stack. After a few rounds, it will end up in a state &lt;em&gt;sn&lt;/em&gt; which will hold two sets — a set of dependencies to be resolved and a set of dependencies already resolved and included in the application stack.&lt;/p&gt;

&lt;p&gt;In each state &lt;em&gt;sn&lt;/em&gt;, the resolver can take an &lt;em&gt;action&lt;/em&gt; that corresponds to making a &lt;em&gt;decision&lt;/em&gt; on which dependency should be included in the application stack. An example can be shown in the figure down below — the resolver can take the action to resolve &lt;code&gt;jinja2==2.10.2&lt;/code&gt; coming from the &lt;a href="https://pypi.org/" rel="noopener noreferrer"&gt;PyPI&lt;/a&gt;. By doing so, the resolver adds &lt;code&gt;jinja2==2.10.2&lt;/code&gt; to the resolved dependencies set and adds all the dependencies on which &lt;code&gt;jinja2==2.10.2&lt;/code&gt; directly depends on to the unresolved dependencies set (respecting the version range specification).&lt;/p&gt;

&lt;p&gt;As &lt;code&gt;jinja2==2.10.2&lt;/code&gt; can affect our application stack positively or negatively based on the knowledge we have about this dependency (e.g. build-time errors &lt;a href="https://dev.to/fridex/how-to-beat-python-s-pip-inspecting-the-quality-of-machine-learning-software-1pkp"&gt;spotted in our software inspections&lt;/a&gt;), we can respect this fact by propagating the "&lt;em&gt;reward signal&lt;/em&gt;" that corresponds to the action taken.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fi%2Fllhdjnjqaeeyrjbm4xto.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fi%2Fllhdjnjqaeeyrjbm4xto.png" alt="Alt Text"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;em&gt;Decision-making process — resolving dependencies by walking through the dependency graph and retrieving the reward signal. This process can be modeled as an MDP. Image by the author.&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;The cumulated reward signal across all the actions taken to find a final state (all the packages included in the software stack/computed lock file) then corresponds to the overall software stack quality (i.e. software stack score).&lt;/p&gt;

&lt;h1&gt;
  
  
  Reinforcement learning-based dependency resolution and abstractions
&lt;/h1&gt;

&lt;p&gt;The resolution process can be seen as communication between abstractions described below (following &lt;a href="https://en.wikipedia.org/wiki/Object-oriented_programming" rel="noopener noreferrer"&gt;object-oriented programming paradigm&lt;/a&gt;).&lt;/p&gt;

&lt;h2&gt;
  
  
  Resolver
&lt;/h2&gt;

&lt;p&gt;... is an abstraction that can resolve software stacks following rules (e.g. how dependencies should be resolved respecting Python packaging standards). It uses &lt;em&gt;Predictor&lt;/em&gt; to help with guiding which dependencies should be resolved while traversing the dependency graph (we do not need to resolve the latest packages necessarily). &lt;em&gt;Resolver&lt;/em&gt; also triggers the &lt;em&gt;Software stack resolution pipeline&lt;/em&gt; to compute the immediate reward signal that is subsequently forwarded to &lt;em&gt;Predictor&lt;/em&gt;.&lt;/p&gt;

&lt;h2&gt;
  
  
  Predictor
&lt;/h2&gt;

&lt;p&gt;… helps &lt;em&gt;Resolver&lt;/em&gt; to resolve software stacks — acts as a "&lt;em&gt;decision-maker&lt;/em&gt;". It selects dependencies that should be included in a software stack to deliver the required quality (e.g. stable software, secure software, latest software, ...). It learns what packages should be included in software by selecting them.&lt;/p&gt;

&lt;h2&gt;
  
  
  Software stack resolution pipeline
&lt;/h2&gt;

&lt;p&gt;... is a scoring pipeline that judges actions made by the &lt;em&gt;Predictor&lt;/em&gt;. The output of this pipeline is primarily the "&lt;em&gt;reward signal&lt;/em&gt;" that is computed in the pipeline units. This resolution pipeline can be dynamically constructed on each run to respect user needs (e.g. different pipeline units to deliver "&lt;em&gt;secure software&lt;/em&gt;" in opposite to "&lt;em&gt;well-performing software&lt;/em&gt;" given the operating system and hardware used to run the application).&lt;/p&gt;

&lt;h1&gt;
  
  
  Monte Carlo tree search and Temporal Difference learning in a resolution process
&lt;/h1&gt;

&lt;p&gt;&lt;a href="https://en.wikipedia.org/wiki/Simulated_annealing" rel="noopener noreferrer"&gt;Simulated annealing&lt;/a&gt; in its &lt;a href="https://en.wikipedia.org/wiki/Adaptive_simulated_annealing" rel="noopener noreferrer"&gt;adaptive form&lt;/a&gt; (ASA) was used as the first type of Predictor in the resolution process. Even though the ASA based predictor does not learn anything about the state space of possible software stacks, it gave a base for resolving software stacks that had many times higher quality than the "&lt;em&gt;latest&lt;/em&gt;" software (as resolved by Pipenv or pip).&lt;/p&gt;

&lt;p&gt;The next natural step for the resolution process was to learn actions taken during the resolution process. &lt;a href="https://en.wikipedia.org/wiki/Temporal_difference_learning" rel="noopener noreferrer"&gt;Temporal Difference learning&lt;/a&gt; and, later, &lt;a href="https://en.wikipedia.org/wiki/Monte_Carlo_tree_search" rel="noopener noreferrer"&gt;Monte Carlo tree search&lt;/a&gt; algorithm principles were used for implementing the next predictor types. As there is no real opponent to play against (as seen in RL based algorithms), formulas like &lt;em&gt;UCB1&lt;/em&gt; could not be applied in their direct form. To balance &lt;a href="https://en.wikipedia.org/wiki/Monte_Carlo_tree_search#Exploration_and_exploitation" rel="noopener noreferrer"&gt;exploration and exploitation&lt;/a&gt;, ideas from ASA were reused. &lt;em&gt;Time&lt;/em&gt; became the real opponent to keep the resolution process responsive enough to users. The number of software stacks successfully resolved so far or the number of iterations done in the resolver became attributes for balancing &lt;a href="https://en.wikipedia.org/wiki/Monte_Carlo_tree_search#Exploration_and_exploitation" rel="noopener noreferrer"&gt;exploration and exploitation&lt;/a&gt; (and possibly other attributes as well).&lt;/p&gt;

&lt;p&gt;&lt;iframe width="710" height="399" src="https://www.youtube.com/embed/WEJ65Rvj3lc"&gt;
&lt;/iframe&gt;
&lt;/p&gt;

&lt;p&gt;&lt;em&gt;A video that includes core principles of Python dependency resolution and reinforcement learning dependency resolution principles.&lt;/em&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;a href="https://github.com/thoth-station/talks/blob/master/2020-09-25-devconf-us/reinforcement_learning_based_dependency_resolution.pdf" rel="noopener noreferrer"&gt;a link to slides used during the talk&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The video presented above was &lt;a href="https://devconfus2020.sched.com/event/dkYC/reinforcement-learning-based-dependency-resolution" rel="noopener noreferrer"&gt;introduced as part of DevConf.US 2020&lt;/a&gt; and demonstrates these principles more in-depth. &lt;a href="https://www.devconf.info/" rel="noopener noreferrer"&gt;Check the linked annual event&lt;/a&gt; taking place in the USA, Czech Republic, and India each year (this year a virtual event). A lot of cool topics can be explored and you can also participate next year — the event is open as open-source.&lt;/p&gt;

&lt;p&gt;See &lt;a href="https://www.devconf.info/" rel="noopener noreferrer"&gt;devconf.info&lt;/a&gt; for more info.&lt;/p&gt;

&lt;h2&gt;
  
  
  Approximating maxima of an N-dimensional function using dimension addition and reinforcement learning
&lt;/h2&gt;

&lt;p&gt;See also "&lt;a href="https://towardsdatascience.com/approximating-maxima-of-an-n-dimensional-function-using-dimension-addition-cb79e910fa2b" rel="noopener noreferrer"&gt;Approximating maxima of an N-dimensional function using dimension addition and reinforcement learning&lt;/a&gt;" to get more insights from a slightly different perspective.&lt;/p&gt;




&lt;h1&gt;
  
  
  Project Thoth
&lt;/h1&gt;

&lt;p&gt;Project Thoth is an application that aims to help Python developers. If you wish to be updated on any improvements and any progress we make in project Thoth, feel free to subscribe to our YouTube channel where we post updates as well as recordings from scrum demos. Check also &lt;a href="https://twitter.com/thothstation" rel="noopener noreferrer"&gt;our Twitter&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;Stay tuned for any new updates!&lt;/p&gt;

</description>
      <category>python</category>
      <category>machinelearning</category>
      <category>datascience</category>
      <category>opensource</category>
    </item>
    <item>
      <title>micropipenv: the one installation tool that covers Pipenv, Poetry and pip-tools</title>
      <dc:creator>Fridolín Pokorný</dc:creator>
      <pubDate>Tue, 03 Nov 2020 21:06:21 +0000</pubDate>
      <link>https://dev.to/fridex/micropipenv-the-one-installation-tool-that-covers-pipenv-poetry-and-pip-tools-3ee7</link>
      <guid>https://dev.to/fridex/micropipenv-the-one-installation-tool-that-covers-pipenv-poetry-and-pip-tools-3ee7</guid>
      <description>&lt;p&gt;In this article, we will take a look at a tool called "&lt;a href="https://github.com/thoth-station/micropipenv" rel="noopener noreferrer"&gt;micropipenv&lt;/a&gt;". Its main goal is to serve as a common layer for installing Python dependencies as specified by &lt;a href="https://pypi.org/project/pip" rel="noopener noreferrer"&gt;pip&lt;/a&gt;, &lt;a href="https://pypi.org/project/pip-tools" rel="noopener noreferrer"&gt;pip-tools&lt;/a&gt;, &lt;a href="https://pypi.org/project/pipenv" rel="noopener noreferrer"&gt;Pipenv&lt;/a&gt; or &lt;a href="https://pypi.org/project/poetry" rel="noopener noreferrer"&gt;Poetry&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fi%2F4jq1tgxu0jewcjsjc116.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fi%2F4jq1tgxu0jewcjsjc116.png" alt="Alt Text"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;If you are a Python developer, you’ve probably experienced that dependency hell where you can easily end up in with Python’s packaging. A tool called &lt;code&gt;pip&lt;/code&gt; has been around for quite some time, but its implementation is not that sufficient if you develop or maintain any larger Python-based application.&lt;/p&gt;

&lt;h1&gt;
  
  
  pip-tools
&lt;/h1&gt;

&lt;p&gt;A project called &lt;a href="http://pypi.org/project/pip-tools" rel="noopener noreferrer"&gt;pip-tools&lt;/a&gt; tries to address the lack of dependency management in pip. It uses two text files - &lt;code&gt;requirements.in&lt;/code&gt; and &lt;code&gt;requirements.txt&lt;/code&gt;. The first file, &lt;code&gt;requirments.in&lt;/code&gt; managed by a user, states direct dependencies of an application with version range specifications. The latter one, &lt;code&gt;requirements.txt&lt;/code&gt; managed by &lt;code&gt;pip-tools&lt;/code&gt;, acts as a lock file stating all the packages in specific versions necessary to run an application (an analogy to &lt;code&gt;npm-shrinkwrap.json&lt;/code&gt; or &lt;code&gt;package-lock.json&lt;/code&gt; from the npm ecosystem).&lt;/p&gt;

&lt;p&gt;To use &lt;code&gt;pip-tools&lt;/code&gt;, you need to explicitly maintain a Python virtual environment (if you need one). Starting Python 3.3., you can issue the following command to create one and activate it:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;python3 -m venv venv/
source venv/bin/active
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;After you have created the virtual environment, commands &lt;code&gt;pip-compile&lt;/code&gt; and &lt;code&gt;pip-sync&lt;/code&gt; will become your friends.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fi%2Fghuqi26hct9ei00bttbd.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fi%2Fghuqi26hct9ei00bttbd.png" alt="Alt Text"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;pip-tools workflow as described in pip-tools package description available on PyPI&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;For dependencies needed to execute your test suite, &lt;code&gt;pip-tools&lt;/code&gt; introduced a convention using &lt;code&gt;dev-requirements.in&lt;/code&gt; and &lt;code&gt;dev-requirements.txt&lt;/code&gt;. The semantics can be analogically deduced.&lt;/p&gt;

&lt;h2&gt;
  
  
  A note to &lt;code&gt;setup.py&lt;/code&gt; and &lt;code&gt;requirments.txt&lt;/code&gt;
&lt;/h2&gt;

&lt;p&gt;You have probably seen &lt;code&gt;requirements.txt&lt;/code&gt; file used with &lt;code&gt;setup.py&lt;/code&gt;. Note the difference with &lt;code&gt;pip-tools&lt;/code&gt; that can be misleading for newcomers. If you want to publish a library, you don’t want to restrict all the versions by pinning transitive dependencies to specific versions as you don’t know how the resolved software stack will look like on an application side. The final resolution should always happen on the application level, not on the library level. You, as a library maintainer, just want to provide information about the compatibility of your library with dependencies used within the library.&lt;/p&gt;

&lt;h1&gt;
  
  
  Pipenv
&lt;/h1&gt;

&lt;p&gt;&lt;a href="https://twitter.com/kennethreitz" rel="noopener noreferrer"&gt;Kenneth Reitz&lt;/a&gt;, the author of one of the most popular Python library — &lt;a href="https://pypi.org/project/requests/" rel="noopener noreferrer"&gt;requests&lt;/a&gt;, introduced a project called &lt;a href="https://pipenv.pypa.io/en/latest/" rel="noopener noreferrer"&gt;Pipenv&lt;/a&gt; in &lt;a href="https://www.kennethreitz.org/essays/announcing-pipenv" rel="noopener noreferrer"&gt;January 2017&lt;/a&gt;. The project gained popularity and attention from the community very quickly.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fi%2Fjtqw85b2ugkv0047rxja.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fi%2Fjtqw85b2ugkv0047rxja.png" alt="Alt Text"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;Pipenv: Python Dev Workflow for Humans&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;The main aim of the project was to simplify dependency management and make it more user-friendly. It helped to maintain a virtual environment and manage dependencies with one single command, using newly introduced files &lt;code&gt;Pipfile&lt;/code&gt; (TOML) and &lt;code&gt;Pipfile.lock&lt;/code&gt; (JSON).&lt;/p&gt;

&lt;h1&gt;
  
  
  Pipenv in our team
&lt;/h1&gt;

&lt;p&gt;We, at Red Hat, adopted Pipenv and made it one of the options for dependency management during deployment in the &lt;a href="https://github.com/openshift/source-to-image" rel="noopener noreferrer"&gt;OpenShift’s Source-To-Image&lt;/a&gt; build process (see for example &lt;a href="https://github.com/sclorg/s2i-python-container/tree/master/3.7" rel="noopener noreferrer"&gt;Fedora-based Python container images&lt;/a&gt; or &lt;a href="https://github.com/thoth-station/s2i-thoth" rel="noopener noreferrer"&gt;Thoth’s Python s2i container images&lt;/a&gt;). It gained popularity also in our &lt;a href="https://github.com/aicoe" rel="noopener noreferrer"&gt;AICoE&lt;/a&gt; and &lt;a href="https://github.com/thoth-station" rel="noopener noreferrer"&gt;Thoth team&lt;/a&gt; where de-facto all the repositories with Python source code use Pipenv for dependency management.&lt;/p&gt;

&lt;h1&gt;
  
  
  Poetry
&lt;/h1&gt;

&lt;p&gt;Another option is to use &lt;a href="http://pypi.org/project/poetry" rel="noopener noreferrer"&gt;Poetry&lt;/a&gt;. It looks like Poetry attracted the Python community, especially during the silent phase of Pipenv. Poetry uses Pipenv: Python Dev Workflow for Humans&lt;br&gt;
a different type of lock format than Pipenv and is not compatible with Pipenv or pip at all.&lt;/p&gt;
&lt;h1&gt;
  
  
  micropipenv
&lt;/h1&gt;

&lt;p&gt;Some of the requirements and ideas lead to introducing a new tool called &lt;a href="http://pypi.org/project/micropipenv" rel="noopener noreferrer"&gt;micropipenv&lt;/a&gt;. No, there is no intention to introduce another &lt;code&gt;pip&lt;/code&gt;, &lt;code&gt;pip-tools&lt;/code&gt;, &lt;code&gt;Poetry&lt;/code&gt; or &lt;code&gt;Pipenv&lt;/code&gt;.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fi%2Fienmpp1jyvupsivjyms1.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fi%2Fienmpp1jyvupsivjyms1.png" alt="Alt Text"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;&lt;a href="https://xkcd.com/1987/" rel="noopener noreferrer"&gt;https://xkcd.com/1987/&lt;/a&gt;&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;This lightweight implementation (one file, around 1200 LOC with comments) has one optional dependency and can install your requirements from files that are managed using Pipenv, Poetry, pip-tools, or simple &lt;code&gt;requirements.txt&lt;/code&gt; file as commonly used together with &lt;code&gt;setup.py&lt;/code&gt;. The only thing you need to do is to issue:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;micropipenv install
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The tool will automatically detect what type of requirements file or lock file you use for managing your dependencies and performs the desired installation.&lt;/p&gt;

&lt;p&gt;Moreover, the tool offers a simple conversion feature where any lock file can be transformed to &lt;code&gt;requirements.txt&lt;/code&gt; and/or &lt;code&gt;requirements.in&lt;/code&gt;.&lt;/p&gt;

&lt;p&gt;To produce a pip-tools style &lt;code&gt;requirements.in&lt;/code&gt; and &lt;code&gt;requirements.txt&lt;/code&gt; you can simply perform the following commands, assuming you have Pipenv or Poetry lock files present in the directory:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;micropipenv requirements --no-dev &amp;gt; requirements.txt
micropipenv requirements --no-dev --only-direct &amp;gt; requirements.in
micropipenv requirements --no-default &amp;gt; dev-requirements.txt
micropipenv requirements --no-default --only-direct &amp;gt; dev-requirements.in
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;If you just want to install dependencies, you don’t need to install micropipenv at all. You can just simply download it and let it do its one time job:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;curl https://raw.githubusercontent.com/thoth-station/micropipenv/master/micropipenv.py | python3 - install
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;See &lt;a href="https://github.com/thoth-station/micropipenv" rel="noopener noreferrer"&gt;documentation and sources&lt;/a&gt; for more info.&lt;/p&gt;

&lt;h1&gt;
  
  
  Thoth
&lt;/h1&gt;

&lt;p&gt;One of the ongoing efforts in Red Hat’s Office of the CTO is a project called Thoth. The newly introduced tool in this article, micropipenv, was born in this project. Check one of our publicly available demos to see micropipenv in action (the &lt;a href="https://www.youtube.com/watch?v=I-QC83BcLuo&amp;amp;t=9m" rel="noopener noreferrer"&gt;micropipenv demo starts at 9:00&lt;/a&gt;):&lt;/p&gt;

&lt;p&gt;&lt;iframe width="710" height="399" src="https://www.youtube.com/embed/I-QC83BcLuo"&gt;
&lt;/iframe&gt;
&lt;/p&gt;

&lt;p&gt;You can follow our &lt;a href="https://www.youtube.com/channel/UClUIDuq_hQ6vlzmqM59B2Lw" rel="noopener noreferrer"&gt;YouTube channel&lt;/a&gt; for more updates.&lt;/p&gt;




&lt;h1&gt;
  
  
  Why micropipenv?
&lt;/h1&gt;

&lt;p&gt;The main reason behind micropipenv was to reduce the maintenance cost of Pipenv during its silent phase. However, it turned out to be a good idea to have such a minimalistic tool for installing dependencies, especially when it comes to containerized applications. The main advantage of micropipenv turned out to be its size.&lt;br&gt;
When deploying applications in containerized environments, it’s a really good idea to maintain a lock file for the application. As the lock file states the whole dependency stack already resolved, there is no reason why there needs to be shipped Poetry or Pipenv in the container image. A tool that just installs the dependencies from any lock file supplied seems to be like a minimalistic way to go to reduce container image size and software present in it (and thus shipped with the application).&lt;/p&gt;

&lt;p&gt;&lt;a href="https://github.com/fridex/s2i-example-micropipenv" rel="noopener noreferrer"&gt;A simple size comparison done a while back showed approximately 30.4MiB difference&lt;/a&gt; when Pipenv was not installed into the containerized environment in comparison to a single file approach using micropipenv.&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;micropipenv can read in &lt;code&gt;Pipfile.lock&lt;/code&gt;, &lt;code&gt;requirements.txt&lt;/code&gt; or &lt;code&gt;poetry.lock&lt;/code&gt; stating already resolved software stack and install it using pip.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;The design of the CLI made micropipenv a straightforward tool to make a compatibility layer between all popular Python dependency management tools available out there in the open-source world.&lt;/p&gt;




&lt;h1&gt;
  
  
  Project Thoth
&lt;/h1&gt;

&lt;p&gt;Project Thoth is an application that aims to help Python developers. If you wish to be updated on any improvements and any progress we make in project Thoth, feel free to subscribe to our &lt;a href="https://www.youtube.com/channel/UClUIDuq_hQ6vlzmqM59B2Lw" rel="noopener noreferrer"&gt;YouTube channel&lt;/a&gt; where we post updates as well as recordings from scrum demos. We also have a &lt;a href="https://twitter.com/thothstation" rel="noopener noreferrer"&gt;Twitter account&lt;/a&gt;.&lt;/p&gt;

</description>
      <category>python</category>
    </item>
    <item>
      <title>How to beat Python’s pip: Dependency Monkey inspecting the quality of TensorFlow dependencies</title>
      <dc:creator>Fridolín Pokorný</dc:creator>
      <pubDate>Sun, 01 Nov 2020 12:24:58 +0000</pubDate>
      <link>https://dev.to/fridex/how-to-beat-python-s-pip-dependency-monkey-inspecting-the-quality-of-tensorflow-dependencies-6fc</link>
      <guid>https://dev.to/fridex/how-to-beat-python-s-pip-dependency-monkey-inspecting-the-quality-of-tensorflow-dependencies-6fc</guid>
      <description>&lt;p&gt;In this article, we will &lt;a href="https://dev.to/fridex/how-to-beat-python-s-pip-inspecting-the-quality-of-machine-learning-software-1pkp"&gt;continue inspecting the quality of the software&lt;/a&gt;. Instead of selecting packages to be checked manually, we will use a component called "&lt;em&gt;Dependency Monkey&lt;/em&gt;" which can resolve software stacks following programmed rules and verify the application correctness.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--cGYxyL4P--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_800/https://dev-to-uploads.s3.amazonaws.com/i/f3jpw0k42skkp0vrnhue.png" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--cGYxyL4P--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_800/https://dev-to-uploads.s3.amazonaws.com/i/f3jpw0k42skkp0vrnhue.png" alt="Alt Text" width="800" height="484"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;em&gt;Neurathen Castle located in the Bastei rocks near Rathen in Saxon Switzerland, Germany. Image by the author.&lt;/em&gt;&lt;/p&gt;

&lt;h1&gt;
  
  
  Why different combinations of packages?
&lt;/h1&gt;

&lt;p&gt;In the &lt;a href="https://dev.to/fridex/how-to-beat-python-s-pip-inspecting-the-quality-of-machine-learning-software-1pkp"&gt;previous article&lt;/a&gt;, but mainly &lt;a href="https://dev.to/fridex/how-to-beat-python-s-pip-a-brief-intro-4bec"&gt;in the introductory article to "How to beat Python’s pip" series&lt;/a&gt;, we have described a state space of all the possible software stacks that can be resolved for an application stack given the requirements on libraries. Each resolved software stack in such state space can be scored by a scoring function that can compute "&lt;em&gt;how good the given software is&lt;/em&gt;". In the figure below, we can see an interpolated scoring function for resolved software stacks created out of two libraries &lt;code&gt;simplelib&lt;/code&gt; and &lt;code&gt;anotherlib&lt;/code&gt;.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--63OkWj9a--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_800/https://dev-to-uploads.s3.amazonaws.com/i/suq3tzu6vyn3wsiozqqo.png" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--63OkWj9a--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_800/https://dev-to-uploads.s3.amazonaws.com/i/suq3tzu6vyn3wsiozqqo.png" alt="Alt Text" width="800" height="440"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;The image above shows an interpolated score function for a state-space made when installing two dependencies "&lt;code&gt;simplelib&lt;/code&gt;" and "&lt;code&gt;anotherlib&lt;/code&gt;" in different versions (valid combinations of different versions installed together).&lt;/p&gt;

&lt;p&gt;The interpolated function above shows a score for two-dimensional state space (one dimension for each package). As we add more packages to an application, this state space is becoming larger and larger (especially considering transitive dependencies that need to be added as well to have a valid software stack).&lt;/p&gt;

&lt;p&gt;For real-world applications, we can very easily get tens of dimensions (e.g. by installing &lt;code&gt;tensorflow==2.3.0&lt;/code&gt; we include 36 distinct packages in different versions, thus 36 dimensions plus one dimension for the scoring function). These dimensions introduce distinct input features that affect application behavior as reflected by the scoring function. As we already know &lt;a href="https://dev.to/fridex/how-to-beat-python-s-pip-inspecting-the-quality-of-machine-learning-software-1pkp"&gt;based on our last article&lt;/a&gt;, any issue in any of these packages can introduce a problem in our application (run time or build time).&lt;/p&gt;

&lt;p&gt;All the possible versions (all the possible 36-dimensional vectors following our example) are impossible to test in a reasonable time and thus require some smart picking which versions should be included in the final resolved stack. One slicing mechanism is the actual resolver — it can slice possible resolutions respecting version range specifications of packages in the dependency graph. But how do we limit the number of possible stacks to a reasonable sample even more?&lt;/p&gt;

&lt;p&gt;&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--x89Q-mvN--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_800/https://dev-to-uploads.s3.amazonaws.com/i/ygf6ojywd9cdj2hicf4d.png" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--x89Q-mvN--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_800/https://dev-to-uploads.s3.amazonaws.com/i/ygf6ojywd9cdj2hicf4d.png" alt="Alt Text" width="133" height="129"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;em&gt;Packages "B" in versions &amp;lt;1.5.0 will be removed based on resolver — they are not valid resolutions following the version range specification of package "A". Hence, they will limit the size of the corresponding feature "B".&lt;/em&gt;&lt;/p&gt;

&lt;h1&gt;
  
  
  A smart offline resolver
&lt;/h1&gt;

&lt;p&gt;Besides removing packages based on version range specification in the resolver, a component called &lt;em&gt;Dependency Monkey&lt;/em&gt; is capable of using "&lt;em&gt;pipeline units&lt;/em&gt;". The whole resolution process is treated as a pipeline made out of pipeline units of different types that decide whether packages should be considered during the resolution. In other words, if resolved stacks formed out of selected packages should be inspected.&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;An example can be an inspection of a TensorFlow software stack. If we want to test a specific TensorFlow with &lt;a href="https://pypi.org/project/numpy"&gt;NumPy&lt;/a&gt; versions for compatibility, we can skip already tested software stack combinations (e.g. based on the queries to our database with previous test results).&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;Pipeline units create a programmable interface to the resolver which can act based on pipeline units decisions.&lt;/p&gt;

&lt;h1&gt;
  
  
  Amun inspections: revisited
&lt;/h1&gt;

&lt;p&gt;In the previous article called "&lt;a href="https://dev.to/fridex/how-to-beat-python-s-pip-inspecting-the-quality-of-machine-learning-software-1pkp"&gt;How to beat Python’s pip: Inspecting the quality of machine learning software&lt;/a&gt;" we introduced a service called &lt;a href="https://github.com/thoth-station/amun-api/"&gt;Amun&lt;/a&gt; that can run software respecting a specification that states how the application is assembled and run. Besides information about the operating system or hardware used, it accepts also a list of packages that should be installed in order to build and run the software.&lt;/p&gt;

&lt;p&gt;As &lt;em&gt;Dependency Monkey&lt;/em&gt; can resolve Python software stacks, it becomes one of the users of the Amun service. Simply said, if a &lt;em&gt;Dependency Monkey&lt;/em&gt; resolves a Python software stack which it considers as a valid candidate for testing, it submits it to &lt;a href="https://github.com/thoth-station/amun-api/"&gt;Amun&lt;/a&gt; to inspect its quality.&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;We use "quality" to describe a certain aspect of the software. One such quality aspect can be performance or other runtime behavior. The fact an application fails to build is also an indicator of the software stack quality.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;h1&gt;
  
  
  Dependency Monkey’s resolution pipeline
&lt;/h1&gt;

&lt;p&gt;One can see &lt;em&gt;Dependency Monkey&lt;/em&gt; as a resolver that accepts an input vector and resolves one or multiple software stacks considering the input vector and an aggregated knowledge about the software and packages forming the software stacks. This aggregated knowledge can accumulate information about packages or package combinations seen in the software stacks.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--M-O8gU6h--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_800/https://dev-to-uploads.s3.amazonaws.com/i/1dro18wivuoc42j7i799.png" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--M-O8gU6h--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_800/https://dev-to-uploads.s3.amazonaws.com/i/1dro18wivuoc42j7i799.png" alt="Alt Text" width="800" height="262"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Dependency Monkey is formed out of pipeline units that help to resolve Python software stacks based on the input vector considering the knowledge base.&lt;/p&gt;

&lt;h1&gt;
  
  
  Checking different package combinations in TensorFlow stacks
&lt;/h1&gt;

&lt;p&gt;Let’s check some dependencies of a TensorFlow stack (I used TensorFlow in version &lt;a href="https://pypi.org/project/tensorflow/2.1.0/"&gt;2.1.0&lt;/a&gt;, the dependency listing will differ across versions). If we take a &lt;a href="https://dev.to/fridex/how-to-beat-python-s-pip-solving-python-dependencies-2d6e"&gt;look at the direct dependencies of TensorFlow&lt;/a&gt;, we will find packages such as &lt;a href="https://pypi.org/project/h5py/"&gt;h5py&lt;/a&gt;, &lt;a href="https://pypi.org/project/opt-einsum/"&gt;opt-einsum&lt;/a&gt;, &lt;a href="https://pypi.org/project/scipy/"&gt;scipy&lt;/a&gt;, &lt;a href="https://pypi.org/project/Keras-Preprocessing/"&gt;Keras-Preprocessing&lt;/a&gt;, and &lt;a href="https://pypi.org/project/tensorboard/"&gt;tensorboard&lt;/a&gt; in specific versions. They share a common dependency &lt;a href="https://pypi.org/project/numpy/"&gt;NumPy&lt;/a&gt;, a direct dependency of TensorFlow itself (see &lt;a href="https://gist.github.com/fridex/681605f0bcf6437a71c5ed64883e0a24"&gt;this GitHub gist for the listing&lt;/a&gt; that can change over time with new package releases). All the packages stated can be installed in different versions, which can have different version range requirements on NumPy. The actual version of NumPy installed depends on the resolver and the resolution process that can take into account also other libraries that the user requested to install (besides TensorFlow as a single direct dependency). It’s worth to pinpoint here that any issue in NumPy (even incompatibilities introduced by &lt;em&gt;overpinning&lt;/em&gt; or &lt;em&gt;underpinning&lt;/em&gt;) can &lt;a href="https://dev.to/fridex/how-to-beat-python-s-pip-inspecting-the-quality-of-machine-learning-software-1pkp"&gt;lead to a broken application&lt;/a&gt;. So let’s try to test the TensorFlow stack with different combinations of NumPy.&lt;/p&gt;

&lt;p&gt;In the upcoming video, you can see a brief walk-through on Dependency Monkey together with a service called &lt;a href="https://towardsdatascience.com/how-to-beat-pythons-pip-inspecting-the-quality-of-machine-learning-software-f1a028f0c42a"&gt;Amun&lt;/a&gt;. In the first part of the demo (&lt;a href="https://www.youtube.com/watch?v=S3hFn8KRsKc&amp;amp;t=19m25s"&gt;starting at 19:25&lt;/a&gt;), &lt;em&gt;Dependency Monkey&lt;/em&gt; resolves software stacks considering aggregated knowledge (one of such knowledge is &lt;a href="https://dev.to/fridex/how-to-beat-python-s-pip-solving-python-dependencies-2d6e"&gt;dependency information needed during the resolution&lt;/a&gt;) and &lt;a href="https://dev.to/fridex/how-to-beat-python-s-pip-inspecting-the-quality-of-machine-learning-software-1pkp"&gt;submits these software stacks to Amun to inspect the quality of the software&lt;/a&gt;. The tested software stack is TensorFlow in version 2.1.0, using the &lt;a href="https://pypi.org/project/tensorflow/"&gt;build published on PyPI&lt;/a&gt;, with different combinations of NumPy resolved (the whole application stack is formed with packages in the same package version but NumPy versions get adjusted respecting the dependency graph).&lt;/p&gt;

&lt;p&gt;&lt;iframe width="710" height="399" src="https://www.youtube.com/embed/S3hFn8KRsKc"&gt;
&lt;/iframe&gt;
&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;A note to video: Dependencies that should be locked could be also stated in the direct dependency listing. Note however that by doing so, the dependency will always be present in all the stacks, even though it would not be used and could affect the dependency graph. That’s why pinning of dependencies is performed on a unit level.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;The second part of the demo (&lt;a href="https://www.youtube.com/watch?v=S3hFn8KRsKc&amp;amp;t=28m13s"&gt;starting at 28:13&lt;/a&gt;) shows &lt;em&gt;Dependency Monkey&lt;/em&gt; resolution that randomly samples the state space of all the possible TensorFlow stacks. As we already know, this state space is too large thus checking all the combinations is impossible in a reasonable time. &lt;em&gt;Dependency Monkey&lt;/em&gt; randomly generates software stacks that are valid resolutions of TensorFlow software and submits them to Amun which verifies the software stack builds and runs correctly.&lt;/p&gt;

&lt;p&gt;Such random state space sampling can spot issues. One such interesting issue in TensorFlow 2.1 stack is a dependency &lt;code&gt;urllib3&lt;/code&gt; that, when installed in a specific version, can cause runtime errors on TensorFlow imports. See &lt;a href="https://thoth-station.ninja/j/tf_21_urllib3"&gt;this document for a detailed overview&lt;/a&gt;. Note the version installed can depend also on other libraries that an application can use besides TensorFlow so there can be affected applications by this issue.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;a href="https://github.com/thoth-station/dependency-monkey-zoo/tree/master/tensorflow/inspection-2020-09-07"&gt;A link to Dependency Monkey configuration for demo part 1&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://github.com/thoth-station/dependency-monkey-zoo/tree/master/tensorflow/inspection-2020-09-04"&gt;A link to Dependency Monkey configuration for demo part 2&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://thoth-station.ninja/j/tf_21_urllib3"&gt;A link to the issue spotted with urllib3 and TensorFlow&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;




&lt;h1&gt;
  
  
  Project Thoth
&lt;/h1&gt;

&lt;p&gt;Project Thoth is an application that aims to help Python developers. If you wish to be updated on any improvements and any progress we make in project Thoth, feel free to subscribe to our &lt;a href="https://www.youtube.com/channel/UClUIDuq_hQ6vlzmqM59B2Lw"&gt;YouTube channel&lt;/a&gt; where we post updates as well as recordings from scrum demos. Check also &lt;a href="https://twitter.com/thothstation"&gt;our Twitter&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;Stay tuned for any updates!&lt;/p&gt;

</description>
      <category>python</category>
      <category>datascience</category>
      <category>machinelearning</category>
      <category>opensource</category>
    </item>
    <item>
      <title>How to beat Python’s pip: Inspecting the quality of machine learning software</title>
      <dc:creator>Fridolín Pokorný</dc:creator>
      <pubDate>Mon, 26 Oct 2020 17:57:50 +0000</pubDate>
      <link>https://dev.to/fridex/how-to-beat-python-s-pip-inspecting-the-quality-of-machine-learning-software-1pkp</link>
      <guid>https://dev.to/fridex/how-to-beat-python-s-pip-inspecting-the-quality-of-machine-learning-software-1pkp</guid>
      <description>&lt;p&gt;Following &lt;a href="https://dev.to/fridex/how-to-beat-python-s-pip-solving-python-dependencies-2d6e"&gt;the previous article written about solving Python dependencies&lt;/a&gt;, we will take a look at the quality of software. This article will cover "inspections" of software stacks and will link a free dataset available on Kaggle. Even though the title says the quality of "machine learning software", principles and ideas can be reused for inspecting any software quality.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--Ws55iauK--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_800/https://dev-to-uploads.s3.amazonaws.com/i/pe2j62vrk9lsa4674wzo.jpeg" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--Ws55iauK--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_800/https://dev-to-uploads.s3.amazonaws.com/i/pe2j62vrk9lsa4674wzo.jpeg" alt="Alt Text" width="800" height="600"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h1&gt;
  
  
  Application (Software &amp;amp; Hardware) Stack
&lt;/h1&gt;

&lt;p&gt;Let’s consider a Python machine learning application. This application can use a machine learning library, such as &lt;a href="https://www.tensorflow.org/"&gt;TensorFlow&lt;/a&gt;. TensorFlow is in that case a direct dependency of the application and by installing it, the machine learning application is using directly TensorFlow and indirectly dependencies of TensorFlow. Examples of such indirect dependencies of our application can be &lt;a href="https://pypi.org/project/numpy/"&gt;NumPy&lt;/a&gt; or &lt;a href="https://pypi.org/project/absl-py/"&gt;absl-py&lt;/a&gt; that are used by TensorFlow.&lt;/p&gt;

&lt;p&gt;Our machine learning Python application and all the Python libraries run on top of a &lt;a href="https://developers.redhat.com/blog/2020/06/25/red-hat-enterprise-linux-8-2-brings-faster-python-3-8-run-speeds/"&gt;Python interpreter in some specific version&lt;/a&gt;. Moreover, they can use other additional native dependencies (provided by the operating system) such as &lt;a href="https://en.wikipedia.org/wiki/GNU_C_Library"&gt;glibc&lt;/a&gt; or &lt;a href="https://en.wikipedia.org/wiki/CUDA"&gt;CUDA&lt;/a&gt; (if running computations on GPU). To visualize this fact, let’s create a stack with all the items creating the application stack running on top of some hardware.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--Eb8dSxlB--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_800/https://dev-to-uploads.s3.amazonaws.com/i/q22wuj0e5uhskskonyzo.png" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--Eb8dSxlB--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_800/https://dev-to-uploads.s3.amazonaws.com/i/q22wuj0e5uhskskonyzo.png" alt="Alt Text" width="428" height="474"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Note that an issue in any of the described layers causes that our Python application misbehaves, produces wrong output, produces runtime errors, or simply does not start at all.&lt;/p&gt;

&lt;p&gt;Let’s try to identify any possible issues in the described stack by building the software and let’s have it running on our hardware. By doing so we can spot possible issues before pushing our application to a production environment or fine-tune the software so that we get the best possible out of our application on the hardware available.&lt;/p&gt;

&lt;h1&gt;
  
  
  On-demand software stack creation
&lt;/h1&gt;

&lt;p&gt;If our application depends on a TensorFlow release starting version &lt;a href="https://pypi.org/project/tensorflow/2.0.0/"&gt;2.0.0&lt;/a&gt; (e.g. requirements on API offered by &lt;code&gt;tensorflow&amp;gt;=2.0.0&lt;/code&gt;), we can test our application with different versions of TensorFlow up to the current &lt;a href="https://pypi.org/project/tensorflow/#history"&gt;2.3.0 release available on PyPI to this date&lt;/a&gt;. The same can be applied to transitive dependencies of TensorFlow, e.g. &lt;a href="https://pypi.org/project/absl-py/"&gt;absl-py&lt;/a&gt;, &lt;a href="https://pypi.org/project/numpy/"&gt;NumPy&lt;/a&gt;, or any other. A version change of any transitive dependency can be performed analogically to any other dependency in our software stack.&lt;/p&gt;

&lt;h2&gt;
  
  
  Dependency Monkey
&lt;/h2&gt;

&lt;p&gt;Note one version change can completely change (or even invalidate) what dependencies in what versions will be present in the application stack considering the dependency graph and version range specifications of libraries present in the software stack. To create a pinned down list of packages in specific versions to be installed a resolver needs to be run in order to resolve packages and their version range requirements.&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;Do you remember &lt;a href="https://dev.to/fridex/how-to-beat-python-s-pip-a-brief-intro-4bec"&gt;the state space described in the first article of "How to beat Python’s pip" series&lt;/a&gt;? Dependency Monkey can in fact create the state space of all the possible software stacks that can be resolved respecting version range specifications. If the state space is too large to resolve in a reasonable time, it can be sampled.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--bByppDl4--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_800/https://dev-to-uploads.s3.amazonaws.com/i/bnair2neojhx4lljqb24.png" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--bByppDl4--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_800/https://dev-to-uploads.s3.amazonaws.com/i/bnair2neojhx4lljqb24.png" alt="Alt Text" width="700" height="385"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;A component called "Dependency Monkey" is capable of creating different software stacks considering the dependency graph and version specifications of packages in the dependency graph. This all is done offline based on pre-computed results from &lt;a href="https://dev.to/fridex/how-to-beat-python-s-pip-solving-python-dependencies-2d6e"&gt;Thoth’s solver runs (see the previous article from "How to beat Python’s pip" series)&lt;/a&gt;. The results of solver runs are synced into Thoth’s database so that they are available in a query-able form. Doing so enables Dependency Monkey to resolve software stacks at a fast pace (see a &lt;a href="https://www.youtube.com/watch?v=p0ECHhrxEq0"&gt;YouTube video on optimizing Thoth’s resolver&lt;/a&gt;). Moreover, the underlying algorithm can consider Python packages published on different Python indices (&lt;a href="https://pypi.org/"&gt;besides PyPI&lt;/a&gt;, it can also use &lt;a href="https://tensorflow.pypi.thoth-station.ninja/"&gt;custom TensorFlow builds from an index such as the AICoE one&lt;/a&gt;). We will do a more in-depth explanation of Dependency Monkey in one of the upcoming articles. If you are too eager, &lt;a href="https://thoth-station.ninja/docs/developers/adviser/dependency_monkey.html"&gt;feel free to browse its online documentation&lt;/a&gt;.&lt;/p&gt;

&lt;h2&gt;
  
  
  Amun API
&lt;/h2&gt;

&lt;p&gt;Now, let’s utilize a service called "&lt;a href="https://github.com/thoth-station/amun-api"&gt;Amun&lt;/a&gt;". This service was designed to accept a specification of the software stack and hardware and execute an application given the specification.&lt;/p&gt;

&lt;p&gt;Amun is an &lt;a href="https://www.openshift.com/"&gt;OpenShift&lt;/a&gt; cluster native application, that utilizes OpenShift features (such as builds, container image registry, …) and &lt;a href="https://argoproj.github.io/"&gt;Argo Workflows&lt;/a&gt; to run desired software on specific hardware using a specific software environment. The specification is accepted in a JSON format that is subsequently translated into respective steps that need to be done in order to test the given stack build and run.&lt;/p&gt;

&lt;p&gt;&lt;iframe width="710" height="399" src="https://www.youtube.com/embed/yeBjnZpdMwY"&gt;
&lt;/iframe&gt;
&lt;/p&gt;

&lt;p&gt;The video linked above shows how Amun inspections are run and how the knowledge created is aggregated using OpenShift, Argo workflows, and Ceph. You can see inspected different TensorFlow builds &lt;code&gt;tensorflow&lt;/code&gt;, &lt;code&gt;tensorflow-cpu&lt;/code&gt;, &lt;code&gt;intel-tensorflow&lt;/code&gt; and a &lt;a href="https://tensorflow.pypi.thoth-station.ninja/"&gt;community builds of TensorFlow for AVX2 instruction set support available on the AICoE index&lt;/a&gt;.&lt;/p&gt;

&lt;h1&gt;
  
  
  Thoth’s inspection dataset on Kaggle
&lt;/h1&gt;

&lt;p&gt;We (Red Hat) have produced multiple inspections as part of the project Thoth where we tested different TensorFlow releases and different TensorFlow builds.&lt;/p&gt;

&lt;p&gt;One such dataset is &lt;a href="https://www.kaggle.com/thothstation/thoth-performance-dataset-v10"&gt;Thoth’s performance data set in version 1 on Kaggle&lt;/a&gt;. It’s consisting out of nearly 4000 files capturing information about inspection runs of TensorFlow stacks. &lt;a href="https://www.kaggle.com/kerneler/starter-thoth-performance-dataset-v1-0-96cc92dd-0"&gt;A notebook published together with the dataset&lt;/a&gt; can help one exploring the dataset.&lt;/p&gt;

&lt;p&gt;&lt;iframe width="710" height="399" src="https://www.youtube.com/embed/_tZo7eIOzJI"&gt;
&lt;/iframe&gt;
&lt;/p&gt;

&lt;h1&gt;
  
  
  Project Thoth
&lt;/h1&gt;

&lt;p&gt;Project Thoth is an application that aims to help Python developers. If you wish to be updated on any improvements and any progress we make in project Thoth, feel free to &lt;a href="https://www.youtube.com/channel/UClUIDuq_hQ6vlzmqM59B2Lw"&gt;subscribe to our YouTube channel&lt;/a&gt; where we post updates as well as recordings from scrum demos. Reach out to &lt;a href="http://twitter.com/thothstation"&gt;our Twitter&lt;/a&gt; as well if you want to be informed about new stuff.&lt;/p&gt;

&lt;p&gt;Stay tuned for any updates!&lt;/p&gt;

</description>
      <category>python</category>
      <category>machinelearning</category>
      <category>opensource</category>
      <category>datascience</category>
    </item>
    <item>
      <title>How to beat Python’s pip: Solving Python dependencies</title>
      <dc:creator>Fridolín Pokorný</dc:creator>
      <pubDate>Sat, 17 Oct 2020 18:59:09 +0000</pubDate>
      <link>https://dev.to/fridex/how-to-beat-python-s-pip-solving-python-dependencies-2d6e</link>
      <guid>https://dev.to/fridex/how-to-beat-python-s-pip-solving-python-dependencies-2d6e</guid>
      <description>&lt;p&gt;&lt;a href="https://dev.to/fridex/how-to-beat-python-s-pip-a-brief-intro-4bec"&gt;In the previous blog post&lt;/a&gt;, I’ve mentioned the start of the blog series I plan to publish about Python dependencies and how to deal with them. This article is the second one coming out of this series and it will cover obtaining information about Python dependencies. You’ll also gain access to our dataset published on Kaggle.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fi%2F3trhqvxb0hpl2ze0kbzl.jpeg" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fi%2F3trhqvxb0hpl2ze0kbzl.jpeg" alt="Alt Text"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h1&gt;
  
  
  Python dependencies
&lt;/h1&gt;

&lt;p&gt;Python’s packaging allows specifying dependencies &lt;a href="https://packaging.python.org/tutorials/packaging-projects/#creating-setup-py" rel="noopener noreferrer"&gt;using a &lt;code&gt;setup.py&lt;/code&gt; script&lt;/a&gt;. This script states all the metadata used by Python’s packaging to let seamlessly install Python distributions into environments. It has a pretty nice and straightforward structure and allows you to programmatically define all the needed bits when packaging your Python package.&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;Another popular way how to write your Python package metadata is the &lt;code&gt;setup.cfg&lt;/code&gt; file. In opposite to &lt;code&gt;setup.py&lt;/code&gt;, this file is not a Python source code but a static configuration file. &lt;a href="https://docs.python.org/3/distutils/configfile.html" rel="noopener noreferrer"&gt;Refer to distutils documentation for more info&lt;/a&gt;.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;Having an ability to use a Python script to define all the packaging metadata offers great power. You can code basically anything you want and adjust your package metadata as desired during the &lt;code&gt;setup.py&lt;/code&gt; invocation. But as usual:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;With great power comes great responsibility.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;h1&gt;
  
  
  Checking dependencies of a Python package
&lt;/h1&gt;

&lt;p&gt;If you visit &lt;a href="https://pypi.org/project/hexsticker/" rel="noopener noreferrer"&gt;any project page on PyPI&lt;/a&gt;, the Python Package Index, you’ll notice there are no dependency information. The great power behind the &lt;code&gt;setup.py&lt;/code&gt; script is the main reason why there are no dependencies — if the dependencies are stated in a Python script, it means the Python script needs to be executed to obtain dependency information. Okay, so let’s trigger the installation process of a Python package, and let’s see what dependencies are stated there, but wait… What operating system should we use? What Python interpreter version should we use? What native dependencies (such as &lt;a href="https://en.wikipedia.org/wiki/GNU_Compiler_Collection" rel="noopener noreferrer"&gt;gcc&lt;/a&gt; for native extensions) should be present in our environment? What CPU architecture? And… And…&lt;/p&gt;

&lt;p&gt;Obtaining information about a Python package seems to be not that straightforward. Consider all the factors and variations that can be coded in the &lt;code&gt;setup.py&lt;/code&gt; script which can, in turn, construct different sets of dependencies or can lead to Python package installation issues. Refer to the article "&lt;a href="https://dustingram.com/articles/2018/03/05/why-pypi-doesnt-know-dependencies/" rel="noopener noreferrer"&gt;Why PyPI Doesn’t Know Your Projects Dependencies&lt;/a&gt;" written by Dusting Ingram, one of the maintainers of the &lt;a href="https://github.com/pypa/warehouse" rel="noopener noreferrer"&gt;Warehouse&lt;/a&gt; (software that powers PyPI) for more info on this.&lt;/p&gt;

&lt;h1&gt;
  
  
  Why PyPI doesn’t know your project's dependencies but Thoth does
&lt;/h1&gt;

&lt;p&gt;We, at Red Hat, have developed a tool that can check dependencies of the desired Python package — it’s called &lt;a href="https://github.com/thoth-station/solver" rel="noopener noreferrer"&gt;thoth-solver&lt;/a&gt; (&lt;a href="https://pypi.org/project/thoth-solver/" rel="noopener noreferrer"&gt;see also it’s PyPI release&lt;/a&gt;).&lt;/p&gt;

&lt;p&gt;This Python application can install dependencies from any Python package index conforming to the &lt;a href="https://www.python.org/dev/peps/pep-0503/" rel="noopener noreferrer"&gt;Simple Repository API PEP-503&lt;/a&gt; (such as &lt;a href="https://pypi.org/" rel="noopener noreferrer"&gt;pypi.org&lt;/a&gt; or &lt;a href="http://tensorflow.pypi.thoth-station.ninja/" rel="noopener noreferrer"&gt;AICoE Python package index&lt;/a&gt;) and extract all the metadata of a Python package. One of such metadata are requirements of the Python package.&lt;/p&gt;

&lt;p&gt;As there are no static metadata to rely on without actually installing a Python module (well, for some wheel distributions it is possible to do so), &lt;a href="https://github.com/thoth-station/solver" rel="noopener noreferrer"&gt;thoth-solver&lt;/a&gt; installs the given application into your environment and extracts package metadata. The data aggregated are reported in a structured JSON format for any further processing.&lt;/p&gt;

&lt;h1&gt;
  
  
  Checking some of the Thoth solver screws
&lt;/h1&gt;

&lt;p&gt;As stated above, Thoth’s solver downloads and actually installs a Python package. The very first "observation" it captures is:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Does the given Python application install into the given environment?&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;There are numerous reasons why a Python package does not need to be installable into the environment. It might be missing or wrong toolchain (e.g. missing gcc or wrong gcc version e.g. lacking proper C/C++ standard), Python interpreter incompatibilities (e.g. Python 2 versus Python3 issues), missing required manylinux support by an older pip release used, or just wrong release by maintainers. Basically, anything that can go possibly wrong. The implementation behind thoth-solver captures this fact with all the relevant log information (that are subsequently analyzed within project Thoth to automatically derive why the given package was not installable).&lt;/p&gt;

&lt;p&gt;Once the installation succeeds, the tool obtains all the information about dependencies using &lt;a href="https://docs.python.org/3/library/importlib.metadata.html" rel="noopener noreferrer"&gt;&lt;code&gt;importlib.metadata&lt;/code&gt;&lt;/a&gt; (and some additional metadata as produced by the &lt;code&gt;importlib.metadata&lt;/code&gt; module). This metadata gathering is done on top of a fresh virtual environment into which the analyzed package was installed into to reduce any interference with dependencies of thoth-solver itself or any other package installed in the environment where thoth-solver runs in. Requirements stated are parsed and solved respecting Python standards for dependency specification so that the resulting document states also dependencies in specific versions (rather than just dependency specifications). The obtained results are subsequently aggregated and reported in the final JSON document together with thoth-solver run metadata (OS, Python interpreter version/build, …).&lt;/p&gt;

&lt;p&gt;We run thoth-solver as a containerized application in our clusters using different operating systems (such as &lt;a href="https://www.redhat.com/en/blog/introducing-red-hat-universal-base-image" rel="noopener noreferrer"&gt;UBI&lt;/a&gt;, &lt;a href="https://www.redhat.com/en/technologies/linux-platforms/enterprise-linux" rel="noopener noreferrer"&gt;RHEL&lt;/a&gt;, &lt;a href="https://getfedora.org/" rel="noopener noreferrer"&gt;Fedora&lt;/a&gt;, ...), different native dependencies, and different Python interpreter versions (a matrix of all the factors that can influence Python package installation). The resulting JSON documents of thoth-solver runs are automatically placed onto &lt;a href="https://ceph.io/" rel="noopener noreferrer"&gt;Ceph&lt;/a&gt; and synced into Thoth’s knowledge base. Optimizations of thoth-solver implementation (such as pre-baking virtual environment into containers shipped) allowed us to analyze a few hundred Python packages per hour (the only limitation for us are basically cluster resources and networking). All the dependency information is available on our API endpoints.&lt;/p&gt;

&lt;h1&gt;
  
  
  Thoth solver dataset on Kaggle
&lt;/h1&gt;

&lt;p&gt;If you wish to browse some of the thoth-solver data, you can do so by accessing &lt;a href="https://www.kaggle.com/thothstation/thoth-solver-dataset-v10" rel="noopener noreferrer"&gt;Kaggle dataset&lt;/a&gt; we published. See also a &lt;a href="https://www.kaggle.com/pacospace/explore-thoth-solver-dataset" rel="noopener noreferrer"&gt;notebook that explores the dataset&lt;/a&gt; or &lt;a href="https://github.com/thoth-station/datasets" rel="noopener noreferrer"&gt;github.com/thoth-station/datasets&lt;/a&gt; repository with notebooks.&lt;/p&gt;

&lt;p&gt;The dataset consists of 100,000 resulting (415.79 MB) thoth-solver JSON documents. You can find application stacks of popular Python packages (such as &lt;a href="https://www.tensorflow.org/" rel="noopener noreferrer"&gt;TensorFlow&lt;/a&gt;) published on PyPI.&lt;/p&gt;

&lt;h1&gt;
  
  
  Thoth reverse solver
&lt;/h1&gt;

&lt;p&gt;As the solver states dependencies at the point of time when it is run, we wanted to keep our knowledge base up to date with recent Python package releases. Consider a new major &lt;code&gt;numpy==1.20.0&lt;/code&gt; release or a new patch &lt;code&gt;numpy==1.18.6&lt;/code&gt; release — do these releases affect any packages that depend on &lt;code&gt;numpy&amp;gt;=1.19&lt;/code&gt;? We can answer this question (offline, without running thoth-solver) using another component called &lt;a href="https://github.com/thoth-station/revsolver" rel="noopener noreferrer"&gt;thoth-revsolver&lt;/a&gt;. Check this demo for more info:&lt;/p&gt;

&lt;p&gt;&lt;iframe width="710" height="399" src="https://www.youtube.com/embed/bpDzi_Jaj4M"&gt;
&lt;/iframe&gt;
&lt;/p&gt;




&lt;h1&gt;
  
  
  Project Thoth
&lt;/h1&gt;

&lt;p&gt;Project Thoth is an application that aims to help Python developers. If you wish to be updated on any improvements and any progress we make in project Thoth, feel free to &lt;a href="https://www.youtube.com/channel/UClUIDuq_hQ6vlzmqM59B2Lw" rel="noopener noreferrer"&gt;subscribe to our YouTube channel&lt;/a&gt; where we post our updates as well as our recordings from our scrum demos. You can also &lt;a href="https://twitter.com/thothstation" rel="noopener noreferrer"&gt;follow us on Twitter&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;Stay tuned for any new updates!&lt;/p&gt;

</description>
      <category>python</category>
    </item>
    <item>
      <title>How to beat Python’s pip: A brief intro</title>
      <dc:creator>Fridolín Pokorný</dc:creator>
      <pubDate>Mon, 12 Oct 2020 10:01:59 +0000</pubDate>
      <link>https://dev.to/fridex/how-to-beat-python-s-pip-a-brief-intro-4bec</link>
      <guid>https://dev.to/fridex/how-to-beat-python-s-pip-a-brief-intro-4bec</guid>
      <description>&lt;p&gt;The Python’s package installer, &lt;a href="https://pypi.org/project/pip" rel="noopener noreferrer"&gt;pip&lt;/a&gt;, is known to have issues when resolving software stacks. In the upcoming series of articles, I will briefly discuss an approach that helped to resolve versions of libraries for applications faster than pip’s resolution algorithm. Moreover, the resolved software stacks are scored based on various aspects to help with shipping high-quality software.&lt;/p&gt;

&lt;p&gt;Python is one of the most growing programming languages out there. There is no doubt it’s becoming the programming language of choice for data science, machine learning engineers, or software developers. In my eyes, Python code is a pseudo-code that simply runs — easy to write, easy to maintain. Creating an API server using &lt;a href="https://pypi.org/project/flask" rel="noopener noreferrer"&gt;Flask&lt;/a&gt;, making data analysis in &lt;a href="https://pypi.org/project/jupyterlab" rel="noopener noreferrer"&gt;Jupyter notebooks&lt;/a&gt;, or creating a neural network using &lt;a href="https://pypi.org/project/tensorflow/" rel="noopener noreferrer"&gt;TensorFlow&lt;/a&gt;, these all can be easily written in a few lines of code. Any performance-critical parts can be optimized thanks to CPython’s C API. Python is a very effective weapon in anyone’s inventory.&lt;/p&gt;

&lt;h1&gt;
  
  
  Python, pip &amp;amp; resolvers
&lt;/h1&gt;

&lt;p&gt;The Python Packaging Authority (PyPA) is a working group that maintains a core set of projects used in Python packaging. One of these components is &lt;a href="https://pypi.org/project/pip" rel="noopener noreferrer"&gt;pip&lt;/a&gt; — the PyPA recommended tool for installing Python packages. If you developed any Python application, most probably you have used it or at least considered it to be used for installing libraries for your project. Similar tools are Pipenv (also maintained by PyPA) or Poetry.&lt;/p&gt;

&lt;p&gt;pip does its job in most cases pretty well — it can install your desired software from PyPI — the Python’s Packaging Index that hosts open-source projects. Alternatively, you can use your privately hosted Python indexes as a source of software to be installed. Unfortunately, pip lacks a proper implementation of a resolver that can in some cases lead to painful situations. As of today, PyPA is working on a new pip’s resolver implementation. Mostly, resolvers use an implementation but let’s have a look at the resolution problem from the other side.&lt;/p&gt;

&lt;h1&gt;
  
  
  A state-space made out of Python packages
&lt;/h1&gt;

&lt;p&gt;Let’s say we want to create an application that uses two libraries called &lt;code&gt;simplelib&lt;/code&gt; and &lt;code&gt;anotherlib&lt;/code&gt;. These libraries can be installed in different versions. These versions can have a different impact on the resulting software shipped — e.g. performance impact, security impact, or, in the worst cases, the application does not assemble at all. Now, let’s create a function that includes such observations and performs "scoring" with respect to versions included in the software installed. Such function would have discrete values and for our artificial example it could look like this visualization (assuming the libraries do not have any transitive dependencies):&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fi%2Fs154eh8bp487bz9y98gd.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fi%2Fs154eh8bp487bz9y98gd.png" alt="Alt Text"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;To make it more intuitive let’s try to interpolate values and plot the resulting figure:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fi%2Fdllnme6b5jnoyboedvaj.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fi%2Fdllnme6b5jnoyboedvaj.png" alt="Alt Text"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;On the horizontal axes, you can see different versions of &lt;code&gt;simplelib&lt;/code&gt; and &lt;code&gt;anotherlib&lt;/code&gt; libraries. On the vertical axis, you can see different values of the scoring function. If you would use &lt;a href="https://pypi.org/project/pip" rel="noopener noreferrer"&gt;pip&lt;/a&gt;, &lt;a href="https://pypi.org/project/pipenv" rel="noopener noreferrer"&gt;Pipenv&lt;/a&gt;, or &lt;a href="https://pypi.org/project/poetry" rel="noopener noreferrer"&gt;Poetry&lt;/a&gt;, all these tools will resolve as more recent versions of libraries as possible — on our graph it would be the rightmost value:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fi%2Fwaw9j2287u3d0zohqldt.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fi%2Fwaw9j2287u3d0zohqldt.png" alt="Alt Text"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;But what if we want to ship better software? What if the most recent releases are broken? That would require manual work, increased maintenance cost and one can easily end up in dependency hell.&lt;/p&gt;

&lt;h1&gt;
  
  
  Thoth’s advise: install the right software
&lt;/h1&gt;

&lt;p&gt;The idea described above gave birth to a project called &lt;a href="https://github.com/thoth-station" rel="noopener noreferrer"&gt;Thoth&lt;/a&gt;. Thoth is a recommendation engine for Python applications that can resolve not the latest, but the greatest set of libraries that get installed for your application. Thoth’s resolver is resolving and scoring software stacks based on it’s aggregated knowledge — hence it’s a server-side resolution. You can submit requirements that you have for your application and Thoth’s recommendation engine can resolve the software stack that satisfies them.&lt;/p&gt;




&lt;p&gt;In the upcoming articles, I will dive more into Thoth’s internals — how the resolution is performed, what are the key concepts implemented, and how the implementation can resolve and score tenths, hundreds, or thousands of software stacks a second. One of the concepts used there is reinforcement learning that helps to resolve high-quality software stacks based on observations in Thoth’s knowledge base, so stay tuned!&lt;/p&gt;

</description>
      <category>python</category>
      <category>machinelearning</category>
    </item>
    <item>
      <title>“Termial” random for prioritized picking an item from a list</title>
      <dc:creator>Fridolín Pokorný</dc:creator>
      <pubDate>Thu, 20 Feb 2020 19:10:31 +0000</pubDate>
      <link>https://dev.to/fridex/termial-random-for-prioritized-picking-an-item-from-a-list-22jh</link>
      <guid>https://dev.to/fridex/termial-random-for-prioritized-picking-an-item-from-a-list-22jh</guid>
      <description>&lt;p&gt;Let’s take a look at a solution that randomly picks an item from a list. Instead of assigning equal probability to each item in the list, let’s create an algorithm that assigns the highest probability to the item at index 0 and lower probabilities to all remaining items as the index in the list grows. Can we implement such a function?😓&lt;/p&gt;

&lt;p&gt;You can use already available routines present in the Python standard library to pick a random item from a list. Let’s say we want to randomly pick a number from a list of integers. The only thing you need to do is to run the following snippet:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="nn"&gt;random&lt;/span&gt;
&lt;span class="n"&gt;my_list&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mi"&gt;42&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;33&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;30&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nb"&gt;int&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s"&gt;"deadbeef"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;base&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;16&lt;/span&gt;&lt;span class="p"&gt;)]&lt;/span&gt;
&lt;span class="n"&gt;random&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;choice&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;my_list&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="c1"&gt;# results in 42 with a probability of 1 / len(my_list)
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;


&lt;p&gt;Now let’s say we want to prioritize number 42 over 33, prioritize number 33 over 30, and at the same time, prioritize number 30 over “0xdeadbeef”. We have 4 numbers in total in our list, let’s assign weights to these numbers in the following manner:&lt;br&gt;
&lt;/p&gt;
&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;+--------------+-------------+
|    number    |    weight   |
+--------------+-------------+
|     42       |      4      |
|     33       |      3      |
|     30       |      2      |
|  0xdeadbeef  |      1      |
+--------------+-------------+
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;


&lt;p&gt;The higher the weight is, the higher the probability we pick the given number.&lt;br&gt;
You can see weighs as a number of “buckets” we assign to each number from the list. Subsequently, we randomly (random uniform) try to hit one bucket. After hitting the bucket, we check to which number this bucket corresponds to.&lt;br&gt;
The total number of buckets we can hit is equal to the sum of weights:&lt;br&gt;
&lt;/p&gt;
&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;4 + 3 + 2 + 1 = 10
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;


&lt;p&gt;The probability of hitting a bucket based on the number from the list is shown below:&lt;br&gt;
&lt;/p&gt;
&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;+--------------+-----------------------+
|    number    |       probability     |
+--------------+-----------------------+
|     42       |       4/10 = 0.4      |
|     33       |       3/10 = 0.3      |
|     30       |       2/10 = 0.2      |
|  0xdeadbeef  |       1/10 = 0.1      |
+--------------+-----------------------+
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;


&lt;p&gt;To generalize this for &lt;em&gt;n&lt;/em&gt; numbers, we can come up with the following formula:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--Od6MHsfv--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_800/https://miro.medium.com/max/1024/1%2AfeDrwuLTeVGBZyfOphWwsw.png" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--Od6MHsfv--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_800/https://miro.medium.com/max/1024/1%2AfeDrwuLTeVGBZyfOphWwsw.png" alt="formula1" width="512" height="32"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;In other mathematical words:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--2F30PadC--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_800/https://miro.medium.com/max/256/1%2AuYBVFoPwtLKoBmHWgv4Bmw.png" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--2F30PadC--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_800/https://miro.medium.com/max/256/1%2AuYBVFoPwtLKoBmHWgv4Bmw.png" alt="formula2" width="128" height="81"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Does this formula look familiar to you? It’s called &lt;a href="https://en.wikipedia.org/wiki/Termial"&gt;termial of a positive integer &lt;em&gt;n&lt;/em&gt;; from Wikipedia&lt;/a&gt;:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;The termial was coined by Donald E. Knuth in his The Art of Computer Programming. It is the additive analog of the factorial function, which is the product of integers from 1 to n. He used it to illustrate the extension of the domain from positive integers to the real numbers.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;Now, let’s make our hands dirty with some code. To compute the termial of &lt;em&gt;n&lt;/em&gt;:&lt;br&gt;
&lt;/p&gt;
&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="n"&gt;termial_of_n&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nb"&gt;sum&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nb"&gt;range&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nb"&gt;len&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;my_list&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;  &lt;span class="c1"&gt;# O(N)
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;


&lt;p&gt;Another, more effective way, to compute the termial of n is to use &lt;a href="https://en.wikipedia.org/wiki/Binomial_coefficient"&gt;Binomial coefficient&lt;/a&gt; principle and compute &lt;code&gt;(len(my_list) + 1)&lt;/code&gt; over &lt;code&gt;2&lt;/code&gt;:&lt;br&gt;
&lt;/p&gt;
&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="n"&gt;l&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nb"&gt;len&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;my_list&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="c1"&gt;# (l + 1) over 2 = l! / (2!*(l-2)!) = l * (l - 1) / 2
&lt;/span&gt;&lt;span class="n"&gt;termial_of_n&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;((&lt;/span&gt;&lt;span class="n"&gt;l&lt;/span&gt;&lt;span class="o"&gt;**&lt;/span&gt;&lt;span class="mi"&gt;2&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="n"&gt;l&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;&amp;gt;&amp;gt;&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt;  &lt;span class="c1"&gt;# O(1)
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;


&lt;p&gt;Finally, we can pick a random (random uniform) bucket from our buckets:&lt;br&gt;
&lt;/p&gt;
&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;
&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="nn"&gt;random&lt;/span&gt;
&lt;span class="n"&gt;choice&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;random&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;randrange&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;termial_of_n&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;


&lt;p&gt;The result stored in the variable choice holds an integer starting from 0 to 9 (inclusively) and represents an index to the list of the buckets we created earlier:&lt;br&gt;
&lt;/p&gt;
&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;+--------------+---------------+---------------+
|    choice    |     bucket    |     number    |
+--------------+---------------+---------------+
|      0       |       1       |       42      |
+--------------+---------------+---------------+
|      1       |       2       |       42      |
+--------------+---------------+---------------+
|      2       |       3       |       42      |
+--------------+---------------+---------------+
|      3       |       4       |       42      |
+--------------+---------------+---------------+
|      4       |       5       |       33      |
+--------------+---------------+---------------+
|      5       |       6       |       33      |
+--------------+---------------+---------------+
|      6       |       7       |       33      |
+--------------+---------------+---------------+
|      7       |       8       |       30      |
+--------------+---------------+---------------+
|      8       |       9       |       30      |
+--------------+---------------+---------------+
|      9       |       10      |   0xdeadbeef  |
+--------------+---------------+---------------+
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;


&lt;p&gt;How do we find which number we hit through a randomly picked bucket for any &lt;em&gt;n&lt;/em&gt;?&lt;/p&gt;

&lt;p&gt;Let’s revisit how we computed the termial number of n using the Binomial coefficient based formula:&lt;br&gt;
&lt;/p&gt;
&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="n"&gt;l&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nb"&gt;len&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;my_list&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;termial_of_n&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;((&lt;/span&gt;&lt;span class="n"&gt;l&lt;/span&gt;&lt;span class="o"&gt;**&lt;/span&gt;&lt;span class="mi"&gt;2&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="n"&gt;l&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;&amp;gt;&amp;gt;&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;


&lt;p&gt;Based on termial function definition, we know that regardless of &lt;em&gt;n&lt;/em&gt;, we always assign 1 bucket to number at index 0, 2 buckets to number at index 1, 3 buckets to number at index 2 and so on. Using this knowledge, we can transform the Binomial coefficient formula to the following equation:&lt;br&gt;
&lt;/p&gt;
&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="n"&gt;choice&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;((&lt;/span&gt;&lt;span class="n"&gt;i&lt;/span&gt;&lt;span class="o"&gt;**&lt;/span&gt;&lt;span class="mi"&gt;2&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="n"&gt;i&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;&amp;gt;&amp;gt;&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;


&lt;p&gt;The next step is to find i that satisfies the given equation. The equation is a quadratic function described as:&lt;br&gt;
&lt;/p&gt;
&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;a*i**2 + b*i + c = 0

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;


&lt;p&gt;where:&lt;br&gt;
&lt;/p&gt;
&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;a = 1/2
b = 1/2
c = -choice
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;


&lt;p&gt;As the &lt;em&gt;choice&lt;/em&gt; is expected to be always a non-negative integer (index to the list of buckets), we can search for a solution that always results in a non-negative integer (reducing one discriminant term that always results in negative &lt;em&gt;i&lt;/em&gt;).&lt;br&gt;
&lt;/p&gt;
&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="nn"&gt;math&lt;/span&gt;
&lt;span class="c1"&gt;# D = b**2 - 4*a*c
# x1 = (-b + math.sqrt(D)) / (2*a)
# x2 = (-b - math.sqrt(D)) / (2*a)
# Given:
#   a = 1/2
#   b = 1/2
#   c = -choice
# D = (1/2)**2 + 4*0.5*choice = 0.25 + 2*choice
&lt;/span&gt;&lt;span class="n"&gt;i&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;math&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;floor&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="mf"&gt;0.5&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="n"&gt;math&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;sqrt&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mf"&gt;0.25&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;choice&lt;/span&gt; &lt;span class="o"&gt;&amp;lt;&amp;lt;&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;)))&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;


&lt;p&gt;The solution has to be rounded using &lt;em&gt;math.floor&lt;/em&gt; as it corresponds to the inverted index with respect to &lt;em&gt;n&lt;/em&gt;. As i is inverted, the final solution (index to the original list) is:&lt;br&gt;
&lt;/p&gt;
&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="n"&gt;my_list&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;n&lt;/span&gt; &lt;span class="o"&gt;-&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt; &lt;span class="o"&gt;-&lt;/span&gt; &lt;span class="n"&gt;i&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;

&lt;h1&gt;
  
  
  Asymptotic complexity analysis
&lt;/h1&gt;

&lt;p&gt;Let’s assume:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;the &lt;em&gt;len&lt;/em&gt; function can return the length of the list in &lt;em&gt;O(1)&lt;/em&gt; time&lt;/li&gt;
&lt;li&gt;random.randrange* operates in &lt;em&gt;O(1)&lt;/em&gt; time&lt;/li&gt;
&lt;li&gt;we use Binomial coefficient based equation for computing the termial of &lt;em&gt;n&lt;/em&gt;
&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The whole computation is done in &lt;em&gt;O(1)&lt;/em&gt; time with &lt;em&gt;O(1)&lt;/em&gt; space.&lt;/p&gt;

&lt;p&gt;If we would use the &lt;em&gt;sum&lt;/em&gt; based computation of the termial of &lt;em&gt;n&lt;/em&gt;, the algorithm would become &lt;em&gt;O(n)&lt;/em&gt; time with &lt;em&gt;O(1)&lt;/em&gt; space.&lt;/p&gt;


&lt;h1&gt;
  
  
  Disclaimer &amp;amp; original usage
&lt;/h1&gt;

&lt;p&gt;I designed this algorithm for &lt;a href="https://thoth-station.ninja/docs/developers/adviser/"&gt;Thoth’s recommendation engine&lt;/a&gt;. Its main purpose is to prefer more recent versions of packages in the resolver during the resolution of Python software stacks.&lt;/p&gt;

&lt;p&gt;I would be happy for any feedback or any similar approaches recommended.&lt;/p&gt;

&lt;p&gt;A complete solution can be found on &lt;a href="https://gist.github.com/fridex/8a2442f7e187914af715968097688aa3"&gt;my GitHub gist&lt;/a&gt;:&lt;/p&gt;


&lt;div class="ltag_gist-liquid-tag"&gt;
  
&lt;/div&gt;



</description>
      <category>python</category>
      <category>random</category>
    </item>
  </channel>
</rss>
