Playing with Python `path` attribute

#python

Note: this article was originally published on my personal blog. Check it out here.

Recently, when preparing an article about Python import system, I discovered the __path__ attribute and found some cool stuff to do with it, so I wanted to show what I discovered.

Take a look at the following example:

myPackage/
├── __init__.py
├── subPackageA/
│   ├── __init__.py
│   └── foo.py       # Has a function foo()
└── subPackageB/
    ├── __init__.py
    └── bar.py       # Has a function bar()

First case: all the __init__.py files of the package and subpackages are empty (that's important, we'll get to that later).

In this case, if we want to import the functions foo and bar from their modules foo and bar, we would have to do the following:

>>> from myPackage.subPackageA.foo import foo
>>> from myPackage.subPackageB.bar import bar

Nothing new here.

Now second case, we update myPackage/subPackageA/__init__.py and put this little piece of code:

import os

current_dir = os.path.dirname(os.path.realpath(__file__))
__path__.append(os.path.join(
    os.path.abspath(os.path.join(current_dir, '..')),
    'subPackageB'
))

With that in place we can do some magic:

>>> from myPackage.subPackageA.foo import foo
>>> from myPackage.subPackageA.bar import bar

Note that we imported bar from subPackageA, but bar.py is still under myPackage/subPackageB:

>>> from myPackage.subPackageA import bar
>>> bar.__file__
'<...>/myPackage/subPackageB/bar.py'

But what happened here? Why were we able to import bar under subPackageA's namespace just by adding a piece of code inside myPackage/subPackageA/__init__.py? Time for some explanations.

The `init.py` file and the usage of the `path` attribute

The __init__.py file is a special file that indicate that the current folder is a Python package. During the import process, when Python finds the source of a module or package it is asked to import, it will read and execute all the __init__.py files of the package and subpackages of the module.

So take the following structure. Each of the __init__.py file has a simple print(__name__) so that we can see what's going on.

packageA/
├── __init__.py
└── packageB/
    ├── __init__.py
    └── packageC/
        ├── __init__.py
        └── packageD/
            ├── __init__.py
            └── module.py

And when we try to import the module:

>>> from packageA.packageB.packageC.packageD import module
packageA
packageA.packageB
packageA.packageB.packageC
packageA.packageB.packageC.packageD

So in our example above, the code we added in subPackageA/__init__.py we executed when Python executed from myPackage.subPackageA.foo import foo. This piece of code look perhaps a bit cryptic, but it simply does one thing: add the absolute path to myPackage/subPackageB/ to the __path__ of subPackageA.

The __path__ attribute of a Python package is used by the import system when trying to import modules of subpackages. It is a list (or at least an iterable) of paths that can be used when importing modules or subpackages.

For example:

>>> import django
>>> django.__path__
['<...>/venv/lib/python3.9/site-packages/django']
>>> import django.db
>>> django.db.__file__
'<...>/venv/lib/python3.9/site-packages/django/db/__init__.py'

So when we updated subPackageA.__path__ to add subPackageB, we told Python to bind the objects of subPackageB/ under the namespace of subPackageA, which is why we were able to import bar from subPackageA.

Two points to note though:

The update of subPackageA.__path__ can be done even if the __path__ has not been updated with the new path yet. I believe this is because when Python tries to resolve the paths of the imports, it executes all the __init__.py files it finds in paths that matches (even partially) the import required. I still needs confirme this though.
It is still possible to import the objects of subPackageB under its own namespace.

Possible usage of this trick

Now that you know this is possible, you may be wondering what could be a possible real-life use of this.

__path__ can be used sometimes when you're dealing with dynamic imports. For example Django uses them when dealing with modules provided by the user.

Another possibility is to hide the internal structure of a package. If you can dynamically choose which namespace to bind your objects to, you can basically choose the structure you want for your package, no matter the internal organizations of files and folders. Note that this usage is probably just theoretical, as it wouldn't be really useful in real life. I see a few reason for this limitation:

Like I said before, the dynamic binding of objects to a namespace other than their own doesn't prevent them to be imported via their own namespace (cf. point 2. above).
When a package is installed into the site-packages folder (whether system-wide or in a virtualenv), it is possible to see the source files of it, so people would still be able to figure out how the package is structured internally.
If you really want to acheive this, it might be easier to add imports to the __init__.py files, as they are executed when the package is imported, than to mess with __path__ attribute.

A last possibility is to be able to extend packages via some extensions that can be installed separately, and registered to the main package, to make their usage as transparent as possible. Let's take an example to make things a bit easier to understand.

Let's say you create a package imageLib to handle image manipulation. You have functions to resize, crop, rotate... All this functions needs to have an image decoded (ie. a matrix of pixels) to work, so you need to provide functions for encoding and decoding each common image format (png, jpg, gif...). But you don't want to include all the extensions into your lib, so that people will have to download and install only those they need.

That's where __path__ manipulation comes handy. The extensions can have a .register() function that will find the __path__ attribute of the lib (or any of it's subpackage), and add themselves to it, so that the import can be as smooth as possible.

Turns out this ability is something that is quite popular, named namespace packages. This was a functionnality available using the pkgutil package, but it's part of Python since Python 3.3 (Sept. 2012) with PEP 420. This PEP is really interesting, but that'll be for another article.