Because I wrote a part1 here, I wanted to follow up with a part2 (this article).
In the first part, I touched on CPython's size (when packaging), but now I want to mention speed.
And I never considered speed being so critical with Python (especially CPython) but on some embedded it sure makes a difference.
If you've run Python more than a few times, you've noticed those .pyc files it generates. Those are byte-compiled cache files that are loaded into the Python interpreter.
Usually, Python will look if there's a .pyc file and if the timestamp (of that file) is newer than the .py file, it will run that **.pyc.
Otherwise, the file gets re-generated, and then run.
And I never cared about them, but when you're a package maintainer, and you need to save on space on things like OpenWrt (and other embedded things), you'll notice that the Python source file is about the same size as the .pyc file.
Debian (for example) seems to ship both these files.
Initially, I packaged just the source files, because deploying Python source files makes sense for humans. You can see the source and you know exactly what you're getting with CPython and the libraries.
Of course, to prevent people from running out of storage flash on their devices when running Python code, byte-compiled caches have been disabled.
That means, that starting a Python program would involve re-generating those cache files every time you'd run a Python script.
It seems that the generation of those .pyc is expensive.
There was a report saying that a "Hello World" script in Python would take 500 milliseconds to start (on some slow machines).
When compiling it to .pyc and running that, it would take 70 milliseconds.
So definitely an improvement.
Now, what happened was that we switched from shipping Python packages with source (.py) files, to shipping/packaging Python packages with byte-compiled (.pyc) files.
Also, each Python package would get it' own python-xxxx-src version. So users that want the Python sources can choose to install them separately.
It is the best compromise here, to:
- keep the Python packages as small as possible
- have some speed (well, not make it worse)
For reference, I did find some of the old discussions on Github about some of these.
They have some historical meaning for me.
One is dating from 2014, around when I started: https://github.com/openwrt/packages/issues/474
And one about Debian's reproduce-able builds, which relates to all these .pyc files: https://github.com/openwrt/packages/issues/5278
Maybe it's a good time to re-spin the old reproduce-able builds effort for OpenWrt :)
Top comments (0)