20 years ago I thought Python was a brilliant language for including so many useful libraries in its distribution. This "batteries included" motto was well known and described in the Python about page.
But today it's clear to me that including batteries in a coding language is a deadly flaw. Because of the packaging hurdle and because of library rot. Details follow.
Why include libraries? Why have a larger standard library?
Batteries included implies that you don't need to worry about downloading or depending on external packages to get stuff done. You can just go. Yes, with Python I did many times find myself productive without worrying about downloading packages. No npm install
or go get
necessary.
Another important benefit of batteries included is that they make libraries more likely to be interoperable. Packages want to support standard library usage patterns and types. It's better if there's an official set
and Promise
so there isn't an ecosystem battle over such things.
The rotting flaw
The problem with including libraries in Python is that the language itself has to evolve carefully and slowly. But what happens when inevitably there's a new way to do something? A new and better library is created - and now the batteries you included are suboptimal this year, and a decade later your batteries are actual trash you lug around.
Backwards compatibility is critical. I don't want to fix my code once a year when the new Python version rolls out. Moving from Python 2 to 3 sucked for everyone. Even the main language designer Guido was still talking about the painful transition 10 years later.
So you can't just rip out a standard library and replace it with a new one. If you include a library in your core language distribution - you're either married to it for FOREVER, or you've come to terms with eventually causing your users great pain in updates of old code they don't remember they wrote.
Rotting examples
Python's standard library has 3 option parsing libraries: getopt
, optparse
, and argparse
. I personally voted to include argparse
because at the time I thought that was a good idea.
Python includes the urllib.request
module though most folks nowadays prefer the requests
library for making HTTP requests.
There are a few xml parsing libraries in the standard library I'm aware of (xml.dom.minidom
, xml.sax
, xml.parsers.expat
, etree
) but folks today often go for BeautifulSoup
.
I'm sure there are many more examples.
The packaging hurdle
Aside from the eventual rot of libraries - coders that rely on included batteries are incentivized to avoid the packaging system entirely if they can. That's the whole value you get out of the batteries, right? So your helloworld app does not include a "get packages" step. But then that results in fewer developers engaging with the packaging system.
But packaging is really where every piece of software ends up. By incentivizing devs to avoid it, by causing devs to want to use the batteries, we are getting a less battle-tested, and lower quality packaging system.
Counter-argument - the JS package explosion
Almost every package you npm install
today will cause thousands of files and dependencies to show up in your node_modules
folder. If Python's standard library didn't have a math
module then every complicated enough package would end up having to depend on all of the popular math packages transitively because each dependency might use a different one. I'm not sure how to incentivize library creators to reduce this problem.
The confusing state of Python packaging
What happens when I import abc
? I honestly don't know. The package abc
might be a local file (py
or pyc
) or folder in my $PATH
, next to my script, pip-installed, or already somehow in memory.
What is import abc.xyz
? Is xyz
a function in the package? A sub-folder?
This isn't just confusing for the user, it's also confusing for the library builders setting up their package for distribution, and for package system authors.
Setting up a package for distribution in Python, and downloading packages is terribly complicated. I love NodeJS's npm
even though it has its own flaws and competitors. Everyone agrees to use package.json
and node_modules
as the "virtual env". npm
covers all the ground of installing code and binary dependencies on all platforms and generally just works. But with Python, pip
is only one step in a longer process - you probably need a virtualenv, uv, poetry, conda, chocolatey, apt-get, winget, or something else to avoid version conflicts with other Python scripts on your system, and to install binary packages.
Part of why this packaging story is so confusing is because Python does not include it in its "helloworld". If Python did show everyone the packaging story - then it would get fixed.
Summary
A language's standard library should really just provide the bare minimum it can. The bare minimum is probably:
- A packaging system
- Operating system APIs
- Types (numbers, strings, arrays, promises, etc)
A core language distribution should only contain what's necessary to install packages, and for packages to be compatible with each other. Leave all the rest to downloadable packages. I hope Python 4 standardizes the packaging circus on a common data structure like package.json
did for NodeJS, and that most of today's standard library packages will be removed from the executable distribution and posted on PyPI.
Cheers.
Top comments (0)