This is a story about a tool that caught a production-impacting bug the day before we released the code. This is also the story of a tool no one uses, and for good reason. By the time you're done reading you'll see why this tool is useful, why it's unusable, and how you can actually use it with your Python project.
(Not a Python programmer? The same problems and solutions are likely apply to tools in your ecosystem as well.)
Pylint saves the day
If you're coding in Haskell the compiler's got your back. If you're coding in Java the compiler will usually lend a helping hand. But if you're coding in a dynamic language like Python or Ruby you're on your own: you don't have a compiler to catch bugs for you.
The next best thing is a lint tool that uses heuristics to catch bugs in your code. One such tool is Pylint, and here's how I started using it.
One day at work we realized our builds had been consistently failing for a few days, and it wasn't the usual intermittent failures. After a few days of investigating, my colleague Tom Prince discovered the problem. It was Python code that looked something like this:
for volume in get_volumes():
do_something(volume)
for volme in get_other_volumes():
do_something_else(volume)
Notice the typo in the second for loop. Combined with the fact that Python leaks variables from blocks, the last value of volume from the first for loop was used for every iteration of the second loop.
To see if we could prevent these problems in the future I tried Pylint, re-introduced the bug... and indeed it caught the problem. I then looked at the rest of the output to see what else it had found.
What it had found was a serious bug. It was in code I had written a few days earlier, and the bug completely broke an important feature we were going to ship to users the very next day. Here's a heavily simplified minimal reproducer for the bug:
list_of_printers = []
for i in [1, 2, 3]:
def printer():
print(i)
list_of_printers.append(printer)
for func in list_of_printers:
func()
The intended result of this reproducer is to print:
1
2
3
But what will actually get printed with this code is:
3
3
3
When you define a nested function in Python that refers to a variable in the outside scope it binds not the value of a variable but the variable itself. In this case that means the i inside printer() ended up always getting the last value of the variable i in the for loop.
And luckily Pylint caught that bug before it shipped; pretty great, right?
Why no one uses Pylint
Pylint is useful, but many projects don't use it. For example, I went and checked just now, and neither Twisted nor Django nor Flask nor Sphinx seem to use Pylint. Why wouldn't these large, sophisticated Python projects use a tool that would automatically catch bugs for them?
One problem is that it's slow, but that's not the real problem; you can always just run it on the CI system with the other slow tests. The real problem is the amount of output.
Here's what I mean: I ran pylint
on a checkout of Twisted and the resulting output was 28,000 lines of output (at which point pylint
crashed, but I'll assume that's fixed in newer releases). Let me say that again: 28,000 errors or warnings.
That's awful.
And to be fair Twisted has a coding standard that doesn't match the Python mainstream, but massive amounts of noise has been my experience with other projects as well. Pylint has a lot of useful errors... but also a whole lot of utterly useless garbage assumptions about how your code should look. And fundamentally it treats them all the same; e.g. there's a distinction between warnings and errors but in practice both useful and useless stuff is in the warning category.
For example:
W:675, 0: Class has no __init__ method (no-init)
That's not a useful warning. Now imagine a few thousand of those.
How you should use Pylint
So here we have a tool that is potentially useful, but unusable in practice.
What to do? Luckily Pylint has some functionality that can help: you can configure it with a whitelist of lint checks.
First, setup Pylint to do nothing:
- Make a list of all the features you plausibly want to enable from the Pylint docs and configure
.pylintrc
to whitelist them. - Comment them all out.
At this point Pylint will do no checks. Next:
- Uncomment a small batch of checks, and run
pylint
. - If the resulting errors are real problems, fix them. If the errors are utter garbage, delete those checks from the configuration.
At this point you have a small number of probably useful checks that are passing: you can run pylint
and you only will be told about new problems. In other words, you have a useful tool.
Repeat this process a few times, or once a week, enabling a new batch of checks each time until you run out of patience or you run out of Pylint checks to enable.
The end result will be something like this configuration or this configuration; both projects are open source under the Apache 2.0 license, so you can use those as a starting point.
Go forth and lint
Here's my challenge to you: if you're a Python programmer, go setup Pylint on a project today. It'll take an hour to get some minimal checks going, and one day it will save you from a production-impacting bug. If you're not a Python programmer you can probably find some equivalent tool for your language; go set that up.
And if you're the author of a lint tool, please, try to come up with better defaults. It's better to catch 60% of bugs and have 10,000 software projects using your tool than to catch 70% of bugs and have almost no one use it.
Broken software, bad job offers, and more: avoid 20+ years of my mistakes working as a software engineer. Join 2500 other programmers and learn how to avoid a new mistake every week.
Top comments (4)
Hmm. I played around with it for a while and this is the best solution I could come up with:
(note: I don't know Python)
Maybe you can give
wemake-python-styleguide
a try? It has even more rules than pylint, but does not even try to mess with types.It has way less false-positives and is based on
flake8
.wemake-services / wemake-python-styleguide
The strictest and most opinionated python linter ever!
wemake-python-styleguide
Welcome to the strictest and most opinionated python linter ever.
wemake-python-styleguide
is actually a flake8 plugin with some other plugins as dependencies.Quickstart
You will also need to create a
setup.cfg
file with the configuration.We highly recommend to also use:
Running
This app is still just good old
flake8
And it won't change your existing workflow.See "Usage" section in the docs for examples and integrations.
We also support Github Actions as first class-citizens Try it out!
What we are about
The ultimate goal of this project is to make all people write exactly the same
python
code.Cheers!
There's also an underlying issue that was clearly apparent to me: either your tests (if any) are not covering that piece of code, or no QA / functional testing was done before marking it production-ready, the latter being graver IMHO.
Unless the piece of code is not accurately reflecting how the scoped variable is used inside the function, this might have been certainly caught during development (tests) or during QA.
Nice post! I don't work with python daily and didn't know about pylint. So by default this tool doesn't show information properly. Is there any other tool?