Last weekend, I stared mentoring people on exercism.io on the Python track. I wasn't sure what to expect, but over the last week I have mentored about 50 people, helping them get their solutions from "tests passing" to "tests passing, readable, and Pythonic." I'm hooked. It's a total blast. I'm going to write a post specifically on that experience. That's not this post. This post is to talk about the three most common mistakes I saw over the last week and some possible alternatives that might be better! So let's start the countdown!
1. Deep Nesting of If Statements or Loops
# Calculating whether or not 'year' is a leap year
if year % 4 == 0:
if year % 100 == 0:
if year % 400 == 0:
return True
else:
return False
else:
return True
else:
return False
A lot of times, I'll pull a line from the Zen of Python to lead off my feedback to a "mentee" (not to be confused with a manitee). When I see this issue, I always lead with
Flat is better than nested.
If you look at your code with your eyes unfocused, looking at the shapes and not reading the words, and you see a bunch of arrows going out and back in again:
\
\
\
\
\
/
/
/
\
\
\
\
/
/
/
/
/
/
It's not definitely a bad thing, but it is a "code smell," or a Spidey Sense that something could possibly be refactored.
So, what can you do instead of nest? There are a couple things to try. The first is inverting your logic and using "early returns" to peel off small pieces of the solution space one at a time.
if year % 400 == 0:
return True
if year % 100 == 0:
return False
if year % 4 == 0:
return True
return False
If the number is divisible by 400, then we immediately return true. Otherwise, for the rest of our code, we can know that year is definitely not divisible by 400. So, at that point, any other year that's divisible by 100 is not a leap year. So we peel off that layer of the onion by returning False.
After that, we can know that year
is definitely not a multiple of 400 or 100, and the remainder of the code follows the same pattern.
The other way to avoid nesting is by using "boolean operators:" and, or, and not
. We can combine if
statements and thus, save ourselves a layer of nesting!
if year % 4 == 0 and (year % 100 != 0 or year % 400 == 0):
return True
else:
return False
Of course, that leads us to our second item...
2. Returning Booleans from If Statements
We'll start with our last example from above:
if year % 4 == 0 and (year % 100 != 0 or year % 400 == 0):
return True
else:
return False
Anytime you find yourself writing:
if something:
return True
else:
return False
You should remember that the clause of an if
statement is itself a boolean!
>>> year = 2000
>>> year % 4 == 0 and (year % 100 != 0 or year % 400 == 0)
True
So, why not type a little less and return the result of the boolean operation directly?
return (year % 4 == 0 and (year % 100 != 0 or year % 400 == 0))
Granted, at this point, the line may be getting a little long, but the code is a little less redundant now!
3. Lists are Like Hammers -- Not Everything is a Nail
Here are two possible ways that this could show up:
some_numbers = [1, 2, 5, 7, 8, ...]
other_numbers = [1, 3, 6, 7, 9, ...]
# Let's try to combine these two without duplicates
for number in other_numbers:
if number not in some_numbers:
some_numbers.append(number)
Or:
data = [["apple", 4], ["banana", 2], ["grape", 14]]
# What fruits do we have?
for item in data:
print(item[0])
# => "apple" "banana" "grape"
# How many grapes do we have?
for item in data:
if item[0] == "grape":
print(item[1])
# => 14
In the first case, you're trying to keep track of some groups of items and you want to combine them without duplicates. This is an ideal candidate for a set
. Sets inherently keep track of their items (although not the order, so don't use a set if the order is important). You can declare them with the built-in set()
function or with squiggle braces ({}
).
some_numbers = {1, 2, 5, 7, 8}
other_numbers = {1, 3, 6, 7, 9}
# Sets use the 'binary or' operator to do "unions"
# which is where they take all of the unique elements
some_numbers | other_numbers
# => {1, 2, 3, 5, 6, 7, 8, 9}
# You can even add single items in!
some_numbers.add(10)
# => {1, 2, 5, 7, 8, 10}
# But adding a duplicate doesn't change anything
some_numbers.add(1)
# => {1, 2, 5, 7, 8, 10}
In the second case, again, order probably isn't critical. You want to keep track of some data by a "label" or something, but be able to keep them all together and list them out as necessary. This time, you're probably looking for a dict
. You can create those with either the dict()
built-in function or, again, squiggle braces ({}
). This time, however, you separate the labels (keys) and the values with a colon.
fruits = {
"apples": 4,
"bananas": 2,
"grapes": 14,
}
You can list out all of the keys (or values!).
list(fruits.keys())
# => ["apples", "bananas", "grapes"]
list(fruits.values())
# => [4, 2, 14]
# Or both!
list(fruits.items())
# => [("apples", 4), ("bananas", 2), ("grapes", 14)]
And you can ask it about (or give it a new value for) specific keys.
# How many grapes are there?
fruits["grapes"]
# => 14
# Not anymore. I ate some.
fruits["grapes"] = 0
fruits["grapes"]
# => 0
Using a list, the your algorithm loops through every item to find the right one. dict
's are built to have very fast lookups, so, even if your dict
is a bazillion fruits long, finding the grapes
is still super fast -- and easy to type! No loops!
Call to Action
Exercism needs mentors! If you think you'd be a good mentor (or even a decent mentor, just on the easy exercises), sign up at their Mentor Signup page. Right now, Rust, Golang, and Elixir are especially swamped and need your help!
Top comments (30)
I think that
... is much clearer than
... or even
Of course this wouldn't always be true but I think that we should make these decisions on the basis of code readability and clarity of intent rather than always going by the number of SLOC or the length of lines.
I personally agree. This example is probably one Boolean too long for a one-liner. I just wanted to show that technique. For some reason, most people I mentor can understand the one-liner better. I don’t know why. Maybe it’s easier for them to understand than inverting the order of comparisons in their head.
Using the expression and putting it in its own function is usually pretty clear. It lets you give a label to the condition. You can also put each comparison on its own line if the expression is too long.
The logical operators stand out more when they're at the beginning of a line.
I agree too
Raymond Hettinger said "one thought per line" and I think this is the best guide for direction with things like this. You can have several things going on as long as they can be thought of (in the realm of your code) as a single thought.
One thing you see all the time from people with previous programming experience is trying to make "C-style" for loops iterating over indices and subscripting the list, i.e.
for i in range(len(list))
instead of the idiomaticfor item in list
.Oh, and generally working against the language and its idioms by shoehorning things into place instead of picking an appropriate data structure or language construct from the outset.
Examples:
Wanting dicts to be ordered instead of using OrderedDict (this is changed in Python 3.7+).
Looping through dicts and accessing elements by key instead of using the appropriate iterator.
Using for loops and indexing to access parts of lists instead of using slices.
Things like that.
dict
ordering is important because keyword arguments are stored in a dict. Consider this:We created an
OrderedDict
, but the constructor was using**kwargs
which means the arguments we passed to it were stored in a regulardict
and got reordered beforeOrderedDict.__init__
could store them in order.With Python 3.6+, the
**kwargs
stores the arguments in the same order they were written, and theOrderedDict
construction works as expected:A little detail on ordered dicts:
Technically dicts have been ordered since 3.6 on (C)Python, it's just that in 3.7 they decided that to be call "Python 3.7" any implementation has to have the insertion order.
So, dicts in Python 3.6+ are ordered only in the official version, which means that you might have portability issues if you depend on that behavior, but in 3.7+ you are garanteed they will be ordered, regardless of the implementation.
Difference with Python 2:
Yep, that's what I meant by 3.7 being different on that point.
I see a lot of this when people want to use the index to do something instead of using
for i in enumerate(foo)
which returns(index, value)
enumerate
is a great tool and it's what you should use when you actually need the index for something, but most of the time people don't really need the index.And there's a better way to use it:
This is a bit faster, because the first condition is met most often:
Shorter version:
Shortest version:
That’s a good point! I feel like that order would be more intuitive for students as well! It does have the slight downside of using “negative comparisons” but I think the ease of understanding definitely outweighs that. 😁
Negative comparisons are needed to eliminate else from source codes.
I’m not 100% sure that else is a bad thing. But I agree that using negative comparisons isn’t a terrible thing. Maybe more of a “all things being equal, try to use positive rather than negative if possible” kind of thing.
Eliminating else made my source code cleaner and faster. I use else in ternary operator only. I transformed structure if-else into ternary expression in many times.
Some languages (XSLT for example) are without "else". This inspired me.
I understand where you come from with your example to check if a year is a leap year. It's a good example since you do need to use quite a few comparisons to figure out if a given year is a leap year. I will have to disagree with you that using a one-liner is a much clearer or even less redundant.
The thing is, if you encounter a code with this style in a massive code base:
With your eyes, you can figure out what is happening there in less than a second. But when you encounter a one-liner like this:
You actually need to stop and in your head, you have to disassemble the logic on that statement. The same thing happens with generators, if you are creating a massive generator that makes the code less readable is better to just use the long form.
One thing worth mention is that when you do code challenges it is quite fun to try and scale down the solution to a one-liner - some platforms even reward you for coming up with a short solution, so it's good to know how to scale down a long if statement like that.
Yes, this is very true. I mentioned in the article that the one-line Boolean is probably getting a little long and dense. Personally, I really like the multiple return version a lot better too. I show the Boolean version to illustrate that as one option for removing nesting. Good point about doing shortest solutions as a challenge :)
Hey, Great post! Reading it i realized that my most recurrent mistake is try to use lists for everything. Checking out your examples of
sets
anddicts
i see their potential as alternatives data structures. Thank you!Glad I could help 😁 thanks for reading!
You forgot my 'favorite'!
Perhaps the best book I've ever read on this subject is How to Make Mistakes in Python by Mike Pirnat, a free eBook from O'Reilly [not to be confused with O'Rly ;) ]
General case of using sudo for every command.
Ah! I haven’t read that one! Thank you!! It’s on the list. 😃
not sure, how I missed
set
syntax in Python :)Thanks!
Good article and all... but that image it's too way disturbing, I mean it was really necessary to put a disgusting image of an animal doing an aberrant act that's been documented as a symptom of captivity? this is a Dev blog, seriously why the disgusting pic at all?
Interesting, I had no idea this was a symptom of distress. Here's some info for anyone who is confused.
I was going for a combination of Python (the language) and eating your own tail (as in shooting yourself in the foot, getting yourself into trouble, causing problems for yourself, etc.). Obviously, if I knew it was a sign of animal distress, I wouldn't have included it. Thanks for the friendly heads up! I'll get that changed.
I'm not a Python programmer, but in general I find return statements which produce a boolean value increadibly hard to read, unless the returned value is a constant (return true / return false) or an aptly named variable. I personally try to avoid return statements where the right side contains any boolean operators.
Useful!
Thanks! Glad it helped!