Ryan Palo

Posted on Aug 4, 2018 • Edited on Aug 6, 2018 • Originally published at assertnotmagic.com

3 Common Mistakes that Python Newbies Make

#python #beginners

Last weekend, I stared mentoring people on exercism.io on the Python track. I wasn't sure what to expect, but over the last week I have mentored about 50 people, helping them get their solutions from "tests passing" to "tests passing, readable, and Pythonic." I'm hooked. It's a total blast. I'm going to write a post specifically on that experience. That's not this post. This post is to talk about the three most common mistakes I saw over the last week and some possible alternatives that might be better! So let's start the countdown!

1. Deep Nesting of If Statements or Loops

# Calculating whether or not 'year' is a leap year

if year % 4 == 0:
    if year % 100 == 0:
        if year % 400 == 0:
            return True
        else:
            return False
    else:
        return True
else:
    return False

A lot of times, I'll pull a line from the Zen of Python to lead off my feedback to a "mentee" (not to be confused with a manitee). When I see this issue, I always lead with

Flat is better than nested.

If you look at your code with your eyes unfocused, looking at the shapes and not reading the words, and you see a bunch of arrows going out and back in again:

\
 \
  \
   \
    \
    /
   /
  /
  \
   \
    \
     \
     /
    /
   /
  /
 /
/

It's not definitely a bad thing, but it is a "code smell," or a Spidey Sense that something could possibly be refactored.

So, what can you do instead of nest? There are a couple things to try. The first is inverting your logic and using "early returns" to peel off small pieces of the solution space one at a time.

if year % 400 == 0:
    return True
if year % 100 == 0:
    return False
if year % 4 == 0:
    return True
return False

If the number is divisible by 400, then we immediately return true. Otherwise, for the rest of our code, we can know that year is definitely not divisible by 400. So, at that point, any other year that's divisible by 100 is not a leap year. So we peel off that layer of the onion by returning False.

After that, we can know that year is definitely not a multiple of 400 or 100, and the remainder of the code follows the same pattern.

The other way to avoid nesting is by using "boolean operators:" and, or, and not. We can combine if statements and thus, save ourselves a layer of nesting!

if year % 4 == 0 and (year % 100 != 0 or year % 400 == 0):
    return True
else:
    return False

Of course, that leads us to our second item...

2. Returning Booleans from If Statements

We'll start with our last example from above:

if year % 4 == 0 and (year % 100 != 0 or year % 400 == 0):
    return True
else:
    return False

Anytime you find yourself writing:

if something:
    return True
else:
    return False

You should remember that the clause of an if statement is itself a boolean!

>>> year = 2000
>>> year % 4 == 0 and (year % 100 != 0 or year % 400 == 0)
True

So, why not type a little less and return the result of the boolean operation directly?

return (year % 4 == 0 and (year % 100 != 0 or year % 400 == 0))

Granted, at this point, the line may be getting a little long, but the code is a little less redundant now!

3. Lists are Like Hammers -- Not Everything is a Nail

Here are two possible ways that this could show up:

some_numbers = [1, 2, 5, 7, 8, ...]
other_numbers = [1, 3, 6, 7, 9, ...]
# Let's try to combine these two without duplicates
for number in other_numbers:
    if number not in some_numbers:
        some_numbers.append(number)

Or:

data = [["apple", 4], ["banana", 2], ["grape", 14]]
# What fruits do we have?
for item in data:
    print(item[0])
# => "apple" "banana" "grape"
# How many grapes do we have?
for item in data:
    if item[0] == "grape":
        print(item[1])
# => 14

In the first case, you're trying to keep track of some groups of items and you want to combine them without duplicates. This is an ideal candidate for a set. Sets inherently keep track of their items (although not the order, so don't use a set if the order is important). You can declare them with the built-in set() function or with squiggle braces ({}).

some_numbers = {1, 2, 5, 7, 8}
other_numbers = {1, 3, 6, 7, 9}
# Sets use the 'binary or' operator to do "unions"
# which is where they take all of the unique elements
some_numbers | other_numbers
# => {1, 2, 3, 5, 6, 7, 8, 9}

# You can even add single items in!
some_numbers.add(10)
# => {1, 2, 5, 7, 8, 10}

# But adding a duplicate doesn't change anything
some_numbers.add(1)
# => {1, 2, 5, 7, 8, 10}

In the second case, again, order probably isn't critical. You want to keep track of some data by a "label" or something, but be able to keep them all together and list them out as necessary. This time, you're probably looking for a dict. You can create those with either the dict() built-in function or, again, squiggle braces ({}). This time, however, you separate the labels (keys) and the values with a colon.

fruits = {
    "apples": 4,
    "bananas": 2,
    "grapes": 14,
}

You can list out all of the keys (or values!).

list(fruits.keys())
# => ["apples", "bananas", "grapes"]
list(fruits.values())
# => [4, 2, 14]

# Or both!
list(fruits.items())
# => [("apples", 4), ("bananas", 2), ("grapes", 14)]

And you can ask it about (or give it a new value for) specific keys.

# How many grapes are there?
fruits["grapes"]
# => 14

# Not anymore.  I ate some.
fruits["grapes"] = 0

fruits["grapes"]
# => 0

Using a list, the your algorithm loops through every item to find the right one. dict's are built to have very fast lookups, so, even if your dict is a bazillion fruits long, finding the grapes is still super fast -- and easy to type! No loops!

Call to Action

Exercism needs mentors! If you think you'd be a good mentor (or even a decent mentor, just on the easy exercises), sign up at their Mentor Signup page. Right now, Rust, Golang, and Elixir are especially swamped and need your help!

Top comments (30)

Shreyas Minocha • Aug 5 '18

I think that

if year % 400 == 0:
    return True
if year % 100 == 0:
    return False
if year % 4 == 0:
    return True
return False

... is much clearer than

if year % 4 == 0 and (year % 100 != 0 or year % 400 == 0):
    return True
else:
    return False

... or even

return (year % 4 == 0 and (year % 100 != 0 or year % 400 == 0))

Of course this wouldn't always be true but I think that we should make these decisions on the basis of code readability and clarity of intent rather than always going by the number of SLOC or the length of lines.

Ryan Palo • Aug 5 '18

I personally agree. This example is probably one Boolean too long for a one-liner. I just wanted to show that technique. For some reason, most people I mentor can understand the one-liner better. I don’t know why. Maybe it’s easier for them to understand than inverting the order of comparisons in their head.

Valentin Berlier • Aug 5 '18 • Edited

Using the expression and putting it in its own function is usually pretty clear. It lets you give a label to the condition. You can also put each comparison on its own line if the expression is too long.

def is_leap_year(year):
    return (
        year % 4 == 0
        and (
            year % 100 != 0 
            or year % 400 == 0
        )
    )

The logical operators stand out more when they're at the beginning of a line.

Miguel Ruiz • Sep 7 '18

I agree too

Corey McCarty • Jan 11 '19

Raymond Hettinger said "one thought per line" and I think this is the best guide for direction with things like this. You can have several things going on as long as they can be thought of (in the realm of your code) as a single thought.

Thomas Landin • Aug 5 '18 • Edited

One thing you see all the time from people with previous programming experience is trying to make "C-style" for loops iterating over indices and subscripting the list, i.e. for i in range(len(list)) instead of the idiomatic for item in list.

Thomas Landin • Aug 5 '18

Oh, and generally working against the language and its idioms by shoehorning things into place instead of picking an appropriate data structure or language construct from the outset.

Examples:

Wanting dicts to be ordered instead of using OrderedDict (this is changed in Python 3.7+).

Looping through dicts and accessing elements by key instead of using the appropriate iterator.

Using for loops and indexing to access parts of lists instead of using slices.

Things like that.

Idan Arye • Aug 5 '18

dict ordering is important because keyword arguments are stored in a dict. Consider this:

$ python3.4
Python 3.4.8 (default, Jun 23 2018, 02:35:33) 
[GCC 8.1.1 20180531] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> from collections import OrderedDict
>>> OrderedDict(a=1, b=2, c=3)
OrderedDict([('a', 1), ('c', 3), ('b', 2)])
>>> OrderedDict(c=1, b=2, a=3)
OrderedDict([('a', 3), ('c', 1), ('b', 2)])

We created an OrderedDict, but the constructor was using **kwargs which means the arguments we passed to it were stored in a regular dict and got reordered before OrderedDict.__init__ could store them in order.

With Python 3.6+, the **kwargs stores the arguments in the same order they were written, and the OrderedDict construction works as expected:

$ python3.6
Python 3.6.6 (default, Jun 27 2018, 13:11:40) 
[GCC 8.1.1 20180531] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> from collections import OrderedDict
>>> OrderedDict(a=1, b=2, c=3)
OrderedDict([('a', 1), ('b', 2), ('c', 3)])
>>> OrderedDict(c=1, b=2, a=3)
OrderedDict([('c', 1), ('b', 2), ('a', 3)])

rhymes • Aug 5 '18

A little detail on ordered dicts:

Technically dicts have been ordered since 3.6 on (C)Python, it's just that in 3.7 they decided that to be call "Python 3.7" any implementation has to have the insertion order.

So, dicts in Python 3.6+ are ordered only in the official version, which means that you might have portability issues if you depend on that behavior, but in 3.7+ you are garanteed they will be ordered, regardless of the implementation.

Difference with Python 2:

Python 2.7.15 (default, Jul  3 2018, 10:08:31)
[GCC 4.2.1 Compatible Apple LLVM 9.1.0 (clang-902.0.39.2)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> a = {'b': 1, 'c': 3, 'a': '4'}
>>> a
{'a': '4', 'c': 3, 'b': 1}
>>>

Python 3.6.6 (default, Jul 20 2018, 18:34:07)
[GCC 4.2.1 Compatible Apple LLVM 9.1.0 (clang-902.0.39.2)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> a = {'b': 1, 'c': 3, 'a': '4'}
>>> a
{'b': 1, 'c': 3, 'a': '4'}

Thomas Landin • Aug 5 '18

Yep, that's what I meant by 3.7 being different on that point.

Fabio Rosado • Aug 5 '18

I see a lot of this when people want to use the index to do something instead of using for i in enumerate(foo) which returns (index, value)

Thomas Landin • Aug 5 '18

enumerate is a great tool and it's what you should use when you actually need the index for something, but most of the time people don't really need the index.

And there's a better way to use it:

for index, value in enumerate(foo):
    # voilà, the values are already unpacked
    # into two handy variables!

Vlastimil Pospichal • Aug 5 '18

This is a bit faster, because the first condition is met most often:

if year % 4 != 0:
    return False
if year % 100 != 0:
    return True
if year % 400 != 0:
    return False
return True

Shorter version:

if year % 4 != 0:
    return False
if year % 100 != 0:
    return True
return year % 400 == 0

Shortest version:

return year % 4 == 0 and (year % 100 != 0 or year % 400 == 0)

Ryan Palo • Aug 5 '18

That’s a good point! I feel like that order would be more intuitive for students as well! It does have the slight downside of using “negative comparisons” but I think the ease of understanding definitely outweighs that. 😁

Vlastimil Pospichal • Aug 5 '18

Negative comparisons are needed to eliminate else from source codes.

Ryan Palo • Aug 5 '18

I’m not 100% sure that else is a bad thing. But I agree that using negative comparisons isn’t a terrible thing. Maybe more of a “all things being equal, try to use positive rather than negative if possible” kind of thing.

Vlastimil Pospichal • Aug 5 '18

Eliminating else made my source code cleaner and faster. I use else in ternary operator only. I transformed structure if-else into ternary expression in many times.

Some languages (XSLT for example) are without "else". This inspired me.

Fabio Rosado • Aug 5 '18

I understand where you come from with your example to check if a year is a leap year. It's a good example since you do need to use quite a few comparisons to figure out if a given year is a leap year. I will have to disagree with you that using a one-liner is a much clearer or even less redundant.

The thing is, if you encounter a code with this style in a massive code base:

if year % 400 == 0:
    return True
if year % 100 == 0:
    return False
if year % 4 == 0:
    return True
return False

With your eyes, you can figure out what is happening there in less than a second. But when you encounter a one-liner like this:

return (year % 4 == 0 and (year % 100 != 0 or year % 400 == 0))

You actually need to stop and in your head, you have to disassemble the logic on that statement. The same thing happens with generators, if you are creating a massive generator that makes the code less readable is better to just use the long form.

One thing worth mention is that when you do code challenges it is quite fun to try and scale down the solution to a one-liner - some platforms even reward you for coming up with a short solution, so it's good to know how to scale down a long if statement like that.

Ryan Palo • Aug 5 '18

Yes, this is very true. I mentioned in the article that the one-line Boolean is probably getting a little long and dense. Personally, I really like the multiple return version a lot better too. I show the Boolean version to illustrate that as one option for removing nesting. Good point about doing shortest solutions as a challenge :)

λlvaro Frias • Aug 8 '18

Hey, Great post! Reading it i realized that my most recurrent mistake is try to use lists for everything. Checking out your examples of sets and dicts i see their potential as alternatives data structures. Thank you!

Ryan Palo • Aug 8 '18

Glad I could help 😁 thanks for reading!

Jason C. McDonald • Aug 5 '18 • Edited

You forgot my 'favorite'!

sudo pip install ...

Perhaps the best book I've ever read on this subject is How to Make Mistakes in Python by Mike Pirnat, a free eBook from O'Reilly [not to be confused with O'Rly ;) ]

Ben Sinclair • Aug 6 '18

General case of using sudo for every command.

Ryan Palo • Aug 5 '18

Ah! I haven’t read that one! Thank you!! It’s on the list. 😃

Alexandr K • Aug 5 '18

not sure, how I missed set syntax in Python :)

Thanks!

Esteban Rocha • Aug 6 '18

Good article and all... but that image it's too way disturbing, I mean it was really necessary to put a disgusting image of an animal doing an aberrant act that's been documented as a symptom of captivity? this is a Dev blog, seriously why the disgusting pic at all?

Ryan Palo • Aug 6 '18

Interesting, I had no idea this was a symptom of distress. Here's some info for anyone who is confused.

I was going for a combination of Python (the language) and eating your own tail (as in shooting yourself in the foot, getting yourself into trouble, causing problems for yourself, etc.). Obviously, if I knew it was a sign of animal distress, I wouldn't have included it. Thanks for the friendly heads up! I'll get that changed.

Martin Häusler • Sep 2 '18

I'm not a Python programmer, but in general I find return statements which produce a boolean value increadibly hard to read, unless the returned value is a constant (return true / return false) or an aptly named variable. I personally try to avoid return statements where the right side contains any boolean operators.