DEV Community

loading...

Python Pitfalls - Expecting The Unexpected

martinheinz profile image Martin Heinz Originally published at martinheinz.dev Updated on ・8 min read

Regardless of which programming language you're coding in, you've probably encountered good chunk of weird and seemingly unexplainable issues that ended up being really stupid mistakes or quirks of that specific language. Python aims at being clean and simple language, yet it also has its portion of gotchas and quirks that can surprise both beginner and experienced software developers. So, to avoid unnecessary rage and frustration over some weird issue in your favourite programming language, here follows a list of common Python pitfalls, that you should try to avoid at all costs.

Mutable Default Arguments Are a Bad Idea

Setting default arguments for a function is very common and useful for defining optional arguments or arguments that can usually use same, predefined value. Setting default argument to a mutable value such as list or dict can, however, cause unexpected behavior:

def some_func(args=[]):
    args.append("data")
    ...

some_func()
print(some_func.__defaults__)
# (['data'],)
some_func()
print(some_func.__defaults__)
# (['data', 'data'],)
some_func()
print(some_func.__defaults__)
# (['data', 'data', 'data'],)

# Better solution:
def some_func(args=None):
    if args is None:
        args = []
    ...
Enter fullscreen mode Exit fullscreen mode

The problem with using mutable value as default argument is that the default argument in not initialized every time the function is called. Instead, the recently used value will be passed in, which in case of mutable types is a problem. To avoid this problem, you should always use None or other immutable type instead, and perform check against the argument as shown above.

Even though this might seem like nuisance and a problem, it's an intended behavior and it can also be exploited to make caching functions which can use the persistent mutable default argument as cache:

def some_func(var, cache={}):
    if var in cache:
        return cache[var]
    # Do stuff...
    cache[var] = result
    return result
Enter fullscreen mode Exit fullscreen mode

Similar behavior to the default arguments above, can also be seen with dict.setdefault(key, value). In the below code we can see some surprising results:

data = {}
key = 'some_key'
val = []  # Mutable

data.setdefault(key, val)

print('Before:', data)
# Before: {'some_key': []}
val.append('some_data')
print('After:', data)
# After: {'some_key': ['some_data']}
Enter fullscreen mode Exit fullscreen mode

Even though we didn't touch the data dictionary above, it was modified by appending to default value val. That's because default value passed to setdefault is assigned directly into the dictionary when the key is missing instead of being copied from original. To avoid this issue, make sure you never reuse values when using setdefault.

NaN (Non-) Reflexivity

Working with floats and non-integer numbers can often be difficult and annoying, but it gets especially weird when you get into Not-a-Number and Infinity territory. So, let's demonstrate this by making a few comparisons with these values:

x = float("NaN")  # Define "Not a Number" (the string is case-insensitive), equivalent of math.nan
y = float("inf")  # Define "Infinity"
z = float("inf")  # Define more "Infinity"

x == 10  # False; Makes sense

y == z   # True; Kinda makes sense

x == x   # False... NaN != NaN
Enter fullscreen mode Exit fullscreen mode

The above code shows the non-reflexivity of NaN. NaN in Python will never compare as equal even when compared with itself. So, in case you need to test for NaN or inf, then you should use math.isnan() and math.isinf(). Also be careful with any other arithmetic operation when working with code that might produce NaN, as it will propagate through all operations without raising an exception.

Python is usually clever and won't generally return NaN from math functions, e.g. math.exp(1000.0) will return OverflowError: math range error and both math.sqrt(-1.0) and math.log(0.0) will return ValueError: math domain error, but you might encounter it with Numpy or Pandas and if you do so, remember not to try comparing NaNs for equality.

Late Binding Closures

There are quote a few gotchas, pitfalls and surprises surrounding scopes and closures in Python. The most common one - I'd say - is late binding in closures. Let's start with example:

funcs = []

for i in range(3):
    def some_func(n):
        return i * n
    funcs.append(some_func)

for f in funcs:
    print(f(2))

# 4
# 4
# 4
Enter fullscreen mode Exit fullscreen mode

The code above shows definition of function inside a loop which is then added to a list. With each iteration the i variable increments and so does the i variable in the defined function, right? Wrong.

Late binding causes all the functions to assume value of 2 (from last iteration). This happens because of the closure in which all the functions are defined in - the global one. Because of this all of them refer to the same i which gets mutated in the loop.

There's more than one way to fix this, but the cleanest one in my opinion is to use functools.partial:

from functools import partial

funcs = []

for i in range(3):
    def some_func(i, n):
        return i * n
    funcs.append(partial(some_func, i))

for f in funcs:
    print(f(2))

# 0
# 2
# 4
Enter fullscreen mode Exit fullscreen mode

With partial we can create new callable object with predefined i, forcing the variable to be bound immediately which fixes the issue. We can then supply the remaining original parameter n when we want to actually call the functions.

Reassigning Global Variables

Using a lot of global variables is generally discouraged and viewed as a bad practice. There are however, valid reason to use some global variables - for example to define various flags, which can be used to set log level of function.

But what if you decide to flip (reassign) this flag? Well, it can cause a massive headache:

flag = False

def some_func():
    flag = True

def some_other_func():
    if flag:
        ...  # Do something when flag is set to True

some_func()
some_other_func()
Enter fullscreen mode Exit fullscreen mode

Looking at the code above one might expect the value of global flag variable to change to True after execution of some_func(), but that's not the case. The some_func declares new local variable flag, sets it to True and it then disappears after end of function body. The global variable is never touched.

There's a simple fix to this, though. We need to first declare in the function body that we want to refer to the global variable instead of using local one. We do that with global <var_name> - in this case global flag:

flag = False

def some_func():
    global flag
    flag = True

...
Enter fullscreen mode Exit fullscreen mode

Another "fun" issues with variables that you might run into - which is luckily much easier to debug and fix - is modification of out-of-scope variable. Similarly to previous gotcha, it's caused by manipulating variable that was defined in outer scope:

var = 1
def some_func():
    return var

def another_func():
    var += 1
    return var

print(some_func())
# 1
print(another_func())
# UnboundLocalError: local variable 'var' referenced before assignment
Enter fullscreen mode Exit fullscreen mode

Here we try to increment variable var inside function scope, assuming that it will modify the global one. But again, that's wrong.

When you modify variable it becomes local to the scope, but you can't increment variable that wasn't declared before (in current scope), so UnboundLocalError is thrown.

This again can be fixed using global <var_name> in case of global variables. This so-called scoping bug can also occur inside nested functions where you would use nonlocal <var_name> instead, to refer to variable in the nearest outer scope:

def outer():
    flag = False     # Scope: 'outer'

    def inner():
        nonlocal flag
        flag = True  # Scope: 'inner'

    inner()
    print(flag)      # True; Did change
    return ...

outer()
Enter fullscreen mode Exit fullscreen mode

Proper Way to Define Tuples

One misconception that pretty much every Python developer has ingrained in their mind, is that tuples are defined by surrounding parenthesis. Unlike iterables like dict or set, Python tuple is defined by the comma separating its elements.

Mistakes originating from this misconception usually arise when we try to define tuple with just single element:

x = ("value")  # Not a tuple.
type(x)
# <class 'str'>

x = ("value",)  # This is a tuple.
type(x)
# <class 'tuple'>

x = "value",  # No need for parenthesis.
type(x)
# <class 'tuple'>

x = ("some" "value")  # Don't forget the comma, otherwise strings will be implicitly concatenated.
print(x)
'somevalue'
Enter fullscreen mode Exit fullscreen mode

In the snippet above, we can see that it's necessary to add , after the singular element to make Python recognize it as tuple. We can also completely omit parenthesis, which is pretty common practice with return statements that return multiple values.

Last example above shows one more similar pitfall. If you forget to separate elements with comma, Python will use implicit concatenation making it a single value of a type string. This kind of implicit concatenation can happen anywhere in the code not just when defining tuple, so always double check your strings and iterables if something fishy is happening with your program.

Indexing Byte Values Instead of Byte Strings

When working with files and data in them we mostly just use ASCII or UTF-8 strings. From time to time however, you might have to read and write some binary data and you might be surprised with the results of indexing and iterating them:

text = "Some data"

for c in text:
    print(c)

# S
# o
# m
# e
# ...

text = b"Some data"  # Binary String!

for c in text:
    print(c)

# 83
# 111
# 109
# 101
# ...
Enter fullscreen mode Exit fullscreen mode

When indexing into binary string, instead of receiving byte string, we get integer byte value, or in other words - ordinal value of the indexed character. To avoid this - especially when reading binary file - it's best to always use text.decode('utf-8') to get proper string. If you however want to keep the original data as binary string, then you can instead use chr(c) to convert individual characters to string representation.

Indexing with Negated Variable

Slicing and dicing is one of the most handy features of Python including the ability to specify negative indexes, but if you are not careful with those, you might get unexpected results:

x = "Some data"
print(x[-4:])
# data

print(x[-0:])  # Equivalent to x[:]
# Some data
Enter fullscreen mode Exit fullscreen mode

If we slice a sequence with any negative value (variable) other than -0 we will get the expected values, but if we happen to accidentally slice using [-0:] we will receive as a result a copy of whole sequence as it is equivalent to [:].

Why Is It Returning None!?

I left my "favourite" gotcha the for last. It's easy to forget whether a function returns new value or modifies original in-place. Especially, when there are generally 2 types of methods - list methods which modify the argument and return None and string methods which modify the argument in-place.

# List methods
some_list.sort()  # Sorts in-place, returns None
some_list = some_list.sort()  # Wrong! -> some_list == None

some_list.reverse()
some_list.extend(["another"])
some_list.clear()
some_list.append("value")

# String methods
some_string = "Some data"

some_string = some_string.split()
some_string = some_string.strip()
some_string = some_string.capitalize()
some_string = some_string.upper()
some_string = some_string.encode()
Enter fullscreen mode Exit fullscreen mode

I'm guilty of making this mistake way too many times. It's easy to forget behavior of one of the many string or list methods and it can lead to hours of debugging. So, if you receive None where there should be whole string or list, then double check to make sure you're using all of the above shown methods correctly.

Conclusion

It's inevitable that you will run into these or other similar gotchas and pitfalls that will cause a lot of rage and frustration. More often than not, the best way to solve any of these issues is to just step back for a moment. Go for walk. Go make a cup of coffee. Or at least take a deep breath. Most of the time all it takes to solve such an issue, is to leave it for bit and come back later.

If that doesn't help, maybe it's time for some rubber duck debugging or to bring in another pair of eye (colleague sitting next to you). Oftentimes, when you start explaining the problem to somebody else, you will immediately realise where the problem really is.

When you eventually find the bug and manage to solve it, take a moment to think about what you could have done to find it faster. Next time you run into similar issue you might be able to resolve it a bit more quickly.

Discussion (1)

pic
Editor guide
Collapse
Sloan, the sloth mascot
Comment deleted