Patrick Loeber

Posted on Jul 28, 2020 • Edited on Sep 17, 2020 • Originally published at python-engineer.com

11 Tips And Tricks To Write Better Python Code

#python #programming

You can read the original article on my website:
https://www.python-engineer.com/posts/11-tips-to-write-better-python-code/

In this tutorial I show 11 Tips and Tricks to write better Python code! I show a lot of best practices that improve your code by making your code much cleaner and more Pythonic. Here's the overview of all the tips:

1) Iterate with enumerate() instead of range(len())
2) Use list comprehension instead of raw for-loops
3) Sort complex iterables with the built-in sorted() method
4) Store unique values with Sets
5) Save Memory With Generators
6) Define default values in Dictionaries with .get() and .setdefault()
7) Count hashable objects with collections.Counter
8) Format Strings with f-Strings (Python 3.6+)
9) Concatenate Strings with .join()
10) Merge dictionaries with the double asterisk syntax ** (Python 3.5+)
11) Simplify if-statements with if x in list instead of checking each item separately

1) Iterate with `enumerate()` instead of `range(len())`

If we need to iterate over a list and need to track both the index and the current item, most people would use the range(len) syntax. In this example we want to iterate over a list, check if the current item is negative, and set the value in our list to 0 in this case. While the range(len) syntax works it's much nicer to use the built-in enumerate function here. This returns both the current index and the current item as a tuple. So we can directly check the value here and also access the item with the index.

data = [1, 2, -3, -4]
# weak:
for i in range(len(data)):
    if data[i] < 0:
        data[i] = 0

# better:
data = [1, 2, -3, -4]
for idx, num in enumerate(data):
    if num < 0:
        data[idx] = 0

2) Use list comprehension instead of raw for-loops

Let's say we want to create a list with certain values, in this case a list with all the squared numbers between 0 and 9. The tedious way would be to create an empty list, then use a for loop, do our calculation, and append it to the list:

squares = []
for i in range(10):
    squares.append(i*i)

A simpler way to do this is list comprehension. Here we only need one line to achieve the same thing:

# better:
squares = [i*i for i in range(10)]

List comprehension can be really powerful, and even include if-statements. If you want to learn more about the syntax and good use cases, I have a whole tutorial about list comprehension here. Note that the usage of list comprehension is a little bit debatable. It should not be overused, especially not if it impairs the readability of the code. But I personally think this syntax is clear and concise.

3) Sort complex iterables with the built-in `sorted()` method

If we need to sort some iterable, e.g., a list, a tuple, or a dictionary, we don't need to implement the sorting algorithm ourselves. We can simply use the built-in sorted function. This automatically sorts the numbers in ascending order and returns a new list. If we want to have the result in descending order, we can use the argument reverse=True. As I said, this works on any iterable, so here we could also use a tuple. But note that the result is a list again!

data = (3, 5, 1, 10, 9)
sorted_data = sorted(data, reverse=True) # [10, 9, 5, 3, 1]

Now let's say we have a complex iterable. Here a list, and inside the list we have dictionaries, and we want to sort the list according to the age in the dictionary. For this we can also use the sorted function and then pass in the key argument that should be used for sorting. The key must be a function, so here we can use a lambda and use a one line function that returns the age.

data = [{"name": "Max", "age": 6}, 
        {"name": "Lisa", "age": 20}, 
        {"name": "Ben", "age": 9}
        ]
sorted_data = sorted(data, key=lambda x: x["age"])

4) Store unique values with Sets

If we have a list with multiple values and need to have only unique values, a nice trick is to convert our list to a set. A Set is an unordered collection data type that has no duplicate elements, so in this case it removes all the duplicates.

my_list = [1,2,3,4,5,6,7,7,7]
my_set = set(my_list) # removes duplicates

If we already know that we want unique elements, like here the prime numbers, we can create a set right away with curly braces. This allows Python to make some internal optimizations, and it also has some handy methods for calculating the intersections and differences between two sets.

primes = {2,3,5,7,11,13,17,19}

5) Save Memory With Generators

In tip #2 I showed you list comprehension. But a list is not always the best choice. Let's say we have a very large list with 10000 items and we want to calculate the sum over all the items. We can of course do this with a list, but we might run into memory issues. This is a perfect example where we can use generators. Similar to list comprehension we can use generator comprehension that has the same syntax but with parenthesis instead of square brackets. A generator computes our elements lazily, i.e., it produces only one item at a time and only when asked for it. If we calculate the sum over this generator, we see that we get the same correct result.

# list comprehension
my_list = [i for i in range(10000)]
print(sum(my_list)) # 49995000

# generator comprehension
my_gen = (i for i in range(10000))
print(sum(my_gen)) # 49995000

Now let's inspect the size of both the list and the generator with the built-in sys.getsizeof() method. For the list we get over 80000 bytes and for the generator we only get approximately 128 bytes because it only generates one item at a time. This can make a huge difference when working with large data, so it's always good to keep the generator in mind!

import sys 

my_list = [i for i in range(10000)]
print(sys.getsizeof(my_list), 'bytes') # 87616 bytes

my_gen = (i for i in range(10000))
print(sys.getsizeof(my_gen), 'bytes') # 128 bytes

6) Define default values in Dictionaries with `.get()` and `.setdefault()`

Let's say we have a dictionary with different keys like the item and the price of the item. At some point in our code we want to get the count of the items and we assume that this key is also contained in the dictionary. When we simply try to access the key, it will crash our code and raise a KeyError. So a better way is to use the .get() method on the dictionary. This also returns the value for the key, but it will not raise a KeyError if the key is not available. Instead it returns the default value that we specified, or None if we didn't specify it.

my_dict = {'item': 'football', 'price': 10.00}
price = my_dict['count'] # KeyError!

# better:
price = my_dict.get('count', 0) # optional default value

If we want to ask our dictionary for the count and we also want to update the dictionary and put the count into the dictionary if it's not available, we can use the .setdefault() method. This returns the default value that we specified, and the next time we check the dictionary the used key is now available in our dictionary.

count = my_dict.setdefault('count', 0)
print(count) # 0
print(my_dict) # {'item': 'football', 'price': 10.00, 'count': 0}

7) Count hashable objects with `collections.Counter`

If we need to count the number of elements in a list, there is a very handy tool in the collections module that does exactly this. We just need to import the Counter from collections, and then create our counter object with the list as argument. If we print this, then for each item in our list we see the according number of times that this item appears, and it's also already sorted with the most common item being in front. This is much nicer to calculate it on our own. If we the want to get the count for a certain item, we can simply access this item, and it will return the corresponding count. If the item is not included, then it returns 0.

from collections import Counter

my_list = [10, 10, 10, 5, 5, 2, 9, 9, 9, 9, 9, 9]
counter = Counter(my_list)

print(counter) # Counter({9: 6, 10: 3, 5: 2, 2: 1})
print(counter[10]) # 3

It also has a very handy method to return the most common items, which - no surprise - is called most_common(). We can specify if we just want the very most common item, or also the second most and so on by passing in a number. Note that this returns a list of tuples. Each tuple has the value as first value and the count as second value. So if we just want to have the value of the very most common item, we call this method and then we access index 0 in our list (this returns the first tuple) and then again access index 0 to get the value.

from collections import Counter

my_list = [10, 10, 10, 5, 5, 2, 9, 9, 9, 9, 9, 9]
counter = Counter(my_list)

most_common = counter.most_common(2)
print(most_common) # [(9, 6), (10, 3)]
print(most_common[0]) # (9, 6)
print(most_common[0][0]) # 9

8) Format Strings with f-Strings (Python 3.6+)

This is new since Python 3.6 and in my opinion is the best way to format a string. We just have to write an f before our string, and then inside the string we can use curly braces and access variables. This is much simpler and more concise compared to the old formatting rules, and it's also faster. Moreover, we can write expressions in the braces that are evaluated at runtime. So here for example we want to print the squared number of our variable i, and we can simply write this operation in our f-String.

name = "Alex"
my_string = f"Hello {name}"
print(my_string) # Hello Alex

i = 10
print(f"{i} squared is {i*i}") # 10 squared is 100

9) Concatenate Strings with `.join()`

Let's say we have a list with different strings, and we want to combine all elements to one string, separated by a space between each word. The bad way is to do it like this:

list_of_strings = ["Hello", "my", "friend"]

# BAD:
my_string = ""
for i in list_of_strings:
    my_string += i + " "

We defined an empty string, then iterated over the list, and then appended the word and a space to the string. As you should know, a string is an immutable element, so here we have to create new strings each time. This code can be very slow for large lists, so you should immediately forget this approach! Much better, much faster, and also much more concise is to the .join() method:

# GOOD:
list_of_strings = ["Hello", "my", "friend"]
my_string = " ".join(list_of_strings)

This combines all the elements into one string and uses the string in the beginning as a separator. So here we use a string with only a space. If we were for example to use a comma here, then the final string has a comma between each word. This syntax is the recommended way to combine a list of strings into one string.

10) Merge dictionaries with the double asterisk syntax ** (Python 3.5+)

This syntax is new since Python 3.5. If we have two dictionaries and want to merge them, we can use curly braces and double asterisks for both dictionaries. So here dictionary 1 has a name and an age, and dictionary 2 also has the name and then the city. After merging with this concise syntax our final dictionary has all 3 keys in it.

d1 = {'name': 'Alex', 'age': 25}
d2 = {'name': 'Alex', 'city': 'New York'}
merged_dict = {**d1, **d2}
print(merged_dict) # {'name': 'Alex', 'age': 25, 'city': 'New York'}

11) Simplify if-statements with `if x in list` instead of checking each item separately

Let's say we have a list with main colors red, green, and blue. And somewhere in our code we have a new variable that contains some color, so here c = red. Then we want to check if this is a color from our main colors. We could of course check this against each item in our list like so:

colors = ["red", "green", "blue"]

c = "red"

# cumbersome and error-prone
if c == "red" or c == "green" or c == "blue":
    print("is main color")

But this can become very cumbersome, and we can easily make mistakes, for example if we have a typo here for red. Much simpler and much better is just to use the syntax if x in list:

colors = ["red", "green", "blue"]

c = "red"

# better:
if c in colors:
    print("is main color")

Conclusion

I hope you enjoyed those tips and learned a few new things! If you have any feedback or other tips you can recommend, please reach out on Twitter or YouTube!

Oldest comments (23)

Kevin Woblick • Jul 28 '20

Excellent tips for Python beginners like me. Clean and precise descriptions and no blah blah. Thank you very much!

Patrick Loeber • Jul 28 '20

Thanks a lot for the feedback :)

Ivan López • Jul 28 '20

thank you so much, excellent post for beginners :)

Patrick Loeber • Jul 29 '20

Thanks for the feedback!

Junaid Mahmud • Jul 28 '20

Thank you very much for this. It helped me to have a clearer idea.

Patrick Loeber • Jul 29 '20

Glad you like it!

Jing Xue • Jul 29 '20

11) why not make colors a set?

Patrick Loeber • Jul 29 '20

Good point! Probably a set would be even better to store the colors here. I just wanted to show a dummy example, and in "real life" code a lot of times you stumble on situations where you have a list that you want to check...

Jing Xue • Jul 29 '20

Understood. Actually come to think of it, when you have only a few items, a list is not necessarily slower and definitely more memory efficient than a set. :-)

Andrew Harpin • Jul 29 '20

I fundamentally disagree with 1, unless there is a performance benefit or some other benefit not covered by your simplistic example.

For a non-python coder or inexperienced python coder, your initial example is much more readable. It is obvious what is happening.

Someone unfamiliar with enumerate needs to learn what it does, what the parameter order is and then apply that understanding to the code below, with the first this is not required.

Additionally, the enumerate is potentially subject to errors as operator precedence is critical to its operation.

For 2, I have a passionate hatred of nested for or if operations, there are occasions where they are necessary, but these are generally quite rare.

As you mention, they affect code readability, but they also affect maintainability of the code, when the addition or change of functionality can become difficult.

A lot of people use them too frequently on the premise of code compaction, but this is just a cover for poor design, where better abstraction would be more beneficial.

Patrick Loeber • Jul 29 '20 • Edited

Thanks for the extensive feedback!
Agree with 2, a lot of people use them too frequently and we should be careful here.

I also agree that range(len()) is fine and might be better suited for a beginner. I would never teach a beginner this method the first time I'm showing the for-loop. But as I said in the beginning, I wanted to show how the code can be more elegant and Pythonic. I'm not sure if better readability always has to mean that the code should be suited for a complete beginner. And the enumerate function shouldn't be too hard to understand once someone learned about it.

Andrew Harpin • Jul 29 '20

I disagree that it is more elegant, my personal opinion is good code can be read by a non coder, at least from an overview perspective.

With the other solution they would struggle as it requires inside knowledge, whereas range and len are much more obvious.

Jose Rodriguez • Jul 29 '20

I also prefer enumerate anytime. range(len()) seems to defeat the motto of “simple over complex”. enumerate seems quite descriptive to me (non-English speaker) and otherwise once explained falls flat. If you are concern about variable unpacking...it makes sense to iterate over an index if you come from C, but python “for-each” approach seems to me more readable. IMHO ❤️

Vedran Čačić • Jul 31 '20

First, please read python.org/dev/peps/pep-0279/. Python design is (or at least it was at the time enumerate was introduced) itself a well-designed community-driven peer-reviewed process. Somebody thought about all your objections and addressed them in a way that was deemed satisfactory by BDFL. Also see nedbatchelder.com/text/iter.html. There is more to loops than C-style "compare, add, dereference, increment" low-level twiddling.

Second, your other objections are just FUD. Parameter order? There is only one parameter. Yes, there is an optional one, but the usual convention (not only in Python) is that the optional parameter always follows the mandatory one. [It's incredibly ironic that the builtin you defend as completely obvious, range, is one of very rare exceptions to that rule.] And what does operator precedence do there? There are no operators at all in the usual use of enumerate.

Banji • Jul 29 '20

Boom!
That's what we need these days
Tnx for sharing
Happy coding with ❤️

Patrick Loeber • Jul 29 '20

Thanks for reading :)

amir • Jul 29 '20

wooow really Tnx , very complete and helpful topics

Patrick Loeber • Jul 29 '20

Glad you like it :)

Ravi • Jul 29 '20

Nice Article, thanks for sharing it.

Vedran Čačić • Jul 31 '20

3) Use the batteries, Luke. from operator import attrgetter.

5) Don't use sys.getsizeof unless you know a lot more about Python's memory model. nedbatchelder.com/blog/202002/sysg...

7) (This should be a separate item, it's incredibly important for writing Pythonic code) Unpack, don't index. In the same way as enumerate is better than range with subscripting, also here. [(top, _)] = counter.most_common(1). Much easier to read than that [0][0] sorcery.

10) Better yet, use a dedicated operator instead of relying on implementation detail. python.org/dev/peps/pep-0584/

11) should use 4) :-P. colors is really a set, right? Not only is is semantically better, it's also asymptotically much faster to search.

Patrick Loeber • Jul 31 '20

Awesome feedback, thanks! Totally agree on those points. I have to admit I did not know about this sys.getsizeof issue...and yes, colors should be a set ;) I just wanted to show a dummy example...

Vedran Čačić • Jul 31 '20 • Edited

Yeah, that's your main problem: your examples are a bit too dummy. :-) But it is so when you take the design that's important for real-life applications and try to justify it with 5 lines of code. :-]

The problem is not with getsizeof. It does the best job it can, given the memory model. The problem is that Python's objects are not boxes of well-defined edges. The question, strictly speaking, doesn't have a meaningful answer. If I say a.t = b.t = x, should x count towards the size of a or b? Or both? Or neither? :-)

(If only one: the memory is the same as after b.t = a.t = x, so it should be symmetric.
If both: there is only one x in memory. After c.t = x, the memory usage doesn't go up by size of x.
If neither: well, then almost nothing uses any memory: objects just refer to other objects, not "contain" them.)