While working with dictionaries in Python, there are two common approaches: using a built-in dictionary
or using a defaultdict
from the collections module. While both options allow you to store and retrieve data in a key-value format, there are some key differences that make defaultdict the better option in many cases.
In this blog post, we will explore these differences and provide examples of how defaultdict can be a more efficient and effective choice for working with dictionaries in Python.
Here is why you should use defaultdict()
:
- Cleaner Code
- Avoiding key errors
- Custom Default Values
- Faster Execution
Cleaner Code
One of the primary benefits of using a defaultdict
is that it simplifies your code. When you're using a regular dict
, you need to manually check if a key exists in the dictionary before trying to access it.
For example, consider the following code:
my_dict = {}
my_dict['apple'] = 1
if 'apple' in my_dict:
my_dict['apple'] += 1
else:
my_dict['apple'] = 1
In this code, we check if the key 'apple' exists in my_dict and increment its value if it does. Otherwise, we set its value to 1.
The same code can be written using a defaultdict
as follows:
from collections import defaultdict
my_dict = defaultdict(int)
my_dict['apple'] += 1
In this code, we simply access the 'apple' key in my_dict and increment its value. If the key doesn't exist, the defaultdict will create it and initialize its value to the default value of 0.
Avoiding key errors
Another benefit of using a defaultdict
is that it avoids KeyError
exceptions that can occur when accessing nonexistent keys in a regular dictionary
.
For example, consider the following code:
my_dict = {}
my_dict['apple'] += 1
In this code, we're trying to increment the value of the 'apple' key in my_dict. However, since the key doesn't exist, this code will raise a KeyError exception.
The same code using a defaultdict would work without raising an exception:
from collections import defaultdict
my_dict = defaultdict(int)
my_dict['apple'] += 1
In this code, if the key 'apple'
doesn't exist in my_dict
, the defaultdict
will create it and initialize its value to 0. Then, we can safely increment its value without worrying about raising a KeyError
exception.
Custom Default Values
When using a dictionary, you can specify a custom default
value by checking whether the key exists in the dictionary or not. If the key does not exist, you can set a default
value for that key. Here's an example:
my_dict = {}
key = 'foo'
if key not in my_dict:
my_dict[key] = 0
print(my_dict[key]) # 0
In this example, we create an empty dictionary my_dict
. We then check whether the key 'foo'
exists in the dictionary or not. If the key does not exist, we set a default value of 0
for that key. We then print the value associated with the key 'foo'
.
The defaultdict class simplifies the code for creating custom default values by providing a default value for any nonexistent key. When creating a defaultdict, you can specify a default value as an argument. Here's an example:
from collections import defaultdict
my_dict = defaultdict(int)
key = 'foo'
print(my_dict[key]) # 0
In this example, we define a function default_value that returns the string 'default'. We then create a defaultdict my_dict with default_value as the default value. We then try to access the key 'foo'. Since the key does not exist in the dictionary, the defaultdict returns the default value 'default'.
Faster Execution
Defaultdict can be faster than the built-in dictionary in certain situations, particularly when creating large dictionaries. This is because defaultdict initializes all keys with the default value when the dictionary is created. This can reduce the number of times we need to check for key existence and retrieve values from the dictionary.
import time
# Using the built-in dictionary
start_time = time.time()
my_dict = {}
for i in range(10000000):
my_dict[i] = i
print("Built-in dictionary took", time.time() - start_time, "seconds.")
# Using defaultdict
start_time = time.time()
my_dict = defaultdict(int)
for i in range(10000000):
my_dict[i] = i
print("Defaultdict took", time.time() - start_time, "seconds.")
The output of this code will be:
Built-in dictionary took 1.4013495445251465 seconds.
Defaultdict took 0.9122903347015381 seconds.
In Conclusion
It's always best practice to use defaultdict over the built-in dictionary in Python because it simplifies code, allows default values for nonexistent keys, and can be faster in certain situations, particularly when creating large dictionaries. This can make our code more readable, less error-prone, and faster.
Top comments (1)
I am not sure about this. I would say that it depends a lot on the specific case. If the key is supposed to be in the dictionary, but it isn't (maybe because of a typo), in my opinion it is better to have an exception raised rather than having the program to continue as normal.
This reminds me of old FORTRAN behavior: since you were not required to declare your variables, a typo could create a second, undesired, variable and causes errors that were quite tricky to hunt down. A typical case was when the line it was too lang (FORTRAN compilers silently discarded everything after column. 72, columns 73-80 were usually used to put a sequential number on punched card)