Occasionally, I find myself solving simple Python quizzes while getting bored. Nothing special, but rather a way to keep myself in shape. Also, this is a good way to learn something new, as Python evolves, and I'd like to use new features wherever they are applicable.
But this time, something went wrong.
The quiz was quite simple: what is the return value for {i: i**2 for i in range(3)}.setdefault(2, 10) statement? I've never used setdefault() before, so it was a bit interesting, a tiny research and a mind exercise.
I started to search for the resulting data I'm working on. The {i for i**2 for i in range(3)} results in the dict type, and this is an example of dict comprehension. Nothing and nothing complex. But what is setdefault() for? I am familiar with the .get(key, <default_value>) method I like to use. This beautiful approach allows us to get either the dict's value behind the key or the <default_value> one. But the question should be tricky, shouldn't it? So, I tried to guess the return.
It was a "semi-blind" guess from the list of possible options: 2 / 10 / 4 / 1. My thinking was: "We have the dict: {0: 0, 1: 1, 2: 4}. 2 and 10 are in the method, so they very possibly are not the correct answer. Neither is 1". So I tried number 4, and that was the correct one.
But I still had no idea "why". So, Python documentation sheds light on this:
If key is in the dictionary, return its value. If not, insert key with a value of default and return default. default defaults to
None.
Here, I became more confused about this. When I was taught programming, my teacher told me that a method should be self-descriptive. I apply this rule across all my development initiatives because I believe it will help future developers (including myself).
And here I got stuck.
The part set is clear about setting the value to the dict's key. That part that returns value confuses me the most. OK, let's assume a developer should know the value behind the key. That's why the .get() method exists. But setdefault doesn't make understanding the code easier due to the functionality of the return value. I would expect to see another, more self-descriptive method name.
On the other side, maybe I'm just mumbling, and people get used to that? While others use AI to write code, I use it to check the existing code for suggestions, so I asked, and Google's Gemini generously brought me this snippet:
items_by_color = {}
items = [('duck', 'purple'), ('water bottle', 'purple'), ('uni-duck', 'pink')]
for item_name, item_color in items:
# Get the list for the colour, or set it to a new empty list and get that
items_list = items_by_color.setdefault(item_color, [])
items_list.append(item_name)
# Output: {'purple': ['duck', 'water bottle'], 'pink': ['uni-duck']}
Ahhh... "So you can write code faster! And there is no need to use if .. else statements anymore for such implementation". This was nice. But, my curiosity whispered, "What is the price of this optimisation?" Well, it's easy. If you update a dictionary's key and return a value, there should be a slight overhead.
To check this, I created a snippet that included the code with setdefault() call and with if .. else:
def test_no_set_default(sequence: dict) -> dict:
"Re-order incoming dict by key, values, using if..else"
reversed = {}
for k,v in sequence.items():
if v not in reversed:
reversed[v] = [k]
else:
reversed[v].append(k)
return reversed
def test_set_default(sequence: dict) -> dict:
"Re-order incoming dict by key, values, using set_default"
reversed = {}
for k, v in sequence.items():
reversed.setdefault(v, [])
reversed[v].append(k)
return reversed
Both functions were getting the same dictionary I filled with fake data:
from faker import Faker
def make_dict(items: int = 1000) -> dict:
"""Create a dictionary of random values"""
fake = Faker()
d = {fake.name(): fake.color_name() for _ in range (items)}
return d
Everything has been wrapped into a separate method to being able to run this:
from timeit import timeit
def test_calls(times=1_000_000):
setup_dict = make_dict()
t1 = timeit(lambda: test_no_set_default(setup_dict), number=times)
t2 = timeit(lambda: test_set_default(setup_dict), number=times)
print(f"no setdefault: {t1}")
print(f"with setdefault: {t2}")
The result? Here it is:
$ uv run python3 set_default_comparison.py --times 1000000
No setdefault: 100.65228875500907
With setdefault: 136.7275900120003
30% of overhead... Well, not something I would like to use in production. And being an ugly method name, I doubt it should be used widely. Anyway, try to convince me I'm wrong.
Top comments (0)