Posted on Feb 20, 2020

Customize your own dictionary - Python Tips

#python

TL;DR

Subclass collections.UserDict, not dict.

Longer Description

I need a dictionary-like variable, and pass it to other function, when the function returned, I need to know if this dictionary has been changed (key or value), if it is, I'll do a database saving (if you are seeing this as the process of saving sessions in the server side, it exactly is). Since this is a frequent operation, I want to make this as fast as possible.

The first idea came to my mind is like this:

session_dict = {}
old_dict = {**session_dict}
# now call function
func(session_dict)
if session_dict != old_dict:
    # change happend, do database saving

But this method require a dictionary copy, which cost more space, and compare two dictionary objects, I am sure there are some recursive comparing happened inside, which may cost more CPU time.

The second idea is using pickle to compare the pickled string:

import pickle
session_dict = {}
finger_print = pickle.dumps(session_dict)
func(session_dict)
if pickle.dumps(session_dict) == finger_print:
    # change happend, do database saving

This one uses pickle.dumps to convert dictionary to string, then after the function call, it compares the string. Based on how long the string is, this may cost more CPU time.

The third idea is, to subclass the dict, and when __setitem__ and __delitem__ are being called, mark the change flag (you may also want to see my previous post here on subclassing dict in Python to support dot syntax accessing):

class ChangeDetectableDict(dict):
    def __init__(self, val=None):
        if val is None:
            val = {}
        super().__init__(val)
        self.changed = False

    def __setitem__(self, item, value):
        super().__setitem__(item, value)
        self.changed = True

    def __delitem__(self, item):
        super().__delitem__(item)
        self.changed = True

This works fine if we have code to set key:

session_dict = ChangeDetectableDict()
session_dict["name"] = "bo"
print(session_dict.changed) # True

Or delete a key:

session_dict = ChangeDetectableDict({"name": "bo"})
del session_dict["name"]
print(session_dict.changed) # True

This will make the change-detection to time O(1), which is great. But You may see the problem that if I set the key's value to be the same value, the changed will still be marked as True:

session_dict = ChangeDetectableDict({"name": "bo"})
session_dict["name"] = "bo"]
print(session_dict.changed) # True

In my case I am fine with it, as long as if something changed, the changed will be marked as True, I don't mind having some false-positive case. But the real problem is, if I don't use del to delete a key, use pop instead, the __delitem__ method won't be called, thus changed is still False:

session_dict = ChangeDetectableDict({"name": "bo"})
session_dict.pop("name", None)
print(session_dict.changed) # False

This is unacceptable for my use case, and the reason why this happens is because Python's built-in dict has some inline optimizations which leads pop not calling __delitem__.

So the fourth method I found is not subclassing dict, instead, subclass collections.UserDict:

from collections import UserDict

class ChangeDetectableDict(UserDict):
    def __init__(self, val=None):
        if val is None:
            val = {}
        super().__init__(val)
        self.changed = False

    def __setitem__(self, item, value):
        super().__setitem__(item, value)
        self.changed = True

    def __delitem__(self, item):
        super().__delitem__(item)
        self.changed = True

Now no matter we use del or pop, the __delitem__ method will always get called, thus will mark changed to be True.

Back to the beginning, now I can write code like this:

session_dict = ChangeDetectableDict()
func(session_dict)
if session_dict.changed:
    # change happend, do database saving

DEV Community

Customize your own dictionary - Python Tips

TL;DR

Longer Description

Top comments (0)

Okay