Miguel Brito

Posted on Jul 17, 2021 • Edited on Jan 31, 2022 • Originally published at miguendes.me

The Best Way to Compare Two Dictionaries in Python

#machinelearning #tutorial #beginners #python

When I had to compare two dictionaries for the first time, I struggled―a lot!

For simple dictionaries, comparing them is usually straightforward. You can use the == operator, and it will work.

However, when you have specific needs, things become harder. The reason is, Python has no built-in feature allowing us to:

compare two dictionaries and check how many pairs are equal
assert nested dictionaries are equal (deep equality comparison)
find the difference between two dicts (dict diff)
compare dicts that have floating-point numbers as values

In this article, I will show how you can do those operations and many more, so let’s go.

Why You Need a Robust Way to Compare Dictionaries

Let's imagine the following scenario: you have two simple dictionaries. How can we assert if they match? Easy, right?

Yeah! You could use the == operator, off course!

>>> a = {
    'number': 1,
    'list': ['one', 'two']
}
>>> b = {
    'list': ['one', 'two'],
    'number': 1
}
>>> a == b
True

That's kind of expected, the dictionaries are the same. But what if some value is different, the result will be False but can we tell where do they differ?

>>> a = {
    'number': 1,
    'list': ['one', 'two']
}
>>> b = {
    'list': ['one', 'two'],
    'number': 2
}
>>> a == b
False

Hum... Just False doesn't tell us much...

What about the str's inside the list. Let's say that we want to ignore their cases.

>>> a = {
    'number': 1,
    'list': ['ONE', 'two']
}
>>> b = {
    'list': ['one', 'two'],
    'number': 1
}
>>> a == b
False

Oops...

What if the number was a float and we consider two floats to be the same if they have at least 3 significant digits equal? Put another way, we want to check if only 3 digits after the decimal point match.

>>> a = {
    'number': 1,
    'list': ['one', 'two']
}
>>> b = {
    'list': ['one', 'two'],
    'number': 1.00001
}
>>> a == b
False

You might also want to exclude some fields from the comparison. As an example, we might now want to remove the list key->value from the check. Unless we create a new dictionary without it, there's no method to do that for you.

Can't it get any worse?

Yes, what if a value is a numpy array?

>>> a = {
    'number': 1,
    'list': ['one', 'two'],
     'array': np.ones(3)
}
>>> b = {
    'list': ['one', 'two'],
    'number': 1,
    'array': np.ones(3)
}
>>> a == b
---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-4-eeadcaeab874> in <module>
----> 1 a == b

ValueError: The truth value of an array with more than one element is ambiguous. Use a.any() or a.all()

Oh no, it raises an exception in the middle of our faces!

Damn it, what can we do then?

Using the Right Tool for the Job

Since dicts cannot perform advanced comparisons, there are only two forms of achieving that. You can either implement the functionality yourself or use a third party library. At some point in your life you probably heard about not reinventing the wheel. So that's precisely what we're going to do in this tutorial.

We'll adopt a library called deepdiff, from zepworks. deepdiff can pick up the difference between dictionaries, iterables, strings and other objects. It accomplishes that by searching for changes in a recursively manner.

deepdiff is not the only kid on the block, there's also Dictdiffer, developed by the folks at CERN. Dictdiffer is also cool but lacks a lot of the features that make deepdiff so interesting. In any case, I encourage you to look at both and determine which one works best for you.

This library is so cool that it not only works with dictionaries, but other iterables, strings and even custom objects. For example, you can "even mix and match" and take the difference between two lists of dicts.

Getting a Simple Difference

In this example, we'll be solving the first example I showed you. We want to find the key whose value differs between the two dicts. Consider the following code snippet, but this time using deepdiff.

In [1]: from deepdiff import DeepDiff

In [2]: a = {
   ...:     'number': 1,
   ...:     'list': ['one', 'two']
   ...: }

In [3]: b = {
   ...:     'list': ['one', 'two'],
   ...:     'number': 2
   ...: }

In [4]: diff = DeepDiff(a, b)

In [5]: diff
Out[5]: {'values_changed': {"root['number']": {'new_value': 2, 'old_value': 1}}}

Awesome! It tells us that the key 'number' had value 1 but the new dict, b, has a new value, 2.

Ignoring String Case

In our second example, we saw an example where one element of the list was in uppercase, but we didn't care about that. We wanted to ignore it and treat "one" as "ONE"

You can solve that by setting ignore_string_case=True

In [10]: a = {
    ...:     'number': 1,
    ...:     'list': ['ONE', 'two']
    ...: }
    ...: 

In [11]: b = {
    ...:     'list': ['one', 'two'],
    ...:     'number': 1
    ...: }

In [12]: diff = DeepDiff(a, b, ignore_string_case=True)

In [13]: diff
Out[13]: {}

If we don't do that, a very helpful message is printed.

In [14]: diff = DeepDiff(a, b)

In [15]: diff
Out[15]: 
{'values_changed': {"root['list'][0]": {'new_value': 'one',
   'old_value': 'ONE'}}}

Comparing Float Values

We also saw a case where we had a float number that we only wanted to check if the first 3 significant digits were equal. With DeepDiff it's possible to pass the exact number of digits AFTER the decimal point. Also, since floats differ from int's, we might want to ignore type comparison as well. We can solve that by setting ignore_numeric_type_changes=True.

In [16]: a = {
    ...:     'number': 1,
    ...:     'list': ['one', 'two']
    ...: }

In [17]: b = {
    ...:     'list': ['one', 'two'],
    ...:     'number': 1.00001
    ...: }

In [18]: diff = DeepDiff(a, b)

In [19]: diff
Out[19]: 
{'type_changes': {"root['number']": {'old_type': int,
   'new_type': float,
   'old_value': 1,
   'new_value': 1.00001}}}
In [24]: diff = DeepDiff(a, b, significant_digits=3, ignore_numeric_type_changes=True)

In [25]: diff
Out[25]: {}

Comparing `numpy` Values

When we tried comparing two dictionaries with a numpy array in it we failed miserably. Fortunately, DeepDiff has our backs here. It supports numpy objects by default!

In [27]: import numpy as np

In [28]: a = {
    ...:     'number': 1,
    ...:     'list': ['one', 'two'],
    ...:      'array': np.ones(3)
    ...: }

In [29]: b = {
    ...:     'list': ['one', 'two'],
    ...:     'number': 1,
    ...:     'array': np.ones(3)
    ...: }

In [30]: diff = DeepDiff(a, b)

In [31]: diff
Out[31]: {}

What if the arrays are different?

No problem!

In [28]: a = {
    ...:     'number': 1,
    ...:     'list': ['one', 'two'],
    ...:      'array': np.ones(3)
    ...: }

In [32]: b = {
    ...:     'list': ['one', 'two'],
    ...:     'number': 1,
    ...:     'array': np.array([1, 2, 3])
    ...: }

In [33]: diff = DeepDiff(a, b)

In [34]: diff
Out[34]: 
{'type_changes': {"root['array']": {'old_type': numpy.float64,
   'new_type': numpy.int64,
   'old_value': array([1., 1., 1.]),
   'new_value': array([1, 2, 3])}}}

It shows that not only the values are different but also the types!

Comparing Dictionaries With `datetime` Objects

Another common use case is comparing datetime objects. This kind of object has the following signature:

class datetime.datetime(year, month, day, hour=0, minute=0, second=0, microsecond=0, tzinfo=None, *, fold=0)

In case we have a dict with datetime objects, DeepDiff allows us to compare only certain parts of it. For instance, if only care about year, month, and day, then we can truncate it.

In [1]: import datetime

In [2]: from deepdiff import DeepDiff

In [3]: a = {
            'list': ['one', 'two'],
            'number': 1,
             'date': datetime.datetime(2020, 6, 17, 22, 45, 34, 513371)
        }

In [4]: b = {
            'list': ['one', 'two'],
            'number': 1,
            'date': datetime.datetime(2020, 6, 17, 12, 12, 51, 115791)
        }

In [5]: diff = DeepDiff(a, b, truncate_datetime='day')

In [6]: diff
Out[7]: {}

Comparing String Values

We've looked at interesting examples so far, and it's a common use case to use dicts to store strings values. Having a better way of contrasting them can help us a lot! In this section I'm going to explain you another lovely feature, the str diff.

In [13]: from pprint import pprint

In [17]: b = {
    ...:     'number': 1,
    ...:     'text': 'hi,\n my awesome world!'
    ...: }

In [18]: a = {
    ...:     'number': 1,
    ...:     'text': 'hello, my\n dear\n world!'
    ...: }

In [20]: ddiff = DeepDiff(a, b, verbose_level=2)

In [21]: pprint(ddiff, indent=2)
{ 'values_changed': { "root['text']": { 'diff': '--- \n'
                                                '+++ \n'
                                                '@@ -1,3 +1,2 @@\n'
                                                '-hello, my\n'
                                                '- dear\n'
                                                '- world!\n'
                                                '+hi,\n'
                                                '+ my awesome world!',
                                        'new_value': 'hi,\n my awesome world!',
                                        'old_value': 'hello, my\n'
                                                     ' dear\n'
                                                     ' world!'}}}

That's nice! We can see the exact lines where the two strings differ.

Excluding Fields

In this last example, I'll show you yet another common use case, excluding a field. We might want to exclude one or more items from the comparison. For instance, using the previous example, we might want to leave out the text field.

In [17]: b = {
    ...:     'number': 1,
    ...:     'text': 'hi,\n my awesome world!'
    ...: }

In [18]: a = {
    ...:     'number': 1,
    ...:     'text': 'hello, my\n dear\n world!'
    ...: }

In [26]: ddiff = DeepDiff(a, b, verbose_level=2, exclude_paths=["root['text']"])
    ...: 

In [27]: ddiff
Out[27]: {}

If you want even more advanced exclusions, DeepDiff also allow you to pass a regex expression. Check this out: https://zepworks.com/deepdiff/current/exclude_paths.html#exclude-regex-paths.

Conclusion

That's it for today, folks! I really hope you've learned something new and useful. Comparing dict's is a common use case since they can used to store almost any kind of data. As a result, having a proper tool to easy this effort is indispensable. DeepDiff has many features and can do reasonably advanced comparisons. If you ever need to compare dict's go check it out.