DEV Community

Cover image for Intersection, union and difference of Sets in Python.

Intersection, union and difference of Sets in Python.

Sebastian G. Vinci on February 04, 2019

I wanted to talk about sets, and four great operations they provide: Intersection: Elements two sets have in common. Union: All the elements fr...
Collapse
 
rhymes profile image
rhymes

Hi Sebastian, nice article! Sets are great and "set comprehensions" are so cool. I have a tip for construction:

>>> a = {1, 2, 3}
>>> b = {2, 3, 4}
>>> a, b
({1, 2, 3}, {2, 3, 4})
>>> type(a)
<class 'set'>
Enter fullscreen mode Exit fullscreen mode

You can use the set literal notation to avoid creating the intermediate list. Unfortunately for an empty set you still need the set() function:

>>> a = {}
>>> type(a)
<class 'dict'>
>>> a = set()
>>> type(a)
<class 'set'>
Enter fullscreen mode Exit fullscreen mode

I'm not sure they make Python's code more readable (well, the difference operator maybe) but you can use a few operators in place of the functions you perfectly explained up here:

>>> a = {1, 2, 3}
>>> b = {2, 3, 4}
>>> a & b # insersection
{2, 3}
>>> a | b # union
{1, 2, 3, 4}
>>> a - b # difference
{1}
>>> a ^ b # symmetric difference
{1, 4}
Enter fullscreen mode Exit fullscreen mode

In the spirit of the Python zen (explicit is better than implicity) I don't think I've ever used the operators, aside from the - for difference which is self explanatory. The other three remind me too much of bitwise operators in C :D

Collapse
 
sur0g profile image
sur0g • Edited

Actually the bitwise operations is completely applicable to the sets (i.e. the wiki XOR article contains Venn's diagrams). That's happening because of the bitwise operations are applied not to the values but to their presence in the set.

cats = {...}
animals = {...}
dogs = {...}
cat_dog_nickelodeon = cats & dogs  # intuitive
other_animals = animals - (cats | dogs)  # beautiful
other_animals = animals.difference(cats.union(dogs))  # much more ugly
Collapse
 
rhymes profile image
rhymes

That's happening because of the bitwise operations are applied not to the values but to their presence in the set.

Thanks for reminding me about that. I still prefer the explicit version in "day to day" code though, especially because they can be used dynamically if needed!

Collapse
 
fluffy profile image
fluffy • Edited

Them being bitwise operators makes them really easy to remember, IMO - after all, what's the fundamental difference between a bitmask and a set of booleans? :)

(Of course in that case difference should look like &~ and not -...)

Collapse
 
rhymes profile image
rhymes

Python is a high level language though, there's no builtin concept of masking bits so, taking into account that we can't assume C knowledge by the reader, I reckon that:

a.intersection(b)

is more readable than

a & b

especially six months from now with a different programmer tasked to fix something 😆

(also remember than there are no other major contexts in Python itself where a & b means anything)

Thread Thread
 
fluffy profile image
fluffy

I agree that the long names are more readable, but Python does provide operators for & and | already; on integers they provide the bitwise logic, same as in C.

>>> 3 & 4
0
>>> 4 & 4
4
>>> 4 & 7
4
>>> 3 | 4
7

Also some things make use of those operators for other purposes; for example, Peewee ORM uses them for its query generator.

Anyway, just because a language is high level doesn't mean it doesn't (and shouldn't) provide bitwise functionality. Bitwise operations are still really useful for a lot of purposes in a lot of fields and I absolutely would not discount how necessary they are.

Thread Thread
 
rhymes profile image
rhymes

Oh thanks fluffy, I totally forgot about those. I rarely see them anywhere that I probably forgot :) My bad!

Anyway, just because a language is high level doesn't mean it doesn't (and shouldn't) provide bitwise functionality. Bitwise operations are still really useful for a lot of purposes in a lot of fields and I absolutely would not discount how necessary they are.

Can't argue with that hehe

Collapse
 
svinci profile image
Sebastian G. Vinci

Hi, thanks a lot for the comment.

As stated in my post, yes, we can use curly braces to avoid the intermediate collection, but I think, unless performance is an issue, we should use the constructor, as curly braces are also used for dictionaries.

Curly braces may be used too, which will avoid creating an intermediate list, but as they are also used for dictionaries, you can not create an empty set with curly braces. Also, someone can get confused, so I avoid them and stick to the constructor.

On the other hand, I did not know about those operators, so thanks a lot for bringing them up, although I agree with you, the usage of the functions is more readable.

Again, thanks for your coment.

Regards!