Intersection, union and difference of Sets in Python.

Sebastian G. Vinci on February 04, 2019

I wanted to talk about sets, and four great operations they provide: Intersection: Elements two sets have in common. Union: All the elements ... [Read Full]
markdown guide
 

Hi Sebastian, nice article! Sets are great and "set comprehensions" are so cool. I have a tip for construction:

>>> a = {1, 2, 3}
>>> b = {2, 3, 4}
>>> a, b
({1, 2, 3}, {2, 3, 4})
>>> type(a)
<class 'set'>

You can use the set literal notation to avoid creating the intermediate list. Unfortunately for an empty set you still need the set() function:

>>> a = {}
>>> type(a)
<class 'dict'>
>>> a = set()
>>> type(a)
<class 'set'>

I'm not sure they make Python's code more readable (well, the difference operator maybe) but you can use a few operators in place of the functions you perfectly explained up here:

>>> a = {1, 2, 3}
>>> b = {2, 3, 4}
>>> a & b # insersection
{2, 3}
>>> a | b # union
{1, 2, 3, 4}
>>> a - b # difference
{1}
>>> a ^ b # symmetric difference
{1, 4}

In the spirit of the Python zen (explicit is better than implicity) I don't think I've ever used the operators, aside from the - for difference which is self explanatory. The other three remind me too much of bitwise operators in C :D

 

Actually the bitwise operations is completely applicable to the sets (i.e. the wiki XOR article contains Venn's diagrams). That's happening because of the bitwise operations are applied not to the values but to their presence in the set.

cats = {...}
animals = {...}
dogs = {...}
cat_dog_nickelodeon = cats & dogs  # intuitive
other_animals = animals - (cats | dogs)  # beautiful
other_animals = animals.difference(cats.union(dogs))  # much more ugly
 

That's happening because of the bitwise operations are applied not to the values but to their presence in the set.

Thanks for reminding me about that. I still prefer the explicit version in "day to day" code though, especially because they can be used dynamically if needed!

 

Hi, thanks a lot for the comment.

As stated in my post, yes, we can use curly braces to avoid the intermediate collection, but I think, unless performance is an issue, we should use the constructor, as curly braces are also used for dictionaries.

Curly braces may be used too, which will avoid creating an intermediate list, but as they are also used for dictionaries, you can not create an empty set with curly braces. Also, someone can get confused, so I avoid them and stick to the constructor.

On the other hand, I did not know about those operators, so thanks a lot for bringing them up, although I agree with you, the usage of the functions is more readable.

Again, thanks for your coment.

Regards!

 

Them being bitwise operators makes them really easy to remember, IMO - after all, what's the fundamental difference between a bitmask and a set of booleans? :)

(Of course in that case difference should look like &~ and not -...)

 

Python is a high level language though, there's no builtin concept of masking bits so, taking into account that we can't assume C knowledge by the reader, I reckon that:

a.intersection(b)

is more readable than

a & b

especially six months from now with a different programmer tasked to fix something 😆

(also remember than there are no other major contexts in Python itself where a & b means anything)

I agree that the long names are more readable, but Python does provide operators for & and | already; on integers they provide the bitwise logic, same as in C.

>>> 3 & 4
0
>>> 4 & 4
4
>>> 4 & 7
4
>>> 3 | 4
7

Also some things make use of those operators for other purposes; for example, Peewee ORM uses them for its query generator.

Anyway, just because a language is high level doesn't mean it doesn't (and shouldn't) provide bitwise functionality. Bitwise operations are still really useful for a lot of purposes in a lot of fields and I absolutely would not discount how necessary they are.

Oh thanks fluffy, I totally forgot about those. I rarely see them anywhere that I probably forgot :) My bad!

Anyway, just because a language is high level doesn't mean it doesn't (and shouldn't) provide bitwise functionality. Bitwise operations are still really useful for a lot of purposes in a lot of fields and I absolutely would not discount how necessary they are.

Can't argue with that hehe

code of conduct - report abuse