Dylan Anthony

Posted on

# You Should Use Python Sets

This post contains references to Big O notation1, if you aren't familiar with it (or don't remember how it works) now is a good time for a refresher.

There are a ton of data structures available in the standard Python library, most of which go underutilized. Today I'd like to focus on one type in particular you should probably be using more than you are: the `set`. You may be already aware of the general concept of sets in computer science2, the Python set3 is exactly that.

## What is it?

For those of you who don't already know, a `set` is an unordered collection of hashable objects which cannot contain any duplicates. The hashable part is key here, much like the keys in a `dict`4 (A.K.A. associative array 5), each entry is hashed to create an index into the underlying data structure. You can think of a `set` much like a `dict` without any values.

## When to use it?

A `set` is an extremely useful alternative to the more popular `list` in several scenarios:

1. Looking for membership (e.g. the `in` operator). The lookup time for a `set` is O(1) whereas for a list it's O(n)6.
2. If you'll be removing members, same story as `in` here regarding complexity.
3. If you need to compare, contrast, or combine collections - the `set` has operators for all of these.
4. If you cannot have any duplicate values in the collection.

## When not to use it?

1. If you need to have more than one copy of the same value in the collection.
2. If the order of the objects matters
3. If the objects in your collection are not hashable7

## How to use it?

``````set_with_values = {'a', 'b', 'c'}
other_set = set()  # Note {} would be a dict

assert 'a' in set_with_values  # Super speedy operation

assert other_set < set_with_values  # This means other_set is a subset of set_with_values

assert set_with_values > other_set  # Super set

# Set defines a bunch of useful operators
assert set_with_values | other_set == {'a', 'b', 'c', 'd'}
assert set_with_values & other_set == {'c'}
assert set_with_values - other_set == {'a', 'b'}
assert set_with_values ^ other_set == {'a', 'b', 'd'}

set_with_values |= other_set  # You can perform updates with all of those operators
assert set_with_values == {'a', 'b', 'c', 'd'}  # Note only 1 copy of 'c'

set_with_values.remove('d')  # Again, super fast compared to lists
``````

## Conclusion

If you aren't already using sets all over your code, you probably should be. Basically any time you're using `in` with a list or `del` with a list, you should consider using a different structure.