One of the first data structures discussed in Sedgewick's Algorithms is the Bag. A Bag is a data structure, which holds an unordered collection. We say unordered because it is in contrast to ordered collections like conventional arrays. Because we are not concerned with order, we can store the elements in any way we want.

This article will discuss three different implementations of Bag, each using a different internal data structure. By using different internal data structures, a clearer picture is created on the value of these interfaces, and can also help identify time complexity trade-offs.

## The Bag Interface

The image below describes a Bag interface in Java:

It describes five operations:

- The constructor
`Bag()`

takes zero arguments - The method
`add(item)`

inserts an item into the Bag. - The method
`isEmpty()`

tells us if the Bag is empty. - The method
`size()`

tells us the size of the Bag. - The interface
`Iterable<Item>`

in Java allows the use of the`for .. in ..`

loop.

##
Implementing `Iterable`

in Python

Unlike Java, Python does not have a formal concept of a typed interface. However, it does have facilities to allow an object to be iterable, much like Java, using the `__iter__`

and `_next__`

. The following is an example of how these methods work in Python:

```
class DoubleYourNumber:
def __init__(self, start=0, stop=10):
self.start = start
self.stop = stop
def __iter__(self):
self.current = self.start
return self
def __next__(self):
if self.current <= self.stop:
value = self.current * 2
self.current += 1
return value
else:
raise StopIteration
```

The above is an example of a class that supports the Python Iterator interface. Below is an example of how this class could be used to iterate:

```
>>> for value in DoubleYourNumber(stop=5):
... print(value)
...
0
2
4
6
8
10
```

You can get the same effect by using `iter`

and `next`

directly on the iterator object:

```
>>> doubler = DoubleYourNumber(stop=5)
>>> iter_doubler = iter(doubler)
>>> next(iter_doubler)
0
>>> next(iter_doubler)
2
>>> next(iter_doubler)
4
>>> next(iter_doubler)
6
>>> next(iter_doubler)
8
>>> next(iter_doubler)
10
>>> next(iter_doubler)
Traceback (most recent call last):
...
StopIteration
```

In a way, you can think of iterators as following this `while`

loop pattern:

```
iter_object = iter(object)
while True:
try:
value = next(iter_object)
# Do something with `value`
except StopIteration:
break # Leave the infinite while loop
```

We'll be using this pattern in our implementations of Bag.

## Unit Testing a Bag

Since we don't have a formal interface, we can instead rely on unit testing to confirm the implementation is useful for our needs. Below is a set of tests that we can use to verify our implementation.

```
import unittest
from bag_implementation import Bag
class BagUnitTest(unittest.TestCase):
def test_new_bag_is_empty(self):
"""
A Bag should initially be empty.
"""
bag = Bag()
self.assertEqual(bag.is_empty(), True)
def test_add_increases_size_by_one(self):
"""
Calling bag.add(...) should increase it's size by one.
"""
bag = Bag()
bag.add(1)
self.assertEqual(bag.size(), 1)
def test_bag_can_be_iterated_over(self):
"""
Bag uses the iterable Python interface.
"""
bag = Bag()
for i in [1,2,3]:
bag.add(i)
sum = 0
for i in bag:
sum += i
self.assertEqual(sum, 6)
```

##
Creating a Bag Using the Built-In `list`

The first implementation of Bag is mostly a wrapper around `list`

. Since `list`

is an ordered collection, if we drop the ordered constraint `list`

can be used as an unordered collection. Below is a possible implementation using `list`

:

```
class BagWithList(object):
def __init__(self):
self.collection = list()
def add(self, item):
self.collection.append(item)
def size(self):
return len(self.collection)
def is_empty(self):
return len(self.collection) == 0
def __iter__(self):
return iter(self.collection)
```

The official Python wiki has a page on the time complexity of some of the built-in data structures. In particular, the `list`

operations has the following big-O in the average case:

Version | Operation | Big-O |
---|---|---|

`list` |
Append | O(1) |

`list` |
Get Length | O(1) |

`list` |
Iteration | O(n) |

Also this data structure uses O(n) space.

## Using a Linked List

By default Python does not come with a version of a linked list as a native data structure. Causing us to write our own below, with a custom iterator.

```
class LinkedListNode(object):
def __init__(self, value):
self.value = value
self.next_node = None
def add(self, value):
if self.next_node == None:
self.next_node = LinkedListNode(value)
else:
self.next_node.add(value)
def size(self):
if self.next_node == None:
return 1
else:
return 1 + self.next_node.size()
def __iter__(self):
return LinkedListIter(self)
class LinkedListIter(object):
def __init__(self, current_node):
self.current_node = current_node
def __next__(self):
if self.current_node == None:
raise StopIteration
else:
value = self.current_node.value
self.current_node = self.current_node.next_node
return value
```

We could use this implementation of a Linked List in our implementation of a Bag.

```
class BagWithLinkedList(object):
def __init__(self):
self.linked_list = None
def add(self, value):
if self.linked_list == None:
self.linked_list = LinkedListNode(value)
else:
self.linked_list.add(value)
def size(self):
if self.linked_list == None:
return 0
else:
return self.linked_list.size()
def is_empty(self):
return self.linked_list == None
def __iter__(self):
if self.linked_list == None:
return iter([])
else:
return iter(self.linked_list)
```

The time complexity of these methods are not as attractive as the previous. There are other use cases for linked lists that show how to use it effectively. Despite this, it still uses O(n) space complexity, as the `list`

version did.

Version | Operation | Big-O |
---|---|---|

Linked List | Append | O(n) |

Linked List | Get Length | O(n) |

Linked List | Iteration | O(n) |

##
Using `defaultdict`

from `collections`

as a Multiset

A set is a commonly known mathematical object. A set is only concerned with membership, or whether an element belongs to collection or not. Like bags, sets are unordered. However, unlike bags, they only allow an element to appear once. If we relax this constraint, and keep track of the number of times an element appears, then we have a multiset.

Multisets appear in C++ as a built-in data structure. However, Python does not have anything named as a multiset. Instead, there is `collections.Counter`

and `collections.defaultdict`

. The former keeps track of both counts and the insertion order. We only need to keep track of counts, so we will use `defaultdict`

with a default value of `0`

by using `defaultdict(int)`

.

```
from collections import defaultdict
class Multiset(object):
def __init__(self):
self.counts = defaultdict(int)
self.total = 0
def add(self, item):
self.counts[item] += 1
self.total += 1
def size(self):
return self.total
class MultisetIterator(object):
def __init__(self, multiset):
self.multiset = multiset
self.keys = list(multiset.counts.keys()) # Choose a fixed ordering
self.key_index = 0 # Current `keys` index
self.key_repeat = 0 # Number of times we've repeated the current key
def current_key(self):
return self.keys[self.key_index]
def current_count(self):
return self.multiset.counts[self.current_key()]
def __next__(self):
key = self.current_key()
count = self.current_count()
if self.key_repeat == count:
if self.key_index + 1 < len(self.keys):
self.key_index += 1
self.key_repeat = 0
else:
raise StopIteration
self.key_repeat += 1
return self.current_key()
```

Next, we use our `Multiset`

in the implementation of `Bag`

.

```
class BagWithMultiset(object):
def __init__(self):
self.multiset = Multiset()
def add(self, item):
self.multiset.add(item)
def size(self):
return self.multiset.size()
def is_empty(self):
return self.multiset.size() == 0
def __iter__(self):
return MultisetIterator(self.multiset)
```

This version of Bag might be the most performant, given the time and space complexity.

Version | Operation | Big-O Average Case | Big-O Worst Case |
---|---|---|---|

Multiset | Append | O(1) | O(n) |

Multiset | Get Length | O(1) | O(1) |

Multiset | Iteration | O(n) | O(n) |

These results are almost on par with the `list`

implementation. Perhaps the worst case for appending, is worth revisiting. However, the space complexity is not O(n), but in fact O(k) where k is the number of unique elements in the collection. If there are many repeated values, this might be more advantageous than the two other versions of Bag.

## Conclusion

We covered the basics of Python iterators, and considered three different versions of `Bag`

: one using `list`

, one using our hand-rolled linked list implementation, and finally a version of a multiset using Python's `defaultdict`

. This data structure is not very common, and does not have many advantages to ordered arrays. Nevertheless, because of its simplicity, we can use the Bag data structure as an introduction to algorithms.

With that.. happy algorithming, friends!

## Top comments (0)