DEV Community

loading...

Python Collections

Batuhan Osman Taşkaya
python guy
・7 min read

Python has general purpose built-in container types (list, set, tuple, dict). But in some cases they may not be enough. We can get more container types with a standard library module called Collections.

Also collections provides abstract base classes (not this post's topic). They can be used to test whether a class provides a particular interface; for example, whether it is hashable or whether it is a mapping.

Namedtuple

Python's standard tuple uses numerical indexes to get values. namedtuple is a factory function for creating tuple subclasses with named fields for better readability and self-documented code.

It requires 2 arguments by default for creating a subclass with name fields. The first one is typename, it represents the class name (created by namedtuple factory). Second one is field_names; it can take a string, seperated by space / comma chars or an iterable.

Example:

>>> person = namedtuple("Person", ("name","age","gender"))
>>> Samantha = person("Samantha", 33, "Woman")
>>> Samantha.name
'Samantha'
>>> Samantha.age
33
>>> Samantha.gender
'Woman'
>>> 

When namedtuple called, it formats a class template (that inherited from tuple) with typename, field_names and several internal values. Verbose option allows to see formatted class template.

Check formatting step of a class template in namedtuple factory.

And when your field_names are parsed by namedtuple they controlled in a few steps.

  • Is this field_name a string? (All field_names mapped into str when namedtuple called) [Raises TypeError if False]
  • Is this field_name not a valid identifier (checked with isidentifier method)[Raises ValueError if True]
  • Is this field_name a reserved keyword (checked with keyword.iskeyword method) [Raises ValueError if True]
  • Is this field_name starts with underscore (_)? [Raises ValueError if True]
  • Is this field_name used & seen before? (seen is a set and keeps all valid field names)[Raises ValueError if True]

Rename option prevents this errors with renaming value. Check renaming process

>>> pet = namedtuple("Pet", (None, "3m3", True, 7, 7), rename=True)
>>> pet._fields
('_0', '_1', '_2', '_3', '_4')

Methods

_make(i) : Return a new instance from given iterable (Class Method).
_asdict() : Return tuple field names and values as OrderedDict.
_replace(**kw) : Return a new instance with replacing specified fields.
_source : Return formatted class template like verbose option.
_fields : Return fields as Tuple.

Deque

In conventional stacks / queues inputs and outputs are restricted to a single end but in deque (double ended queue) container type, appends & pops can be performed either side of the queue. Also it supports thread-safe and it is super-fast.

It requires no argument for initalizition. You can specify inital value of deque with giving it a iterable. Also you can limit the amount of items a deque can hold. By limiting deque with maxlen we simply tell to deque when you reach the maximum length pop out the item from the opposite end.

>>> item_ids = deque((1,2,3), maxlen=5) 
>>> item_ids[1] 
2
>>> len(item_ids) 
3

Methods

  • append(x) : Appends x to right-end
>>> item_ids.append(4)
>>> item_ids
deque([1, 2, 3, 4], maxlen=5) 
  • appendleft(x) : Appends x to left-end
>>> item_ids.appendleft(0)
>>> item_ids
deque([0, 1, 2, 3, 4], maxlen=5)
  • extend(i): Extend i iterable into deque's right-end
>>> item_ids.extend(["a","b"])
>>> item_ids
deque([2, 3, 4, 'a', 'b'], maxlen=5)
  • extendleft(i) : Extend i iterable into deque's left-end
>>> item_ids.extendleft((1,0))
>>> item_ids
deque([0, 1, 2, 3, 4], maxlen=5)
  • pop() : Pop an element from right-end and return it (raise IndexError if no element)
>>> item_ids.pop() 
4
>>> item_ids
deque([0, 1, 2, 3], maxlen=5)
  • popleft() : Pop an element from left-end and return it (raise IndexError if no element)
>>> item_ids.popleft() 
0
>>> item_ids
deque([1, 2, 3], maxlen=5)
  • remove(x) : Remove first x value if found (raise ValueError if not)
>>> item_ids = deque([2, 3, 10, 10, 10], maxlen=5)
>>> item_ids.remove(10)
>>> item_ids
deque([2, 3, 10, 10], maxlen=5)
  • reverse() : Reverse the deque
>>> item_ids.reverse()
>>> item_ids
deque([10, 10, 3, 2], maxlen=5)
  • count(x) : Total count of x in deque
>>> item_ids.count(10)
2
  • rotate(n): Rotate the deque n times to the right.
>>> item_ids.rotate(1)
>>> item_ids
deque([2, 10, 10, 3], maxlen=5)
>>> item_ids.rotate(2)
>>> item_ids
deque([10, 3, 2, 10], maxlen=5)
  • clear() : Remove all items in deque and set total length to 0
>>> item_ids.clear()
>>> item_ids
deque([], maxlen=5)
  • maxlen : If set return maximum length of deque if not return None
>>> item_ids.maxlen
5

ChainMap

The ChainMap class manages dictionaries or other mappings and gets values from them in order to given key. It takes *maps as a inital value. If no *maps are given, a single empty dictionary is provided so that a new chain always has at least one mapping. Also it uses regular dict API.

>>> d1 = {'name':'Samantha', 'age': 33} 
>>> d2 = {'gender':'Woman', 'name':'Carter'}
>>> m = ChainMap(d1,d2)
ChainMap({'name': 'Samantha', 'age': 33}, {'gender': 'Woman', 'name': 'Carter'})
>>> tuple(m.keys())
('name', 'gender', 'age')
>>> tuple(m.items())
(('name', 'Samantha'), ('gender', 'Woman'), ('age', 33))
>>> m['name']
'Samantha'
>>> m.__getitem__('name')
'Samantha'

It stores all *maps in a list and gives you access to this list via an attribute called maps. The list is ordered from first-searched to last-searched. It is the only stored state and can be modified to change which mappings are searched. The list should always contain at least one mapping.

>>> m.maps
[{'name': 'Samantha', 'age': 33}, {'gender': 'Woman', 'name': 'Carter'}]

If you want to get Carter when you give the name key, you have to reverse maps list.

>>> m.maps = list(reversed(m.maps))
[{'gender': 'Woman', 'name': 'Carter'}, {'name': 'Samantha', 'age': 33}]
>>> m['name']
'Carter'

ChainMap also gives a method for creating a new ChainMap instance optionally with one extra mapping (as first mapping in maps for make it easy to avoid modifying the existing underlying data structures). This method called new_child().

>>> n = m.new_child({'age':22})
>>> n
ChainMap({'age': 22}, {'gender': 'Woman', 'name': 'Amanda'}, {'name': 'Samantha', 'age': 33})
>>> n['age'] 
22

If you want to get parent ChainMap from an instance created with new_child() you can use parent attribute. It returns a new ChainMap containing all of the maps in the current instance except the first one.

>>> n.parents # Get parent of `n`
ChainMap({'gender': 'Woman', 'name': 'Amanda'}, {'name': 'Samantha', 'age': 33})

Counter

Counter is subclass of dict. It helps us counting hashable objects. Counter can return any integer value including zero or negative values.

Initialization Types

You can initalize Counter with 4 diffrent types.

  • With giving nothing
>>> c = Counter()
>>> c['a'] = 3
>>> c
Counter({'a': 3})
  • With giving an iterable
>>> c = Counter("xyzzzzxyz")
>>> c['x']
2
>>> c
Counter({'z': 5, 'x': 2, 'y': 2})
  • With giving an dict & mapping
>>> c = Counter({"x":2, "z":5, "y":2})
>>> c['z']
5
>>> c
Counter({'z': 5, 'x': 2, 'y': 2})
  • With giving **kwargs
>>> c = Counter(x=2, z=5, y=3)
>>> c['y']
3
>>> c
Counter({'z': 5, 'y': 3, 'x': 2})

You can access and set Counter's values by using the regular dictionary API (because Counter inherited from dict). Also you never get KeyError from Counter. Instead of that you get 0 for non-exist keys.

Example:

>>> word = "xyzzz"
>>> c = Counter(word)
>>> c
Counter({'z': 3, 'x': 1, 'y': 1})
>>> c['x'] += 5
>>> c
Counter({'x': 6, 'z': 3, 'y': 1})
>>> c['a']
0

Methods

  • elements() : Returns a iterator (itertools.chain) that produces all of the items.
>>> tuple(c.elements())
('x', 'x', 'x', 'x', 'x', 'x', 'y', 'z', 'z', 'z')
  • most_common(n) : Returns a sequence of the n most frequently encountered values and total counts of this values.
>>> c.most_common(2)
[('x', 6), ('z', 3)]
  • update(x) : Updates Counter object with x's values. x can be a iterable or a mapping.
>>> c.update('abcccccaa')
>>> c
Counter({'x': 6, 'c': 5, 'z': 3, 'a': 3, 'y': 1, 'b': 1})
  • subtract(x) : Substracts values from given iterable / mapping.
>>> c.subtract("xxxxxxxzz")
>>> c
Counter({'c': 5, 'a': 3, 'y': 1, 'z': 1, 'b': 1, 'x': -1})
  • __add__(x) (+) : Combination of 2 Counter objects.
>>> c = Counter(a=4, b=3)
>>> d = Counter(b=2, c=6)
>>> c + d
Counter({'c': 6, 'b': 5, 'a': 4})
  • __sub__(x) (-) : Subtraction of 2 Counter objects (with keeping only positive counts)
>>> c - d
Counter({'a': 4, 'b': 1})
  • __and__(x) (&) : Intersection of 2 Counter objects.
>>> c & d
Counter({'b': 2})
  • __or__(x) (|) : Union of 2 Counter objects.
>>> c | d
Counter({'c': 6, 'a': 4, 'b': 3})

OrderedDict

Python's regular dict doesn't keep order of insertion but a OrderedDict (inherited from regular dict) keeps order of insertion. You can use regular dict api (like fromkeys class method) on it because it is a subclass of dict.

>>> o = OrderedDict(name="Samantha", position="SG1", age=33)
>>> o
OrderedDict([('name', 'Samantha'), ('position', 'SG1'), ('age', 33)])
>>> o['name']
'Samantha'
>>> o['name'] += ' Carter'
>>> o['name']
'Samantha Carter'

Equality Tests

A regular python dict checks only contents when testing. OrderedDict check both entries and insertion orders (order-sensitive).

>>> dict.fromkeys("abc") == dict.fromkeys("cba")
True
>>> OrderedDict.fromkeys("abc") == OrderedDict.fromkeys("cba")
False

Methods

  • popitem(last) : Return and remove an entry in OrderedDict. If last is True removing order is LIFO if not removing order is FIFO. (last default: True)
>>> o.popitem(last=False)
('name', 'Samantha')
>>> o
OrderedDict([('position', 'SG1'), ('age', 33)])
>>> o.popitem()
('age', 33)
>>> o
OrderedDict([('position', 'SG1')])
  • move_to_end(key, last) : Changes order of an entry (specified with key). If last is True moves to the end of dict if not moves to the starting of dict. (last default: True)
>>> o = o.fromkeys("abc")
>>> o.move_to_end('a')
>>> o
OrderedDict([('b', None), ('c', None), ('a', None)])
>>> o.move_to_end('c', last=False)
>>> o
OrderedDict([('c', None), ('b', None), ('a', None)])

Defaultdict

defaultdict is a dict subclass with default values for non-exist entries. It takes a factory for initalizing. Factory helps it with generating values for non-exist entries. Also you can regular dict API with it.

When you try to get a dict entry (that does not exist) via __getitem__, it calls __missing__ instead of raising KeyError. And __missing__ uses factory to fill this key.

>>> def factory() -> str: 
...     return "unknown"
... 
>>> d = defaultdict(factory, red="ff0000", blue="0000ff")
>>> d['red']
'ff0000'
>>> d['green']
'unknown'

UserDict , UserList, UserString

If you need to subclass built-in dict,list or string don't use directly this types for inheriting. Use UserDict , UserList and UserString. Their purpose is being a wrapper for easier subclassing.

Discussion (0)