loading...

Dataclasses In Python

btaskaya profile image Batuhan Osman Taşkaya Updated on ・5 min read

Dataclasses are a standard library module (that added in Python 3.7) for code generation with given type annotataed specs. It simplifies process of writing a class as a mutable data holder with automatically generated ordering methods (similar to functools.total_ordering class decorator).

You can get backport of dataclasses via pip (pip install dataclasses)

What were we doing before Dataclasses?

  • namedtuple as a code generator. The bad side of that it is a tuple. It restricts you in many ways.
  • types.Namespace as a simple data holder. But it is really light and most of the time it may not afford your demands.
  • ORM Models with describing specifications via class attributes (fields).
  • traitlets (3rd party) module for creating data holder classes like ORM Model's with rich validation.
  • attrs (3rd party) like modules.

Introduction

We want to store people in dataclasses. A person should have a name, age, job fields.

>>> @dataclass
... class Person:
...     name: str
...     age: int
...     job: str = "Developer"

We decorated our class with dataclass class decorator. We created annotated 3 fields with PEP 526's variable annotation syntax and we assigned a default value to job field.

When we want to create a dataclass we just initalize it like a normal class. Dataclasses uses normal class definition syntax, allows you to inheritance, using metaclasses and other Python class features.

>>> samantha = Person("Samantha Carter", 33)
>>> samantha
Person(name='Samantha Carter', age=33, job='Developer')

It is a repr; shows us how this class made and it is really useful when we are debugging.

You can access fields by their name

>>> samantha.name
'Samantha Carter'
>>> samantha.age
33
>>> samantha.job
'Developer'

Dataclasses also offers us some functions like replace, asdict and astuple. replace takes an instance of dataclass and replaces with given **kwargs. asdict returns as regular dict (in named tuple it returns an OrderedDict) and the last one astuple returns as a tuple generated with values. When we need to unpack a dataclass we can use astuple for generating a tuple and unpacking it.

>>> samantha = replace(samantha, job="Frontend Developer")
>>> samantha
Person(name='Samantha Carter', age=33, job='Frontend Developer')
>>> asdict(samantha)
{'name': 'Samantha Carter', 'age': 33, 'job': 'Frontend Developer'}
>>> astuple(samantha)
('Samantha Carter', 33, 'Frontend Developer')
>>> name,age,job = astuple(samantha)
>>> name
'Samantha Carter'

Mutability by default. We can assign variables without replacing it with a new instance.

>>> samantha.age += 1
>>> samantha
Person(name='Samantha Carter', age=34, job='Frontend Developer')

Parameters

The dataclass class decorator can take several parameters for better code generation.

  • init (True) : Generates __init__ if not written.

  • repr (True) : Generates __repr__ method if not written.

  • eq (True) : Generates __eq__ method if not written. No need for __ne__ because Python look __eq__ and negate it. Generated method will compare a tuple of its attributes with a tuple of attributes of the other instance of the same dataclass.

#(Taken from Lib/dataclasses.py)
if eq:
    flds = [f for f in field_list if f.compare]
    self_tuple = _tuple_str('self', flds)
    other_tuple = _tuple_str('other', flds)
    _set_new_attribute(cls, '__eq__',
                       _cmp_fn('__eq__', '==',
                               self_tuple, other_tuple))
  • order (False) : Generates ordering methods (lt(), le(), gt(), ge()). If eq is False and it is True raises ValueError, if class already has this methods raises TypeError.

  • unsafe_hash (False) : Generates __hash__ method with checking values of frozen and eq. If this two parameters both True, it generates a __hash__ method. If eq is True and frozen is False marks this class as Unhashable. If eq is false it uses object 's __hash__ method (id-based hashing)

  • frozen (False) : If this is set it raises an Exception when you try to assign something. If you already have __setattr__ & __delattr__, it raises a TypeError.

We can see which dataclass parameters enabled/disabled in code generation with a attribute called __dataclass_params__

>>> Person.__dataclass_params__
_DataclassParams(init=True,repr=True,eq=True,order=False,unsafe_hash=False,frozen=False)

Some Interesting Points

  • The equality comparison not only checks values, it also does an exact class match. You can not compare apples with oranges :)

  • __hash__ 's default is None so that you don’t get accidental hashability using identity.

  • Generated code includes everything from the original so that the default value shows-up as a class variable, and the type declarations show-up in the class annotations.

  • Generated code also have a lot of metadata like __dataclass_params__ or __dataclass_fields__

Immutable & Hashable & Ordered Dataclass Example

>>> @dataclass(order=True, frozen=True)
... class Item:
...     price: int
...     name : str
...     type : str = "Sword"
>>> items = [Item(40, "X"),
...          Item(20, "Y"),
...          Item(30, "XY")]
>>> pprint(sorted(items))
[Item(price=20, name='Y', type='Sword'),
 Item(price=30, name='XY', type='Sword'),
 Item(price=40, name='X', type='Sword')]
>>> powerful_sword = Item(1000, "Powerful")
>>> powerful_sword.price = 500
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "<string>", line 3, in __setattr__
dataclasses.FrozenInstanceError: cannot assign to field 'price'

Go Deeper - Fields

Fields as Factories

Instead of fixed default values we can use factories with field(). It takes a default factory parameter (field(default_factory=list)). It is similar to collections.defaultdict.

Attaching Metadata to Fields

If we need to describe a field we can attach it a metadata. It's default value is None and acts like an empty dict. You can give it an mapping.

Mark as Don't Compare

If you dont want to compare some specific fields you can mark it as not comparable via field(compare=False)

Remove it from __repr__ & __init__

Field allows you to remove parameters from __repr__ / __init__. If you want to remove it from repr() just add field(repr=False) or remove it from generated __init__() just add field(init=False)

An Example

Imagine you have books. A book has ISBN, Title, Author, Price and Renters fields.

>>> @dataclass
... class Book:
...     author  : str = field()
...     title   : str = field()
...     isbn    : int = field(compare=False)
...     price   : int = field(default_factory=int, metadata={"currency":"Turkish Lira"})
...     renters : List[str] = field(default_factory=list, metadata={"max": 5}, repr=False)
...     
...     def rent(self, name):
...         if len(self.renters) >= 5:
...             raise ValueError("5 People Already Rent This Book")
...         self.renters.append(name)
...     
...     def unrent(self, name):
...         self.renters.remove(name)
... 
>>> free = Book("Sam Williams", "Free as in Freedom", 9968237238, 35)
>>> cb   = Book("Eric Raymond", "Cathedral and the Bazaar", 969398332)
>>> free
Book(author='Sam Williams', title='Free as in Freedom', isbn=9968237238, price=35)
>>> cb
Book(author='Eric Raymond', title='Cathedral and the Bazaar', isbn=969398332, price=0)
>>> for i in range(5):
...     free.rent(i)
... 
>>> free.renters
[0, 1, 2, 3, 4]
>>> free.rent("Batuhan")
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "<stdin>", line 11, in rent
ValueError: 5 People Already Rent This Book
>>> free.unrent(1)
>>> free.rent("Batuhan")
>>> free.renters
[0, 2, 3, 4, 'Batuhan']
>>> repr(free)
"Book(author='Sam Williams', title='Free as in Freedom', isbn=9968237238, price=35)"

Discussion

pic
Editor guide
Collapse
zimba profile image
Martin

Hey Batuhan,

Thanks for the nice post.

I was using dataclasses from attrs module. One of the feature which is missing in the python 3.6 impl is the validators. Are you aware of any good patterns to make it work python dataclasses?

Example dataclass using attrs:

import attr
from attr import dataclass

@dataclass
class Item:
  name: str
  price: int = attr.ib()

  @price.validator
  def check_price(self, attr, price):
    if price <= 0:
       raise ValueError('Invalid price')

How do i convert the above example to python 3.6 dataclasses?

Collapse
btaskaya profile image
Batuhan Osman Taşkaya Author

You can validate fields after initalization process with __post_init__.

@dataclass
class Item:
  name: str
  price: int
  def __post_init__(self):
      if self.price <= 0:
          raise ValueError('Invalid price')

Collapse
zimba profile image
Martin

Thanks, that looks great.

Collapse
jorotenev profile image
Georgi Tenev

Very nice!

Collapse
btaskaya profile image
Collapse
jorotenev profile image
Collapse
jasonhoo95 profile image
jasonhoo

This is an amazing post good one

Collapse
jasonhoo95 profile image
jasonhoo

Can anyone show me where can I find more tutorial for this python in dev.to

Collapse
jasonhoo95 profile image
jasonhoo

Python is amazing I like to learn a lot about it this tutorial rocks