American football players "blocking" the kick, not "intercepting."
Special thanks to Luciano Ramalho. I learned most of the knowledge about descriptors from his workshop in PyBay 2017
Have you seen this code or maybe have written code like this?
from sqlalchemy import Column, Integer, String
class User(Base):
id = Column(Integer, primary_key=True)
name = Column(String)
This code snippet partially comes from the tutorial of a popular ORM package called SQLAlchemy. If you ever wonder why the attributes id
and name
aren't passed into the __init__
method and bind to the instance like regular class does, this post is for you.
This post starts with explaining descriptors, why to use them, how to write them in previous Python versions (<= 3.5,) and finally writing them in Python 3.6 with the new feature described in PEP 487 -- Simpler customisation of class creation
If you are in a hurry or you just want to know what's new, scroll all the way down to the bottom of this article. You'll find the whole code.
What are descriptors
A great definition of descriptor is explained by Raymond Hettinger in Descriptor HowTo Guide:
In general, a descriptor is an object attribute with “binding behavior”, one whose attribute access has been overridden by methods in the descriptor protocol. Those methods are __get__(), __set__(), and __delete__(). If any of those methods are defined for an object, it is said to be a descriptor.
There are three ways to access an attribute. Let's say we have the a
attribute on the object obj
:
- To lookup its value,
some_variable = obj.a
, - To change its value,
obj.a = 'new value'
, or - To delete it,
del obj.a
Python is dynamic and flexible to allow users intercept the above expression/statement and bind behaviors to them.
Why you want to use descriptors
Let's see an example:
class Order:
def __init__(self, name, price, quantity):
self.name = name
self.price = price
self.quantity = quantity
def total(self):
return self.price * self.quantity
apple_order = Order('apple', 1, 10)
apple_order.total()
# 10
Despite the lack of proper documentation, there is a bug:
apple_order.quantity = -10
apple_order.total
# -10, too good of a deal!
Instead of using getter and setter methods and break the APIs, let's use property to enforce quantity
be positive:
class Order:
def __init__(self, name, price, quantity):
self._name = name
self.price = price
self._quantity = quantity # (1)
@property
def quantity(self):
return self._quantity
@quantity.setter
def quantity(self, value):
if value < 0:
raise ValueError('Cannot be negative.')
self._quantity = value # (2)
...
apple_order.quantity = -10
# ValueError: Cannot be negative
We transformed quantity
from a simple attribute to a non-negative property. Notice line (1)
that the attribute are renamed to _quantity
to avoid line (2)
getting a RecursionError
.
Are we done? Hell no. We forgot about the price
attribute cannot be negative neither. It might be attempting to just create another property for price
, but remember the DRY principle: when you find yourself doing the same thing twice, it's a good sign to extract the reusable code. Also, in our example, there might be more attributes need to be added into this class in the future. Repeating the code isn't fun for the writer or the reader. Let's see how to use descriptors to help us.
How to write descriptors
With the descriptors in place, our new class definition would become:
class Order:
price = NonNegative('price') # (3)
quantity = NonNegative('quantity')
def __init__(self, name, price, quantity):
self._name = name
self.price = price
self.quantity = quantity
def total(self):
return self.price * self.quantity
apple_order = Order('apple', 1, 10)
apple_order.total()
# 10
apple_order.price = -10
# ValueError: Cannot be negative
apple_order.quantity = -10
# ValueError: Cannot be negative
Notice the class attributes defined before the __init__
method? It's a lot like the SQLAlchemy example showed on the very beginning of this post. This is where we are heading. We need to define the NonNegative
class and implement the descriptor protocols. Here's how:
class NonNegative:
def __init__(self, name):
self.name = name # (4)
def __get__(self, instance, owner):
return instance.__dict__[self.name] # (5)
def __set__(self, instance, value):
if value < 0:
raise ValueError('Cannot be negative.')
instance.__dict__[self.name] = value # (6)
Line (4)
: the name
attribute is needed because when the NonNegative
object is created on line (3)
, the assignment to attribute named price
hasn't happen yet. Thus, we need to explicitly pass the name price
to the initializer of the object to use as the key for the instance's __dict__
.
Later, we'll see how in Python 3.6+ we can avoid the redundancy.
The redundancy could be avoid in earlier versions of Python, but I think this would take too much effort to explain and is not the purpose of this post. Thus, not included.
Line (5)
and (6)
: instead of using builtin function getattr
and setattr
, we need to reach into the __dict__
object directly, because the builtins would be intercepted by the descriptor protocols too and cause the RecursionError
.
Welcome to Python 3.6+
We are still repeating ourself in line (3)
. How do I get a cleaner API to use such that we write:
class Order:
price = NonNegative()
quantity = NonNegative()
def __init__(self, name, price, quantity):
...
Let's look at the new descriptor protocol in Python 3.6:
-
object.__set_name__(self, owner, name)
- Called at the time the owning class owner is created. The descriptor has been assigned to name.
With this protocol, we could remove the __init__
and bind the attribute name to the descriptor:
class NonNegative:
...
def __set_name__(self, owner, name):
self.name = name
To put all the codes together:
class NonNegative:
def __get__(self, instance, owner):
return instance.__dict__[self.name]
def __set__(self, instance, value):
if value < 0:
raise ValueError('Cannot be negative.')
instance.__dict__[self.name] = value
def __set_name__(self, owner, name):
self.name = name
class Order:
price = NonNegative()
quantity = NonNegative()
def __init__(self, name, price, quantity):
self._name = name
self.price = price
self.quantity = quantity
def total(self):
return self.price * self.quantity
apple_order = Order('apple', 1, 10)
apple_order.total()
# 10
apple_order.price = -10
# ValueError: Cannot be negative
apple_order.quantity = -10
# ValueError: Cannot be negative
Conclusion
Python is a general purpose programming language. I love that it not only has very powerful features that are highly flexible and could possibly bend the language tremendously (e.g. Meta Classes,) but also has high-level APIs/protocols to serve 99% of the needs (e.g. Descriptors.) I believe there's the right tool for the job. Descriptors are clearly the right tool for binding behaviors to attributes. Although Meta Classes could potentially do the same thing, Descriptor could solve the problem more gracefully. It's also pleasing to see Python evolve for serving general people's needs better.
Here's my conclusion:
- Python 3.6 is by far the greatest Python.
- Descriptors are used to bind behaviors to accessing attributes.
Top comments (4)
Use WeakKeyDictionary instead of ordinary dictionaries when creating descriptor classes, else you will run into problems (bugs will appear) when you start to delete instances, as those instance won't get garbage deleted:
Watch this:
youtube.com/watch?v=lmcgtUw5djw
time: (15:00)
Have you examined how well auto-complete and type-inference works when using descriptors for various IDEs? This is always the problem that I run into when working with new language features unfortunately.
Hey Seth, no I haven't, although I feel most IDSs should have good support on descriptors since it's a well established feature from Python 2.2. The only new feature here is the
__set_name__
protocol that's been added since Python 3.6.That's what I was looking for. Thank's a lot.