This article was edited by Carolyn Stransky and inspired by the Property-Based Testing with PropEr, Erlang, and Elixir book.
Properties-driven development means the application of property-based testing to test-driven development. Test-driven development asks us to constantly write tests to ensure our code is easily testable and usable. Property-based testing asks us to write test case generators instead of hard-coded example inputs and outputs and ensure given properties hold for our code.
Thinking in terms of properties forces us to be very explicit about what our code can and cannot do. We're effectively adopting a design by contract approach, which helps us understand the problem we're trying to solve before diving into coding.
In this article, we'll learn what properties-driven development looks like. We'll also apply these principles to develop a module for a sorted dictionary.
What's in this tutorial
- What is properties-driven development?
- Example project: Sorted dictionary
- Conclusion
- Additional resources
⚠️ Prerequisites:
- A general understanding of software testing.
- (Optional) Python 3+* if you want to follow along.
* This guide will use Python for code examples, but the concepts aren't limited to any specific programming language. So even if you don't know Python, we'd encourage you to read along anyway.
💻 References:
This GitHub repository contains all the featured code examples as tests. The repository also contains instructions for how to execute them.
What is properties-driven development?
As mentioned above, properties-driven development is test-driven development with property-based tests. Test-driven development asks us to think what our code should do and put that into a test. In the context of this article, it's not that important whether we write our tests before, after, or during the implementation. What's important is that tests guide the development. Property-based testing asks us to formulate those tests as properties.
For example, let's say we're writing code for converting a CSV into a JSON array. Instead of jumping into writing a CSV parser, test-driven development asks us to first come up with test cases.
Here's an example CSV input:
a,b,c
3,2,1
6,3,2
This should be turned into the following JSON:
[
{ "a": 3, "b": 2, "c": 1 },
{ "a": 6, "b": 3, "c": 2 }
]
This example would make a nice unit test for our code! However, the test would have quite a few assumptions baked in:
- Keys are distinct.
- Values are integers.
- There are no missing values.
Even if we wanted to be thorough and write unit tests to cover each of those assumptions... our imaginations are limited. For example, we already forgot to mention the assumptions that the list of keys is non-empty and that there's at least one row in the CSV.
Property-based tests are excellent for forcing us to be explicit about our assumptions. To come up with properties for our CSV-to-JSON parser, we would need to first generate CSVs we expect our parser to be able to handle.
Here's the pseudocode for this type of generator:
- Generate keys: A list of arbitrary strings.
- Generate the number of rows: A non-negative integer.
- Generate rows: For each row and key, generate an arbitrary string value.
Can you see how many tricky cases our generator forces us to cover? The list of keys may be empty or the number of rows may be zero. We also assume our code works with arbitrary strings (including empty strings and strings containing commas). And that those strings can work as both keys and values. Creating a generator ensures that our CSV parser is pushed to its limits.
But this generator might turn out to be too demanding for our use case. If that happens, we can relax the generator to produce friendlier data. For example, we may find that our code crashes with empty input. If we know that empty inputs are filtered out elsewhere in the code, we can change the generator to produce only non-empty inputs and add an assertion in our code that the input is expected be non-empty. Instead of programming by coincidence, we've consciously made the decision what our code is expected to handle.
The generator we created is an example of a bottom-up approach to data generation. Instead of generating random CSVs directly, we generate them step by step and keep track of what goes in. So when the time comes to write an assertion that our code did the right thing, we know what the expected result is. This avoids the problem where we need to duplicate the possibly complex implementation in tests. For example, with our generator, we know the length of the resulting JSON array should be equal to the number of rows generated at step 2. That's a good property! If you want to dive deeper into how to apply property-based testing to CSV parsing, take a look at this chapter from the PropEr testing book.
With that, we should have an understanding of what properties-driven development is. Next, let's apply these principles in an example project.
Example project: Sorted dictionary
As an example project, we'll build a custom dictionary in Python. We'll call the data structure SortedDict
and expect it to always keep its keys in a particular order. This type of sorted dictionary might be useful, for example, when you want to keep track of users sorted by birth date. By keeping keys sorted, we're able to traverse the sorted list of key-value pairs in linear time.
For our project, we'll keep the keys sorted by storing the key-value pairs in a binary search tree.
Note: Because the standard tree could be unbalanced, and therefore ineffective, you should use
sortedcontainers
in production software.
Implementing a sorted dictionary is a good example for properties-driven development for a few reasons. First, while property-based testing is most common in functional programming where data structures are immutable, the example shows it can be equally as useful for testing mutable objects. Second, while the implementation of the binary tree is straightforward, it's also complex enough to deserve good tests. Especially the deletion logic can be error-prone and hide complex bugs.
To implement the binary search tree, we'll resort to the reference algorithms from Introduction to Algorithms textbook.
For the sake of this article, we'll assume the keys are integers and that the keys themselves are used for comparison (instead of providing a custom callable per value like SortedDict
in sortedcontainers
does).
What should it do?
To come up with properties, we first need to come up with the requirements for our SortedDict
. One way to come up with requirements is to play around with the API that we expect our module to have.
First, we expect that we can search for an inserted key:
>>> # Insert and search
>>> sorted_dict = SortedDict()
>>> sorted_dict[2] = 'two'
>>> sorted_dict[2]
'two'
We also expect that keys are always sorted irrespective of the insertion order. Continuing on the previous example:
>>> # Keys are sorted
>>> sorted_dict[1] = 'one'
>>> sorted_dict.keys()
[1, 2]
Like with a standard dictionary in Python, we expect re-inserting an existing key will result in the value being overwritten:
>>> # Handles duplicate keys
>>> sorted_dict[2] = 'two-two'
>>> sorted_dict[2]
'two-two'
We expect that searching for a non-existing key will raise a KeyError
:
>>> # Non-existing key
>>> sorted_dict[3]
Traceback (most recent call last):
KeyError: ...
Note:
...
represents the rest of theKeyError
log.
Finally, if we delete a key, we expect that searching for it will also raise a KeyError
:
>>> # Searching for deleted key
>>> del sorted_dict[1]
>>> sorted_dict[1]
Traceback (most recent call last):
KeyError: ...
Listing requirements
Now that we have an idea of what we expect our code to do, we can try to generalize those previous examples into requirements:
- Key-value pairs added to our sorted dictionary can be searched.
- Adding a key that already exists overwrites the existing one.
- Keys are always sorted.
- Searching for a non-existing key raises a
KeyError
. - Deleting a key and then searching for it raises a
KeyError
.
These requirements will serve as the basis for our properties.
Next, we'll write a property for the first requirement: Being able to insert key-value pairs into the dictionary and then search for them.
Implementing insert and search
Before writing the test case generators, we first add the basic skeleton for SortedDict
:
# sorted_dict.py
class SortedDict:
def __init__(self):
"""
Sorted dictionary, keeps key-value pairs
sorted by the key.
"""
pass
def __setitem__(self, key, item):
"""
Add a key-value pair to the dictionary.
"""
pass
def __getitem__(self, key):
"""
Get a key-value pair from the dictionary.
"""
pass
The __setitem__
method is called when sorted_dict[key] = item
is invoked. Similarly, __getitem__
defines the behaviour for sorted_dict[key]
.
This skeleton doesn't implement setting or getting the keys yet. So once we're done writing the test, we can run it to make sure it fails. This gives us confidence our test works as expected.
Generator
We'll use the Hypothesis library for writing properties for our sorted dictionary. Hypothesis supports generating almost any kind of data you can imagine. It also provides decorators for writing property-based tests.
To get started with writing our first property-based test, we need a generator of key-value tuples. The entry point to generators in Hypothesis is the hypothesis.strategies
module, which we alias as some
below because it reads nicely. Strategies in Hypothesis are essentially "clever data generators" that you can compose to generate very complex data.
Here's how to compose generators in Hypothesis to produce a list of key-value tuples:
# test_sorted_dict.py
import hypothesis.strategies as some
def some_key_value_tuples():
"""
Generator for lists of key-value tuples.
"""
some_keys = some.integers()
some_values = some.binary()
some_kvs = some.tuples(some_keys, some_values)
return some.lists(some_kvs)
The function first defines a generator some_keys
for keys, which we assume are integers. For values, we assume any binary is valid. We then define some_kvs
, a generator of key-value tuples using some_keys
and some_values
as generators of keys and values, respectively. Finally, we return the generator some.lists(some_kvs)
, which generates lists of tuples.
To see what kind of data the generator creates, we can call some_key_value_tuples().example()
:
>>> some_key_value_tuples().example()
[]
>>> some_key_value_tuples().example()
[(53, b'{\x8b\xed\x92\xa8\xcb\x7fq\x95')]
>>> some_key_value_tuples().example()
[(-19443, b'\x16ERa'), (-425, b'')]
Now that we have a generator for lists of key-value tuples, we want to build a generator generating instances of SortedDict
containing those tuples. With Hypothesis, we can use hypothesis.strategies.composite
for this purpose as follows:
# test_sorted_dict.py
@some.composite
def some_sorted_dicts(draw):
"""
Generator of sorted dicts.
"""
key_values = draw(some_key_value_tuples())
sorted_dict = SortedDict()
for key, val in key_values:
sorted_dict[key] = val
return sorted_dict
The @some.composite
decorator injects the draw
function into the decorated function definition. The draw
function can be used to sample values from another generator. Above, we sample a list of key-value tuples from the some_key_value_tuples()
generator and then insert those key-value pairs into our SortedDict
. Now some_sorted_dicts()
is a data generator generating instances of SortedDict
.
However, there's a problem: when we get an instance of SortedDict
from the generator, we don't know anymore what data went into creating the sorted dictionary. We therefore also return the key-value pairs from the generator, with the pairs stored in a standard Python dictionary:
# test_sorted_dict.py
@some.composite
def some_sorted_dicts(draw):
"""
Generator of sorted dicts. Returns a tuple of the sorted dictionary and a dictionary holding the key-value pairs used for data generation.
"""
... # define sorted_dict as as above
expected = {}
for key, val in key_values:
expected[key] = val
return sorted_dict, expected
The variable expected
contains the key-value pairs we expect to find from our sorted dictionary. The key-value pairs it contains act as a "model" for what our sorted dictionary should contain. Using a standard dictionary as the model is useful, because it handles duplicates in the same way we expect SortedDict
to handle, i.e., by overwrite.
Property
With the test case generator some_sorted_dicts
defined, we're ready to state our first property: Any keys inserted into the dictionary can be searched.
# test_sorted_dict.py
from hypothesis import given
@given(dict_and_values=some_sorted_dicts())
def test_insert_and_search(dict_and_values):
"""
Key-value pairs added to sorted dictionary can be searched.
"""
sorted_dict, expected = dict_and_values
for key, value in expected.items():
in_dict = sorted_dict[key]
assert in_dict == value
The test case is marked as a Hypothesis test with the @given
function decorator. When this test is run, Hypothesis will generate 100 different sorted dictionaries and verify that all of the key-value pairs expected in the dictionary are found.
Note: 100 is the default number of satisfying test cases that will run before terminating in Hypothesis. You can change this value by adding a @settings object.
If we run the tests now, they should fail: we haven't implemented __setitem__
and __getitem__
yet. For those, we need the binary search tree.
Binary search tree
For the binary search tree, we'll use the implementation from the Introduction to Algorithms book.
Tree
is defined as a dataclass as follows:
# tree.py
from dataclasses import dataclass
import typing as t
@dataclass
class Tree:
"""
Binary search tree.
"""
root: t.Optional[Node] = None
def __repr__(self):
return "Tree(root={})".format(repr(self.root))
The @dataclass
decorator adds, for example, an __init__
method for creating a tree with an optional root
argument. If the tree is empty, root
is equal to None
. Otherwise, it contains a Node
defined like this:
# tree.py
@dataclass
class Node:
key: int
value: t.Any
parent: t.Optional["Node"] = None
left: t.Optional["Node"] = None
right: t.Optional["Node"] = None
def __repr__(self):
return "Node(key={}, value={}, left=({}), right=({})".format(
self.key, repr(self.value), repr(self.left), repr(self.right)
)
A Node
has a key and a value. It also contains references to its left
and right
children as well as its parent
.
Note:
parent
isn't included in__repr__
to avoid infinite loops (printing a child then prints the parent, which prints the child, which prints the parent...).
Inserting a key and a value to the tree happens as follows:
# tree.py
def insert(tree: Tree, key, value):
"""
Reference insertion algorithm from Introduction to Algorithms.
Operates in-place.
"""
y = None
x = tree.root
while x is not None:
y = x
if key < x.key:
x = x.left
elif key > x.key:
x = x.right
else:
x.value = value
return
z = Node(key=key, value=value, parent=y)
if y is None:
tree.root = z
elif z.key < y.key:
y.left = z
else:
y.right = z
Note: If
key
is equal to an existing key, the value in the corresponding node is updated in-place.
To search from the tree, we need to go down the tree until a match is found. Or, if no match is found, raise a KeyError
:
# tree.py
def search(tree: Tree, key):
node = _search_node(tree, key)
return node.value
def _search_node(tree: Tree, key) -> Node:
if tree.root is None:
raise KeyError("Empty dictionary, can't find key {}".format(key))
x = tree.root
while x is not None:
if key < x.key:
x = x.left
elif key > x.key:
x = x.right
else:
return x
raise KeyError("Key {} not found".format(key))
Our search
involves a helper function _search_node
that searches a Node
by key. We'll need this when implementing deletion later.
Putting it all together
With the tree supporting insert and search, we can use this tree in our SortedDict
:
# sorted_dict.py
from . import tree
class SortedDict:
def __init__(self):
self._tree = tree.Tree()
def __setitem__(self, key, item):
tree.insert(self._tree, key, item)
def __getitem__(self, key):
return tree.search(self._tree, key)
If we now run the property-based test for insertion and search, we should see it pass!
Now we have implemented the first requirement with a property-based test. Let's take a look at our remaining requirements:
Key-value pairs added to our sorted dictionary can be searched.Adding a key that already exists overwrites the existing one.- Keys are always sorted.
- Searching for a non-existing key raises a
KeyError
. - Deleting a key and then searching for it raises a
KeyError
.
In this case, we'll also consider our second requirement implemented as we trust Hypothesis to have generated duplicate keys. We could work on the requirement of keeping keys sorted next, but we want to be sure the requirement also holds after deletions. So that's the next requirement we'll tackle: Deleting a key and then searching for it raises a KeyError
.
Implementing deletion
The requirement states that searching for a deleted key raises a KeyError
. But how can we put that into a property? One way is to use our some_sorted_dicts()
generator from earlier and let Hypothesis draw one of the keys for deletion. We then delete that key and ensure searching raises a KeyError
.
To draw a key to delete, we could use composite
again to create a composite strategy. This strategy would yield a sorted dictionary, the dictionary of expected contents, and a key to delete. However, in this case, it makes more sense to use the built-in data
strategy from Hypothesis. This strategy comes with the draw
method and we can use it to sample a value from a generator during the test.
Here's how we can use data()
to sample a key to delete from the list of inserted keys:
# test_sorted_dict.py
@given(
dict_and_values=some_sorted_dicts(),
data=some.data(),
)
def test_search_after_delete(dict_and_values, data):
"""
Searching for a key after deleting that key raises a KeyError.
"""
sorted_dict, expected = dict_and_values
inserted_keys = list(expected.keys())
some_key = some.sampled_from(inserted_keys)
key_to_delete = data.draw(some_key, label="Key to delete")
del sorted_dict[key_to_delete]
with pytest.raises(KeyError): # type: ignore
sorted_dict[key_to_delete]
This test first uses sampled_from
to create a generator called some_key
. When sampled, some_key
randomly chooses one key from the list of inserted keys. We then use data.draw()
to sample a key that we want to delete. The label
argument is added so that Hypothesis can print a clearer error message if it finds a failing test case. Once a key is drawn, we delete the key and verify that searching for it raises a KeyError
.
But there's a problem with our implementation: The list of inserted keys may be empty. We need to cover that case by ensuring the test only runs for non-empty dictionaries. We can do that by using filter
with our generator:
# test_sorted_dict.py
@given(
dict_and_values=some_sorted_dicts().filter(lambda drawn: len(drawn[1]) > 0),
data=some.data(),
)
def test_search_after_delete(dict_and_values, data):
...
Note: Throughout the rest of the tutorial,
...
represents the rest of the function or class as it was previous written.
The filter ensures that the dictionary is non-empty.
With the property ready, we can now move on to the implementation. The delete operation in our test, del sorted_dict[key]
, translates to sorted_dict.__delitem__(key)
. So we need to define __delitem__
in SortedDict
:
# sorted_dict.py
class SortedDict:
...
def __delitem__(self, key):
return tree.delete(self._tree, key)
Again, we delegate the deletion to the tree with the tree.delete
function. Because the deletion is somewhat tricky but can be copied from the Introduction to Algorithms book, we refer to tree.py
in the accompanying repository for its implementation.
Having tackled deletion, our list of requirements now looks as following:
Key-value pairs added to our sorted dictionary can be searched.Adding a key that already exists overwrites the existing one.- Keys are always sorted.
- Searching for a non-existing key raises a
KeyError
. Deleting a key and then searching for it raises aKeyError
As our last example, we'll write a property-based test for the third requirement: Keys are always sorted. The property for final requirement (Searching for a non-existing key raises a KeyError
) can be found in test_sorted_dict.py
in the accompanying repository.
Ensuring keys are sorted
A property like "Keys are always sorted" is an invariant: Whatever operations are performed, it should remain true. Assuming SortedDict
has a keys()
method returning the list of keys, we can write a property-based test ensuring that all the instances of SortedDict
generated by some_sorted_dicts()
have sorted keys:
# test_sorted_dict.py
@given(dict_and_values=some_sorted_dicts())
def test_keys_sorted(dict_and_values):
"""
Invariant: Keys in the sorted dictionary are sorted.
"""
sorted_dict, _ = dict_and_values
keys = sorted_dict.keys()
assert keys == sorted(keys)
The implementation for sorted_dict.keys()
uses in-order walk to traverse the tree. The implementation can be found in sorted_dict.py
and tree.py
in the accompanying repository.
The previous property ensures that keys are always sorted after they are inserted. But this property should also be checked after deletion. So, we should make sure that the invariant holds in our earlier deletion property:
# test_sorted_dict.py
@given(
dict_and_values=some_sorted_dicts().filter(lambda drawn: len(drawn[1]) > 0),
data=some.data(),
)
def test_search_after_delete(dict_and_values, data):
...
keys = sorted_dict.keys()
assert keys == sorted(keys)
While this works, there's something unsatisfactory about it. The premise of property-based testing is that we can cover all kinds of unexpected cases. In test_keys_sorted
and test_search_after_delete
, we're hard-coding the cases where the invariant should be tested. Is there something we can do to randomize testing the invariant?
Enter stateful testing. With stateful tests, we define operations that can be run, but the framework decides the order.
With Hypothesis, we can use RuleBasedStateMachine
for that. We won't go into details, but here's an example for SortedDict
:
# test_sorted_dict.py
class StatefulDictStateMachine(RuleBasedStateMachine):
def __init__(self):
super().__init__()
self.sorted_dict = SortedDict()
self.state = {}
inserted_keys = Bundle("inserted")
deleted_keys = Bundle("deleted_keys")
@rule(target=inserted_keys, key=some.integers(), value=some.text())
def insert(self, key, value):
event("Inserting key")
self.sorted_dict[key] = value
self.state[key] = value
return key
@rule(key=inserted_keys)
def search(self, key):
# A key inserted before may have already been
# deleted if it was a duplicate, so searching it
# may not succeed. Check the key exists in
# the model dictionary.
assume(key in self.state)
event("Searching existing key")
assert self.sorted_dict[key] == self.state[key]
@rule(key=consumes(inserted_keys))
def delete(self, key):
assume(key in self.state)
event("Deleting key")
del self.sorted_dict[key]
del self.state[key]
@rule(key=some.integers())
def search_non_existing(self, key):
assume(key not in self.state)
event("Searching non-existing key")
with pytest.raises(KeyError): # type: ignore
self.sorted_dict[key]
@invariant()
def keys_sorted(self):
keys = self.sorted_dict.keys()
assert keys == sorted(keys)
TestStatefulDict = StatefulDictStateMachine.TestCase
The methods defined in our RuleBasedStateMachine
, decorated with @rule
, define the commands the state machine runs. The decorator @invariant
ensures that we check after every command if the keys are being sorted. We also keep a model of our current expected state in self.state
to know what keys our SortedDict
is expected to contain.
Note: Stateful tests, like the one above, can work wonders for revealing tricky bugs. However, they can be very hard to debug, so always try to cover as many cases as possible with regular unit and property-based tests.
Final touch: Add doctest
Remember how we played around with the expected API of our SortedDict
to come up with the properties? For example, we assumed the following would hold:
>>> # Insert and search
>>> sorted_dict = SortedDict()
>>> sorted_dict[2] = 'two'
>>> sorted_dict[2]
'two'
It would be great to ensure these smaller examples also hold for our implementation. Writing properties can be complex and error-prone, so it's valuable to have these human-readable examples to serve as anchors. They ensure that no matter what happens, our code works as expected at least with small example inputs.
These tests serve as documentation, which means they make excellent doctests. We can add the tests at the top of our sorted_dict.py
module in a docstring:
# sorted_dict.py
"""
Sorted, mutable dictionary
keeping keys in sorted order.
Examples:
>>> # Insert and search
>>> sorted_dict = SortedDict()
>>> sorted_dict[2] = 'two'
>>> sorted_dict[2]
'two'
>>> # Handles duplicate keys
>>> sorted_dict[2] = 'two-two'
>>> sorted_dict[2]
'two-two'
...
"""
In pytest
, we can enable doctests with the --doctest-modules
flag. Add the following in your pytest.ini
to always run doctests:
# pytest.ini
[pytest]
addopts = --doctest-modules
# Ignore extraneous whitespaces and exception details.
doctest_optionflags= NORMALIZE_WHITESPACE IGNORE_EXCEPTION_DETAIL
When pytest
is run, it now also runs the examples from the documentation. And with that, we're done implementing the features we set out to implement for our SortedDict
!
Conclusion
In this article, we learned how to apply property-based testing to test-driven development. We also used this principle to develop a sorted dictionary. Thinking in terms of properties instead of concrete examples can help us slow down and be precise about what our code is expected to do.
It's important to remember that property-based testing doesn't replace regular unit tests. Instead, it provides a new tool to our testing toolboxes. And sometimes, property-based testing isn't the right tool for the job. For an interesting article on when property-based testing shines, take a look at this article by Brujo Benavides. However, the more experienced we get at writing test case generators and property-based tests, the more natural it becomes to use them for testing almost any kind of code.
Thank you for reading! As always, we're happy to hear your feedback.
Additional resources
- Property-Based Testing with PropEr, Erlang, and Elixir: Excellent book on property-based testing with a chapter on properties-driven development
- From 1 to 10,000 test cases in under an hour: A beginner's guide to property-based testing: Guide to PBT from my colleagues Carolyn and Fredrik
- Hypothesis Quick Start guide: Get started with property-based testing with Hypothesis in Python
- My Take on Property-Based Testing: Insights on when to use property-based testing
- Property-Based Test-Driven Development in Elixir: Mathias Polligkeit's article on properties-driven development
Top comments (2)
In the Ruby world, we use Faker and FactoryBot to help handle the generator code. Makes it very easy to write these explicit generators (e.g. give me a number between 0 and 5000) with a DSL that's very easy to understand.
Thanks for your comment! I haven't coded in Ruby but it's always interesting to see how libraries in other languages solve similar problems.