loading...

How I Speed up My Django Test Suite by 200%

davidkwan95 profile image David ・3 min read

tl;dr: Add --parallel at the end of the test command

Background

Earlier this week I was deploying a new feature on a legacy codebase. The whole test took 60 minutes before it was done. If this was going to be a one-time deployment, I was gonna let it slide off. I will have to work with this codebase for a little bit longer however, so I need to find a quick way to speed up the test suite.

I then looked for how to make Django test run in parallel and found it really quickly. It's achieved just by adding --parallel option in the test command, e.g. python manage.py test --parallel.

So problem solved right? The test ended in 5 minutes. Unfortunately, it ended in 5 minutes because the test failed. After some time figuring out why it failed, I realized that the failing tests were using Redis Queue (django-rq). Apparently running the --parallel command didn't isolate it.

I googled how to isolate the django-rq when running the test in parallel, but unfortunately, I wasn't able to find the solution. I then had 2 choices: abandon my plan on speeding the test suite, or writing a custom test runner. I chose the latter.

Writing a custom Django Test Runner to isolate Redis Queue

Before writing the test, I read up on how Django isolate the test database. I knew I would either extend or copied most of the logic from that. After digging around, I found the code where the database isolation happens.

Now let's write the custom test runner. Please note that I am using django-rq library. The way to set the Redis connection might be different. Still, the idea to isolate it remains the same. We will achieve isolation by assigning a unique database index to the Redis configuration for each worker.

Firstly, we need to supply a different function to init_worker in Django's ParallelTestSuite. To achieve that, we will write our custom test suite called CustomParallelTestSuite that extends ParallelTestSuite.

from django.test.runner import _init_worker
from django.test.runner import ParallelTestSuite

def redis_parallel_init_worker(counter):
    _init_worker(counter)

class CustomParallelTestSuite(ParallelTestSuite):
    init_worker = redis_parallel_init_worker

We still need to isolate the database however, that's why we will still call the _init_worker function inside our function.

Now we also need to modify the default base runner to use our CustomParallelTestSuite as well. We will do this by extending the DiscoverRunner class.

...
from django.test.runner import DiscoverRunner
...
class CustomDiscoverRunner(DiscoverRunner):
    parallel_test_suite = CustomParallelTestSuite

Sweet! All we've done up till now is only creating new classes, but the behavior of the test runner remains the same. Now it's time to modify the redis_parallel_init_worker to add the functionality of isolating the Redis. I copied most of the worker counter from _init_worker.

from django.conf import settings
from django.test.runner import _init_worker
from django.test.runner import DiscoverRunner
from django.test.runner import ParallelTestSuite

_worker_id = 0

def redis_parallel_init_worker(counter):
    from django_rq.settings import QUEUES
    _init_worker(counter)

    """
    Switch redis databases dedicated to this worker.
    This helper lives at module-level because of the multiprocessing module's
    requirements.
    """

    global _worker_id

    with counter.get_lock():
        _worker_id = counter.value - 1

    settings_dict = getattr(settings, 'RQ_QUEUES', None).copy()

    QUEUES = settings_dict
    QUEUES['default']['DB'] = _worker_id

With that, we've assigned a unique Redis database index based on their _worker_id, and Redis isolation is achieved.

Full code

Here's the full code for the custom test runner we just wrote.

# myapp/test/runner.py

from django.conf import settings
from django.test.runner import _init_worker
from django.test.runner import DiscoverRunner
from django.test.runner import ParallelTestSuite

_worker_id = 0


def redis_parallel_init_worker(counter):
    from django_rq.settings import QUEUES
    _init_worker(counter)

    """
    Switch redis databases dedicated to this worker.
    This helper lives at module-level because of the multiprocessing module's
    requirements.
    """

    global _worker_id

    with counter.get_lock():
        _worker_id = counter.value - 1

    settings_dict = getattr(settings, 'RQ_QUEUES', None).copy()

    QUEUES = settings_dict
    QUEUES['default']['DB'] = _worker_id


class CustomParallelTestSuite(ParallelTestSuite):
    init_worker = redis_parallel_init_worker


class CustomDiscoverRunner(DiscoverRunner):
    parallel_test_suite = CustomParallelTestSuite

Adding to settings to refer to use the custom test suite

Now the last thing we need to do is to change our settings to use the test runner we've just created. To do this add TEST_RUNNER = 'path.to.custom_runner.CustomRunnerClass' in settings.py, for example:

# settings.py
TEST_RUNNER = 'myapp.test.runner.CustomDiscoverRunner'

Wrapping up

Now, calling the command with --parallel runs the test correctly as intended. The test eventually ended in about 17 minutes, a speedup of around 3 times! Some of the tests can still be optimized if I wanted to increase the speed further, but I'm satisfied with this result for now.

Thanks for reading, I hope this post gives you a basic idea on how to write your own custom test runner if you encountered a similar problem. See you next time.

Discussion

pic
Editor guide