tl;dr: Add --parallel
at the end of the test command
Background
Earlier this week I was deploying a new feature on a legacy codebase. The whole test took 60 minutes before it was done. If this was going to be a one-time deployment, I was gonna let it slide off. I will have to work with this codebase for a little bit longer however, so I need to find a quick way to speed up the test suite.
I then looked for how to make Django test run in parallel and found it really quickly. It's achieved just by adding --parallel
option in the test command, e.g. python manage.py test --parallel
.
So problem solved right? The test ended in 5 minutes. Unfortunately, it ended in 5 minutes because the test failed. After some time figuring out why it failed, I realized that the failing tests were using Redis Queue (django-rq). Apparently running the --parallel
command didn't isolate it.
I googled how to isolate the django-rq when running the test in parallel, but unfortunately, I wasn't able to find the solution. I then had 2 choices: abandon my plan on speeding the test suite, or writing a custom test runner. I chose the latter.
Writing a custom Django Test Runner to isolate Redis Queue
Before writing the test, I read up on how Django isolate the test database. I knew I would either extend or copied most of the logic from that. After digging around, I found the code where the database isolation happens.
Now let's write the custom test runner. Please note that I am using django-rq library. The way to set the Redis connection might be different. Still, the idea to isolate it remains the same. We will achieve isolation by assigning a unique database index to the Redis configuration for each worker.
Firstly, we need to supply a different function to init_worker
in Django's ParallelTestSuite
. To achieve that, we will write our custom test suite called CustomParallelTestSuite
that extends ParallelTestSuite
.
from django.test.runner import _init_worker
from django.test.runner import ParallelTestSuite
def redis_parallel_init_worker(counter):
_init_worker(counter)
class CustomParallelTestSuite(ParallelTestSuite):
init_worker = redis_parallel_init_worker
We still need to isolate the database however, that's why we will still call the _init_worker
function inside our function.
Now we also need to modify the default base runner to use our CustomParallelTestSuite
as well. We will do this by extending the DiscoverRunner
class.
...
from django.test.runner import DiscoverRunner
...
class CustomDiscoverRunner(DiscoverRunner):
parallel_test_suite = CustomParallelTestSuite
Sweet! All we've done up till now is only creating new classes, but the behavior of the test runner remains the same. Now it's time to modify the redis_parallel_init_worker
to add the functionality of isolating the Redis. I copied most of the worker counter from _init_worker
.
from django.conf import settings
from django.test.runner import _init_worker
from django.test.runner import DiscoverRunner
from django.test.runner import ParallelTestSuite
_worker_id = 0
def redis_parallel_init_worker(counter):
from django_rq.settings import QUEUES
_init_worker(counter)
"""
Switch redis databases dedicated to this worker.
This helper lives at module-level because of the multiprocessing module's
requirements.
"""
global _worker_id
with counter.get_lock():
_worker_id = counter.value - 1
settings_dict = getattr(settings, 'RQ_QUEUES', None).copy()
QUEUES = settings_dict
QUEUES['default']['DB'] = _worker_id
With that, we've assigned a unique Redis database index based on their _worker_id
, and Redis isolation is achieved.
Full code
Here's the full code for the custom test runner we just wrote.
# myapp/test/runner.py
from django.conf import settings
from django.test.runner import _init_worker
from django.test.runner import DiscoverRunner
from django.test.runner import ParallelTestSuite
_worker_id = 0
def redis_parallel_init_worker(counter):
from django_rq.settings import QUEUES
_init_worker(counter)
"""
Switch redis databases dedicated to this worker.
This helper lives at module-level because of the multiprocessing module's
requirements.
"""
global _worker_id
with counter.get_lock():
_worker_id = counter.value - 1
settings_dict = getattr(settings, 'RQ_QUEUES', None).copy()
QUEUES = settings_dict
QUEUES['default']['DB'] = _worker_id
class CustomParallelTestSuite(ParallelTestSuite):
init_worker = redis_parallel_init_worker
class CustomDiscoverRunner(DiscoverRunner):
parallel_test_suite = CustomParallelTestSuite
Adding to settings to refer to use the custom test suite
Now the last thing we need to do is to change our settings to use the test runner we've just created. To do this add TEST_RUNNER = 'path.to.custom_runner.CustomRunnerClass'
in settings.py
, for example:
# settings.py
TEST_RUNNER = 'myapp.test.runner.CustomDiscoverRunner'
Wrapping up
Now, calling the command with --parallel
runs the test correctly as intended. The test eventually ended in about 17 minutes, a speedup of around 3 times! Some of the tests can still be optimized if I wanted to increase the speed further, but I'm satisfied with this result for now.
Thanks for reading, I hope this post gives you a basic idea on how to write your own custom test runner if you encountered a similar problem. See you next time.
Top comments (3)
Thank you! This helps me in separating ElasticSearch index as well.
Hi @nguyenbathanh . Could you share your code for ElasticSearch?
In
settings.py
, setWORKER_INDEX = 0
.In custom test runner:
Then using
WORKER_INDEX
from settings as prefix to your ES index.