Today the #DjangoTip will be about using select_related
and prefetch_related
to improve our queries performance.
You can clone this project from [http://github.com/LucasMagnum/django-tip-02](http://github.com/LucasMagnum/django-tip-02)
and follow the examples below.
It takes time to fully understand how select_related
and prefetch_related
works, so keep looking for other resources and I hope this post helps you to understand a bit more. Code and have fun ❤
Before we start, let me ask you one question; Did you clapped today? I would love to hear your claps and read your feedback!
Product model
This will be the model used in all examples below, so we can always come back here and read it again.
https://github.com/LucasMagnum/django-tip-02/blob/master/app/products/models.py
It’s pretty simple, right? The Product
has a foreign key relationship with the Category
and the Category
has one many to many relationships with itself.
Let’s start, shall we?
List of products
Let’s start creating a list of products function, this function will return one list of dicts with the fields: id
, title
and category
name:
https://github.com/LucasMagnum/django-tip-02/blob/master/app/products/queries.py#L128
For test our function, we need products into our database. We have one command to load products into the database, just execute the load_products
command and 500 products and 50 categories will be inserted into our database.
python app/manage.py load_products
Now we open our shell
running python app/manage.py shell
and test our function:
We have all our products, but what’s the problem with this function?
Let’s see how many queries this function does into the database:
Note: I created one debugger decorator to see how many queries each example takes, you don’t have to understand how it works, just have in mind that it shows the number of the queries and the time in seconds for each example :p
https://gist.github.com/LucasMagnum/81a122b7317baccf7c6bb4c9c549c928
Wow! 501 queries for that simple piece of code?? What happened?
For each time we access one foreign key
that is not in cache, another
query will be made to retrieve the value.
In our case, we accessing the category
inside a loop, one query will be made for each time the loop executes.
for product in product_qs:
products.append({
'id': product.id,
'title': product.title,
'category': product.category.name
})
To avoid this high number of queries when accessing foreign keys
or one to one fields
we can use the select_related
method.
Select related
We can use the select_related
method to load the foreign key
values and cache those results for each object. By default Django ORM doesn’t make a JOIN for our results, so each time we need to access the foreign key another query will be made.
Let’s change our code to use the select_related
with our products queryset:
https://github.com/LucasMagnum/django-tip-02/blob/master/app/products/queries.py#L148
Now we can verify that these two functions return the same values and see how many queries the products_list_select_related
does:
https://gist.github.com/LucasMagnum/5d5e54e80a7ec4b1c1b399e65081a4c4
Congrats! Now we did only 1 query instead of 501, it’s a real improvement :D
List of categories
It was an amazing improvement on the last function, now we will create a list of categories function, this will help us to understand the behavior of the prefetch_related
method.
Let’s create our categories list function to retrieve all categories and their subcategories. This function should return the name
, is_active
and a subcategories
list for each category.
https://github.com/LucasMagnum/django-tip-02/blob/master/app/products/queries.py#L7
If we run this function in the debugger mode, we will see how many queries this function is doing:
https://gist.github.com/LucasMagnum/b35cb3d6158644ebbc40ca7e345896ed
Ok, we saw this problem before, we just need to add the select_related
and we will end up with just one query, right? Err, No!
If we try to change the categories_qs
to use select_related
we will receive an error from Django:
Exception when trying to use select_related in a many to many field
We can not use the select_related
with many to many relationships
, do you remember? To improve our queries we need to use a new method called prefetch_related
.
Prefetch related
When we are using many to many relationships
we could use prefetch_related
method to booster our queries =D
Let’s create another function using the prefetch_related
to see our improvements:
Categories list using prefetch related to improve our queries.
We just changed the line 5 to use prefetch_related
on subcategories
field. Now let’s see how many queries this new function is doing:
https://gist.github.com/LucasMagnum/efe74895552ebd24c5897db90b81712a
Good job! We improved the performance of our query, but wait for a second, why we have two queries?
Different from select_related
the prefetch_related
made the JOIN using Python rather than in the database.
In this case, Django made two queries and then join the results into one queryset for us.
Let’s create a new function that will return only the active subcategories using the prefetch_related:
Categories list using the prefetch related function.
Now, when we run the debugger for this function, we see:
https://gist.github.com/LucasMagnum/f5022fc09830a21c534e8ca1f2b8bd92
WOW! We increased the number of queries, how it’s possible?????
Calm down, there is a reason for this:
When we use the prefetch related
we are saying to Django we want all the results to be JOINED, but when we use the filter(is_active=True)
we are changing the primary query and then Django doesn’t JOIN the right results for us.
That’s why we have 52 queries, 51 queries iterating over the categories and 1 query to get all the results in prefetch.
Let’s see how to solve this problem using the new Prefetch
introduced recently by Django (;
Using Prefetch with to_attr
Prefetch using to_attr to return prefetch related values
There’s two changes in this new function, I changed the categories_qs
query to use the custom Prefetch
and changed the loop to use a new attribute active_subcategories
.
Let’s understand what it is doing:
Category.objects.prefetch_related(
Prefetch(
'subcategories',
queryset=Category.objects.filter(is_active=True),
to_attr='active_subcategories'
)
)
We are prefetching all values from subcategories
field, using the Category.objects.filter(is_active=True)
as base queryset
and passing telling Django we want all the prefetched values into the attribute active_subcategories
. This will create an attribute active_subcategories
for each category
returned by our queryset
.
We can see it in our django shell
:
https://gist.github.com/LucasMagnum/a13afb6ad657136d7c17f5e53d2a6f68
As we can see, when we try to access the active_subcategories
without making the prefetch related we will receive an error.
We need to use the new Prefetch
to set the new attribute for our use.
Now we will make a new query to prefetch all the results;
YEES!! Now we can understand a bit better how the new Prefetch
works.
Let’s debug our function now:
https://gist.github.com/LucasMagnum/f6060fe8bc0a1390aed0e27439eb029b
Great!! Now we have only two queries and not 52 =D
Let’s how we could achieve the same result without using the to_attr
parameter.
Using Prefetch without to_attr
If we don’t want to create a new attribute, we can reuse the attribute from the field and just change our base query:
https://gist.github.com/LucasMagnum/21074818968d99d7507d24526fa63818
Here, we are using the Prefetch
without specifying an attribute.
When we are accessing the category.subcategories
we are using the values from the Prefetch
, so we don’t need to filter the is_active
here because the results will come filtered =)
And if we debug this function, we still have only two queries:
https://gist.github.com/LucasMagnum/5bda88c72978ad4679b8062a5cd7d6b0
We can check if our functions are returning the same values:
https://gist.github.com/LucasMagnum/65425f73807da299b08acac8ef393daa
FANTASTIC!!! Everything is working and I hope that you learned a bit more about the new Prefetch today =)
TL;DR;
- Select related & Prefetch related are performance boosters
- Select related does one query to JOIN the tables
- Select related could be used with ForeignKey and OneToOneField
- Select related doesn’t work with ManyToMany fields
- Prefetch related makes two queries and do the JOIN using Python
- Prefetch related loses it effect when you change the base query
Useful links
- https://docs.djangoproject.com/en/1.11/ref/models/querysets/
- https://docs.djangoproject.com/en/1.11/ref/models/querysets/#prefetch-related
- https://docs.djangoproject.com/en/1.11/ref/models/querysets/#django.db.models.query.QuerySet.select_related
See you later, keep coding and have fun ❤
Top comments (0)