DEV Community

Cover image for Fix slow queries in Django when using annotate and subqueries
Eduardo Zepeda
Eduardo Zepeda

Posted on • Originally published at coffeebytes.dev on

Fix slow queries in Django when using annotate and subqueries

Django’s ORM is quite useful and versatile, it can perform most of the common SQL operations, such as filtering, partitioning, joins or sorting information, creating aliases, but it also has its limitations, especially when combined with subqueries, today I’ll tell you about one of its limitations and how to solve it.

Despite its few weaknesses, its ORM is one of the reasons why you should use Django.

Django annotate and subqueries, a performance problem

The Django annotate function, which I already told you about in a post where I explain the differences between annotate and aggregate in Django, is used to add information to a SQL query, this information can be an average, a sum or anything else you want, the problem occurs when that information comes from a subquery.

Let me give you an example:

from django.db.models import F
from django.db.models.expressions import Subquery

first_subquery = Subquery(...)
second_subquery = Subquery(...)

queryset = YourModel.objects.annotate(first_annotation=first_subquery)
    .annotate(second_annotation=second_subquery)
    .annotate(
        third_annotation=F("first_subquery") - F("second_subquery"))
    .annotate(
        fourth_annotation=((F("first_subquery") - F("second_subquery")) / F("second_subquery"))
    )

Enter fullscreen mode Exit fullscreen mode

The problem here arises when we mix subqueries with annotate, and then proceed to use those annotations in other annotations.

Django does not have the ability to recognize that it is already repeating the subqueries over and over again, so the SQL it generates repeats the same subqueries over and over again, resulting in a poorly performing query; we fall into the famous n+1 queries problem.

SQL generated by Django using annotate and subqueries is inefficient.

Worse, where exactly is the problem? The django ORM translates the above queryset into the following SQL query:

SELECT columns
        (SELECT ...first_subquery - SELECT ...second_subquery) AS "third_annotation",
        (SELECT ...first_subquery - SELECT ...second_subquery)/(SELECT ...first_subquery) as "fourth_annotation",
        (SELECT ...first_subquery) as "first_annotation",
        (SELECT ...second_subquery) as "second_annotation"
    FROM table_a LEFT OUTER JOIN table_b
    ON table_a.id = table_b.id
    GROUP BY table_a.id ...

Enter fullscreen mode Exit fullscreen mode

Notice how Django is reusing the SQL from each subquery multiple times during the query, instead of performing the query once and then reusing that value.

If you don’t know how to get the SQL query that Django’s ORM generates, I remind you, qs represents your queryset:

print(qs.query)

Enter fullscreen mode Exit fullscreen mode

How to fix this? Well, one of the ways to fix this SQL query is to use Common Table Expressions (CTEs), however, as of this writing, Django does not support Common Table Expressions (CTEs), so we will have to use a raw query instead of the methods already provided by the Django ORM.

Use Common Table Expressions (CTEs) to improve annotate and subqueries performance.

The solution is to create a raw query, remember that modern versions of django you can use the raw method of your model manager so that Django automatically assigns it to a queryset object of your respective model.

qs = YourModel.objects.raw("YOUR_SQL_RAW_QUERY_GOES_HERE")

Enter fullscreen mode Exit fullscreen mode

The SQL query with the Common Table Expressions (CTEs) that we will use would look like this:

WITH my_cte AS (
    SELECT 
        a.column
        (SELECT ...subquery_one) AS first_annotation, 
        (SELECT ...subquery_two) AS second_annotation
    FROM table_a 
    LEFT OUTER JOIN table_b 
    ON table_a.id = table_b.id 
    GROUP BY table_a.id ...
)
SELECT 
    columns,
    first_annotation, 
    second_annotation, 
    first_annotation - second_annotation AS third_annotation,
    (first_annotation - second_annotation)/first_annotation AS fourth_annotation
FROM my_cte;

Enter fullscreen mode Exit fullscreen mode

As you can see the subqueries are in parentheses and each of them appears only once.

Using Common Table Expressions (CTEs) will allow us an efficient query, avoiding multiple repetitive queries to the database and will give us a performance that outperforms the Django ORM’s query by several orders of magnitude. Perhaps implementing CTEs is one of the actions that can be taken to improve the Django framework.

Top comments (0)