I hit a performance wall at work recently after adding a single extra Count() on a Django queryset that looked harmless. Luckily, the problematic code didn't make it into the production environment, but I had to think hard to figure out what went wrong and find a solution.
The Demo Models
All the following models and entities are made up by me to illustrate the problem.
Imagine we have a PostgreSQL database storing information about separate stores, their departments, employees and products of the stores:
The DB tables can contain much more fields, but in our case we only care about relations between the models.
So in Django, models.py would look like this:
from django.db.models import Model, CharField, ForeignKey, CASCADE
class Store(Model):
name = CharField(max_length=200)
# ...
class Department(Model):
store = ForeignKey(
Store, related_name="departments", on_delete=CASCADE
)
# ...
class Employee(Model):
department = ForeignKey(
Department, related_name="employees", on_delete=CASCADE
)
# ...
class Product(Model):
store = ForeignKey(
Store, related_name="products", on_delete=CASCADE
)
# ...
Seed data (roughly what bit me):
- 2 stores,
- 10 departments per store,
- 500 employees per department - 10 000 employees overall,
- 2500 products per store.
In one of the places of our system, we show a list of the stores, including the amount of employees in each store. Pretty easy with Django ORM, right?
stores = Store.objects.annotate(
total_employees=Count("departments__employees")
).values(
"id",
# ...
"total_employees"
)
This query is relatively fast and works like a charm.
The Problem
Now let's imagine we were asked to add another counter: a number of products per store. We already have total_employees, why not just add total_products?
Well, most likely we already have some unit test for our piece of code, which checks the logic on a small amount of data, and before releasing the code, we can figure out that another JOIN was added, some data is duplicated, and instead of just COUNT(...), we switch to COUNT(DISTINCT ...), eventually coming up with something like this:
stores = Store.objects.annotate(
total_employees=Count("departments__employees",
distinct=True),
total_products=Count("products", distinct=True),
).values(
"id",
# ...
"total_employees",
"total_products",
)
Looks safe to commit, push, wait for the green tests on CI, merge and deploy.
However, after the deployment you'll almost immediately see that everything hangs. As I said, there's not that many stores, only 2 for now, not that many employees, and not that many departments.
And it takes, like, 10 seconds to fetch the numbers! What's wrong with it?
Let's take a closer look at the generated SQL for this seemingly innocent queryset:
SELECT
"shop_store"."id",
COUNT(DISTINCT "shop_product"."id") AS "total_products",
COUNT(DISTINCT "shop_employee"."id") AS "total_employees"
FROM "shop_store"
LEFT OUTER JOIN "shop_product"
ON "shop_store"."id" = "shop_product"."store_id"
LEFT OUTER JOIN "shop_department"
ON "shop_store"."id" = "shop_department"."store_id"
LEFT OUTER JOIN "shop_employee"
ON "shop_department"."id" = "shop_employee"."department_id"
GROUP BY "shop_store"."id"
And let's check the actual query plan:
GroupAggregate ... rows=2 ...
-> Nested Loop Left Join ... rows=25000000 ...
...
...
Execution Time: 11376.072 ms
Translation: these three JOINs turn into a 25-million-row cartesian mess before GROUP BY and COUNT(DISTINCT):
Products × Departments × Employees
= 2500 × 10 × 500
= 12 500 000 (per store)
× 2 stores
= 25 000 000 joined rows
The Fix
There are multiple ways of handling this case, but the easiest fix is to use subqueries:
subquery_products = Subquery(
Product.objects.filter(store_id=OuterRef("pk"))
.values("store_id")
.annotate(count=Count("pk"))
.values("count"),
output_field=IntegerField()
)
subquery_employees = Subquery(
Employee.objects.filter(department__store_id=OuterRef("pk"))
.values("department__store_id")
.annotate(count=Count("pk"))
.values("count"),
output_field=IntegerField()
)
stores = Store.objects.annotate(
total_products=Coalesce(subquery_products, 0),
total_employees=Coalesce(subquery_employees, 0),
).values("id", "total_products", "total_employees")
SQL query:
SELECT "shop_store"."id",
COALESCE((
SELECT COUNT(U0."id") AS "count"
FROM "shop_product" U0
WHERE U0."store_id" = "shop_store"."id"
GROUP BY U0."store_id"), 0
) AS "total_products",
COALESCE((
SELECT COUNT(U0."id") AS "count"
FROM "shop_employee" U0
INNER JOIN "shop_department" U1
ON U0."department_id" = U1."id"
WHERE U1."store_id" = "shop_store"."id"
GROUP BY U1."store_id"), 0
) AS "total_employees"
FROM "shop_store";
Now this one takes a couple of milliseconds with pretty modest and predictable plan:
Seq Scan on shop_store ... rows=2 ...
SubPlan 1
-> Seq Scan on shop_product u0 ... rows=2500 ...
SubPlan 2
-> Hash Join ... rows=5000 ...
-> Seq Scan on shop_employee u0_1 ... rows=10000 ...
-> Hash ... rows=10 ...
...
Execution Time: 5.600 ms
No giant intermediate data sets, just two tiny scans:
- before: 11 376 ms (~11 seconds)
- after: 5.6 ms (2000x faster)
Takeaways
-
COUNT(DISTINCT)with multi-branchLEFT JOINs makes the database loop through the entire cartesian product. - Correlated subqueries aggregate each branch separately and scale linearly with data size.
- Always test aggregate queries against production-sized data before you ship.

Top comments (2)
I don't know how to do it django ORM, but another way to write the query is with UNION.
I find aggregate queries so awkward.
@xwero I don't think it's possible to compose a Django queryset with
UNIONinFROM, but I see that your query is way more efficient, especially when there's more than just 2 stores.I created 32 stores, 160000 employees and 80000 products, and checked the plans. Here's the plan of your query with
FROM ( ... UNION ALL ... ):And here's the plan of the final query from the article:
UNION - ~130 ms; the other one - ~645 ms!
Thank you so much for the comment, I learned something new today :)