loading...
15Five Engineering Blog

Using DjangoQL for a better django admin search

caleb15 profile image Caleb Collins-Parks ・7 min read

A reasonable Django admin search looks like this:

Python:

class MusicianAdmin(admin.ModelAdmin):
    search_fields = ['first_name', 'last_name']

User search input:
john lennon

Resulting SQL:

WHERE (first_name ILIKE '%john%' OR last_name ILIKE '%john%')
AND (first_name ILIKE '%lennon%' OR last_name ILIKE '%lennon%')

Not too bad, right?

But now imagine your search fields look like this:

class ReviewerReporterAdmin(admin.ModelAdmin):
    search_fields = ('reporter__first_name', 'reporter__last_name',
                     'reporter__email', 'reporter__company__name',
                     'reporter__company__subdomain',
                     'reviewer__first_name', 'reviewer__last_name',
                     'reviewer__email',)

The resulting SQL will look like this:

WHERE (reporter__first_name  ILIKE '%john%' OR reporter__last_name ILIKE '%john%' OR reporter__email  ILIKE '%john%' OR reporter__company__name ILIKE '%john%' OR reporter__company__subdomain ILIKE '%john%' OR reviewer__first_name  ILIKE '%john%' OR reviewer__last_name ILIKE '%john%' OR reviewer__email ILIKE '%john%')
AND (reporter__first_name  ILIKE '%lennon%' OR reporter__last_name ILIKE '%lennon%' OR reporter__email  ILIKE '%lennon%' OR reporter__company__name ILIKE '%lennon%' OR reporter__company__subdomain ILIKE '%lennon%' OR reviewer__first_name  ILIKE '%lennon%' OR reviewer__last_name ILIKE '%lennon%' OR reviewer__email ILIKE '%lennon%')

Yikes. But computers are powerful. That's still manageable.

Now try this one on for size:

WHERE (reporter__first_name  ILIKE '%john%' OR reporter__last_name ILIKE '%john%' OR reporter__email  ILIKE '%john%' OR reporter__company__name ILIKE '%john%' OR reporter__company__subdomain ILIKE '%john%' OR reviewer__first_name  ILIKE '%john%' OR reviewer__last_name ILIKE '%john%' OR reviewer__email ILIKE '%john%')
AND (reporter__first_name  ILIKE '%lennon%' OR reporter__last_name ILIKE '%lennon%' OR reporter__email  ILIKE '%lennon%' OR reporter__company__name ILIKE '%lennon%' OR reporter__company__subdomain ILIKE '%lennon%' OR reviewer__first_name  ILIKE '%lennon%' OR reviewer__last_name ILIKE '%lennon%' OR reviewer__email ILIKE '%lennon%')
AND (reporter__first_name  ILIKE '%is%' OR reporter__last_name ILIKE '%is%' OR reporter__email  ILIKE '%is%' OR reporter__company__name ILIKE '%is%' OR reporter__company__subdomain ILIKE '%is%' OR reviewer__first_name  ILIKE '%is%' OR reviewer__last_name ILIKE '%is%' OR reviewer__email ILIKE '%is%')
AND (reporter__first_name  ILIKE '%a%' OR reporter__last_name ILIKE '%a%' OR reporter__email  ILIKE '%a%' OR reporter__company__name ILIKE '%a%' OR reporter__company__subdomain ILIKE '%a%' OR reviewer__first_name  ILIKE '%a%' OR reviewer__last_name ILIKE '%a%' OR reviewer__email ILIKE '%a%')
AND (reporter__first_name  ILIKE '%great%' OR reporter__last_name ILIKE '%great%' OR reporter__email  ILIKE '%great%' OR reporter__company__name ILIKE '%great%' OR reporter__company__subdomain ILIKE '%great%' OR reviewer__first_name  ILIKE '%great%' OR reviewer__last_name ILIKE '%great%' OR reviewer__email ILIKE '%great%')
AND (reporter__first_name  ILIKE '%singer%' OR reporter__last_name ILIKE '%singer%' OR reporter__email  ILIKE '%singer%' OR reporter__company__name ILIKE '%singer%' OR reporter__company__subdomain ILIKE '%singer%' OR reviewer__first_name  ILIKE '%singer%' OR reviewer__last_name ILIKE '%singer%' OR reviewer__email ILIKE '%singer%')
AND (reporter__first_name  ILIKE '%and%' OR reporter__last_name ILIKE '%and%' OR reporter__email  ILIKE '%and%' OR reporter__company__name ILIKE '%and%' OR reporter__company__subdomain ILIKE '%and%' OR reviewer__first_name  ILIKE '%and%' OR reviewer__last_name ILIKE '%and%' OR reviewer__email ILIKE '%and%')
AND (reporter__first_name  ILIKE '%songwriter%' OR reporter__last_name ILIKE '%songwriter%' OR reporter__email  ILIKE '%songwriter%' OR reporter__company__name ILIKE '%songwriter%' OR reporter__company__subdomain ILIKE '%songwriter%' OR reviewer__first_name  ILIKE '%songwriter%' OR reviewer__last_name ILIKE '%songwriter%' OR reviewer__email ILIKE '%songwriter%')

That's what happens when the user searches for "john lennon is a great singer and songwriter". Each word is searched against every field, leading to the giant SQL block above! Here's a real-world example of what this looks like:

(table and field names obfuscated):

SELECT
"banana_table"."id", "banana_table"."create_ts", "banana_table"."type", "banana_table"."sasquatch_id", "banana_table"."yolo_id", "banana_table"."to_scale", "banana_table"."extra", "ff_orange"."id", "ff_orange"."spiders", "ff_orange"."last_epic_win", "ff_orange"."is_superorange", "ff_orange"."passport", "ff_orange"."number_swords", "ff_orange"."scim_orangename", "ff_orange"."orangename", "ff_orange"."mango", "ff_orange"."canonical_mango", "ff_orange"."yolo_id", "ff_orange"."global_id", "ff_orange"."first_kiss", "ff_orange"."last_kiss", "ff_orange"."knightlyTitle", "ff_orange"."IVotedDidYou", "ff_orange"."IVotedDidYou_length", "ff_orange"."aspirational_knightlyTitle", "ff_orange"."aspirational_knightlyTitle_length", "ff_orange"."aspirational_IVotedDidYou", "ff_orange"."aspirational_IVotedDidYou_length", "ff_orange"."vulnerabilities", "ff_orange"."vulnerabilities_length", "ff_orange"."banana_mansion", "ff_orange"."banana_mega_mansion", "ff_orange"."IEnjoyThinkingOfRandomNames", "ff_orange"."ButImNotSureAnyoneWillActuallyReadThem", "ff_orange"."MakingUpRandomNamesIsItsOwnReward", "ff_orange"."is_yolo_admin", "ff_orange"."is_pineapple_admin", "ff_orange"."can_view_all_oranges", "ff_orange"."require_spiders_change", "ff_orange"."is_vip", "ff_orange"."is_vip_bossperson", "ff_orange"."is_a_bro", "ff_orange"."is_not_a_bro", "ff_orange"."saml_orange_id", "ff_orange"."first_mango_eaten", "ff_orange"."last_spiders_change_ts", "ff_orange"."create_ts", "ff_orange"."update_ts", "ff_orange"."num_dirty_mangoes", "ff_orange"."howdouspellmangos", "ff_orange"."last_seen_ts", "ff_orange"."ithinkitswithae", "ff_orange"."ahhhhh", "ff_orange"."imrunningoutofrandomnames", "ff_orange"."howmanymorefields", "ff_orange"."arethere", "ff_orange"."uggghhh", "ff_orange"."is_ides_of_march", "ff_orange"."biweekly_is_ides_of_march", "ff_orange"."biweekly_is_ides_of_march_which_week", "ff_orange"."monthly_is_ides_of_march", "ff_orange"."monthly_is_ides_of_march_which_in_month", "ff_orange"."prefs", "ff_orange"."toilet_paper", "ff_orange"."skip_yolo_wide_questions", "ff_orange"."blorp_type", "ff_orange"."bloop_type", "ff_orange"."number_angry_zebras", "ff_orange"."can_manage_oranges", "ff_yolo"."id", "ff_yolo"."name", "ff_yolo"."sick_hair_yo", "ff_yolo"."IEnjoyThinkingOfRandomNames", "ff_yolo"."prefs", "ff_yolo"."is_dev", "ff_yolo"."apocalypse_start_date", "ff_yolo"."apocalypse_length_days", "ff_yolo"."mango_domain", "ff_yolo"."hypeperson_id", "ff_yolo"."toilet_paper", "ff_yolo"."create_ts", "ff_yolo"."update_ts", "ff_yolo"."arethere", "ff_yolo"."num_dirty_mangoes", "ff_yolo"."converted_to_paying_ts", "ff_yolo"."stopped_paying_ts", "ff_yolo"."country", "ff_yolo"."state", "ff_yolo"."is_ides_of_march", "ff_yolo"."biweekly_is_ides_of_march", "ff_yolo"."biweekly_is_ides_of_march_which_week", "ff_yolo"."monthly_is_ides_of_march", "ff_yolo"."monthly_is_ides_of_march_which_in_month", "ff_yolo"."uggghhh", "ff_yolo"."is_vip_bossperson", "ff_yolo"."is_a_bro", "ff_yolo"."smart_banana_access", "ff_yolo"."happiness_access", "ff_yolo"."smart_potato_access", "ff_yolo"."attachments_access", "ff_yolo"."smart_potato_access", "ff_yolo"."is_sad_banana", "ff_yolo"."is_angry_banana", "ff_yolo"."is_secret_banana", "ff_yolo"."is_nice", "ff_yolo"."is_delicious", "ff_yolo"."carrot_hypeman", "ff_yolo"."potato_hypeman", "ff_yolo"."number_of_people_with_eyebrows", "ff_yolo"."banana_owner_id"
FROM "banana_table"
LEFT OUTER JOIN "ff_orange" ON ("banana_table"."sasquatch_id" = "ff_orange"."id")
LEFT OUTER JOIN "ff_yolo" ON ("banana_table"."yolo_id" = "ff_yolo"."id")
WHERE (
    (UPPER("banana_table"."type"::text) LIKE UPPER('%this%') OR UPPER("ff_orange"."first_kiss"::text) LIKE UPPER('%this%') OR UPPER("ff_orange"."last_kiss"::text) LIKE UPPER('%this%') OR UPPER("ff_orange"."mango"::text) LIKE UPPER('%this%') OR UPPER("ff_yolo"."name"::text) LIKE UPPER('%this%') OR UPPER("ff_yolo"."sick_hair_yo"::text) LIKE UPPER('%this%'))
    AND (UPPER("banana_table"."type"::text) LIKE UPPER('%is%') OR UPPER("ff_orange"."first_kiss"::text) LIKE UPPER('%is%') OR UPPER("ff_orange"."last_kiss"::text) LIKE UPPER('%is%') OR UPPER("ff_orange"."mango"::text) LIKE UPPER('%is%') OR UPPER("ff_yolo"."name"::text) LIKE UPPER('%is%') OR UPPER("ff_yolo"."sick_hair_yo"::text) LIKE UPPER('%is%'))
    AND (UPPER("banana_table"."type"::text) LIKE UPPER('%a%') OR UPPER("ff_orange"."first_kiss"::text) LIKE UPPER('%a%') OR UPPER("ff_orange"."last_kiss"::text) LIKE UPPER('%a%') OR UPPER("ff_orange"."mango"::text) LIKE UPPER('%a%') OR UPPER("ff_yolo"."name"::text) LIKE UPPER('%a%') OR UPPER("ff_yolo"."sick_hair_yo"::text) LIKE UPPER('%a%'))
    AND (UPPER("banana_table"."type"::text) LIKE UPPER('%test%') OR UPPER("ff_orange"."first_kiss"::text) LIKE UPPER('%test%') OR UPPER("ff_orange"."last_kiss"::text) LIKE UPPER('%test%') OR UPPER("ff_orange"."mango"::text) LIKE UPPER('%test%') OR UPPER("ff_yolo"."name"::text) LIKE UPPER('%test%') OR UPPER("ff_yolo"."sick_hair_yo"::text) LIKE UPPER('%test%'))
    AND (UPPER("banana_table"."type"::text) LIKE UPPER('%query%') OR UPPER("ff_orange"."first_kiss"::text) LIKE UPPER('%query%') OR UPPER("ff_orange"."last_kiss"::text) LIKE UPPER('%query%') OR UPPER("ff_orange"."mango"::text) LIKE UPPER('%query%') OR UPPER("ff_yolo"."name"::text) LIKE UPPER('%query%') OR UPPER("ff_yolo"."sick_hair_yo"::text) LIKE UPPER('%query%'))
)
ORDER BY "banana_table"."id" DESC

If your eyes glazed over that example, don't worry. The exact syntax doesn't matter - the point is that such a large query is not a theoretical example, it can actually happen!

Still, when running against a local database the query only took roughly 4 milliseconds. Fast, but the table in my local database only has 70 rows. Slow code will remain silent at that scale, lurking in the background and waiting to pounce. And it likes to rear its head at the worst possible moment - production.

In production we have over 3 MILLION rows. Divide 3,847,951 by 70 to get the ratio of how much larger production is. Multiply the time it took with 70 rows (4ms) by the ratio and we can predict that the same search in production would take 219,882 milliseconds, or 219 seconds*! Ain't nobody got time for that. Seriously, our pages time out after 30 seconds, so there's literally not enough time for that.

  • You could simply use less search fields. But what if the user wants to search against a field you removed? You're reducing functionality to improve performance, which may or may not be an acceptable tradeoff.

  • You could buy a better database. But you might not have the funds for that yet.

  • You could add indexes to your tables. But this requires careful consideration for each table.

  • You could add a dedicated mirror database for django admin. But you might not have the funds for that yet. (This was one of the improvements we made)

  • You could switch to Elasticsearch. But that will take time to implement, add complexity, and again, take funds to host.

If only the user could easily choose what fields they want to search against, that would help relieve the problem. Luckily, there's a library that does exactly that; enter DjangoQL!

By letting users choose what fields to search against you can massively speed up searches. Remember that SQL query I showed earlier?

WHERE (first_name ILIKE '%john%' OR last_name ILIKE '%john%')
AND (first_name ILIKE '%lennon%' OR last_name ILIKE '%lennon%')

If the user selects a specific field the query becomes simpler, and thus faster:

WHERE (first_name ILIKE '%john%')
AND (first_name ILIKE '%lennon%')

If you have a lot of search fields and data, this can result in a large performance improvement:

Page      |  # Items   |  Performance
--------------------------------------------------------------------------
XXXX      |  ~1,000    |  no performance difference
YYYY      |  ~100,000  |  up to ~5x increase in search speed
ZZZZ      | ~1,000,000 |  searching can take so long it times out unless DjangoQL is used

This is also dependent upon network speed, hardware speed, database indexes, and a whole host of other factors. But generally speaking if you have over 100k rows and a bunch of search fields you'll probably see a performance increase.

The neat thing is that the performance improvement is really just a side-benefit of the library. The main benefit is giving your users a query language that is much more powerful and flexible than just being able to do text matches.

The final benefit is that DjangoQL is a very straightforward addition to your code. You just make a simple class that inherits from DjangoQLSearchMixin and admin.ModelAdmin and use that instead of admin.ModelAdmin

Repeat changes in the bottom file for the rest of your files, and that's basically it!

This isn't going to solve your performance problems, but it should help a bit assuming you educate your users on how to use it. It's a easy yet powerful band-aid you can apply while you work on more long-term fixes. Even if you don't care about speeding up your code it's still useful, so it's a win-win!

Discussion

pic
Editor guide