The Django ORM is great but it may not always be a good idea to use it in its most primitive state as it could lead to inefficient queries.
For instance, consider a database model called Record that consists of id, name, created_at and an is_deleted flag indicating whether it has been deleted or not.
from django.db import models
class Record(models.Model):
# id will be created automatically
name = models.CharField(max_length=255)
created_at = models.DateTimeField(auto_now_add=True)
is_deleted = models.BooleanField(default=False)
When you query using Record.objects.filter(is_deleted=False), the underlying database query looks something like this:
SELECT id,
name,
created_at,
is_deleted
FROM records
WHERE NOT is_deleted;
Not only is this querying all fields of the table but it returns a queryset of model instances like this:
>>> Record.objects.filter(is_deleted=False)
<QuerySet [<Record: Record object (1)>, <Record: Record object (2)>, <Record: Record object (3)>]>
In most cases you do not require all of the fields of your model. What is even worse is that if your Record model consists of 10 fields, this query will end up fetching all of them from the database making your query significantly slower.
Fortunately there are two very straightforward ways to make this query efficient. You could either use values or values_list.
Using values() to retrieve specific DB columns
Let’s say you want to display the names of all the records that have not been deleted. You can retrieve only the name column from the database by making a slight modification to the ORM query:
>>> Record.objects.filter(is_deleted=False).values('name')
<QuerySet [{'name': 'First record'}, {'name': 'Second Record'}, {'name': 'Third Record'}]>
As you can see, instead of returning a list of Record objects, this queryset returns a list of dictionaries that consist of just the name field. The underlying query now becomes:
SELECT name
FROM records
WHERE NOT is_deleted;
You can fetch more columns if you wish by providing the list of columns that you need in the values() method.
>>> Record.objects.filter(is_deleted=False).values('id', 'name')
<QuerySet [{'id': 1, 'name': 'First record'}, {'id': 2, 'name': 'Second Record'}, {'id': 3, 'name': 'Third Record'}]>
Using values_list() to retrieve specific DB columns
This works pretty much the same way as values() except that it returns a tuple instead of a dictionary:
>>> Record.objects.filter(is_deleted=False).values_list('name')
<QuerySet [('First record',), ('Second Record',), ('Third Record',)]>
>>>
>>> Record.objects.filter(is_deleted=False).values_list('id', 'name')
<QuerySet [(1, 'First record'), (2, 'Second Record'), (3, 'Third Record')]>
>>>
The nifty thing about values_list() is that if you only need to pass in a single field, you can use the flat=True parameter to return a list of single values instead of tuples.
>>> Record.objects.filter(is_deleted=False).values_list('name',flat=True)
<QuerySet ['First record', 'Second Record', 'Third Record']>
Closing notes
In terms of performance, there is no difference between values() and values_list().
They are both intended as optimizations for retrieving a subset of data while avoiding creation of model instances and making the database query more efficient by only selecting specific fields.
The decision of which one of these to use depends on what kind of data structure you prefer to work with.
I usually find a dictionary more intuitive than a tuple and hence prefer values() over values_list(). However if I only need one field from the query I prefer to use values_list() with flat=True.
Top comments (1)
I suppose if there are foreign keys, fetching each record would involve joining of two tables. Instead if you only request for specific columns, that join would be prevented