loading...

Faster CSV export with Django & Postgres

mehdipourfar profile image Mehdi Pourfar ・2 min read

As a Django developer, there are many times that you are asked for CSV export of your database tables.

To do this, Python’s CSV module is a simple library to use. But when your dataset is large, it becomes so inefficient that often leads to timeout error. There are two problem with this method:

  1. Python is not very fast.
  2. Generating model objects by Django Orm is very resource consuming.

As an alternative approach, I suggest using Postgres CSV functions. This way, not only we don’t need to create thousands of model objects, but also we rapidly generate CSV thanks to Postgres speed.

But do I need to write complex SQL queries?
What should I do if I currently have a filtered queryset, for example, in Django admin?

Don’t worry. There is no need to write SQL. Here, we write a function that accepts a queryset and a filename and returns a CSV response

from django.db import connection
from django.utils import timezone
from django.http import HttpResponse

def qs_to_csv_response(qs, filename):
    sql, params = qs.query.sql_with_params()
    sql = f"COPY ({sql}) TO STDOUT WITH (FORMAT CSV, HEADER, DELIMITER E'\t')"
    filename = f'{filename}-{timezone.now():%Y-%m-%d_%H-%M-%S}.csv'
    response = HttpResponse(content_type='text/csv')
    response['Content-Disposition'] = f'attachment; filename={filename}'
    with connection.cursor() as cur:
        sql = cur.mogrify(sql, params)
        cur.copy_expert(sql, response)
    return response

Suppose we have these models:

from django.db import models
from django.utils.translation import ugettext_lazy as _

class City(models.Model):
    name = models.CharField(max_length=50)

class Place(models.Model):
    PLACE_TYPE_CHOICES = (
        (1, _('Park')),
        (2, _('Cafe')),
        (3, _('Resturant')),
        (4, _('Cinema')),
    )
    name = models.CharField(max_length=50)
    city = models.ForeignKey(City, on_delete=models.CASCADE)
    place_type = models.PositiveSmallIntegerField(
        choices=PLACE_TYPE_CHOICES
    )

If we want to have a CSV export from Place table containing place_id, place_name, place_type and city_name, we pass this query to the function above:

from django.db.models import F

Place.objects.values(
  'id',
  'name',
  'place_type',
  city_name=F('city__name')
)

There is some caveat here. Postgres doesn’t know anything about our pretty, human readable and translated place types and putting some numbers in place_type columns can be completely useless.

The function below can be used to tell Postgres about this kind of mappings using Case and When:

from django.db.models import Case, When, Value, CharField

def map_choices(field_name, choices):
    return Case(
        *[When(**{field_name: value, 'then': Value(str(representation))})
          for value, representation in choices],
        output_field=CharField()
    )

And then, we use it in the query:

Place.objects.values(
  'id',
  'name',
  verbose_type=map_choices('place_type', Place.PLACE_TYPE_CHOICES),
  city_name=F('city__name')
)

By using Django ORM tools like F, Func, ExpressionWrapper, RawSQL … you can easily write more complex queries in a performant way.

Discussion

pic
Editor guide