How to download Pandas Dataframe as Excel or CSV in Django?

#django #pandas

Introduction

Pandas is an essential tool used by Python developers used for data analysis purposes, but what is the point of the analysis if we are not able to provide that insight to the end user. When Django and Pandas are used in conjunction we can create applications that are not only developed faster, but are also smart in utilizing the data that is collected from the user. I recently stumbled on one such use case, where the user interacts with the frontend and turns some filters and then the backend processes that data, and then provide the aggregated data as an Excel or CSV file to the user. I will be describing how I went about doing that below.

Prerequisites

In order to follow along this article, I am assuming you already have a Django project that is using Pandas library ready.

Requirements

In addition to Django and Pandas project ready, you would need to install Openpyxl.

Openpyxl

Openpyxl is a tool that allows to read and write Open Office XML formats such as Excel 2010 xlsx/xlsm/xltx/xltm files.

pip install openpyxl

Once installed, you can create Excel files natively from Python using a simple code like this.

from openpyxl import Workbook

workbook = Workbook()
sheet = workbook.active

sheet["A1"] = "ID"
sheet["B1"] = "Key"

sheet["A2"] = 1
sheet["B2"] = "Test Key"

workbook.save(filename="test.xlsx")

This will create a file called "test.xlsx". Try it out yourself.

HTTP Response

As you know, Django uses HTTP Request and Response to communicate with the client. Therefore all data must be converted to Byte String before sending it over to the client.

A Typical HTTP Response looks something like this.

For more information, checkout this article by Mozilla about HTTP Messages.

As you can see in the above picture, every HTTP response has a body section which is a Byte Array and a Header called Content-Type specifies what type of content it is and which character set the browser should use in order to decode the bytes. In the above case, the server is sending a text/html data and the browser need to use iso-8859-1 charset.

In our case, we are sending bytes that are related to Excel spreadsheets. So googling for a bit I found the content-type header for that is this. Source: Mozilla article about Common MIME Types

application/vnd.openxmlformats-officedocument.spreadsheetml.sheet

What are MIME Types?

MIME is short for Multipurpose Internet Mail Extensions which is a standard that indicates the nature and format of a document sent via the Internet.

Important : Browsers use the MIME type, not the file extension, to determine how to process a URL, so it's important that web servers send the correct MIME type in the response's Content-Type header. If this is not correctly configured, browsers are likely to misinterpret the contents of files and sites will not work correctly, and downloaded files may be mishandled.

For more information, checkout MDN Article about MIME Types

The structure of a MIME Type is according to IANA (Internet Assigned Numbers Authority) is like this.

type/subtype

Therefore, in the MIME Type for Excel files, the type is " application", which belongs to any kind of binary data that cannot be directly decoded to a human-readable form (text, HTML, etc.). Files with type application require some sort of external or third party software to decode and read such as PDFs, Zip files, Excel files, etc.

Download Excel Files using Django and Pandas

Finally, we will see how we can send the Pandas dataframe to client as an excel file.

I am assuming that you already have the code for pandas so I will do some abstraction in a variable called data.

First we will import to the inbuilt library, BytesIO so that we can write the excel file as a Byte array.

BytesIO is a library using which we can write data onto system memory instead of writing them as a file.

from io import BytesIO

Then we will use Python context manager to open a Byte buffer on which we can write the excel file. Context Managers allow you to allocate and release memory precisely when you want it.

with BytesIO() as b
    with pd.ExcelWriter(b) as writer:
        # You can add multiple Dataframes to an excel file
        # Using the sheet_name attribute
      data1.to_excel(writer, sheet_name="DATA 1", index=False)
        data2.to_excel(writer, sheet_name="DATA 2", index=False)

    filename = "analytics_data.xlsx"

    # imported from django.http
    res = HttpResponse(
        b.getvalue(), # Gives the Byte string of the Byte Buffer object
        content_type='application/vnd.openxmlformats-officedocument.spreadsheetml.sheet'
    )
    res['Content-Disposition'] = f'attachment; filename={filename}'
    return res

We will be using context managers to open and close files because that's more efficient than manually closing and releasing the memory after use. If you already know how to export a set of data frames as an excel file, that code should look like this.

with pd.ExcelWriter('data.xlsx') as writer:
    data1.to_excel(writer, sheet_name="DATA 1", index=False)
    data2.to_excel(writer, sheet_name="DATA 2", index=False)

We will simply take this code and wrap it around a BytesIO context manager which gives us a memory buffer b as a file to work with. We will simply take that memory buffer and pass it to pd.ExcelWriter() class and the rest of the code will be the same this that context manager.

with BytesIO() as b
    with pd.ExcelWriter(b) as writer:
      data1.to_excel(writer, sheet_name="DATA 1", index=False)
        data2.to_excel(writer, sheet_name="DATA 2", index=False)

Now that we have our Excel data written in a memory buffer, we can simply convert that to ByteArray using the method b.getvalue() which we will pass as a parameter in Django HttpResponse with appropriate content_type that we discussed earlier.

filename = "analytics_data.xlsx"

# imported from django.http
res = HttpResponse(
  b.getvalue(), # Gives the Byte string of the Byte Buffer object
  content_type='application/vnd.openxmlformats-officedocument.spreadsheetml.sheet'
)

In addition to the Content-Type header, we also need to provide another header called Content-Disposition that tells the browser if we want to show the data in the browser download and save it as a local file as an attachment. We can also pass a filename with this header.

res['Content-Disposition'] = f'attachment; filename={filename}'
return res

Here is the complete code snippet for Django view to download the excel file.

def download_analytics_endpoint(request):
    with BytesIO() as b:
        data = get_analytics_data()

        with pd.ExcelWriter(b) as writer:
            data.to_excel(writer, sheet_name="Data", index=False)

        filename = f"analytics_data.xlsx"
        res = HttpResponse(
            b.getvalue(),
            content_type='application/vnd.openxmlformats-officedocument.spreadsheetml.sheet'
        )
        res['Content-Disposition'] = f'attachment; filename={filename}'
        return res

That is all, now you just have to create a Django URL for this function and you can download any file you want, just convert that file to a Byte string and use the correct Content-Type header.