DEV Community

Cover image for Exporting Pandas DataFrames to Google Sheets
Maria Clara Santana
Maria Clara Santana

Posted on • Updated on

Exporting Pandas DataFrames to Google Sheets

In a recent project, I needed to share the results obtained from some data analysis with Pandas in the format of a CSV on Google Sheets. The first approach I tried was using GSheets which is quite nice but has some downsides:

  • Its authentication relies on oauth2client which has been deprecated.
  • Its import_csv method always replaces the whole spreadsheet, and I needed to have several internal sheets that accounted for different variables of the same data.

So, I decided to come up with my solution. First things first, you need to set up a new project on the Google Cloud Console and create credentials, make sure that they are of the type Service Account, which will allow server-to-server communication. After the credentials are created, download as JSON and move it to the project's folder.

On the side menu, go to APIs & Service and then select Credentials

Then go to the console's library of APIs and enable both the Sheets API and the Drive API, the later is needed to share the spreadsheet you create.

While in the projects' folder, we'll add the following dependencies to our requirements file:

google-api-python-client
google-auth
pandas
numpy
Enter fullscreen mode Exit fullscreen mode

And install them running:

$ pip install -r requirements.txt
Enter fullscreen mode Exit fullscreen mode

Now that our setup is ready, we can authenticate our project with the following code:

credentials = service_account.Credentials.from_service_account_file(
    "./path/to/credentials.json"
)

scopes = credentials.with_scopes(
    [
        "https://www.googleapis.com/auth/spreadsheets",
        "https://www.googleapis.com/auth/drive",
    ]
)
Enter fullscreen mode Exit fullscreen mode

The URLs for the scopes can be found here.

The first thing we need is a way to create a spreadsheet from Pandas DataFrames, so we can define a function that receives two parameters, the Spreadsheet title and a list of DataFrames:

def create_sheet(title, data):
    sheets_service = build("sheets", "v4", credentials=credentials)
    sheets = sheets_service.spreadsheets()

    # Body of create method with a Spreadsheet(https://developers.google.com/sheets/api/reference/rest/v4/spreadsheets#Spreadsheet) instance
    create_body = {"properties": {"title": f"{title} {date.today()}"},
                   "sheets": list(map(lambda d: {"properties": {"title": d.get("title")}}, data))}
    res = sheets.create(body=create_body).execute()
    spreadsheet_id = res.get("spreadsheetId")

    # Transform the DataFrame into a matrix of the columns and values
    def df_to_sheet(df):
        df_columns = [np.array(df.columns)]
        df_values = df.values.tolist()
        df_to_sheet = np.concatenate((df_columns, df_values)).tolist()
        return df_to_sheet

    update_body = {
        "valueInputOption": "RAW",
        "data": list(map(lambda d: {"range": d.get("title"), "values": df_to_sheet(d.get("df"))}, data))
    }

    sheets.values().batchUpdate(spreadsheetId=spreadsheet_id, body=update_body).execute()

    return res
Enter fullscreen mode Exit fullscreen mode

The above function can be called as the following:

data = [
    {
        "title": "Tab/Sheet Title",
        "df": pd.DataFrame({})
    }
]
res = create_sheet("Title", data=data)
Enter fullscreen mode Exit fullscreen mode

With res being an instance of a Spreadsheet.

Now that our spreadsheet has been successfully created, it's currently only available to the user on our credentials file, so we need to define a function to share it.

def share_spreadsheet(spreadsheet_id, options, notify=False):
    drive_service = build("drive", "v3", credentials=credentials)

    res = (
        drive_service.permissions()
        .create(
            fileId=spreadsheet_id,
            body=options,
            sendNotificationEmail=notify,
        )
        .execute()
    )

    return res

Enter fullscreen mode Exit fullscreen mode

All of the possible values for the options argument can be found on the Drive API documentation. You can share it with a specific user or give access to every account under a specific domain.

We can define the main function that combines both creation and share scripts and returns the spreadsheet URL.

def create_and_share_spreadsheet(title, data, permissions):
    sheet = create_sheet(title, data)
    share_spreadsheet(sheet.get("spreadsheetId"), options=permissions.get(
        "info"), notify=permissions.get("notify"))

    return sheet.get("spreadsheetUrl")
Enter fullscreen mode Exit fullscreen mode

Voilá, the whole thing put together in a file can be found in this gist. There's quite a lot of room for improvement still, so it may receive some updates that will differ from this post's snippets.

Thank you for reading!

Oldest comments (1)

Collapse
 
tanghoong profile image
tanghoong

Hi Maria, thanks for the good post.
Just wondering if there is any limitation that googles sheets can hold the data? such as size for data, and reading speed from google sheets as json..