In a recent project, I needed to share the results obtained from some data analysis with Pandas in the format of a CSV on Google Sheets. The first approach I tried was using GSheets which is quite nice but has some downsides:
- Its authentication relies on oauth2client which has been deprecated.
- Its
import_csv
method always replaces the whole spreadsheet, and I needed to have several internal sheets that accounted for different variables of the same data.
So, I decided to come up with my solution. First things first, you need to set up a new project on the Google Cloud Console and create credentials, make sure that they are of the type Service Account, which will allow server-to-server communication. After the credentials are created, download as JSON
and move it to the project's folder.
Then go to the console's library of APIs and enable both the Sheets API and the Drive API, the later is needed to share the spreadsheet you create.
While in the projects' folder, we'll add the following dependencies to our requirements file:
google-api-python-client
google-auth
pandas
numpy
And install them running:
$ pip install -r requirements.txt
Now that our setup is ready, we can authenticate our project with the following code:
credentials = service_account.Credentials.from_service_account_file(
"./path/to/credentials.json"
)
scopes = credentials.with_scopes(
[
"https://www.googleapis.com/auth/spreadsheets",
"https://www.googleapis.com/auth/drive",
]
)
The URLs for the scopes can be found here.
The first thing we need is a way to create a spreadsheet from Pandas DataFrames, so we can define a function that receives two parameters, the Spreadsheet title and a list of DataFrames:
def create_sheet(title, data):
sheets_service = build("sheets", "v4", credentials=credentials)
sheets = sheets_service.spreadsheets()
# Body of create method with a Spreadsheet(https://developers.google.com/sheets/api/reference/rest/v4/spreadsheets#Spreadsheet) instance
create_body = {"properties": {"title": f"{title} {date.today()}"},
"sheets": list(map(lambda d: {"properties": {"title": d.get("title")}}, data))}
res = sheets.create(body=create_body).execute()
spreadsheet_id = res.get("spreadsheetId")
# Transform the DataFrame into a matrix of the columns and values
def df_to_sheet(df):
df_columns = [np.array(df.columns)]
df_values = df.values.tolist()
df_to_sheet = np.concatenate((df_columns, df_values)).tolist()
return df_to_sheet
update_body = {
"valueInputOption": "RAW",
"data": list(map(lambda d: {"range": d.get("title"), "values": df_to_sheet(d.get("df"))}, data))
}
sheets.values().batchUpdate(spreadsheetId=spreadsheet_id, body=update_body).execute()
return res
The above function can be called as the following:
data = [
{
"title": "Tab/Sheet Title",
"df": pd.DataFrame({})
}
]
res = create_sheet("Title", data=data)
With res
being an instance of a Spreadsheet.
Now that our spreadsheet has been successfully created, it's currently only available to the user on our credentials file, so we need to define a function to share it.
def share_spreadsheet(spreadsheet_id, options, notify=False):
drive_service = build("drive", "v3", credentials=credentials)
res = (
drive_service.permissions()
.create(
fileId=spreadsheet_id,
body=options,
sendNotificationEmail=notify,
)
.execute()
)
return res
All of the possible values for the options argument can be found on the Drive API documentation. You can share it with a specific user or give access to every account under a specific domain.
We can define the main function that combines both creation and share scripts and returns the spreadsheet URL.
def create_and_share_spreadsheet(title, data, permissions):
sheet = create_sheet(title, data)
share_spreadsheet(sheet.get("spreadsheetId"), options=permissions.get(
"info"), notify=permissions.get("notify"))
return sheet.get("spreadsheetUrl")
Voilรก, the whole thing put together in a file can be found in this gist. There's quite a lot of room for improvement still, so it may receive some updates that will differ from this post's snippets.
Thank you for reading!
Top comments (1)
Hi Maria, thanks for the good post.
Just wondering if there is any limitation that googles sheets can hold the data? such as size for data, and reading speed from google sheets as json..