Emails are an important part of many of our lives - both personally and professionally. Staying on top of your inbox can be a daunting task. My matter how hard I try, inevitably my Gmail begins overflowing with countless unread messages.
In this guide we will explore how Python can be utilized to effortlessly sort through your inbox, allowing you to regain control.
Note: The purpose of this post isn't to detail a fully-automated AI that can clean our inboxes unsupervised. Rather, the goal is to introduce you to the tools needed to supplement your efforts when cleaning your inbox.
Installing dependencies
Ensure you have python3
and pip
installed.
I encourage you to install the dependencies into a virtual environment.
Navigate to your project directory and run the following:
python3 -m venv .venv
source .venv/bin/activate
pip install --upgrade google-api-python-client google-auth-httplib2 google-auth-oauthlib pandas
Getting Google API access
Before we can get started, we need to register our application with Google so we can access user data.
We will follow the official instructions to create an OAuth "Desktop app".
- Go to Credentials
- Click Create Credentials > OAuth client ID.
- Click Application type > Desktop app.
- In the Name field, type a name for the credential. This name is only shown in the Google Cloud console.
- Click Create. A
OAuth client created
popover appears, showing the client details. Click 'Download JSON' and save the file ascredentials.json
to your project directory.
Analyzing your inbox
In this simple example, we will focus on creating a Python script that gives a breakdown of the most common senders in our inbox.
Create a Python file called gmail_organizer.py
in your project directory.
First, let's add the shebang and imports.
#!/usr/bin/env python3
from __future__ import print_function
import os.path
import pandas as pd
import re
from google.auth.transport.requests import Request
from google.oauth2.credentials import Credentials
from google_auth_oauthlib.flow import InstalledAppFlow
from googleapiclient.discovery import build
Then let's create the authentication function. This uses the credentials.json
file to allow us to authenticate on behalf of a user. Once a user has authenticated a token.json
will be created in the project directory. This matches the sample code provided by Google.
# If modifying these scopes, delete the file token.json.
SCOPES = ['https://www.googleapis.com/auth/gmail.readonly']
def get_creds():
creds = None
# The file token.json stores the user's access and refresh tokens, and is
# created automatically when the authorization flow completes for the first
# time.
if os.path.exists('token.json'):
creds = Credentials.from_authorized_user_file('token.json', SCOPES)
# If there are no (valid) credentials available, let the user log in.
if not creds or not creds.valid:
if creds and creds.expired and creds.refresh_token:
creds.refresh(Request())
else:
flow = InstalledAppFlow.from_client_secrets_file(
'credentials.json', SCOPES)
creds = flow.run_local_server(port=0)
# Save the credentials for the next run
with open('token.json', 'w') as token:
token.write(creds.to_json())
return creds
Now, we create the functions get_inbox_emails(..)
and process_email_metadata(..)
- these will be doing most of the heavy lifting.
email_metadata = []
def process_email_metadata(request_id, response, exception):
global email_metadata
message_id = response.get('id')
headers = response.get('payload').get('headers');
if(headers is not None):
for header in headers:
if header['name'] == "From":
username, domain = re.match(
r'(?:.*<)?(.*)@(.*?)(?:>.*|$)', header['value']
).groups()
email_metadata.append({
'message_id':message_id,
'username':username,
'domain':domain})
break
def get_inbox_emails(service):
# Call the Gmail API
response = service.users().messages().list(
userId='me',
labelIds=['INBOX'],
maxResults=5000
).execute()
# Retrieve all message ids
messages = []
messages.extend(response['messages'])
while 'nextPageToken' in response:
page_token = response['nextPageToken']
response = service.users().messages().list(
userId='me',
labelIds=['INBOX'],
maxResults=5000,
pageToken=page_token
).execute()
messages.extend(response['messages'])
# Retrieve the metadata for all messages
step = 100
num_messages = len(messages)
for batch in range(0, num_messages, step):
batch_req = service.new_batch_http_request(callback=process_email_metadata)
for i in range(batch, min(batch + step, num_messages)):
batch_req.add(service.users().messages().get(
userId='me',
id=messages[i]['id'],
format="metadata")
)
batch_req.execute()
Let's break down what these functions accomplish:
- Create a
gmail
service class. - Retrieve all message ids / list all messages. Gmail only allows listing up to 5000 results at one time, so we have to keep requesting more until there is no
nextPageToken
in the response. - Retrieve the metadata for all messages. Gmail does not provide any way to retrieve these details when listing emails, so we need to iterate over each of the message
id
's we found in Step 2. For performance; in each request we ask Gmail to return the metadata of up to 100 emails. - For each email metadata we receive back, the callback
process_email_metadata(..)
is called. This is where we process our data. In this example, I process theFrom:
field and apply some regex to extract the email username and domain name. This will allow us to find the most common senders in my inbox.
Now finally let's create the script entrypoint (calling the functions we've already made above).
def main():
creds = get_creds()
service = build('gmail', 'v1', credentials=creds)
get_inbox_emails(service)
if __name__ == '__main__':
main()
Printing results
Running the code above will return nothing. We need to process the data and display it to the user. We can use Pandas to easily report a descending list of email usernames and domains.
We've already done the work to process this data in process_email_metadata(..)
, so all we need to do is add the following lines to main()
below get_inbox_emails(service)
:
# Print the results
df = pd.DataFrame(email_metadata)
print("Most common email usernames -----------")
print(df.groupby('username')
.size().reset_index(name='count')
.sort_values(by='count',ascending=False)
.to_string(index=False))
print()
print("Most common email domains -------------")
print(df.groupby('domain')
.size().reset_index(name='count')
.sort_values(by='count',ascending=False)
.to_string(index=False))
See the full complete script on Github.
Running
From the project directory:
python3 gmail_organizer.py
A new browser window will open prompting you to sign in to your Google account. The script will analyze the emails in the Gmail account associated with the Google account you sign in with at this point. The browser window will warn you that this is unsafe, but that is only because your application is unverified. If necessary, you can go through the process to verify your application.
After running the application, you should get an output similar to the following:
Most common email usernames -----------
username count
info 6
noreply 5
no-reply 2
donotreply 1
...
Most common email domains -------------
domain count
example.com 5
youtube.com 2
change.org 1
...
Extending the script
The example above is a very simple example of what you can accomplish. It serves as a scaffold that you can expand to tackle more complex situations. It is possible to extend the script to modify your inbox, including labeling or deleting emails.
Start by making sure you have the correct SCOPES
for the operations you are attempting. Google outlines the different scopes here.
To be able to additionally label emails, we need the modify
scope. This means we need to update:
# If modifying these scopes, delete the file token.json.
SCOPES = ['https://www.googleapis.com/auth/gmail.readonly']
to:
# If modifying these scopes, delete the file token.json.
SCOPES = ['https://www.googleapis.com/auth/gmail.readonly',
'https://www.googleapis.com/auth/gmail.modify']
Make sure you delete token.json
after changing the scopes.
From here, labeling/starring an email is very straightforward.
def label_emails(service, message_id):
response = service.users().messages().modify(
userId='me',
id=message_id,
body={
"addLabelIds":['STARRED']
}
).execute()
Note: For labeling large numbers of emails, consider using batchModify instead (for the same reasons we did for retrieving metadata earlier).
We've already done the work to process this data in process_email_metadata(..)
, so to star all emails from example.com
all we need to do is add the following lines to main()
below get_inbox_emails(service)
:
for email in email_metadata:
if(email['domain'] == 'example.com'):
label_emails(service, email['message_id'])
Top comments (1)
Idea is good. We can do it without sending the data to some other application. Will try.