Whether you're sending reports, executing long running tasks or updating a dashboard you likely have a dozen or so Notebooks that need to be run on a regular basis. You can set yourself reminders, make sure everyone on the team has the script to run when needed (e.g. you're on vacation) and make sure you’re logged in before the boss so you can update dashboards, but at some point the cost of your time is going to push you toward automation. We're going to cover three ways to get this done:
- Locally setting up a process to automatically run Python script in the background
- Use SeekWell to run notebooks automatically and remotely
- Setting up your own server to remotely run a notebook
1. Locally (on your computer)
Pros: Simple; No additional costs
Cons: Requires your computer be awake and connected to the internet 24/7; Time consuming to set up and varies depending on your operating system
A. Use nbconvert
to convert your notebook into a .py file
a. Navigate to the directory of your notebook via your command line
b. Run jupyter nbconvert --to script 'my-notebook.ipynb'
c. The above command will create my-notebook.py
d. Run python my-notebook.py
to test it
e. More on nbconvert
can be found here
B. Run the script on a schedule
(Windows) via Task Scheduler
a. Click the Windows Start menu, click Control Panel > Administrative Tools and click Task Scheduler
b. On the The Actions pane, click on the Create Basic Task action
c. If your script is located at "E:\testscript.py" specify C:\path\to\python\python.exe "E:\My script.py" in task scheduler action section. If you don't know your path to python, check out this video
d. Navigate to the Trigger section and create a new trigger with the schedule you'd like (e.g. every hour) This video does a good job walking thru this part
(Mac) via LaunchControl ($15)
a. Open LaunchControl and select Global Agents
b. Find your script. It should show green, indicating that it is executable
c. Under Start Calendar Interval, select when and how often you want the script to run
d. You can set additional rules under Keep Alive
e. You can find more details here
2. Use SeekWell
Pros: Three click automation from within Jupyter Notebook or the desktop app; Easy and secure access to Google Sheets, Slack and SQL databases; The desktop app includes the ability to use SQL alongside Python
Cons: Requires subscription after free 14 day trial ($49/mo)
SeekWell's Chrome Extension and desktop app allow you to schedule a notebook to run daily, hourly or every 5 minutes with just a couple clicks. You can also send data directly to Google Sheets or Slack without storing API keys in plain text. This makes it easy to automatically refresh dashboards using Sheets’ or sending alerts to Slack.
There are two ways to automate with SeekWell--using the Chrome Extension within Jupyter Notebooks or using the desktop app.
Chrome Extension
a. Add the Chrome extension here and create a SeekWell account here.
b. Open a Jupyter Notebook
c. Click on the SeekWell Chrome Extension and select how often you’d like the notebook run
d. Click save and you’re done! You can manage all your schedules from your dashboard.
e. (Optional) Specify a destination (e.g. Google Sheets or Slack) in the Extension using the notebook metadata. See this video for more info.
SeekWell Desktop app
a. Create a SeekWell account here
b. Download the desktop app as part of the sign-up flow. If you want to send data to Slack, be sure to add that integration too.
c. If you need help connecting to your database, check out this article. Code cells on the left default to SQL. To switch them to Python type /python
in a cell and press enter
or return
d. Write your code in the cells and specify a destination for the data. For Google Sheets, navigate to ‘Sheets’ on the right hand side, select a workbook and designate the sheet and cell location in the field just below the code cell using A1 notation (e.g., Sheet2!B10). For Slack, specify a channel (e.g., #alerts).
e. To create a schedule, click on the clock icon in the app. Select how often you’d like to have it run, and the time of day if applicable.
Here’s what it looks like in the app:
f. Click save and you're done! You can manage your schedules from your dashboard.
3. Remotely (in the cloud)
Pros: Only costs computing power; Doesn't break when your computer is off / you're on vacation
Cons: Time consuming and complex to set up; Requires engineering and dev ops resources to get started and maintain; May require storing passwords in plain text on a server
You can set up scripts to run on a server, so they can refresh whether or not you’re not logged in to your machine / connected to the internet. We're going to use Google Cloud Platform here, but it's possible to do something similar on AWS or your cloud of choice. Here are the steps:
a. Set up a Google Cloud Storage bucket
b. Load your notebook to your bucket
c. Install the glcoud CLI
d. Depending on your prior use of Google Cloud, you will need to enable certain API's (e.g. Google Cloud Storage)
e. Run the following commands in your terminal (bash), be sure to change REPLACEWITHYOURBUCKET to your bucket created in step a.
# Compute Engine Instance parameters
export IMAGE_FAMILY="tf-latest-cu100"
export ZONE="us-central1-b"
export INSTANCE_NAME="notebook-executor"
export INSTANCE_TYPE="n1-standard-8"
# Notebook parameters
export INPUT_NOTEBOOK_PATH="gs://REPLACEWITHYOURBUCKET/input.ipynb"
export OUTPUT_NOTEBOOK_PATH="gs://REPLACEWITHYOURBUCKET/output.ipynb"
export STARTUP_SCRIPT="papermill ${INPUT_NOTEBOOK_PATH} ${OUTPUT_NOTEBOOK_PATH}"
gcloud compute instances create $INSTANCE_NAME \
--zone=$ZONE \
--image-family=$IMAGE_FAMILY \
--image-project=deeplearning-platform-release \
--maintenance-policy=TERMINATE \
--accelerator='type=nvidia-tesla-t4,count=2' \
--machine-type=$INSTANCE_TYPE \
--boot-disk-size=100GB \
--scopes=https://www.googleapis.com/auth/cloud-platform \
--metadata="install-nvidia-driver=True,startup-script=${STARTUP_SCRIPT}"
gcloud --quiet compute instances delete $INSTANCE_NAME --zone $ZONE]
f. This should run the notebook once and place the results in your bucket as output.ipynb
g. Next, we need a way to trigger this script to run automatically, which we can do with an App Engine cron job
h. Follow the instructions here to create an App Engine instance. This is a Flask web app.
i. Add an end point to execute the bash script above (be sure to import subprocess), e.g.
def run_notebook():
cmd = 'LONG BASH SCRIPT ABOVE'
response = subprocess.run(cmd, stderr=subprocess.PIPE, stdout=subprocess.PIPE, shell = True)
return 'Success!'
j. Deploy your web app with gcloud app deploy
A little bit of legwork up front can set you and your team up with a steady flow of data without worrying about pushing a button every hour. Let me know in the comments if you run into trouble!
Top comments (0)