Goals:
- Run notebooks files in a lambda.
- Allow them to install their own dependencies.
In this instance, I've used serverless framework, but the problems solved likely apply to other frameworks. After trying a number of approaches, the following seemed to work within the constraints of lambdas read-only file system:
- Create a dedicated workspace in /tmp.
- Copy the notebook and a script to start and execute a virtual environment into the workspace.
- Fork off to the script and allow it to run to completion.
Starting with the serverless.yml
file, note "IPYTHONDIR" must be set to somewhere in /tmp
since lambdas run on a read only file system:
service: nb-exec
frameworkVersion: '3'
provider:
name: aws
functions:
hello:
handler: handler.hello
environment:
IPYTHONDIR: /tmp/ipythondir
plugins:
- serverless-python-requirements
custom:
pythonRequirements:
fileName: requirements.txt
dockerizePip: true
package:
patterns:
- "!.venv/**"
- "!node_modules/**"
Our requirements.txt
file, which we will use to execute the notebook files:
nbconvert===7.9.2
ipython===8.16.1
ipykernel===6.25.2
Next, inside our handler:
import os
import shutil
import subprocess
import uuid
def hello(event, context):
unique_id = str(uuid.uuid4())
workspace_path = os.path.join(os.path.abspath(os.sep), "tmp", f"workspace_{unique_id}")
if not os.path.exists(workspace_path):
os.makedirs(workspace_path)
shutil.copy("execute.sh", workspace_path)
notebook_dir_path = os.path.join(workspace_path, "notebook")
os.makedirs(notebook_dir_path, exist_ok=True)
shutil.copy("example.ipynb", notebook_dir_path)
execute_script_path = os.path.join(workspace_path, "execute.sh")
subprocess.run(["bash", execute_script_path], cwd=workspace_path)
And finally the execute.sh
file:
# Make sure dependencies can be picked up from the deployment directory, as well as the
# built in AWS runtime dependencies.
export PYTHONPATH=$LAMBDA_TASK_ROOT:$LAMBDA_RUNTIME_DIR
# Create a virtual environment that inherits these dependencies.
python3 -m venv .venv --system-site-packages
source .venv/bin/activate
python3 -m nbconvert --to notebook --execute ./notebook/example.ipynb
One unsolved additional problem is the following error when installing dependencies from within a cell:
!pip install pandas
Error: out of pty devices
But replacing this with the following seems to work fine:
subprocess.run(["pip", "install", "pandas"])
Note, running untrusted code in a lambda environment is not secure as each invocation may have access to other invocations or AWS resources.
Top comments (0)