DEV Community

Cover image for SageMaker Lifecycle Configurations for custom environment
Tanmay Prashant Rane for AWS Community Builders

Posted on • Edited on

SageMaker Lifecycle Configurations for custom environment

Preface

Amazon SageMaker is great for us data scientists and machine learning engineers for exploring data, building models. The range of preinstalled packages is great and enough for normal scenarios. But for having new and specific versions of packages "pip" or "conda" managers can be used.

What this blog will give you?

A template to bend and flex according to your requirements and get lifecycle configurations done fast. πŸš€

The Problem 🧠

The sole issue is every time I turn the notebook instance off, the libraries installed manually are lost. This becomes a routine every time you start instance. I would very much like to have my environment and libraries setup as I start instance.

Preferences πŸ§‘β€πŸ’»

  1. I like to get my new environment started with fresh python installation using conda. Here's a one liner for that:

conda create -n <env_name> python=<python version>

conda create -n dev_env python=3.9

  1. Then we can activate the environment as:

conda activate <env_name>

conda activate dev_env

  1. After getting into environment I like to install packages using the pip manager.

pip install <package name>

pip install pyarrow

Let's put this in a lifecycle configuration shell script shall we! πŸ“œ

In the following script we will take care of following tasks:

  1. Creation of environment with desired version of python. In this case python 3.9
  2. Installation of required libraries with pip manager. In this case pandas, pyarrow and TensorFlow.
  3. Make newly created environment accessible for notebooks. Please read the comments in the script.

!/bin/bash

set -e

use ec2-user for operations

sudo -u ec2-user -i <<'EOF'

environment creation

conda create -n dev_env python=3.9 -y
source activate dev_env

library installation

pip install pandas
pip install pyarrow
pip install tensorflow==2.9

make new environment accessible

conda install ipykernel -y

source deactivate

Enter fullscreen mode Exit fullscreen mode




Script placement πŸͺ„

1. Locate lifecycle configurations in console
Lifecycle configuration in SageMaker console

2. Create lifecycle configuration to run on notebook start
Lifecycle configuration creation

3. Add lifecycle configuration to notebook instance configurations
SageMaker notebook instance lifecycle configuration

Results πŸ’‘

Kernel Selection!
See the kernel listed in the dropdown πŸ‘€
Kernel dropdown check

Couple of checks on terminal
And we have what we need!

Terminal environment check

There we have it πŸͺ„, let me know if it helped anyone. If anyone knows better method please comment and let me know! This is only one example of endless applications of lifecycle configurations. You can access them here: SageMaker Lifecycle Configuration scripts

Top comments (0)