After a lot of struggle doing this, I finally found a simple way.
IMPORTANT:
I've discovered that if you want to be able to save a model/pipeline and have it be importable without encountering ModuleNotFoundError
s when you try to load it again, then you need to be sure that your model is built in the same place that it's getting saved. In the case of a neural network, this means compiling, fitting, and saving in the same module. This has been a big headache for me, so I hope you can avoid it.
We can write and read Tensorflow
and sklearn
models/pipelines using joblib
.
Local Write / Read
from pathlib import Path
path = Path(<local path>)
# WRITE
with path.open("wb") as f:
joblib.dump(model, f)
# READ
with path.open("rb") as f:
f.seek(0)
model = joblib.load(f)
We can do the same thing on AWS S3 using a boto3
client:
AWS S3 Write / Read
import tempfile
import boto3
import joblib
s3_client = boto3.client('s3')
bucket_name = "my-bucket"
key = "model.pkl"
# WRITE
with tempfile.TemporaryFile() as fp:
joblib.dump(model, fp)
fp.seek(0)
s3_client.put_object(Body=fp.read(), Bucket=bucket_name, Key=key)
# READ
with tempfile.TemporaryFile() as fp:
s3_client.download_fileobj(Fileobj=fp, Bucket=bucket_name, Key=key)
fp.seek(0)
model = joblib.load(fp)
# DELETE
s3_client.delete_object(Bucket=bucket_name, Key=key)
Top comments (0)