This article was base on this post, however in this case I'm using CDK V2
You can take advantage of Lambda benefits for machine learning model inference with large libraries or pre-trained models.
Solution overview
For the inference Lambda function we use the Docker implementation in order to import the necessary libraries and load the ML Model. We're doing in this way because of the Lambda limits (10 GB for container images and 50 MB for .zip files) in the deployment package size.
We implement Amazon EFS as filesystem of Lambda function for inference, so now it’s even easier and faster to load large models and files to memory for ML inference workloads.
To upload the ML models to your file system, we use a Lambda function that is triggered when you upload the model to your S3 bucket. (in the next picture, you're going to see the architecture)
Architecture overview
The following diagram illustrates the architecture of the solution.
AWS services
- Amazon VPC ( required for Amazon EFS )
- Amazon S3
- Amazon EFS
- AWS Lambda
- Amazon API Gateway
Steps to create this architecture from scratch
We are using AWS CDK V2 with TypeScript.
The Docker for inference was created for scikit-learn requirement. You can customize the requirements file, to use the machine learning framework you want such as TensorFlow, PyTorch, xgboost, etc.
1) Create VPC, EFS and Access Point
const vpc = new Vpc(this, "MLVpc");
We create a VPC because EFS needs to be inside one.
const fs = new FileSystem(this, "MLFileSystem", {
vpc,
removalPolicy: RemovalPolicy.DESTROY,
throughputMode: ThroughputMode.BURSTING,
fileSystemName: "ml-models-efs",
});
Here, we create the EFS within our VPC. Something to keep in mind is the throughput mode.
In bursting mode, the throughput of your file system depends on how much data you’re storing in it. The number of burst credits scales with the amount of storage in your file system. Therefore, if you have an ML model that you expect to grow over time and you drive throughput in proportion to the size, you should use burst throughput mode.
const accessPoint = fs.addAccessPoint("LambdaAccessPoint", {
createAcl: {
ownerGid: "1001",
ownerUid: "1001",
permissions: "750"
},
path: "/export/lambda",
posixUser: {
gid: "1001",
uid: "1001"
}
});
Here, we add an access point that we're going to use in our lambda's functions to connect to EFS.
2) Create S3 Bucket and Lambda function (to upload content from S3 to EFS)
const bucket = new Bucket(this, "MLModelsBucket", {
encryption: BucketEncryption.S3_MANAGED,
bucketName: "machine-learning.models",
});
This is a basic configuration of S3 Bucket.
const MODEL_DIR = "/mnt/ml";
const loadFunction = new NodejsFunction(this, "HandleModelUploaded", {
functionName: "handle-model-uploaded",
entry: `${__dirname}/functions/model-uploaded/handler.ts`,
handler: "handler",
environment: {
MACHINE_LEARNING_MODELS_BUCKET_NAME: bucket.bucketName,
MODEL_DIR,
},
vpc,
filesystem: LambdaFileSystem.fromEfsAccessPoint(accessPoint, MODEL_DIR),
});
//Permission settings
bucket.grantRead(loadFunction);
loadFunction.addEventSource(
new S3EventSource(bucket, {
events: [EventType.OBJECT_CREATED],
})
);
Here there are a few stuff to consider:
1) VPC: You must to set up the same VPC of EFS in the lambda.
2) MODEL_DIR: This is the path where you're going to store and load machine learning models.
3) Event source: We're saying that every new object in the bucket is going to execute that lambda function.
3) Lambda function (to make inference) and API Gateway
const inferenceFunction = new DockerImageFunction(this, "InferenceModel", {
functionName: "inference-model",
code: DockerImageCode.fromImageAsset(
`${__dirname}/functions/model-inference`
),
environment: {
MODEL_DIR,
},
memorySize: 10240,
timeout: Duration.seconds(30),
vpc,
filesystem: LambdaFileSystem.fromEfsAccessPoint(accessPoint, MODEL_DIR),
});
We're using DockerImage as lambda function, because of package size. Remember the Lambda limits (10 GB for container images and 50 MB for .zip files) in the deployment package size.
We set up the maximum memory allowed because we're using an internal cache to avoid loading it on every request.
const api = new RestApi(this, "ApiGateway", {
restApiName: "inference-api",
defaultCorsPreflightOptions: {
allowHeaders: ["*"],
allowMethods: ["*"],
allowOrigins: ["*"],
}
});
const todos = api.root.addResource("inference");
todos.addMethod("POST", new LambdaIntegration(inferenceFunction, { proxy: true }));
Finally, we create a Rest Api Gateway. We use ApiGateway V1 because CDK V2 hasn't released Api Gateway V2 as constructor yet.
Github
Let's take a look to our Github
Top comments (0)