Sergio Kaz

Posted on Jun 22, 2022 • Edited on May 11, 2024

Deploy multiple machine learning models for inference using Lambda and CDK V2

#machinelearning #serverless #datascience #devops

This article was base on this post, however in this case I'm using CDK V2

You can take advantage of Lambda benefits for machine learning model inference with large libraries or pre-trained models.

Solution overview

For the inference Lambda function we use the Docker implementation in order to import the necessary libraries and load the ML Model. We're doing in this way because of the Lambda limits (10 GB for container images and 50 MB for .zip files) in the deployment package size.

We implement Amazon EFS as filesystem of Lambda function for inference, so now it’s even easier and faster to load large models and files to memory for ML inference workloads.

To upload the ML models to your file system, we use a Lambda function that is triggered when you upload the model to your S3 bucket. (in the next picture, you're going to see the architecture)

Architecture overview

The following diagram illustrates the architecture of the solution.

AWS services

Amazon VPC ( required for Amazon EFS )
Amazon S3
Amazon EFS
AWS Lambda
Amazon API Gateway

Steps to create this architecture from scratch

We are using AWS CDK V2 with TypeScript.

The Docker for inference was created for scikit-learn requirement. You can customize the requirements file, to use the machine learning framework you want such as TensorFlow, PyTorch, xgboost, etc.

1) Create VPC, EFS and Access Point

const vpc = new Vpc(this, "MLVpc");

We create a VPC because EFS needs to be inside one.

const fs = new FileSystem(this, "MLFileSystem", {
   vpc,
   removalPolicy: RemovalPolicy.DESTROY,
   throughputMode: ThroughputMode.BURSTING,
   fileSystemName: "ml-models-efs",
});

Here, we create the EFS within our VPC. Something to keep in mind is the throughput mode.

In bursting mode, the throughput of your file system depends on how much data you’re storing in it. The number of burst credits scales with the amount of storage in your file system. Therefore, if you have an ML model that you expect to grow over time and you drive throughput in proportion to the size, you should use burst throughput mode.

const accessPoint = fs.addAccessPoint("LambdaAccessPoint", {
  createAcl: {
    ownerGid: "1001",
    ownerUid: "1001",
    permissions: "750"
  },
  path: "/export/lambda",
  posixUser: {
    gid: "1001",
    uid: "1001"
  }
});

Here, we add an access point that we're going to use in our lambda's functions to connect to EFS.

2) Create S3 Bucket and Lambda function (to upload content from S3 to EFS)

const bucket = new Bucket(this, "MLModelsBucket", {
  encryption: BucketEncryption.S3_MANAGED,
  bucketName: "machine-learning.models",
});

This is a basic configuration of S3 Bucket.

const MODEL_DIR = "/mnt/ml";

const loadFunction = new NodejsFunction(this, "HandleModelUploaded", {
  functionName: "handle-model-uploaded",
  entry: `${__dirname}/functions/model-uploaded/handler.ts`,
  handler: "handler",
  environment: {
    MACHINE_LEARNING_MODELS_BUCKET_NAME: bucket.bucketName,
    MODEL_DIR,
  },
  vpc,
  filesystem: LambdaFileSystem.fromEfsAccessPoint(accessPoint, MODEL_DIR),
});

//Permission settings

bucket.grantRead(loadFunction);

loadFunction.addEventSource(
  new S3EventSource(bucket, {
    events: [EventType.OBJECT_CREATED],
  })
);

Here there are a few stuff to consider:

1) VPC: You must to set up the same VPC of EFS in the lambda.
2) MODEL_DIR: This is the path where you're going to store and load machine learning models.
3) Event source: We're saying that every new object in the bucket is going to execute that lambda function.

3) Lambda function (to make inference) and API Gateway

const inferenceFunction = new DockerImageFunction(this, "InferenceModel", {
  functionName: "inference-model",
  code: DockerImageCode.fromImageAsset(
    `${__dirname}/functions/model-inference`
  ),
  environment: {
    MODEL_DIR,
  },
  memorySize: 10240,
  timeout: Duration.seconds(30),
  vpc,
  filesystem: LambdaFileSystem.fromEfsAccessPoint(accessPoint, MODEL_DIR),
});

We're using DockerImage as lambda function, because of package size. Remember the Lambda limits (10 GB for container images and 50 MB for .zip files) in the deployment package size.

We set up the maximum memory allowed because we're using an internal cache to avoid loading it on every request.

const api = new RestApi(this, "ApiGateway", {
   restApiName: "inference-api",
   defaultCorsPreflightOptions: {
     allowHeaders: ["*"],
     allowMethods: ["*"],
     allowOrigins: ["*"],
   }
});

const todos = api.root.addResource("inference");

todos.addMethod("POST", new LambdaIntegration(inferenceFunction, { proxy: true }));

Finally, we create a Rest Api Gateway. We use ApiGateway V1 because CDK V2 hasn't released Api Gateway V2 as constructor yet.

Github

Let's take a look to our Github

DEV Community

Deploy multiple machine learning models for inference using Lambda and CDK V2

Solution overview

Architecture overview

AWS services

Steps to create this architecture from scratch

1) Create VPC, EFS and Access Point

2) Create S3 Bucket and Lambda function (to upload content from S3 to EFS)

3) Lambda function (to make inference) and API Gateway

Github

Top comments (0)

Read next

Intel Gaudi NPU Matches NVIDIA GPU Performance at 30% Lower Cost in AI Workload Tests

Real-Time AI Video Generation Hits 40 FPS: New System Creates High-Quality Clips on a Single GPU

Ironforge: A Comprehensive DevOps Platform for Web3 Applications

DevOpsDays Chattanooga 2024: Learning, Laughter, and Innovation