Matia Rašetina

Posted on Jun 6

How to Deploy Production-Ready AI Models on AWS Lambda Without the $500/Month Bill

#aws #ai #serverless #docker

If you are an aspiring Data Scientist, ML engineer or a DevOps engineer looking to expand your knowledge with AWS services, you’ve come to the best post!

Many ML engineers struggle to deploy models without incurring SageMaker costs. This post shows you how to deploy real-time ML inference using only Lambda and Docker—fully serverless, low-cost, and production-grade.

In this tutorial, we will go over a small project where we are going to develop an AWS Lambda and put it into a Docker container which is going to be used for predicting injury duration of soccer players, based on the injury description. Technologies we are going to use in this project are:

Python
Docker
AWS services:
- Lambda
- API Gateway
- CloudFormation
- Elastic Container Registry

4 simple steps we are going to take to complete this project are:

Create a script which is going to train and optimize an AI model based on the dataset
Create the Lambda code which will take in the parameters based on which we need to predict the injury duration
Create the CloudFormation template which will build our Lambda by using Docker
Deploy the mentioned CloudFormation template to our AWS environment

The Github repository can be found here: https://github.com/mate329/dockerized-AI-Lambda

Important note - all resources and identifiers shown were used in a temporary environment and have since been deleted.

Let’s start!

Training and optimization script for AI Model

Data Description

The dataset we are using to train our AI model on is a dataset regarding soccer injuries. Injuries are a common occurrence in any sport and it would be very beneficial for any professional sport club to have a system which will, based on the injury description, predict the length of the injury. Here are the parameters which are used to describe an injury:

The dataset has been downloaded from https://stemgames.hr/en/event/competition-in-problem-solving-exercises/technology-arena/

Libraries used

Libraries which we are going to import into our training script:

NumPy - used for scientific methods
Pandas - used for data manipulation, analysis and loading CSV files
the star of our training process - Optuna - used for hyperparameter optimization of AI models
Pickle - saving and loading Python objects
JSON - standard library for working with JSON data
XGBoost - our AI library containing the regressor model which we are going to use
Scikit-Learn - used for splitting data into training and test datasets

import numpy as np 
import pandas as pd 
import optuna
import pickle
import joblib
import json
from xgboost import XGBRegressor
from sklearn.model_selection import train_test_split
from sklearn.metrics import mean_squared_error, mean_absolute_error, r2_score

Loading the data and creating datasets

Next, we are going to load the data, get the list of the features inside our dataset and split the data into training and test datasets

# Loading the data, where 'kaggle_x_train.csv' contains the injury description
# while 'kaggle_y_train.csv' contains the actual duration of the injury
data_train_x = pd.read_csv('data/kaggle_x_train.csv')
data_train_y = pd.read_csv('data/kaggle_y_train.csv')

# Since the description and actual duration of the injury are in two different
# files, we need to merge them into one object
merged_train = pd.merge(data_train_x, data_train_y[['Id', 'injury_duration']], on='Id')
merged_train.drop(columns=['Id'], inplace=True)

# Separate features and target variable
X = merged_train.drop('injury_duration', axis=1)
y = merged_train['injury_duration']

# Save feature names for Lambda validation
feature_names = list(X.columns)

# Split the data into training and testing sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

AI Model Optimization

Now we are coming to a part of the code where we will define the hyperparameter optimization of our AI model.

Hyperparameter optimization is trying to find the combination of model settings (called hyperparameters) that give the highest performance on a given dataset. To make this process as easy as possible, we are using Python library Optuna, which does the optimization for us.

We will define the method called objective where we need to define the parameters which we want to optimize and their ranges, then we provide the params dictionary to the AI model, we train it on our data and see the results.
Root Mean Square Error (or RMSE) is one of the metrics which is used for assessing model performance. In this project, RMSE represents the average difference between the actual injury duration and predicted injury duration.

To start the optimization process, we call the Optuna library and create something called a study - think of it as starting an experiment - we set the direction to minimize since we want the final model to predict the injury duration closest to the actual duration and we start the optimization process by calling the optimize method and defining the number of trials we want to run.

def objective(trial):
    # Define hyperparameter space
    params = {
        'n_estimators': trial.suggest_int('n_estimators', 1, 100),
        'max_depth': trial.suggest_int('max_depth', 1, 10),
        'learning_rate': trial.suggest_float('learning_rate', 0.01, 0.3),
        'min_child_weight': trial.suggest_int('min_child_weight', 1, 10),
        'gamma': trial.suggest_float('gamma', 0, 0.3)
    }

    # Create and fit model
    model = XGBRegressor(**params)
    model.fit(X_train, y_train, eval_set=[(X_test, y_test)], verbose=False)

    # Make predictions and return RMSE
    preds = model.predict(X_test)
    mse = mean_squared_error(y_test, preds)
    rmse = np.sqrt(mse)
    return rmse

# Minimize RMSE
study = optuna.create_study(direction='minimize')
study.optimize(objective, n_trials=200)

Feel free to experiment with the hyperparameter ranges, mine are used just as an example.

Saving needed artifacts for Lambda

At the end of our script, we are saving the necessary files (called artifacts) which our Lambda is going to use. For this, we are going to use joblib library to save the final model with the highest performance.

# Fetch the best hyperparameters found during the study
best_params = study.best_params
print(f"Best parameters: {best_params}")
print(f"Best RMSE: {study.best_value}")

# Train final model with best parameters and evaluate
final_model = XGBRegressor(**best_params)
final_model.fit(X_train, y_train)
final_preds = final_model.predict(X_test)

print("Final model performance:")
print(f"Best parameters: {best_params}")
print(f"RMSE: {np.sqrt(mean_squared_error(y_test, final_preds)):.4f}")

# Export artifacts for AWS Lambda
print("Exporting model artifacts...")

# 1. Save the trained model
joblib.dump(final_model, 'PredictInjuryDurationLambda/injury_model.pkl')
print("Model saved as 'PredictInjuryDurationLambda/injury_model.pkl'")

# 2. Save model metadata
metadata = {
    'feature_names': feature_names,
    'best_params': best_params,
    'model_performance': {
        'rmse': float(np.sqrt(mean_squared_error(y_test, final_preds)))
    },
    'training_date': pd.Timestamp.now().isoformat(),
    'n_features': len(feature_names)
}

with open('PredictInjuryDurationLambda/model_metadata.json', 'w') as f:
    json.dump(metadata, f, indent=2)
print("Metadata saved as 'PredictInjuryDurationLambda/model_metadata.json'")

Here is the screenshot of the terminal window with Optuna library logging every trial which occurs during optimization, with the final parameters logged at the end:

And that’s it for our model training! Next, we’ll go over the Lambda code and a full step-by-step how to dockerize it!

Lambda code and Dockerizing the model

To start off, let’s explain our architecture - the client will send the request to the API Gateway URL which will forward the request to the Lambda. The Lambda will load the Docker image which is located inside Elastic Container Repository and use it to process the incoming request to predict the injury duration. After the prediction, the Lambda will return the data to the client via API Gateway.

Here is the architecture diagram:

Lambda code

The following Lambda code loads the incoming payload, validates that there are no missing information inside the payload and proceeds with the injury duration duration:

import json
import joblib
import pandas as pd

from aws_lambda_powertools import Logger

# Initialize logger
logger = Logger(service="injury_prediction")

# Load artifacts once when Lambda container starts
model = joblib.load('injury_model.pkl')
with open('model_metadata.json', 'r') as f:
    metadata = json.load(f)

@logger.inject_lambda_context(log_event=True)
def lambda_handler(event, context):
        # Load the incoming event and setup the logger
    event_body = json.loads(event.get('body')) if 'body' in event else event
    request_id = context.aws_request_id
    logger.append_keys(request_id=request_id)

    # Validate event structure
    if 'features' not in event_body:
        logger.error(f"Invalid event structure: {event_body}")
        return {
            'statusCode': 400,
            'body': json.dumps({'error': 'Invalid event structure'})
        }

    # Extract features from event
    features = event_body['features']

    # Validate features match training
    expected_features = metadata['feature_names']
    if list(features.keys()) != expected_features:
        logger.error(f"Feature mismatch: expected {expected_features}, got {list(features.keys())}")
        return {
            'statusCode': 400,
            'body': json.dumps({'error': 'Feature mismatch'})
        }

    return perform_prediction(features, expected_features)

def perform_prediction(features, expected_features):
    try:
        # Create DataFrame with correct feature order
        X = pd.DataFrame([features])[expected_features]

        # Make prediction
        prediction = model.predict(X)[0]

        logger.info(f"Prediction made successfully: {prediction} days")

        # Return response
        return {
            'statusCode': 200,
            'body': json.dumps({
                'injury_duration_days': float(prediction),
                'model_version': metadata['training_date']
            })
        }
    except Exception as e:
        return {
            'statusCode': 500,
            'body': json.dumps({'error': str(e)})
        }

To improve the Lambda performance, the AI model for prediction is a global variable, which follows the AWS Lambda best practices because the Lambda reuses global variables in every following invocation until it gets shut down after inactivity.

Defining Dockerfile

To define the configuration of our Docker image, we need to create a Dockerfile. We’ll use an official AWS Lambda image for Python 3.12, copy over the requirements.txt file with our Lambda dependencies, install them and then copy over the artifacts which we’ve generated after training our AI model. Lastly, we’ll define the entry method for our Lambda.

# Use an official AWS Lambda base image for Python
FROM public.ecr.aws/lambda/python:3.12

# Install required Python libraries
COPY requirements.txt ./
RUN pip install --no-cache-dir -r requirements.txt

# Copy the saved model and Lambda function code into the container
COPY lambda_handler.py ./
COPY injury_model.pkl ./
COPY model_metadata.json ./

# Command to run the Lambda function
CMD ["lambda_handler.lambda_handler"]

Our requirements.txt file for our Lambda is the following:

numpy
pandas
xgboost-cpu
scikit-learn
joblib
aws-lambda-powertools

We are using XGBoost-CPU library instead of the regular XGBoost library because Lambda is a CPU-only resource + we’ve saved ~800MB in Docker image size because the CPU-only library doesn’t download binary files used for interacting with computer’s GPU.

Defining CloudFormation resources

The following CloudFormation template is a very simple one - we just have our Lambda and API Gateway resource to open it up to the Internet.

The most important configuration is under the Metadata section of our Lambda - we need to define the name of our Dockerfile, DockerContext and how will we tag the Docker image when it gets created. In addition, the PackageType needs to be set to image as well, so CloudFormation recognizes that we are trying to deploy a dockerized Lambda.

AWSTemplateFormatVersion: '2010-09-09'
Transform: AWS::Serverless-2016-10-31
Description: >
  Predict Injury Duration Lambda

Resources:
  PredictInjuryDurationLambda:
    Type: AWS::Serverless::Function
    Properties:
      PackageType: Image
      Timeout: 60
      Architectures:
        - x86_64
      Events:
        Inference:
          Type: HttpApi
          Properties:
            Path: /predict-injury-duration
            Method: post
            ApiId: !Ref PredictInjuryDurationApi
    Metadata:
      Dockerfile: Dockerfile
      DockerContext: ./
      DockerTag: latest

  PredictInjuryDurationApi:
    Type: AWS::Serverless::HttpApi
    Properties:
      CorsConfiguration:
        AllowOrigins:
          - '*'  # Allow requests from any origin
        AllowHeaders:
          - '*'
        AllowMethods:
          - '*'

Outputs:
  PredictInjuryDurationApi:
    Description: "API Gateway endpoint URL for Prod stage for Inference function"
    Value: !Sub "https://${PredictInjuryDurationApi}.execute-api.${AWS::Region}.amazonaws.com/Prod/predict-injury-duration/"

  LambdaFunctionArn:
    Description: "Injury Duration Prediction Lambda Function ARN"
    Value: !GetAtt PredictInjuryDurationLambda.Arn

Deploying the Lambda with AWS Serverless Application Model and Docker

To continue, you will need to install the following dependencies:

AWS Serverless Application Model - https://docs.aws.amazon.com/serverless-application-model/latest/developerguide/install-sam-cli.html
Docker - https://docs.docker.com/engine/install/

After installing these dependencies, you can continue with the tutorial.

We can build and locally test our Lambda with 3 simple terminal commands:

sam build - command which will find our CloudFormation template, build the necessary artifacts for deploying to AWS
sam local start-api - command which enables us to run our Lambda locally without any expense, so we can confirm that everything is working as expected before deployment
sam deploy - command which takes the built CloudFormation template resources and sends them to your AWS account to be deployed.

Building the resources

Open up your terminal window, go into the folder where you created your template.yaml file (in my case that’s PredictInjuryDurationLambda folder) and run sam build. The expected output should look like:

(venv) matia@matia-H510M-H-V2:./PredictInjuryDurationLambda$ sam build
Building codeuri: /home/matia/dev/cloudkey/mlops-cicd/injuries_lambda/PredictInjuryDurationLambda runtime: None architecture: x86_64 functions: PredictInjuryDurationLambda                   
Building image for PredictInjuryDurationLambda function                                                                                                                                       
Setting DockerBuildArgs for PredictInjuryDurationLambda function                                                                                                                              
Step 1/7 : FROM public.ecr.aws/lambda/python:3.12
3.12: Pulling from lambda/python 
8deb1a9ce5e3: Pull complete 
99a4e43f82e3: Pull complete 
0e56aa1f1c26: Pull complete 
e2ef3e53683d: Pull complete 
b9dec667dad3: Pull complete 
94964360ff6a: Pull complete 
Status: Downloaded newer image for public.ecr.aws/lambda/python:3.12 ---> ef61d0102ac9
Step 2/7 : COPY requirements.txt ./
 ---> bb551c013332
Step 3/7 : RUN pip install --no-cache-dir -r requirements.txt
 ---> Running in 4dd05ae833d0
Collecting numpy (from -r requirements.txt (line 1))
  Downloading numpy-2.2.6-cp312-cp312-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.metadata (62 kB)
Collecting pandas (from -r requirements.txt (line 2))
  Downloading pandas-2.2.3-cp312-cp312-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.metadata (89 kB)
Collecting optuna (from -r requirements.txt (line 3))
  Downloading optuna-4.3.0-py3-none-any.whl.metadata (17 kB)
Collecting xgboost (from -r requirements.txt (line 4))
  Downloading xgboost-3.0.2-py3-none-manylinux_2_28_x86_64.whl.metadata (2.1 kB)
Collecting scikit-learn (from -r requirements.txt (line 5))
  Downloading scikit_learn-1.6.1-cp312-cp312-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.metadata (18 kB)
Collecting joblib (from -r requirements.txt (line 6))
  Downloading joblib-1.5.1-py3-none-any.whl.metadata (5.6 kB)
Requirement already satisfied: python-dateutil>=2.8.2 in /var/lang/lib/python3.12/site-packages (from pandas->-r requirements.txt (line 2)) (2.9.0.post0)
Collecting pytz>=2020.1 (from pandas->-r requirements.txt (line 2))
  Downloading pytz-2025.2-py2.py3-none-any.whl.metadata (22 kB)
Collecting tzdata>=2022.7 (from pandas->-r requirements.txt (line 2))
  Downloading tzdata-2025.2-py2.py3-none-any.whl.metadata (1.4 kB)
Collecting alembic>=1.5.0 (from optuna->-r requirements.txt (line 3))
  Downloading alembic-1.16.1-py3-none-any.whl.metadata (7.3 kB)
Collecting colorlog (from optuna->-r requirements.txt (line 3))
  Downloading colorlog-6.9.0-py3-none-any.whl.metadata (10 kB)
Collecting packaging>=20.0 (from optuna->-r requirements.txt (line 3))
  Downloading packaging-25.0-py3-none-any.whl.metadata (3.3 kB)
Collecting sqlalchemy>=1.4.2 (from optuna->-r requirements.txt (line 3))
  Downloading sqlalchemy-2.0.41-cp312-cp312-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.metadata (9.6 kB)
Collecting tqdm (from optuna->-r requirements.txt (line 3))
  Downloading tqdm-4.67.1-py3-none-any.whl.metadata (57 kB)
Collecting PyYAML (from optuna->-r requirements.txt (line 3))
  Downloading PyYAML-6.0.2-cp312-cp312-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.metadata (2.1 kB)
Collecting nvidia-nccl-cu12 (from xgboost->-r requirements.txt (line 4))
  Downloading nvidia_nccl_cu12-2.27.3-py3-none-manylinux2014_x86_64.manylinux_2_17_x86_64.whl.metadata (2.0 kB)
Collecting scipy (from xgboost->-r requirements.txt (line 4))
  Downloading scipy-1.15.3-cp312-cp312-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.metadata (61 kB)
Collecting threadpoolctl>=3.1.0 (from scikit-learn->-r requirements.txt (line 5))
  Downloading threadpoolctl-3.6.0-py3-none-any.whl.metadata (13 kB)
Collecting Mako (from alembic>=1.5.0->optuna->-r requirements.txt (line 3))
  Downloading mako-1.3.10-py3-none-any.whl.metadata (2.9 kB)
Collecting typing-extensions>=4.12 (from alembic>=1.5.0->optuna->-r requirements.txt (line 3))
  Downloading typing_extensions-4.14.0-py3-none-any.whl.metadata (3.0 kB)
Requirement already satisfied: six>=1.5 in /var/lang/lib/python3.12/site-packages (from python-dateutil>=2.8.2->pandas->-r requirements.txt (line 2)) (1.17.0)
Collecting greenlet>=1 (from sqlalchemy>=1.4.2->optuna->-r requirements.txt (line 3))
  Downloading greenlet-3.2.2-cp312-cp312-manylinux_2_24_x86_64.manylinux_2_28_x86_64.whl.metadata (4.1 kB)
Collecting MarkupSafe>=0.9.2 (from Mako->alembic>=1.5.0->optuna->-r requirements.txt (line 3))
  Downloading MarkupSafe-3.0.2-cp312-cp312-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.metadata (4.0 kB)
Downloading numpy-2.2.6-cp312-cp312-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (16.5 MB)
   ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 16.5/16.5 MB 6.6 MB/s eta 0:00:00
Downloading pandas-2.2.3-cp312-cp312-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (12.7 MB)
   ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 12.7/12.7 MB 6.8 MB/s eta 0:00:00
Downloading optuna-4.3.0-py3-none-any.whl (386 kB)
Downloading xgboost-3.0.2-py3-none-manylinux_2_28_x86_64.whl (253.9 MB)
   ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 253.9/253.9 MB 6.7 MB/s eta 0:00:00
Downloading scikit_learn-1.6.1-cp312-cp312-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (13.1 MB)
   ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 13.1/13.1 MB 7.4 MB/s eta 0:00:00
Downloading joblib-1.5.1-py3-none-any.whl (307 kB)
Downloading alembic-1.16.1-py3-none-any.whl (242 kB)
Downloading packaging-25.0-py3-none-any.whl (66 kB)
Downloading pytz-2025.2-py2.py3-none-any.whl (509 kB)
Downloading scipy-1.15.3-cp312-cp312-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (37.3 MB)
   ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 37.3/37.3 MB 7.2 MB/s eta 0:00:00
Downloading sqlalchemy-2.0.41-cp312-cp312-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (3.3 MB)
   ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 3.3/3.3 MB 7.7 MB/s eta 0:00:00
Downloading threadpoolctl-3.6.0-py3-none-any.whl (18 kB)
Downloading tzdata-2025.2-py2.py3-none-any.whl (347 kB)
Downloading colorlog-6.9.0-py3-none-any.whl (11 kB)
Downloading nvidia_nccl_cu12-2.27.3-py3-none-manylinux2014_x86_64.manylinux_2_17_x86_64.whl (322.4 MB)
   ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 322.4/322.4 MB 6.9 MB/s eta 0:00:00
Downloading PyYAML-6.0.2-cp312-cp312-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (767 kB)
   ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 767.5/767.5 kB 9.5 MB/s eta 0:00:00
Downloading tqdm-4.67.1-py3-none-any.whl (78 kB)
Downloading greenlet-3.2.2-cp312-cp312-manylinux_2_24_x86_64.manylinux_2_28_x86_64.whl (603 kB)
   ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 603.9/603.9 kB 6.1 MB/s eta 0:00:00
Downloading typing_extensions-4.14.0-py3-none-any.whl (43 kB)
Downloading mako-1.3.10-py3-none-any.whl (78 kB)
Downloading MarkupSafe-3.0.2-cp312-cp312-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (23 kB)
Installing collected packages: pytz, tzdata, typing-extensions, tqdm, threadpoolctl, PyYAML, packaging, nvidia-nccl-cu12, numpy, MarkupSafe, joblib, greenlet, colorlog, sqlalchemy, scipy, pandas, Mako, xgboost, scikit-learn, alembic, optuna
Successfully installed Mako-1.3.10 MarkupSafe-3.0.2 PyYAML-6.0.2 alembic-1.16.1 colorlog-6.9.0 greenlet-3.2.2 joblib-1.5.1 numpy-2.2.6 nvidia-nccl-cu12-2.27.3 optuna-4.3.0 packaging-25.0 pandas-2.2.3 pytz-2025.2 scikit-learn-1.6.1 scipy-1.15.3 sqlalchemy-2.0.41 threadpoolctl-3.6.0 tqdm-4.67.1 typing-extensions-4.14.0 tzdata-2025.2 xgboost-3.0.2
WARNING: Running pip as the 'root' user can result in broken permissions and conflicting behaviour with the system package manager, possibly rendering your system unusable.It is recommended to use a virtual environment instead: https://pip.pypa.io/warnings/venv. Use the --root-user-action option if you know what you are doing and want to suppress this warning.

[notice] A new release of pip is available: 24.3.1 -> 25.1.1
[notice] To update, run: pip install --upgrade pip
 ---> Removed intermediate container 4dd05ae833d0
 ---> 17d74dbc2906
Step 4/7 : COPY lambda_handler.py ./
 ---> 04d29738de3f
Step 5/7 : COPY injury_model.pkl ./
 ---> 8af7e1760bdf
Step 6/7 : COPY model_metadata.json ./
 ---> 6e20d1175fb5
Step 7/7 : CMD ["lambda_handler.lambda_handler"]
 ---> Running in e80c76b0a956
 ---> Removed intermediate container e80c76b0a956
 ---> 49128c299b03
Successfully built 49128c299b03
Successfully tagged predictinjurydurationlambda:latest

Simple! Now let’s test out our Lambda locally to confirm that it’s working.

Testing with the local build

We’ll spin up our stack locally by running sam local start-api which is a great way of verifying that our Lambda works as expected. Here is the terminal output together with the Postman API call to our Lambda.

Feel free to use my payload which I’ve used for testing the Lambda:

{
  "features": {
    "age": 28,
    "is_contact": 1,
    "has_stopped": 1,
    "swelling": 2,
    "tone": 1,
    "palpation": 3,
    "is_contraction_painful": 1,
    "is_stretching_painful": 1,
    "class": 3,
    "is_proximal": 0,
    "is_abdominal": 0,
    "is_distal": 1,
    "fascia_depth": 2,
    "is_hamstring": 0,
    "is_quadriceps": 1,
    "is_add_abd": 0,
    "is_calf": 1,
    "is_belly": 0
  }
}

And here are the results from Postman API call:

And here are the results! You can see that my deployed model predicted that the injury duration, based on these descriptors, will be 17 days.

Deploying the Lambda to AWS account

You can deploy your Lambda and API Gateway by running the sam deploy -g command (the -g flag stands for “guided”) inside your terminal window where SAM will guide you through the deployment process. The output will look like this:

(venv) matia@matia-H510M-H-V2:./PredictInjuryDurationLambda$ sam deploy -g

Configuring SAM deploy
======================

    Looking for config file [samconfig.toml] :  Found
    Reading default arguments  :  Success

    Setting default arguments for 'sam deploy'
    =========================================
    Stack Name [sam-app]: PredictInjuryStack
    AWS Region [eu-central-1]: 
    #Shows you resources changes to be deployed and require a 'Y' to initiate deploy
    Confirm changes before deploy [y/N]: 
    #SAM needs permission to be able to create roles to connect to the resources in your template
    Allow SAM CLI IAM role creation [Y/n]: 
    #Preserves the state of previously provisioned resources when an operation fails
    Disable rollback [y/N]: 
    PredictInjuryDurationLambda has no authentication. Is this okay? [y/N]: y
    Save arguments to configuration file [Y/n]: 
    SAM configuration file [samconfig.toml]: 
    SAM configuration environment [default]: 

    Looking for resources needed for deployment:

    Managed S3 bucket: aws-sam-cli-managed-default-samclisourcebucket-vvz5tfqz29f9
    A different default S3 bucket can be set in samconfig.toml and auto resolution of buckets turned off by setting resolve_s3=False
     Image repositories: Not found.
     #Managed repositories will be deleted when their functions are removed from the template and deployed
     Create managed ECR repositories for all functions? [Y/n]: 

    Saved arguments to config file
    Running 'sam deploy' for future deployments will use the parameters saved above.
    The above parameters can be changed by modifying samconfig.toml
    Learn more about samconfig.toml syntax at 
    https://docs.aws.amazon.com/serverless-application-model/latest/developerguide/serverless-sam-cli-config.html

75bd0feec80a: Pushed 
af455ae28cb4: Pushed 
4761438a3d36: Pushed 
1687a3ac8fa7: Pushed 
8f989553341c: Pushed 
c678e1c5e5d8: Pushed 
5964a0804fb6: Pushed 
6a9b57324378: Pushed 
f27b91471588: Pushed 
af56b219ad31: Pushed 
0b81e7a3683d: Pushed 
predictinjurydurationlambda-49128c299b03-latest: digest: sha256:fa1c14414b5029b79f0e97c5647331729beb42891e9eb814b8f77c0a5cda5e15 size: 2621

    Deploying with following values
    ===============================
    Stack name                   : PredictInjuryStack
    Region                       : eu-central-1
    Confirm changeset            : False
    Disable rollback             : False
    Deployment image repository  : 
                                       {
                                           "PredictInjuryDurationLambda": "9257xxxxxxxx.dkr.ecr.eu-central-1.amazonaws.com/predictinjurystack1f00e9c8/predictinjurydurationlambdad41e7e42repo"
                                       }
    Deployment s3 bucket         : aws-sam-cli-managed-default-samclisourcebucket-vvz5tfqz29f9
    Capabilities                 : ["CAPABILITY_IAM"]
    Parameter overrides          : {}
    Signing Profiles             : {}

Initiating deployment
=====================

PredictInjuryDurationLambda has no authentication.
    Uploading to PredictInjuryStack/335eac6f8b2db4755ca431660bfaf17f.template  1578 / 1578  (100.00%)

Waiting for changeset to be created..

CloudFormation stack changeset
-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Operation                                      LogicalResourceId                              ResourceType                                   Replacement                                  
-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
+ Add                                          PredictInjuryDurationApiApiGatewayDefaultSta   AWS::ApiGatewayV2::Stage                       N/A                                          
                                               ge                                                                                                                                         
+ Add                                          PredictInjuryDurationApi                       AWS::ApiGatewayV2::Api                         N/A                                          
+ Add                                          PredictInjuryDurationLambdaInferencePermissi   AWS::Lambda::Permission                        N/A                                          
                                               on                                                                                                                                         
+ Add                                          PredictInjuryDurationLambdaRole                AWS::IAM::Role                                 N/A                                          
+ Add                                          PredictInjuryDurationLambda                    AWS::Lambda::Function                          N/A                                          
-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------

Changeset created successfully. arn:aws:cloudformation:eu-central-1:9257xxxxxxx:changeSet/samcli-deploy1749059624/b0ee6116-2396-4322-bd2e-7a9467225a40

2025-06-04 19:53:50 - Waiting for stack create/update to complete

CloudFormation events from stack operations (refresh every 5.0 seconds)
-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
ResourceStatus                                 ResourceType                                   LogicalResourceId                              ResourceStatusReason                         
-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
CREATE_IN_PROGRESS                             AWS::CloudFormation::Stack                     PredictInjuryStack                             User Initiated                               
CREATE_IN_PROGRESS                             AWS::IAM::Role                                 PredictInjuryDurationLambdaRole                -                                            
CREATE_IN_PROGRESS                             AWS::IAM::Role                                 PredictInjuryDurationLambdaRole                Resource creation Initiated                  
CREATE_COMPLETE                                AWS::IAM::Role                                 PredictInjuryDurationLambdaRole                -                                            
CREATE_IN_PROGRESS                             AWS::Lambda::Function                          PredictInjuryDurationLambda                    -                                            
CREATE_IN_PROGRESS                             AWS::Lambda::Function                          PredictInjuryDurationLambda                    Resource creation Initiated                  
CREATE_IN_PROGRESS - CONFIGURATION_COMPLETE    AWS::Lambda::Function                          PredictInjuryDurationLambda                    Eventual consistency check initiated         
CREATE_IN_PROGRESS                             AWS::ApiGatewayV2::Api                         PredictInjuryDurationApi                       -                                            
CREATE_IN_PROGRESS                             AWS::ApiGatewayV2::Api                         PredictInjuryDurationApi                       Resource creation Initiated                  
CREATE_COMPLETE                                AWS::ApiGatewayV2::Api                         PredictInjuryDurationApi                       -                                            
CREATE_IN_PROGRESS                             AWS::Lambda::Permission                        PredictInjuryDurationLambdaInferencePermissi   -                                            
                                                                                              on                                                                                          
CREATE_IN_PROGRESS                             AWS::Lambda::Permission                        PredictInjuryDurationLambdaInferencePermissi   Resource creation Initiated                  
                                                                                              on                                                                                          
CREATE_COMPLETE                                AWS::Lambda::Permission                        PredictInjuryDurationLambdaInferencePermissi   -                                            
                                                                                              on                                                                                          
CREATE_COMPLETE                                AWS::Lambda::Function                          PredictInjuryDurationLambda                    -                                            
CREATE_IN_PROGRESS                             AWS::ApiGatewayV2::Stage                       PredictInjuryDurationApiApiGatewayDefaultSta   -                                            
                                                                                              ge                                                                                          
CREATE_IN_PROGRESS                             AWS::ApiGatewayV2::Stage                       PredictInjuryDurationApiApiGatewayDefaultSta   Resource creation Initiated                  
                                                                                              ge                                                                                          
CREATE_COMPLETE                                AWS::ApiGatewayV2::Stage                       PredictInjuryDurationApiApiGatewayDefaultSta   -                                            
                                                                                              ge                                                                                          
CREATE_COMPLETE                                AWS::CloudFormation::Stack                     PredictInjuryStack                             -                                            
-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------

CloudFormation outputs from deployed stack
-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Outputs                                                                                                                                                                                   
-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Key                 PredictInjuryDurationApi                                                                                                                                                          
Description         API Gateway endpoint URL for Prod stage for PredictInjuryDurationApi function                                                                                                        
Value               https://hfd90813bg.execute-api.eu-central-1.amazonaws.com/Prod/predict-injury-duration/                                                                               

Key                 LambdaFunctionArn                                                                                                                                                     
Description         Injury Duration Prediction Lambda Function ARN                                                                                                                        
Value               arn:aws:lambda:eu-central-1:9257xxxxxxx:function:PredictInjuryStack-PredictInjuryDurationLambda-B78GP4IFvJbf                                                         
-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------

Successfully created/updated stack - PredictInjuryStack in eu-central-1

And that’s it! Your Lambda containing the AI model is successfully deployed. You can take your *PredictInjuryDurationApi* output and put it inside an API testing tool, like Postman to see your Lambda in action!

We’re going to use the same payload we’ve used for testing the Lambda locally and the results are the same, which confirms that everything is working!

It is important to note that the Docker image will be deployed to Elastic Container Repository (ECR) on AWS. When you are deleting the created resources from this project, make sure to remove the repository and all images stored there.

Conclusion

Congratulations! You’ve successfully built and deployed a production-ready AI model with AWS Lambda and Docker. Let’s go over what we’ve learned in this post:

writing a complete script with XGBoost and Optuna as our AI model and optimizer combination
how to containerize a Lambda function to handle real-time predictions
deploy the containerized Lambda and make it accessible from anywhere on the Internet

Making this project makes it a very cost-effective solution without having to provision and maintain servers, everything is done serverless - isn’t it awesome?

Thank you for reading! Wishing you the greatest day!

DEV Community