DEV Community

Matia Rašetina
Matia Rašetina

Posted on

How to Deploy Production-Ready AI Models on AWS Lambda Without the $500/Month Bill

If you are an aspiring Data Scientist, ML engineer or a DevOps engineer looking to expand your knowledge with AWS services, you’ve come to the best post!

Many ML engineers struggle to deploy models without incurring SageMaker costs. This post shows you how to deploy real-time ML inference using only Lambda and Docker—fully serverless, low-cost, and production-grade.

In this tutorial, we will go over a small project where we are going to develop an AWS Lambda and put it into a Docker container which is going to be used for predicting injury duration of soccer players, based on the injury description. Technologies we are going to use in this project are:

  • Python
  • Docker
  • AWS services:
    • Lambda
    • API Gateway
    • CloudFormation
    • Elastic Container Registry

4 simple steps we are going to take to complete this project are:

  1. Create a script which is going to train and optimize an AI model based on the dataset
  2. Create the Lambda code which will take in the parameters based on which we need to predict the injury duration
  3. Create the CloudFormation template which will build our Lambda by using Docker
  4. Deploy the mentioned CloudFormation template to our AWS environment

The Github repository can be found here: https://github.com/mate329/dockerized-AI-Lambda

Important note - all resources and identifiers shown were used in a temporary environment and have since been deleted.

Let’s start!

Training and optimization script for AI Model

Data Description

The dataset we are using to train our AI model on is a dataset regarding soccer injuries. Injuries are a common occurrence in any sport and it would be very beneficial for any professional sport club to have a system which will, based on the injury description, predict the length of the injury. Here are the parameters which are used to describe an injury:

Dataset description

The dataset has been downloaded from https://stemgames.hr/en/event/competition-in-problem-solving-exercises/technology-arena/

Libraries used

Libraries which we are going to import into our training script:

  • NumPy - used for scientific methods
  • Pandas - used for data manipulation, analysis and loading CSV files
  • the star of our training process - Optuna - used for hyperparameter optimization of AI models
  • Pickle - saving and loading Python objects
  • JSON - standard library for working with JSON data
  • XGBoost - our AI library containing the regressor model which we are going to use
  • Scikit-Learn - used for splitting data into training and test datasets
import numpy as np 
import pandas as pd 
import optuna
import pickle
import joblib
import json
from xgboost import XGBRegressor
from sklearn.model_selection import train_test_split
from sklearn.metrics import mean_squared_error, mean_absolute_error, r2_score
Enter fullscreen mode Exit fullscreen mode

Loading the data and creating datasets

Next, we are going to load the data, get the list of the features inside our dataset and split the data into training and test datasets

# Loading the data, where 'kaggle_x_train.csv' contains the injury description
# while 'kaggle_y_train.csv' contains the actual duration of the injury
data_train_x = pd.read_csv('data/kaggle_x_train.csv')
data_train_y = pd.read_csv('data/kaggle_y_train.csv')

# Since the description and actual duration of the injury are in two different
# files, we need to merge them into one object
merged_train = pd.merge(data_train_x, data_train_y[['Id', 'injury_duration']], on='Id')
merged_train.drop(columns=['Id'], inplace=True)

# Separate features and target variable
X = merged_train.drop('injury_duration', axis=1)
y = merged_train['injury_duration']

# Save feature names for Lambda validation
feature_names = list(X.columns)

# Split the data into training and testing sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)
Enter fullscreen mode Exit fullscreen mode

AI Model Optimization

Now we are coming to a part of the code where we will define the hyperparameter optimization of our AI model.

Hyperparameter optimization is trying to find the combination of model settings (called hyperparameters) that give the highest performance on a given dataset. To make this process as easy as possible, we are using Python library Optuna, which does the optimization for us.

We will define the method called objective where we need to define the parameters which we want to optimize and their ranges, then we provide the params dictionary to the AI model, we train it on our data and see the results.
Root Mean Square Error (or RMSE) is one of the metrics which is used for assessing model performance. In this project, RMSE represents the average difference between the actual injury duration and predicted injury duration.

To start the optimization process, we call the Optuna library and create something called a study - think of it as starting an experiment - we set the direction to minimize since we want the final model to predict the injury duration closest to the actual duration and we start the optimization process by calling the optimize method and defining the number of trials we want to run.

def objective(trial):
    # Define hyperparameter space
    params = {
        'n_estimators': trial.suggest_int('n_estimators', 1, 100),
        'max_depth': trial.suggest_int('max_depth', 1, 10),
        'learning_rate': trial.suggest_float('learning_rate', 0.01, 0.3),
        'min_child_weight': trial.suggest_int('min_child_weight', 1, 10),
        'gamma': trial.suggest_float('gamma', 0, 0.3)
    }

    # Create and fit model
    model = XGBRegressor(**params)
    model.fit(X_train, y_train, eval_set=[(X_test, y_test)], verbose=False)

    # Make predictions and return RMSE
    preds = model.predict(X_test)
    mse = mean_squared_error(y_test, preds)
    rmse = np.sqrt(mse)
    return rmse

# Minimize RMSE
study = optuna.create_study(direction='minimize')
study.optimize(objective, n_trials=200)
Enter fullscreen mode Exit fullscreen mode

Feel free to experiment with the hyperparameter ranges, mine are used just as an example.

Saving needed artifacts for Lambda

At the end of our script, we are saving the necessary files (called artifacts) which our Lambda is going to use. For this, we are going to use joblib library to save the final model with the highest performance.

# Fetch the best hyperparameters found during the study
best_params = study.best_params
print(f"Best parameters: {best_params}")
print(f"Best RMSE: {study.best_value}")

# Train final model with best parameters and evaluate
final_model = XGBRegressor(**best_params)
final_model.fit(X_train, y_train)
final_preds = final_model.predict(X_test)

print("Final model performance:")
print(f"Best parameters: {best_params}")
print(f"RMSE: {np.sqrt(mean_squared_error(y_test, final_preds)):.4f}")

# Export artifacts for AWS Lambda
print("Exporting model artifacts...")

# 1. Save the trained model
joblib.dump(final_model, 'PredictInjuryDurationLambda/injury_model.pkl')
print("Model saved as 'PredictInjuryDurationLambda/injury_model.pkl'")

# 2. Save model metadata
metadata = {
    'feature_names': feature_names,
    'best_params': best_params,
    'model_performance': {
        'rmse': float(np.sqrt(mean_squared_error(y_test, final_preds)))
    },
    'training_date': pd.Timestamp.now().isoformat(),
    'n_features': len(feature_names)
}

with open('PredictInjuryDurationLambda/model_metadata.json', 'w') as f:
    json.dump(metadata, f, indent=2)
print("Metadata saved as 'PredictInjuryDurationLambda/model_metadata.json'")
Enter fullscreen mode Exit fullscreen mode

Here is the screenshot of the terminal window with Optuna library logging every trial which occurs during optimization, with the final parameters logged at the end:

Optuna optimization

And that’s it for our model training! Next, we’ll go over the Lambda code and a full step-by-step how to dockerize it!

Lambda code and Dockerizing the model

To start off, let’s explain our architecture - the client will send the request to the API Gateway URL which will forward the request to the Lambda. The Lambda will load the Docker image which is located inside Elastic Container Repository and use it to process the incoming request to predict the injury duration. After the prediction, the Lambda will return the data to the client via API Gateway.

Here is the architecture diagram:

Architecture Diagram

Lambda code

The following Lambda code loads the incoming payload, validates that there are no missing information inside the payload and proceeds with the injury duration duration:

import json
import joblib
import pandas as pd

from aws_lambda_powertools import Logger

# Initialize logger
logger = Logger(service="injury_prediction")

# Load artifacts once when Lambda container starts
model = joblib.load('injury_model.pkl')
with open('model_metadata.json', 'r') as f:
    metadata = json.load(f)

@logger.inject_lambda_context(log_event=True)
def lambda_handler(event, context):
        # Load the incoming event and setup the logger
    event_body = json.loads(event.get('body')) if 'body' in event else event
    request_id = context.aws_request_id
    logger.append_keys(request_id=request_id)

    # Validate event structure
    if 'features' not in event_body:
        logger.error(f"Invalid event structure: {event_body}")
        return {
            'statusCode': 400,
            'body': json.dumps({'error': 'Invalid event structure'})
        }

    # Extract features from event
    features = event_body['features']

    # Validate features match training
    expected_features = metadata['feature_names']
    if list(features.keys()) != expected_features:
        logger.error(f"Feature mismatch: expected {expected_features}, got {list(features.keys())}")
        return {
            'statusCode': 400,
            'body': json.dumps({'error': 'Feature mismatch'})
        }

    return perform_prediction(features, expected_features)

def perform_prediction(features, expected_features):
    try:
        # Create DataFrame with correct feature order
        X = pd.DataFrame([features])[expected_features]

        # Make prediction
        prediction = model.predict(X)[0]

        logger.info(f"Prediction made successfully: {prediction} days")

        # Return response
        return {
            'statusCode': 200,
            'body': json.dumps({
                'injury_duration_days': float(prediction),
                'model_version': metadata['training_date']
            })
        }
    except Exception as e:
        return {
            'statusCode': 500,
            'body': json.dumps({'error': str(e)})
        }
Enter fullscreen mode Exit fullscreen mode

To improve the Lambda performance, the AI model for prediction is a global variable, which follows the AWS Lambda best practices because the Lambda reuses global variables in every following invocation until it gets shut down after inactivity.

Defining Dockerfile

To define the configuration of our Docker image, we need to create a Dockerfile. We’ll use an official AWS Lambda image for Python 3.12, copy over the requirements.txt file with our Lambda dependencies, install them and then copy over the artifacts which we’ve generated after training our AI model. Lastly, we’ll define the entry method for our Lambda.

# Use an official AWS Lambda base image for Python
FROM public.ecr.aws/lambda/python:3.12

# Install required Python libraries
COPY requirements.txt ./
RUN pip install --no-cache-dir -r requirements.txt

# Copy the saved model and Lambda function code into the container
COPY lambda_handler.py ./
COPY injury_model.pkl ./
COPY model_metadata.json ./

# Command to run the Lambda function
CMD ["lambda_handler.lambda_handler"]
Enter fullscreen mode Exit fullscreen mode

Our requirements.txt file for our Lambda is the following:

numpy
pandas
xgboost-cpu
scikit-learn
joblib
aws-lambda-powertools
Enter fullscreen mode Exit fullscreen mode

We are using XGBoost-CPU library instead of the regular XGBoost library because Lambda is a CPU-only resource + we’ve saved ~800MB in Docker image size because the CPU-only library doesn’t download binary files used for interacting with computer’s GPU.

Defining CloudFormation resources

The following CloudFormation template is a very simple one - we just have our Lambda and API Gateway resource to open it up to the Internet.

The most important configuration is under the Metadata section of our Lambda - we need to define the name of our Dockerfile, DockerContext and how will we tag the Docker image when it gets created. In addition, the PackageType needs to be set to image as well, so CloudFormation recognizes that we are trying to deploy a dockerized Lambda.

AWSTemplateFormatVersion: '2010-09-09'
Transform: AWS::Serverless-2016-10-31
Description: >
  Predict Injury Duration Lambda

Resources:
  PredictInjuryDurationLambda:
    Type: AWS::Serverless::Function
    Properties:
      PackageType: Image
      Timeout: 60
      Architectures:
        - x86_64
      Events:
        Inference:
          Type: HttpApi
          Properties:
            Path: /predict-injury-duration
            Method: post
            ApiId: !Ref PredictInjuryDurationApi
    Metadata:
      Dockerfile: Dockerfile
      DockerContext: ./
      DockerTag: latest

  PredictInjuryDurationApi:
    Type: AWS::Serverless::HttpApi
    Properties:
      CorsConfiguration:
        AllowOrigins:
          - '*'  # Allow requests from any origin
        AllowHeaders:
          - '*'
        AllowMethods:
          - '*'

Outputs:
  PredictInjuryDurationApi:
    Description: "API Gateway endpoint URL for Prod stage for Inference function"
    Value: !Sub "https://${PredictInjuryDurationApi}.execute-api.${AWS::Region}.amazonaws.com/Prod/predict-injury-duration/"

  LambdaFunctionArn:
    Description: "Injury Duration Prediction Lambda Function ARN"
    Value: !GetAtt PredictInjuryDurationLambda.Arn
Enter fullscreen mode Exit fullscreen mode

Deploying the Lambda with AWS Serverless Application Model and Docker

To continue, you will need to install the following dependencies:

After installing these dependencies, you can continue with the tutorial.

We can build and locally test our Lambda with 3 simple terminal commands:

  1. sam build - command which will find our CloudFormation template, build the necessary artifacts for deploying to AWS
  2. sam local start-api - command which enables us to run our Lambda locally without any expense, so we can confirm that everything is working as expected before deployment
  3. sam deploy - command which takes the built CloudFormation template resources and sends them to your AWS account to be deployed.

Building the resources

Open up your terminal window, go into the folder where you created your template.yaml file (in my case that’s PredictInjuryDurationLambda folder) and run sam build. The expected output should look like:

(venv) matia@matia-H510M-H-V2:./PredictInjuryDurationLambda$ sam build
Building codeuri: /home/matia/dev/cloudkey/mlops-cicd/injuries_lambda/PredictInjuryDurationLambda runtime: None architecture: x86_64 functions: PredictInjuryDurationLambda                   
Building image for PredictInjuryDurationLambda function                                                                                                                                       
Setting DockerBuildArgs for PredictInjuryDurationLambda function                                                                                                                              
Step 1/7 : FROM public.ecr.aws/lambda/python:3.12
3.12: Pulling from lambda/python 
8deb1a9ce5e3: Pull complete 
99a4e43f82e3: Pull complete 
0e56aa1f1c26: Pull complete 
e2ef3e53683d: Pull complete 
b9dec667dad3: Pull complete 
94964360ff6a: Pull complete 
Status: Downloaded newer image for public.ecr.aws/lambda/python:3.12 ---> ef61d0102ac9
Step 2/7 : COPY requirements.txt ./
 ---> bb551c013332
Step 3/7 : RUN pip install --no-cache-dir -r requirements.txt
 ---> Running in 4dd05ae833d0
Collecting numpy (from -r requirements.txt (line 1))
  Downloading numpy-2.2.6-cp312-cp312-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.metadata (62 kB)
Collecting pandas (from -r requirements.txt (line 2))
  Downloading pandas-2.2.3-cp312-cp312-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.metadata (89 kB)
Collecting optuna (from -r requirements.txt (line 3))
  Downloading optuna-4.3.0-py3-none-any.whl.metadata (17 kB)
Collecting xgboost (from -r requirements.txt (line 4))
  Downloading xgboost-3.0.2-py3-none-manylinux_2_28_x86_64.whl.metadata (2.1 kB)
Collecting scikit-learn (from -r requirements.txt (line 5))
  Downloading scikit_learn-1.6.1-cp312-cp312-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.metadata (18 kB)
Collecting joblib (from -r requirements.txt (line 6))
  Downloading joblib-1.5.1-py3-none-any.whl.metadata (5.6 kB)
Requirement already satisfied: python-dateutil>=2.8.2 in /var/lang/lib/python3.12/site-packages (from pandas->-r requirements.txt (line 2)) (2.9.0.post0)
Collecting pytz>=2020.1 (from pandas->-r requirements.txt (line 2))
  Downloading pytz-2025.2-py2.py3-none-any.whl.metadata (22 kB)
Collecting tzdata>=2022.7 (from pandas->-r requirements.txt (line 2))
  Downloading tzdata-2025.2-py2.py3-none-any.whl.metadata (1.4 kB)
Collecting alembic>=1.5.0 (from optuna->-r requirements.txt (line 3))
  Downloading alembic-1.16.1-py3-none-any.whl.metadata (7.3 kB)
Collecting colorlog (from optuna->-r requirements.txt (line 3))
  Downloading colorlog-6.9.0-py3-none-any.whl.metadata (10 kB)
Collecting packaging>=20.0 (from optuna->-r requirements.txt (line 3))
  Downloading packaging-25.0-py3-none-any.whl.metadata (3.3 kB)
Collecting sqlalchemy>=1.4.2 (from optuna->-r requirements.txt (line 3))
  Downloading sqlalchemy-2.0.41-cp312-cp312-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.metadata (9.6 kB)
Collecting tqdm (from optuna->-r requirements.txt (line 3))
  Downloading tqdm-4.67.1-py3-none-any.whl.metadata (57 kB)
Collecting PyYAML (from optuna->-r requirements.txt (line 3))
  Downloading PyYAML-6.0.2-cp312-cp312-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.metadata (2.1 kB)
Collecting nvidia-nccl-cu12 (from xgboost->-r requirements.txt (line 4))
  Downloading nvidia_nccl_cu12-2.27.3-py3-none-manylinux2014_x86_64.manylinux_2_17_x86_64.whl.metadata (2.0 kB)
Collecting scipy (from xgboost->-r requirements.txt (line 4))
  Downloading scipy-1.15.3-cp312-cp312-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.metadata (61 kB)
Collecting threadpoolctl>=3.1.0 (from scikit-learn->-r requirements.txt (line 5))
  Downloading threadpoolctl-3.6.0-py3-none-any.whl.metadata (13 kB)
Collecting Mako (from alembic>=1.5.0->optuna->-r requirements.txt (line 3))
  Downloading mako-1.3.10-py3-none-any.whl.metadata (2.9 kB)
Collecting typing-extensions>=4.12 (from alembic>=1.5.0->optuna->-r requirements.txt (line 3))
  Downloading typing_extensions-4.14.0-py3-none-any.whl.metadata (3.0 kB)
Requirement already satisfied: six>=1.5 in /var/lang/lib/python3.12/site-packages (from python-dateutil>=2.8.2->pandas->-r requirements.txt (line 2)) (1.17.0)
Collecting greenlet>=1 (from sqlalchemy>=1.4.2->optuna->-r requirements.txt (line 3))
  Downloading greenlet-3.2.2-cp312-cp312-manylinux_2_24_x86_64.manylinux_2_28_x86_64.whl.metadata (4.1 kB)
Collecting MarkupSafe>=0.9.2 (from Mako->alembic>=1.5.0->optuna->-r requirements.txt (line 3))
  Downloading MarkupSafe-3.0.2-cp312-cp312-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.metadata (4.0 kB)
Downloading numpy-2.2.6-cp312-cp312-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (16.5 MB)
   ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 16.5/16.5 MB 6.6 MB/s eta 0:00:00
Downloading pandas-2.2.3-cp312-cp312-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (12.7 MB)
   ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 12.7/12.7 MB 6.8 MB/s eta 0:00:00
Downloading optuna-4.3.0-py3-none-any.whl (386 kB)
Downloading xgboost-3.0.2-py3-none-manylinux_2_28_x86_64.whl (253.9 MB)
   ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 253.9/253.9 MB 6.7 MB/s eta 0:00:00
Downloading scikit_learn-1.6.1-cp312-cp312-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (13.1 MB)
   ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 13.1/13.1 MB 7.4 MB/s eta 0:00:00
Downloading joblib-1.5.1-py3-none-any.whl (307 kB)
Downloading alembic-1.16.1-py3-none-any.whl (242 kB)
Downloading packaging-25.0-py3-none-any.whl (66 kB)
Downloading pytz-2025.2-py2.py3-none-any.whl (509 kB)
Downloading scipy-1.15.3-cp312-cp312-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (37.3 MB)
   ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 37.3/37.3 MB 7.2 MB/s eta 0:00:00
Downloading sqlalchemy-2.0.41-cp312-cp312-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (3.3 MB)
   ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 3.3/3.3 MB 7.7 MB/s eta 0:00:00
Downloading threadpoolctl-3.6.0-py3-none-any.whl (18 kB)
Downloading tzdata-2025.2-py2.py3-none-any.whl (347 kB)
Downloading colorlog-6.9.0-py3-none-any.whl (11 kB)
Downloading nvidia_nccl_cu12-2.27.3-py3-none-manylinux2014_x86_64.manylinux_2_17_x86_64.whl (322.4 MB)
   ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 322.4/322.4 MB 6.9 MB/s eta 0:00:00
Downloading PyYAML-6.0.2-cp312-cp312-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (767 kB)
   ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 767.5/767.5 kB 9.5 MB/s eta 0:00:00
Downloading tqdm-4.67.1-py3-none-any.whl (78 kB)
Downloading greenlet-3.2.2-cp312-cp312-manylinux_2_24_x86_64.manylinux_2_28_x86_64.whl (603 kB)
   ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 603.9/603.9 kB 6.1 MB/s eta 0:00:00
Downloading typing_extensions-4.14.0-py3-none-any.whl (43 kB)
Downloading mako-1.3.10-py3-none-any.whl (78 kB)
Downloading MarkupSafe-3.0.2-cp312-cp312-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (23 kB)
Installing collected packages: pytz, tzdata, typing-extensions, tqdm, threadpoolctl, PyYAML, packaging, nvidia-nccl-cu12, numpy, MarkupSafe, joblib, greenlet, colorlog, sqlalchemy, scipy, pandas, Mako, xgboost, scikit-learn, alembic, optuna
Successfully installed Mako-1.3.10 MarkupSafe-3.0.2 PyYAML-6.0.2 alembic-1.16.1 colorlog-6.9.0 greenlet-3.2.2 joblib-1.5.1 numpy-2.2.6 nvidia-nccl-cu12-2.27.3 optuna-4.3.0 packaging-25.0 pandas-2.2.3 pytz-2025.2 scikit-learn-1.6.1 scipy-1.15.3 sqlalchemy-2.0.41 threadpoolctl-3.6.0 tqdm-4.67.1 typing-extensions-4.14.0 tzdata-2025.2 xgboost-3.0.2
WARNING: Running pip as the 'root' user can result in broken permissions and conflicting behaviour with the system package manager, possibly rendering your system unusable.It is recommended to use a virtual environment instead: https://pip.pypa.io/warnings/venv. Use the --root-user-action option if you know what you are doing and want to suppress this warning.

[notice] A new release of pip is available: 24.3.1 -> 25.1.1
[notice] To update, run: pip install --upgrade pip
 ---> Removed intermediate container 4dd05ae833d0
 ---> 17d74dbc2906
Step 4/7 : COPY lambda_handler.py ./
 ---> 04d29738de3f
Step 5/7 : COPY injury_model.pkl ./
 ---> 8af7e1760bdf
Step 6/7 : COPY model_metadata.json ./
 ---> 6e20d1175fb5
Step 7/7 : CMD ["lambda_handler.lambda_handler"]
 ---> Running in e80c76b0a956
 ---> Removed intermediate container e80c76b0a956
 ---> 49128c299b03
Successfully built 49128c299b03
Successfully tagged predictinjurydurationlambda:latest
Enter fullscreen mode Exit fullscreen mode

Simple! Now let’s test out our Lambda locally to confirm that it’s working.

Testing with the local build

We’ll spin up our stack locally by running sam local start-api which is a great way of verifying that our Lambda works as expected. Here is the terminal output together with the Postman API call to our Lambda.

Running 'sam local start-api'

Feel free to use my payload which I’ve used for testing the Lambda:

{
  "features": {
    "age": 28,
    "is_contact": 1,
    "has_stopped": 1,
    "swelling": 2,
    "tone": 1,
    "palpation": 3,
    "is_contraction_painful": 1,
    "is_stretching_painful": 1,
    "class": 3,
    "is_proximal": 0,
    "is_abdominal": 0,
    "is_distal": 1,
    "fascia_depth": 2,
    "is_hamstring": 0,
    "is_quadriceps": 1,
    "is_add_abd": 0,
    "is_calf": 1,
    "is_belly": 0
  }
}
Enter fullscreen mode Exit fullscreen mode

And here are the results from Postman API call:

API Call to the locally deployed Lambda

And here are the results! You can see that my deployed model predicted that the injury duration, based on these descriptors, will be 17 days.

Deploying the Lambda to AWS account

You can deploy your Lambda and API Gateway by running the sam deploy -g command (the -g flag stands for “guided”) inside your terminal window where SAM will guide you through the deployment process. The output will look like this:

(venv) matia@matia-H510M-H-V2:./PredictInjuryDurationLambda$ sam deploy -g

Configuring SAM deploy
======================

    Looking for config file [samconfig.toml] :  Found
    Reading default arguments  :  Success

    Setting default arguments for 'sam deploy'
    =========================================
    Stack Name [sam-app]: PredictInjuryStack
    AWS Region [eu-central-1]: 
    #Shows you resources changes to be deployed and require a 'Y' to initiate deploy
    Confirm changes before deploy [y/N]: 
    #SAM needs permission to be able to create roles to connect to the resources in your template
    Allow SAM CLI IAM role creation [Y/n]: 
    #Preserves the state of previously provisioned resources when an operation fails
    Disable rollback [y/N]: 
    PredictInjuryDurationLambda has no authentication. Is this okay? [y/N]: y
    Save arguments to configuration file [Y/n]: 
    SAM configuration file [samconfig.toml]: 
    SAM configuration environment [default]: 

    Looking for resources needed for deployment:

    Managed S3 bucket: aws-sam-cli-managed-default-samclisourcebucket-vvz5tfqz29f9
    A different default S3 bucket can be set in samconfig.toml and auto resolution of buckets turned off by setting resolve_s3=False
     Image repositories: Not found.
     #Managed repositories will be deleted when their functions are removed from the template and deployed
     Create managed ECR repositories for all functions? [Y/n]: 

    Saved arguments to config file
    Running 'sam deploy' for future deployments will use the parameters saved above.
    The above parameters can be changed by modifying samconfig.toml
    Learn more about samconfig.toml syntax at 
    https://docs.aws.amazon.com/serverless-application-model/latest/developerguide/serverless-sam-cli-config.html

75bd0feec80a: Pushed 
af455ae28cb4: Pushed 
4761438a3d36: Pushed 
1687a3ac8fa7: Pushed 
8f989553341c: Pushed 
c678e1c5e5d8: Pushed 
5964a0804fb6: Pushed 
6a9b57324378: Pushed 
f27b91471588: Pushed 
af56b219ad31: Pushed 
0b81e7a3683d: Pushed 
predictinjurydurationlambda-49128c299b03-latest: digest: sha256:fa1c14414b5029b79f0e97c5647331729beb42891e9eb814b8f77c0a5cda5e15 size: 2621

    Deploying with following values
    ===============================
    Stack name                   : PredictInjuryStack
    Region                       : eu-central-1
    Confirm changeset            : False
    Disable rollback             : False
    Deployment image repository  : 
                                       {
                                           "PredictInjuryDurationLambda": "9257xxxxxxxx.dkr.ecr.eu-central-1.amazonaws.com/predictinjurystack1f00e9c8/predictinjurydurationlambdad41e7e42repo"
                                       }
    Deployment s3 bucket         : aws-sam-cli-managed-default-samclisourcebucket-vvz5tfqz29f9
    Capabilities                 : ["CAPABILITY_IAM"]
    Parameter overrides          : {}
    Signing Profiles             : {}

Initiating deployment
=====================

PredictInjuryDurationLambda has no authentication.
    Uploading to PredictInjuryStack/335eac6f8b2db4755ca431660bfaf17f.template  1578 / 1578  (100.00%)

Waiting for changeset to be created..

CloudFormation stack changeset
-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Operation                                      LogicalResourceId                              ResourceType                                   Replacement                                  
-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
+ Add                                          PredictInjuryDurationApiApiGatewayDefaultSta   AWS::ApiGatewayV2::Stage                       N/A                                          
                                               ge                                                                                                                                         
+ Add                                          PredictInjuryDurationApi                       AWS::ApiGatewayV2::Api                         N/A                                          
+ Add                                          PredictInjuryDurationLambdaInferencePermissi   AWS::Lambda::Permission                        N/A                                          
                                               on                                                                                                                                         
+ Add                                          PredictInjuryDurationLambdaRole                AWS::IAM::Role                                 N/A                                          
+ Add                                          PredictInjuryDurationLambda                    AWS::Lambda::Function                          N/A                                          
-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------

Changeset created successfully. arn:aws:cloudformation:eu-central-1:9257xxxxxxx:changeSet/samcli-deploy1749059624/b0ee6116-2396-4322-bd2e-7a9467225a40

2025-06-04 19:53:50 - Waiting for stack create/update to complete

CloudFormation events from stack operations (refresh every 5.0 seconds)
-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
ResourceStatus                                 ResourceType                                   LogicalResourceId                              ResourceStatusReason                         
-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
CREATE_IN_PROGRESS                             AWS::CloudFormation::Stack                     PredictInjuryStack                             User Initiated                               
CREATE_IN_PROGRESS                             AWS::IAM::Role                                 PredictInjuryDurationLambdaRole                -                                            
CREATE_IN_PROGRESS                             AWS::IAM::Role                                 PredictInjuryDurationLambdaRole                Resource creation Initiated                  
CREATE_COMPLETE                                AWS::IAM::Role                                 PredictInjuryDurationLambdaRole                -                                            
CREATE_IN_PROGRESS                             AWS::Lambda::Function                          PredictInjuryDurationLambda                    -                                            
CREATE_IN_PROGRESS                             AWS::Lambda::Function                          PredictInjuryDurationLambda                    Resource creation Initiated                  
CREATE_IN_PROGRESS - CONFIGURATION_COMPLETE    AWS::Lambda::Function                          PredictInjuryDurationLambda                    Eventual consistency check initiated         
CREATE_IN_PROGRESS                             AWS::ApiGatewayV2::Api                         PredictInjuryDurationApi                       -                                            
CREATE_IN_PROGRESS                             AWS::ApiGatewayV2::Api                         PredictInjuryDurationApi                       Resource creation Initiated                  
CREATE_COMPLETE                                AWS::ApiGatewayV2::Api                         PredictInjuryDurationApi                       -                                            
CREATE_IN_PROGRESS                             AWS::Lambda::Permission                        PredictInjuryDurationLambdaInferencePermissi   -                                            
                                                                                              on                                                                                          
CREATE_IN_PROGRESS                             AWS::Lambda::Permission                        PredictInjuryDurationLambdaInferencePermissi   Resource creation Initiated                  
                                                                                              on                                                                                          
CREATE_COMPLETE                                AWS::Lambda::Permission                        PredictInjuryDurationLambdaInferencePermissi   -                                            
                                                                                              on                                                                                          
CREATE_COMPLETE                                AWS::Lambda::Function                          PredictInjuryDurationLambda                    -                                            
CREATE_IN_PROGRESS                             AWS::ApiGatewayV2::Stage                       PredictInjuryDurationApiApiGatewayDefaultSta   -                                            
                                                                                              ge                                                                                          
CREATE_IN_PROGRESS                             AWS::ApiGatewayV2::Stage                       PredictInjuryDurationApiApiGatewayDefaultSta   Resource creation Initiated                  
                                                                                              ge                                                                                          
CREATE_COMPLETE                                AWS::ApiGatewayV2::Stage                       PredictInjuryDurationApiApiGatewayDefaultSta   -                                            
                                                                                              ge                                                                                          
CREATE_COMPLETE                                AWS::CloudFormation::Stack                     PredictInjuryStack                             -                                            
-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------

CloudFormation outputs from deployed stack
-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Outputs                                                                                                                                                                                   
-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Key                 PredictInjuryDurationApi                                                                                                                                                          
Description         API Gateway endpoint URL for Prod stage for PredictInjuryDurationApi function                                                                                                        
Value               https://hfd90813bg.execute-api.eu-central-1.amazonaws.com/Prod/predict-injury-duration/                                                                               

Key                 LambdaFunctionArn                                                                                                                                                     
Description         Injury Duration Prediction Lambda Function ARN                                                                                                                        
Value               arn:aws:lambda:eu-central-1:9257xxxxxxx:function:PredictInjuryStack-PredictInjuryDurationLambda-B78GP4IFvJbf                                                         
-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------

Successfully created/updated stack - PredictInjuryStack in eu-central-1
Enter fullscreen mode Exit fullscreen mode

And that’s it! Your Lambda containing the AI model is successfully deployed. You can take your *PredictInjuryDurationApi* output and put it inside an API testing tool, like Postman to see your Lambda in action!

We’re going to use the same payload we’ve used for testing the Lambda locally and the results are the same, which confirms that everything is working!

API Call to the deployed Lambda

It is important to note that the Docker image will be deployed to Elastic Container Repository (ECR) on AWS. When you are deleting the created resources from this project, make sure to remove the repository and all images stored there.

Conclusion

Congratulations! You’ve successfully built and deployed a production-ready AI model with AWS Lambda and Docker. Let’s go over what we’ve learned in this post:

  • writing a complete script with XGBoost and Optuna as our AI model and optimizer combination
  • how to containerize a Lambda function to handle real-time predictions
  • deploy the containerized Lambda and make it accessible from anywhere on the Internet

Making this project makes it a very cost-effective solution without having to provision and maintain servers, everything is done serverless - isn’t it awesome?

Thank you for reading! Wishing you the greatest day!

Top comments (0)