Most ML models die in notebooks. I built one that retrains itself, scales to hundreds of users, and deploys in minutes — here’s the exact AWS + Docker + CI/CD stack I used.
In this project, to keep the costs as low as possible, the Machine Learning model was trained on the local machine then uploaded the artifacts to the AWS platform. This project also demonstrates an entry-level MLOps workflow — retraining and redeployment are automated through CI/CD, but advanced MLOps features like data versioning, model registry, and monitoring are not yet implemented.
There are a few prerequisites for this project and they are the following:
- Docker
- Python
- CI/CD pipelines
- basic usage of GitHub
- basic knowledge of AWS services
Link to the code is here — let’s get started!
Part 1: XGBoost Model Training with Optuna
The dataset for this project was taken from the following Kaggle dataset — link to dataset — this dataset is a great example of a simple regression problem, with clear features and data ready to be used for training a Machine Learning model. We are going to use XGBoost model and there are several reasons:
- gradient boosting trees, particularly XGBoost, consistently outperforms deep learning models on tabular data with evidence that it won most of the Kaggle challenges in 2015 — link
- XGBoost implements L1 and L2 regularization on tree weights, which reduces overfitting
Other model structures, like deep learning models, might seem more appealing, but with so little data and only 15 features, XGBoost typically outperforms deep learning models on tabular data this size.
This makes it a great candidate for this project. There are other models which are also worth trying out, but won’t be implemented in this blog post.
To ensure we are deploying the best model possible with the available data, we need to optimize our model’s hyperparameters. As in the previous blog post, we are going to use Python library called Optuna.
Optuna is a hyperparameter optimization library and it has become the go-to library for quick and easy Machine Learning workflows.
At the end of model optimization and training, the training script will output 2 files - final_model.pkl
and label_encoders.pkl
, which will be put inside a tarball and uploaded to S3.
We chose local training since our training dataset is very small and it only takes a couple of minutes to train the XGBoost model. AWS Sagemaker does provide a platform for training your models, which makes it very useful if you have big datasets or need a more powerful machine to train your models, but for this use case, local training saves us the money by keeping us inside the AWS Free Tier.
Here is the output of the training process:
In the image itself, you can see that we’ve logged the metrics like Mean Square Error (MSE), R^2 score, Root Mean Squared Error (RMSE) and Mean Absolute Error (MAE). These metrics are very important to assess model performance. For a more detailed explanation why do we use these metrics for assessing model performance, please check out this Medium article.
Part 2: Custom Container Strategy
Here’s where AWS defaults fall short.
We are going to use a custom Docker container instead of public Sagemaker Docker images provided by AWS. The reason is that we want to use the latest libraries instead of the outdated ones which are inside the pre-built AWS Docker containers (which you can see here - https://docs.aws.amazon.com/sagemaker/latest/dg-ecr-paths/ecr-eu-central-1.html#xgboost-eu-central-1). At the time of writing this blog, the latest package version for the XGBoost library is 1.7.4, while on PIP, latest available version is 3.0.5.
Choosing Sagemaker’s pre-built Docker images would handle deployment automatically, but we lose the control over the dependencies and it takes away the ability of testing locally. The usage of custom containers, like in this project, adds complexity but if you were doing this on your job, this approach would enable you a more reproducible environment with faster iteration cycles.
Our architectural decision to create our custom Docker image has a drawback - we will need to write additional logic to handle 2 HTTP API calls which Sagemaker does by design, those are /invocations
and /ping
.
When a resource wants the model to return a value, it will provide the necessary payload to the /invocations
endpoint, while the /ping
endpoint is used as a health-check endpoint - Sagemaker will periodically send a GET HTTP request to it to verify that everything is up and running - that’s why we’ll need to install Flask - a lightweight Python library for creating servers in Python.
If we used the pre-built Docker image, we would just need to provide the model artifacts and the interefece.py
file, which contains the logic to get the prediction from the model. On the other hand, there is a great benefit to this approach - we are able to test it locally by running the Docker image and we are going to take a look at testing the model locally in the next few steps.
The following code block is our Dockerfile and let’s explain it step by step:
# Set the linux/amd64 platform as a base for this Docker Image
# so even if you have an ARM processor, it won't impact the runtime
# Common mistake here is a mismatch of the installed libraries because of the
# host machine and the machine which runs the image - e.g. if you build the image
# on an ARM processor, it will only work on machines with that architecture CPU
# and not work on regular x86 processors.
ARG TARGETPLATFORM=linux/amd64
FROM --platform=$TARGETPLATFORM python:3.12-slim
# Set the working directory in the container
WORKDIR /app
# Copy the dependencies file to the working directory
COPY requirements.txt .
# Install Python dependencies
RUN pip install --no-cache-dir -r requirements.txt
# Copy the Flask application code into the container
COPY . .
# Set the environment variable for the SageMaker model directory
ENV SM_MODEL_DIR=/opt/ml/model
# Expose port 8080 to allow external access to the Flask application
EXPOSE 8080
# Define the entry point for running the Flask application
ENTRYPOINT ["gunicorn", "-b", "0.0.0.0:8080", "app:app"]
Here are the following commands which you need to run in the aws
folder to build the image and run the ML model locally:
# Build the image locally
cd aws/docker
docker build -t sagemaker-demo .
# Run the Docker image to create a container for local testing
docker run -p 8080:8080 -v ./model.tar.gz:/opt/ml/model/model.tar.gz sagemaker-demo
This way, we are simulating the Sagemaker environment and we can use API testing tools, like Postman, to see if our model is working as expected.
I’ve used Postman as the API testing tool to verify that the Flask server is working by sending the GET payload to the /ping
, that the model is loaded and returning the expected student score based on the received payload when the POST /invocations
endpoint is called.
Part 3: AWS Infrastructure as Code
AWS CDK generates the Infrastructure resources automatically and handles the complex resource dependencies. If you did this manually in the AWS console, it would be way more error-prone and would take much more time. This way, we are leveraging our Python knowledge by creating these templates. Another usage of IaC in practice is that we can deploy the same stack in multiple environments (e.g. testing
and production
environments) knowing that all resources are the deployed in the same way.
We are going to create multiple resources to be able to open up our Sagemaker model to the Internet:
- AWS Lambda which takes the request and forwards it to the model
- API Gateway opens up our Lambda to the Internet
- Docker Image asset and model training artifacts containing our Docker image with model dependencies and model artifacts respectively
- Sagemaker Model and Endpoint using our defined Docker Image and model artifacts
To visualize our infrastructure, here is an architecture diagram:
There are multiple resources which need to be defined inside the CDK file:
- define the path to the Dockerimage file to build our Docker image locally
# 1. Build & push Docker image from local ./container folder
image_asset = ecr_assets.DockerImageAsset(
self,
"XGBImage",
directory="./docker",
)
- define the path to our
model.tar.gz
which contains necessary artifacts
# 2. Upload model.tar.gz to S3
model_asset = s3_assets.Asset(
self,
"XGBModel",
path="./model.tar.gz",
)
- define the Sagemaker model to connect the Docker image and model artifacts into one entity
# 3. Create SageMaker Model
sm_model = sagemaker.CfnModel(
self,
"XGBModelResource",
execution_role_arn=role.role_arn,
primary_container=sagemaker.CfnModel.ContainerDefinitionProperty(
image=image_asset.image_uri,
model_data_url=model_asset.s3_object_url,
),
)
# Ensure model.tar.gz upload finishes first
sm_model.node.add_dependency(model_asset)
- deploy the model and open it up as an endpoint - very important here is that we are using a Serverless inference configuration to lower cost
# 4. Create EndpointConfig using Serverless Inference
endpoint_config = sagemaker.CfnEndpointConfig(
self,
"XGBEndpointConfig",
production_variants=[
sagemaker.CfnEndpointConfig.ProductionVariantProperty(
model_name=sm_model.attr_model_name,
initial_variant_weight=1.0,
variant_name="AllTraffic",
# Serverless config replaces instance_type/initial_instance_count
serverless_config=sagemaker.CfnEndpointConfig.ServerlessConfigProperty(
memory_size_in_mb=1024,
max_concurrency=5,
),
)
],
)
# and create the endpoint
sm_endpoint = sagemaker.CfnEndpoint(
self,
"XGBEndpoint",
endpoint_config_name=endpoint_config.attr_endpoint_config_name,
)
- lastly, create the Lambda which will be opened up to the Internet, so the users can access the ML model, without having to sign a request to access the model
# 5. Create the Lambda and open it to public (API Gateway)
predict_student_score_lambda = _lambda.Function(
self,
"PredictStudentScoreLambda",
runtime=_lambda.Runtime.PYTHON_3_12,
handler="lambda_handler.lambda_handler",
code=_lambda.Code.from_asset(
"./PredictStudentScore",
bundling={
"image": _lambda.Runtime.PYTHON_3_12.bundling_image,
"command": ["bash", "-c", "pip install aws-lambda-powertools -t /asset-output && cp -r . /asset-output"],
},
),
environment={
"SAGEMAKER_ENDPOINT_NAME": sm_endpoint.attr_endpoint_name
},
timeout=cdk.Duration.seconds(30)
)
# Allow Lambda to invoke SageMaker endpoint
predict_student_score_lambda.add_to_role_policy(
iam.PolicyStatement(
actions=["sagemaker:InvokeEndpoint"],
resources=[
f"arn:aws:sagemaker:{self.region}:{self.account}:endpoint/{sm_endpoint.attr_endpoint_name}"
]
)
)
# API Gateway
api = apigw.RestApi(
self,
"StudentScoreApi",
rest_api_name="StudentScore API",
description="StudentScore Services API",
deploy=True,
deploy_options=apigw.StageOptions(
stage_name="score"
),
default_cors_preflight_options=apigw.CorsOptions(
allow_origins=apigw.Cors.ALL_ORIGINS,
allow_methods=apigw.Cors.ALL_METHODS,
allow_headers=apigw.Cors.DEFAULT_HEADERS,
),
)
# API Gateway Integrations
predict_score_integration = apigw.LambdaIntegration(predict_student_score_lambda)
# API Gateway Resources and Methods
api.root.add_resource("predict-student-score").add_method("POST", predict_score_integration)
And that’s it! Our stack is now ready for deployment. You can deploy the stack by simply running cdk deploy
.
Part 4: CI/CD Pipeline for Dynamic Model Updates
Another great feature of this project is implementing a CI/CD solution - we’ve implemented a Github Actions pipeline, which on dataset change, triggers the training process automatically and updates the Sagemaker endpoint in minutes! But first, let’s clarify why are we choosing Github Actions instead of AWS CodePipeline for example.
GitHub Actions integrates directly with our code repository and offers 2,000 free minutes monthly. AWS CodePipeline would cost $1 per pipeline per month plus compute costs.
Now, let’s go over the CI/CD workflow, step by step.
Connecting our pipeline with AWS
To connect the pipeline with AWS, I found this tutorial inside the Github docs which I think is a great starting point, the link is https://docs.github.com/en/actions/how-tos/secure-your-work/security-harden-deployments/oidc-in-aws. However to make it easier, I’ve created a deploy_bootstrap.py
script inside the aws/bootstrap
folder which will create OIDC for you as well if you don’t have it set up, together with the necessary IAM role which our pipeline will use.
After the aws/bootstrap/deploy_bootstrap.py
script creates the IAM role, you will get the output from the script itself. That IAM role ARN will be used inside your Github Actions configuration file, so have it ready for the next steps. One more thing, this step needs to be done manually. The output should look like, I’ve redacted some things which are related to my AWS account:
I’ve left steps to implement the role ARN to implement inside the Github Actions pipeline, however here is the visualization of the process itself:
- go to your Github repo settings
- under Security, click on “Secrets and variables” then on “Actions”
- click on “New Repository Secret”, enter the secret name and secret itself - in my example
AWS_ROLE_ARN
and the secret ARN itself
Now you can use the secret inside your pipeline configuration! We’ll input it in later steps.
Name the workflow and define when to run it
Now, let’s define the pipeline configuration file!
The name
property is self-explanatory - the name of your workflow.
After that, you can define on which actions to run your pipeline. In this example, the pipeline will be started in two scenarios:
- if the CSV dataset inside the
data
folder changes and new data is added - or by manually starting the pipeline
YAML code which addresses the mentioned fields is in the next code block:
name: Train Model
on:
push:
paths:
- 'data/**'
branches: [main]
workflow_dispatch:
inputs:
model_version:
description: 'Model version tag (optional)'
required: false
default: 'latest'
type: string
Job configuration and environment setup
Now, let’s define the pipeline environment and download the necessary dependencies:
jobs:
train-model:
name: Train Model
runs-on: ubuntu-latest
permissions:
id-token: write # Required for AWS OIDC authentication
contents: read # Required to read repository files
steps:
# Checkout the code and setup Python, NPM and necessary libraries for
# training the ML model
- name: Check out code
uses: actions/checkout@v4
- name: Set up Python
uses: actions/setup-python@v4
with:
python-version: '3.12'
- name: Set up Node.js
uses: actions/setup-node@v4
with:
node-version: '22'
- name: Configure AWS credentials
uses: aws-actions/configure-aws-credentials@v4
with:
role-to-assume: ${{ secrets.AWS_ROLE_ARN }}
aws-region: <your-AWS-region>
- name: Install dependencies
run: |
pip install -r requirements.txt # Python ML libraries
npm install -g aws-cdk # CDK CLI tool
Training the model and deploying to AWS via CDK
- name: Run the train_script script
run: |
python train_model.py
- name: Deploy model with CDK
run: |
cdk deploy --require-approval never
working-directory: aws
env:
AWS_DEFAULT_REGION: <your-AWS-region>
And that’s our pipeline configuration file!
The end result should look like this - first image is the complete overview of the whole job, while the second image is the process of deployment to AWS from our pipeline.
Part 5: Testing the API
By using integrated Postman testing, we’ve used the following configuration to test our deployed API:
- “Ramp up” as our load profile
- 30 virtual users
Here are the test results:
From this test run, we can conclude the following:
- initial spike of latency, between 500-600ms, was caused by Lambda cold start - can be easily fixed by enabling Provisioned Concurrency, however that would increase cost
- after Lambda cold start was resolved, our infrastructure handled traffic without any issue, returning rock-solid 100ms consistently, even when number of requests per second increased
- no error rates, confirming that our code can handle low traffic without issues
There are a couple of things which could additionally be tested:
- more requests with a more aggressive spike instead of ramp up
- what is the amount of requests when our infrastructure and configuration starts throttling?
For production workloads expecting higher traffic, consider testing throttling scenarios and implementing retry logic.
These are up to you if you want to test them, they won’t be covered in this blog post. At the time of writing this, the test cost me ~$0.05USD.
Conclusion: Lessons Learned
In this blog post, we’ve learned many very useful skills and ways of using AWS:
- creating, training and optimizing our Machine Learning model based on very simple regression data
- a great intro for a beginner Data Engineer
- using a “Custom Container” strategy to create our own environment for our ML model
- we own the environment for our model - we can do whatever we want
- using Docker and AWS CDK with Python to build, test and deploy our application
- you’ve deepened your knowledge about AWS CDK and how to test your code which is deployed
- connecting our AWS account with Github Actions CI/CD pipeline to have an automized way of deploying the latest version of our ML model
- an easy intro to MLOps and CI/CD, which is always a great field to understand
Thank you for reading! See you in the next blog post!
Top comments (0)