Reading time: ~15-20 minutes
Level: Intermediate to Advanced
Prerequisites: Parts 1-3, Understanding of CI/CD concepts
Series: Part 4 of 4 (Series Finale) - Part 1 | Part 2 | Part 3
Welcome to the Series Finale!
We've come a long way! In Part 1, we covered the AIDLC framework. In Part 2, we built secure data pipelines. In Part 3, we trained models at scale with SageMaker. Now it's time to bring it all together and deploy to production.
What you'll build today:
- CI/CD pipeline for ML models with CodePipeline
- Production SageMaker endpoints with auto-scaling
- A/B testing infrastructure for safe rollouts
- Model drift detection and monitoring
- Complete observability stack
- Incident response and rollback procedures
By the end: You'll have a complete, production-ready ML platform on AWS that can safely deploy, monitor, and maintain models at scale.
The ML Deployment Problem
Training a great model is only half the battle. Production deployment brings new challenges:
Manual deployments - Error-prone, doesn't scale
No testing - Bad models reach production
Deployment downtime - Service interruptions
No rollback plan - Can't undo bad deployments
Performance degradation - Models drift over time
No observability - Can't debug production issues
The solution:
A complete CI/CD pipeline with automated testing, safe deployments, and comprehensive monitoring.
Architecture Overview
Here's the complete production ML platform:
Production ML Pipeline - 4 Phases:
Phase 1: Build
- Code Push -> CodePipeline -> CodeBuild Testing -> Model Registry
- Automated quality gates and model validation
Phase 2: Deploy
- Staging deployment for testing
- Manual approval gate with metrics review
- Production deployment with A/B testing (90/10 traffic split)
Phase 3: Monitor
- CloudWatch metrics and dashboards
- Model Monitor for drift detection
- SNS alerts for anomalies
Phase 4: Respond
- Auto-Rollback on critical errors
- Manual rollback procedures
- Incident response workflows
Step 1: CI/CD Pipeline Setup
Pipeline Architecture
Code Push -> Build -> Test -> Deploy to Staging -> Approval -> Deploy to Prod -> Monitor
CodePipeline Configuration
Create terraform/cicd.tf:
# S3 bucket for pipeline artifacts
resource "aws_s3_bucket" "pipeline_artifacts" {
bucket = "${var.project_name}-pipeline-artifacts-${data.aws_caller_identity.current.account_id}"
tags = {
Name = "Pipeline Artifacts"
Environment = var.environment
}
}
resource "aws_s3_bucket_versioning" "pipeline_artifacts" {
bucket = aws_s3_bucket.pipeline_artifacts.id
versioning_configuration {
status = "Enabled"
}
}
resource "aws_s3_bucket_server_side_encryption_configuration" "pipeline_artifacts" {
bucket = aws_s3_bucket.pipeline_artifacts.id
rule {
apply_server_side_encryption_by_default {
sse_algorithm = "aws:kms"
kms_master_key_id = aws_kms_key.data_encryption.arn
}
}
}
# CodeBuild project for model testing
resource "aws_codebuild_project" "model_tests" {
name = "${var.project_name}-model-tests"
service_role = aws_iam_role.codebuild.arn
build_timeout = 30
artifacts {
type = "CODEPIPELINE"
}
environment {
compute_type = "BUILD_GENERAL1_MEDIUM"
image = "aws/codebuild/standard:7.0"
type = "LINUX_CONTAINER"
privileged_mode = true
image_pull_credentials_type = "CODEBUILD"
environment_variable {
name = "MODEL_PACKAGE_GROUP_NAME"
value = aws_sagemaker_model_package_group.ml_models.model_package_group_name
}
environment_variable {
name = "AWS_ACCOUNT_ID"
value = data.aws_caller_identity.current.account_id
}
environment_variable {
name = "AWS_REGION"
value = var.aws_region
}
}
source {
type = "CODEPIPELINE"
buildspec = "buildspec.yml"
}
tags = {
Name = "Model Testing"
Environment = var.environment
}
}
# IAM role for CodeBuild
resource "aws_iam_role" "codebuild" {
name = "${var.project_name}-codebuild-role"
assume_role_policy = jsonencode({
Version = "2012-10-17"
Statement = [{
Action = "sts:AssumeRole"
Effect = "Allow"
Principal = {
Service = "codebuild.amazonaws.com"
}
}]
})
}
resource "aws_iam_role_policy" "codebuild" {
name = "${var.project_name}-codebuild-policy"
role = aws_iam_role.codebuild.id
policy = jsonencode({
Version = "2012-10-17"
Statement = [
{
Effect = "Allow"
Action = [
"logs:CreateLogGroup",
"logs:CreateLogStream",
"logs:PutLogEvents"
]
Resource = "arn:aws:logs:*:*:*"
},
{
Effect = "Allow"
Action = [
"s3:GetObject",
"s3:PutObject"
]
Resource = [
"${aws_s3_bucket.pipeline_artifacts.arn}/*",
"${aws_s3_bucket.model_artifacts.arn}/*"
]
},
{
Effect = "Allow"
Action = [
"sagemaker:DescribeModelPackage",
"sagemaker:ListModelPackages"
]
Resource = "*"
},
{
Effect = "Allow"
Action = [
"kms:Decrypt",
"kms:GenerateDataKey"
]
Resource = aws_kms_key.data_encryption.arn
}
]
})
}
# CodePipeline
resource "aws_codepipeline" "ml_pipeline" {
name = "${var.project_name}-ml-pipeline"
role_arn = aws_iam_role.codepipeline.arn
artifact_store {
location = aws_s3_bucket.pipeline_artifacts.bucket
type = "S3"
encryption_key {
id = aws_kms_key.data_encryption.arn
type = "KMS"
}
}
stage {
name = "Source"
action {
name = "Source"
category = "Source"
owner = "AWS"
provider = "S3"
version = "1"
output_artifacts = ["source_output"]
configuration = {
S3Bucket = aws_s3_bucket.model_artifacts.id
S3ObjectKey = "latest/model.tar.gz"
PollForSourceChanges = true
}
}
}
stage {
name = "Test"
action {
name = "ModelValidation"
category = "Build"
owner = "AWS"
provider = "CodeBuild"
version = "1"
input_artifacts = ["source_output"]
output_artifacts = ["test_output"]
configuration = {
ProjectName = aws_codebuild_project.model_tests.name
}
}
}
stage {
name = "DeployToStaging"
action {
name = "CreateStagingEndpoint"
category = "Deploy"
owner = "AWS"
provider = "CloudFormation"
version = "1"
input_artifacts = ["test_output"]
configuration = {
ActionMode = "CREATE_UPDATE"
StackName = "${var.project_name}-staging-endpoint"
TemplatePath = "test_output::endpoint-config.yaml"
Capabilities = "CAPABILITY_IAM"
RoleArn = aws_iam_role.cloudformation.arn
}
}
}
stage {
name = "Approval"
action {
name = "ManualApproval"
category = "Approval"
owner = "AWS"
provider = "Manual"
version = "1"
configuration = {
NotificationArn = aws_sns_topic.validation_notifications.arn
CustomData = "Please review staging endpoint metrics before approving production deployment"
}
}
}
stage {
name = "DeployToProduction"
action {
name = "CreateProductionEndpoint"
category = "Deploy"
owner = "AWS"
provider = "CloudFormation"
version = "1"
input_artifacts = ["test_output"]
configuration = {
ActionMode = "CREATE_UPDATE"
StackName = "${var.project_name}-production-endpoint"
TemplatePath = "test_output::endpoint-config.yaml"
Capabilities = "CAPABILITY_IAM"
RoleArn = aws_iam_role.cloudformation.arn
ParameterOverrides = jsonencode({
Environment = "production"
})
}
}
}
tags = {
Name = "ML Pipeline"
Environment = var.environment
}
}
# IAM role for CodePipeline
resource "aws_iam_role" "codepipeline" {
name = "${var.project_name}-codepipeline-role"
assume_role_policy = jsonencode({
Version = "2012-10-17"
Statement = [{
Action = "sts:AssumeRole"
Effect = "Allow"
Principal = {
Service = "codepipeline.amazonaws.com"
}
}]
})
}
resource "aws_iam_role_policy" "codepipeline" {
name = "${var.project_name}-codepipeline-policy"
role = aws_iam_role.codepipeline.id
policy = jsonencode({
Version = "2012-10-17"
Statement = [
{
Effect = "Allow"
Action = [
"s3:GetObject",
"s3:PutObject",
"s3:GetBucketLocation",
"s3:ListBucket"
]
Resource = [
"${aws_s3_bucket.pipeline_artifacts.arn}",
"${aws_s3_bucket.pipeline_artifacts.arn}/*",
"${aws_s3_bucket.model_artifacts.arn}",
"${aws_s3_bucket.model_artifacts.arn}/*"
]
},
{
Effect = "Allow"
Action = [
"codebuild:StartBuild",
"codebuild:BatchGetBuilds"
]
Resource = aws_codebuild_project.model_tests.arn
},
{
Effect = "Allow"
Action = [
"cloudformation:CreateStack",
"cloudformation:UpdateStack",
"cloudformation:DescribeStacks"
]
Resource = "*"
},
{
Effect = "Allow"
Action = [
"sns:Publish"
]
Resource = aws_sns_topic.validation_notifications.arn
},
{
Effect = "Allow"
Action = [
"iam:PassRole"
]
Resource = aws_iam_role.cloudformation.arn
},
{
Effect = "Allow"
Action = [
"kms:Decrypt",
"kms:GenerateDataKey"
]
Resource = aws_kms_key.data_encryption.arn
}
]
})
}
# IAM role for CloudFormation
resource "aws_iam_role" "cloudformation" {
name = "${var.project_name}-cloudformation-role"
assume_role_policy = jsonencode({
Version = "2012-10-17"
Statement = [{
Action = "sts:AssumeRole"
Effect = "Allow"
Principal = {
Service = "cloudformation.amazonaws.com"
}
}]
})
}
resource "aws_iam_role_policy" "cloudformation" {
name = "${var.project_name}-cloudformation-policy"
role = aws_iam_role.cloudformation.id
policy = jsonencode({
Version = "2012-10-17"
Statement = [
{
Effect = "Allow"
Action = [
"sagemaker:*"
]
Resource = "*"
},
{
Effect = "Allow"
Action = [
"iam:PassRole"
]
Resource = aws_iam_role.sagemaker_execution.arn
}
]
})
}
Step 2: Model Testing Framework
BuildSpec Configuration
Create buildspec.yml in your repository root:
version: 0.2
phases:
pre_build:
commands:
- echo "Installing dependencies..."
- pip install --upgrade pip
- pip install boto3 scikit-learn pandas numpy joblib pytest
build:
commands:
- echo "Running model validation tests..."
- python tests/test_model.py
- echo "Generating endpoint configuration..."
- python deployment/generate_endpoint_config.py
post_build:
commands:
- echo "Build completed on `date`"
artifacts:
files:
- endpoint-config.yaml
- test-results.json
name: TestOutput
reports:
ModelTestReport:
files:
- 'test-results.json'
file-format: 'JUNITXML'
Model Test Suite
Create tests/test_model.py:
import pytest
import boto3
import json
import joblib
import numpy as np
from io import BytesIO
# Test configuration
MODEL_PACKAGE_GROUP_NAME = "ml-pipeline-models"
MINIMUM_ACCURACY = 0.75
MINIMUM_F1_SCORE = 0.70
def get_latest_model():
"""Retrieve latest approved model from registry"""
sagemaker = boto3.client('sagemaker')
response = sagemaker.list_model_packages(
ModelPackageGroupName=MODEL_PACKAGE_GROUP_NAME,
ModelApprovalStatus='Approved',
SortBy='CreationTime',
SortOrder='Descending',
MaxResults=1
)
if not response['ModelPackageSummaryList']:
raise ValueError("No approved models found")
return response['ModelPackageSummaryList'][0]
def test_model_accuracy():
"""Test model meets minimum accuracy threshold"""
model_info = get_latest_model()
# Extract metrics from model metadata
metadata = model_info.get('CustomerMetadataProperties', {})
accuracy = float(metadata.get('accuracy', 0))
assert accuracy >= MINIMUM_ACCURACY, \
f"Model accuracy {accuracy} below threshold {MINIMUM_ACCURACY}"
print(f" Model accuracy: {accuracy:.4f}")
def test_model_f1_score():
"""Test model meets minimum F1 score threshold"""
model_info = get_latest_model()
metadata = model_info.get('CustomerMetadataProperties', {})
f1_score = float(metadata.get('f1_score', 0))
assert f1_score >= MINIMUM_F1_SCORE, \
f"Model F1 score {f1_score} below threshold {MINIMUM_F1_SCORE}"
print(f" Model F1 score: {f1_score:.4f}")
def test_model_prediction_format():
"""Test model produces expected output format"""
# This would test actual model predictions
# Simplified version here
test_input = np.array([[1.5, 2.3, 1.8, 2.1]])
# In real implementation, load model and test
# model = load_model()
# prediction = model.predict(test_input)
# assert prediction.shape == (1,)
# assert prediction.dtype == np.int64
print(" Model prediction format valid")
def test_model_performance_benchmarks():
"""Test model meets performance requirements"""
# Test inference latency
# Test batch processing capability
# Test memory requirements
print(" Model performance benchmarks passed")
def test_model_bias_fairness():
"""Test model for bias and fairness"""
# Simplified bias detection
# In production, use AWS SageMaker Clarify
print(" Model bias checks passed")
if __name__ == '__main__':
pytest.main([__file__, '-v', '--junit-xml=test-results.xml'])
Step 3: SageMaker Endpoint Deployment
Endpoint Configuration Generator
Create deployment/generate_endpoint_config.py:
import boto3
import yaml
import os
def generate_endpoint_config():
"""Generate CloudFormation template for SageMaker endpoint"""
sagemaker = boto3.client('sagemaker')
account_id = boto3.client('sts').get_caller_identity()['Account']
region = os.environ.get('AWS_REGION', 'ap-south-1')
# Get latest approved model
response = sagemaker.list_model_packages(
ModelPackageGroupName=os.environ['MODEL_PACKAGE_GROUP_NAME'],
ModelApprovalStatus='Approved',
SortBy='CreationTime',
SortOrder='Descending',
MaxResults=1
)
model_package_arn = response['ModelPackageSummaryList'][0]['ModelPackageArn']
template = {
'AWSTemplateFormatVersion': '2010-09-09',
'Description': 'SageMaker Endpoint for ML Model',
'Parameters': {
'Environment': {
'Type': 'String',
'Default': 'staging',
'AllowedValues': ['staging', 'production']
}
},
'Resources': {
'Model': {
'Type': 'AWS::SageMaker::Model',
'Properties': {
'ModelName': {
'Fn::Sub': 'ml-model-${Environment}-${AWS::StackName}'
},
'PrimaryContainer': {
'ModelPackageName': model_package_arn
},
'ExecutionRoleArn': f'arn:aws:iam::{account_id}:role/ml-pipeline-sagemaker-execution'
}
},
'EndpointConfig': {
'Type': 'AWS::SageMaker::EndpointConfig',
'Properties': {
'EndpointConfigName': {
'Fn::Sub': 'ml-endpoint-config-${Environment}-${AWS::StackName}'
},
'ProductionVariants': [{
'VariantName': 'AllTraffic',
'ModelName': {'Ref': 'Model'},
'InitialInstanceCount': 1,
'InstanceType': 'ml.m5.large',
'InitialVariantWeight': 1.0
}],
'DataCaptureConfig': {
'EnableCapture': True,
'InitialSamplingPercentage': 100,
'DestinationS3Uri': f's3://ml-pipeline-model-artifacts-dev-{account_id}/data-capture/',
'CaptureOptions': [
{'CaptureMode': 'Input'},
{'CaptureMode': 'Output'}
]
}
}
},
'Endpoint': {
'Type': 'AWS::SageMaker::Endpoint',
'Properties': {
'EndpointName': {
'Fn::Sub': 'ml-endpoint-${Environment}'
},
'EndpointConfigName': {'Ref': 'EndpointConfig'},
'Tags': [
{'Key': 'Environment', 'Value': {'Ref': 'Environment'}},
{'Key': 'ManagedBy', 'Value': 'CloudFormation'}
]
}
}
},
'Outputs': {
'EndpointName': {
'Value': {'Ref': 'Endpoint'},
'Description': 'Name of the SageMaker endpoint'
},
'EndpointArn': {
'Value': {'Fn::GetAtt': ['Endpoint', 'EndpointName']},
'Description': 'ARN of the SageMaker endpoint'
}
}
}
# Write to file
with open('endpoint-config.yaml', 'w') as f:
yaml.dump(template, f, default_flow_style=False)
print(" Endpoint configuration generated")
if __name__ == '__main__':
generate_endpoint_config()
Step 4: A/B Testing Infrastructure
Traffic Splitting Configuration
Create deployment/ab_testing.py:
import boto3
from datetime import datetime
def create_ab_test_endpoint(
endpoint_name,
model_a_arn,
model_b_arn,
traffic_split_percentage=10
):
"""
Create endpoint with A/B testing
Args:
endpoint_name: Name for the endpoint
model_a_arn: ARN of current production model (90% traffic)
model_b_arn: ARN of new model to test (10% traffic)
traffic_split_percentage: Percentage of traffic to new model
"""
sagemaker = boto3.client('sagemaker')
# Create model A (current production)
model_a_name = f"{endpoint_name}-model-a-{datetime.now().strftime('%Y%m%d%H%M%S')}"
sagemaker.create_model(
ModelName=model_a_name,
PrimaryContainer={'ModelPackageName': model_a_arn},
ExecutionRoleArn='arn:aws:iam::ACCOUNT_ID:role/ml-pipeline-sagemaker-execution'
)
# Create model B (new model)
model_b_name = f"{endpoint_name}-model-b-{datetime.now().strftime('%Y%m%d%H%M%S')}"
sagemaker.create_model(
ModelName=model_b_name,
PrimaryContainer={'ModelPackageName': model_b_arn},
ExecutionRoleArn='arn:aws:iam::ACCOUNT_ID:role/ml-pipeline-sagemaker-execution'
)
# Create endpoint config with traffic split
config_name = f"{endpoint_name}-ab-config-{datetime.now().strftime('%Y%m%d%H%M%S')}"
sagemaker.create_endpoint_config(
EndpointConfigName=config_name,
ProductionVariants=[
{
'VariantName': 'ModelA-Current',
'ModelName': model_a_name,
'InitialInstanceCount': 2,
'InstanceType': 'ml.m5.large',
'InitialVariantWeight': 100 - traffic_split_percentage
},
{
'VariantName': 'ModelB-New',
'ModelName': model_b_name,
'InitialInstanceCount': 1,
'InstanceType': 'ml.m5.large',
'InitialVariantWeight': traffic_split_percentage
}
],
DataCaptureConfig={
'EnableCapture': True,
'InitialSamplingPercentage': 100,
'DestinationS3Uri': 's3://ml-pipeline-model-artifacts-dev-ACCOUNT_ID/data-capture/',
'CaptureOptions': [
{'CaptureMode': 'Input'},
{'CaptureMode': 'Output'}
]
}
)
# Create or update endpoint
try:
sagemaker.create_endpoint(
EndpointName=endpoint_name,
EndpointConfigName=config_name
)
print(f" Created A/B test endpoint: {endpoint_name}")
except sagemaker.exceptions.ResourceInUse:
sagemaker.update_endpoint(
EndpointName=endpoint_name,
EndpointConfigName=config_name
)
print(f" Updated A/B test endpoint: {endpoint_name}")
return endpoint_name
def promote_model_b(endpoint_name):
"""
Promote Model B to 100% traffic after successful A/B test
"""
sagemaker = boto3.client('sagemaker')
# Update traffic distribution
sagemaker.update_endpoint_weights_and_capacities(
EndpointName=endpoint_name,
DesiredWeightsAndCapacities=[
{'VariantName': 'ModelA-Current', 'DesiredWeight': 0},
{'VariantName': 'ModelB-New', 'DesiredWeight': 100}
]
)
print(f" Promoted Model B to 100% traffic on {endpoint_name}")
def rollback_to_model_a(endpoint_name):
"""
Rollback to Model A if Model B performs poorly
"""
sagemaker = boto3.client('sagemaker')
sagemaker.update_endpoint_weights_and_capacities(
EndpointName=endpoint_name,
DesiredWeightsAndCapacities=[
{'VariantName': 'ModelA-Current', 'DesiredWeight': 100},
{'VariantName': 'ModelB-New', 'DesiredWeight': 0}
]
)
print(f" Rolled back to Model A on {endpoint_name}")
Step 5: Auto-Scaling Configuration
Create terraform/endpoint-autoscaling.tf:
# Application Auto Scaling Target
resource "aws_appautoscaling_target" "sagemaker_endpoint" {
max_capacity = 10
min_capacity = 1
resource_id = "endpoint/ml-endpoint-production/variant/AllTraffic"
scalable_dimension = "sagemaker:variant:DesiredInstanceCount"
service_namespace = "sagemaker"
}
# Scale up policy
resource "aws_appautoscaling_policy" "scale_up" {
name = "${var.project_name}-scale-up"
service_namespace = aws_appautoscaling_target.sagemaker_endpoint.service_namespace
resource_id = aws_appautoscaling_target.sagemaker_endpoint.resource_id
scalable_dimension = aws_appautoscaling_target.sagemaker_endpoint.scalable_dimension
policy_type = "TargetTrackingScaling"
target_tracking_scaling_policy_configuration {
predefined_metric_specification {
predefined_metric_type = "SageMakerVariantInvocationsPerInstance"
}
target_value = 1000.0 # Target 1000 invocations per instance
scale_in_cooldown = 300 # 5 minutes
scale_out_cooldown = 60 # 1 minute
}
}
# Scale down during off-hours
resource "aws_appautoscaling_scheduled_action" "scale_down_night" {
name = "${var.project_name}-scale-down-night"
service_namespace = aws_appautoscaling_target.sagemaker_endpoint.service_namespace
resource_id = aws_appautoscaling_target.sagemaker_endpoint.resource_id
scalable_dimension = aws_appautoscaling_target.sagemaker_endpoint.scalable_dimension
schedule = "cron(0 22 * * ? *)" # 10 PM UTC
scalable_target_action {
min_capacity = 1
max_capacity = 3
}
}
# Scale up during business hours
resource "aws_appautoscaling_scheduled_action" "scale_up_morning" {
name = "${var.project_name}-scale-up-morning"
service_namespace = aws_appautoscaling_target.sagemaker_endpoint.service_namespace
resource_id = aws_appautoscaling_target.sagemaker_endpoint.resource_id
scalable_dimension = aws_appautoscaling_target.sagemaker_endpoint.scalable_dimension
schedule = "cron(0 6 * * ? *)" # 6 AM UTC
scalable_target_action {
min_capacity = 2
max_capacity = 10
}
}
Step 6: Model Monitoring & Drift Detection
SageMaker Model Monitor
Create monitoring/model_monitor.py:
import boto3
from sagemaker import Model Monitor
from sagemaker.model_monitor import (
DataCaptureConfig,
CronExpressionGenerator,
DefaultModelMonitor
)
import sagemaker
# Initialize
sagemaker_session = sagemaker.Session()
role = 'arn:aws:iam::ACCOUNT_ID:role/ml-pipeline-sagemaker-execution'
def create_baseline():
"""Create baseline for model monitoring"""
my_monitor = DefaultModelMonitor(
role=role,
instance_count=1,
instance_type='ml.m5.xlarge',
volume_size_in_gb=20,
max_runtime_in_seconds=3600,
sagemaker_session=sagemaker_session
)
# Create baseline from training data
my_monitor.suggest_baseline(
baseline_dataset='s3://ml-pipeline-validated-data-dev-ACCOUNT_ID/validated/training.csv',
dataset_format={'csv': {'header': True}},
output_s3_uri='s3://ml-pipeline-model-artifacts-dev-ACCOUNT_ID/baseline',
wait=True
)
print(" Baseline created successfully")
return my_monitor
def create_monitoring_schedule(endpoint_name):
"""Create hourly monitoring schedule"""
my_monitor = DefaultModelMonitor(
role=role,
instance_count=1,
instance_type='ml.m5.xlarge',
volume_size_in_gb=20,
max_runtime_in_seconds=3600,
sagemaker_session=sagemaker_session
)
my_monitor.create_monitoring_schedule(
monitor_schedule_name=f'{endpoint_name}-monitor',
endpoint_input=endpoint_name,
output_s3_uri='s3://ml-pipeline-model-artifacts-dev-ACCOUNT_ID/monitoring-reports',
statistics='s3://ml-pipeline-model-artifacts-dev-ACCOUNT_ID/baseline/statistics.json',
constraints='s3://ml-pipeline-model-artifacts-dev-ACCOUNT_ID/baseline/constraints.json',
schedule_cron_expression=CronExpressionGenerator.hourly(),
enable_cloudwatch_metrics=True
)
print(f" Monitoring schedule created for {endpoint_name}")
def analyze_drift():
"""Analyze model drift from monitoring results"""
s3 = boto3.client('s3')
# Download latest monitoring report
# Parse violations
# Calculate drift metrics
# Alert if drift exceeds threshold
print(" Drift analysis complete")
Step 7: Observability Dashboard
Create terraform/production-monitoring.tf:
# CloudWatch Dashboard for Production
resource "aws_cloudwatch_dashboard" "production_ml" {
dashboard_name = "${var.project_name}-production"
dashboard_body = jsonencode({
widgets = [
# Endpoint Invocations
{
type = "metric"
x = 0
y = 0
width = 12
height = 6
properties = {
metrics = [
["AWS/SageMaker", "Invocations", {
stat = "Sum"
label = "Total Invocations"
}],
[".", "Invocation4XXErrors", {
stat = "Sum"
label = "4XX Errors"
}],
[".", "Invocation5XXErrors", {
stat = "Sum"
label = "5XX Errors"
}]
]
period = 300
stat = "Sum"
region = var.aws_region
title = "Endpoint Invocations & Errors"
yAxis = {
left = {
min = 0
}
}
}
},
# Model Latency
{
type = "metric"
x = 12
y = 0
width = 12
height = 6
properties = {
metrics = [
["AWS/SageMaker", "ModelLatency", {
stat = "Average"
label = "Avg Latency"
}],
["...", {
stat = "p99"
label = "P99 Latency"
}]
]
period = 300
stat = "Average"
region = var.aws_region
title = "Model Latency"
yAxis = {
left = {
label = "Milliseconds"
min = 0
}
}
}
},
# Instance Metrics
{
type = "metric"
x = 0
y = 6
width = 12
height = 6
properties = {
metrics = [
["AWS/SageMaker", "CPUUtilization", {
stat = "Average"
label = "CPU Utilization"
}],
[".", "MemoryUtilization", {
stat = "Average"
label = "Memory Utilization"
}]
]
period = 300
stat = "Average"
region = var.aws_region
title = "Instance Resource Utilization"
yAxis = {
left = {
label = "Percent"
min = 0
max = 100
}
}
}
},
# Model Quality Metrics
{
type = "metric"
x = 12
y = 6
width = 12
height = 6
properties = {
metrics = [
["MLPipeline/ModelQuality", "PredictionAccuracy", {
stat = "Average"
label = "Accuracy"
}],
[".", "DriftScore", {
stat = "Average"
label = "Drift Score"
}]
]
period = 3600
stat = "Average"
region = var.aws_region
title = "Model Quality Metrics"
}
}
]
})
}
# Critical Alerts
resource "aws_cloudwatch_metric_alarm" "high_error_rate" {
alarm_name = "${var.project_name}-high-error-rate"
comparison_operator = "GreaterThanThreshold"
evaluation_periods = "2"
metric_name = "Invocation5XXErrors"
namespace = "AWS/SageMaker"
period = "300"
statistic = "Sum"
threshold = "10"
alarm_description = "Alert when error rate is high"
alarm_actions = [aws_sns_topic.validation_notifications.arn]
treat_missing_data = "notBreaching"
dimensions = {
EndpointName = "ml-endpoint-production"
VariantName = "AllTraffic"
}
}
resource "aws_cloudwatch_metric_alarm" "high_latency" {
alarm_name = "${var.project_name}-high-latency"
comparison_operator = "GreaterThanThreshold"
evaluation_periods = "3"
metric_name = "ModelLatency"
namespace = "AWS/SageMaker"
period = "300"
statistic = "Average"
threshold = "1000" # 1 second
alarm_description = "Alert when latency is high"
alarm_actions = [aws_sns_topic.validation_notifications.arn]
dimensions = {
EndpointName = "ml-endpoint-production"
VariantName = "AllTraffic"
}
}
resource "aws_cloudwatch_metric_alarm" "model_drift_detected" {
alarm_name = "${var.project_name}-model-drift"
comparison_operator = "GreaterThanThreshold"
evaluation_periods = "1"
metric_name = "DriftScore"
namespace = "MLPipeline/ModelQuality"
period = "3600"
statistic = "Average"
threshold = "0.15" # 15% drift threshold
alarm_description = "Alert when model drift is detected"
alarm_actions = [aws_sns_topic.validation_notifications.arn]
}
Step 8: Incident Response & Rollback
Automated Rollback Lambda
Create lambda/auto-rollback/handler.py:
import boto3
import json
import os
sagemaker = boto3.client('sagemaker')
sns = boto3.client('sns')
ERROR_THRESHOLD = 50 # Rollback if 50+ errors in 5 minutes
SNS_TOPIC_ARN = os.environ['SNS_TOPIC_ARN']
def lambda_handler(event, context):
"""
Automatically rollback endpoint if error rate exceeds threshold
"""
# Parse CloudWatch alarm
message = json.loads(event['Records'][0]['Sns']['Message'])
alarm_name = message['AlarmName']
if 'high-error-rate' in alarm_name:
print(f"High error rate detected: {alarm_name}")
# Get current endpoint config
endpoint_name = 'ml-endpoint-production'
endpoint = sagemaker.describe_endpoint(EndpointName=endpoint_name)
current_config = endpoint['EndpointConfigName']
# List all configs, find previous one
configs = sagemaker.list_endpoint_configs(
SortBy='CreationTime',
SortOrder='Descending',
MaxResults=10
)
# Find previous config (skip current)
previous_config = None
for config in configs['EndpointConfigs']:
if config['EndpointConfigName'] != current_config:
previous_config = config['EndpointConfigName']
break
if previous_config:
print(f"Rolling back to: {previous_config}")
# Update endpoint to previous config
sagemaker.update_endpoint(
EndpointName=endpoint_name,
EndpointConfigName=previous_config
)
# Send notification
sns.publish(
TopicArn=SNS_TOPIC_ARN,
Subject='AUTOMATED ROLLBACK INITIATED',
Message=f"""
Automatic rollback triggered due to high error rate.
Endpoint: {endpoint_name}
Previous Config: {current_config}
Rolled Back To: {previous_config}
Please investigate the issue.
"""
)
return {
'statusCode': 200,
'body': json.dumps({
'action': 'rollback',
'endpoint': endpoint_name,
'config': previous_config
})
}
else:
print("No previous config found for rollback")
sns.publish(
TopicArn=SNS_TOPIC_ARN,
Subject='ROLLBACK FAILED - Manual Intervention Required',
Message=f"""
Automatic rollback failed - no previous configuration found.
Endpoint: {endpoint_name}
Current Config: {current_config}
IMMEDIATE ACTION REQUIRED.
"""
)
return {
'statusCode': 200,
'body': json.dumps('Alarm processed')
}
Manual Rollback Script
Create scripts/manual_rollback.sh:
#!/bin/bash
# Manual rollback script for emergencies
set -e
ENDPOINT_NAME="ml-endpoint-production"
REGION="ap-south-1"
echo "EMERGENCY ROLLBACK INITIATED"
echo "================================="
# Get current config
echo "Getting current endpoint configuration..."
CURRENT_CONFIG=$(aws sagemaker describe-endpoint \
--endpoint-name $ENDPOINT_NAME \
--region $REGION \
--query 'EndpointConfigName' \
--output text)
echo "Current config: $CURRENT_CONFIG"
# List recent configs
echo "Finding previous configuration..."
PREVIOUS_CONFIG=$(aws sagemaker list-endpoint-configs \
--sort-by CreationTime \
--sort-order Descending \
--region $REGION \
--query "EndpointConfigs[?EndpointConfigName!='$CURRENT_CONFIG'] | [0].EndpointConfigName" \
--output text)
echo "Previous config: $PREVIOUS_CONFIG"
# Confirm rollback
read -p "Rollback to $PREVIOUS_CONFIG? (yes/no): " CONFIRM
if [ "$CONFIRM" == "yes" ]; then
echo "Executing rollback..."
aws sagemaker update-endpoint \
--endpoint-name $ENDPOINT_NAME \
--endpoint-config-name $PREVIOUS_CONFIG \
--region $REGION
echo " Rollback initiated"
echo "Monitor status: aws sagemaker describe-endpoint --endpoint-name $ENDPOINT_NAME"
else
echo "Rollback cancelled"
fi
Step 9: Cost Optimization for Production
Production Cost Analysis
Monthly Costs (Production):
| Resource | Configuration | Cost/Month |
|---|---|---|
| SageMaker Endpoint | 2x ml.m5.large 24/7 | ~$167 |
| Auto-scaling (peak) | +2 instances @ 8hr/day | ~$111 |
| Model Monitor | Hourly checks | ~$36 |
| Data Capture | 100% sampling, 1M requests | ~$5 |
| CloudWatch | Logs + Metrics + Alarms | ~$20 |
| S3 Storage | 200GB models + data | ~$4.60 |
| CodePipeline | 1 active pipeline | ~$1 |
| Base Total | ~$345/month | |
| With Traffic | $345-500/month |
Cost Saving Tips
# 1. Use Serverless Inference for low traffic
# Instead of persistent endpoint, use serverless
# Costs: $0.20/1M requests + $0.067/GB-hour memory
# 2. Async Inference for batch predictions
# Process predictions asynchronously
# Lower infrastructure costs
# 3. Multi-model endpoints
# Host multiple models on same endpoint
# Share infrastructure costs
# 4. Reserved instances for predictable workloads
# Save up to 75% vs on-demand
# 5. Scheduled scaling
# Scale down during off-hours (nights/weekends)
# Save 40-60% on compute
Step 10: Complete Deployment Workflow
End-to-End Deployment Script
Create scripts/deploy_production.sh:
#!/bin/bash
# Complete production deployment workflow
set -e
PROJECT_NAME="ml-pipeline"
ENVIRONMENT="production"
REGION="ap-south-1"
echo " Starting Production Deployment"
echo "=================================="
# Step 1: Run tests
echo "Step 1/6: Running model tests..."
python tests/test_model.py
echo " Tests passed"
# Step 2: Build infrastructure
echo "Step 2/6: Deploying infrastructure..."
cd terraform
terraform apply -auto-approve \
-var="environment=production" \
-var="notification_email=your-email@example.com"
cd ..
echo " Infrastructure deployed"
# Step 3: Create baseline
echo "Step 3/6: Creating monitoring baseline..."
python monitoring/model_monitor.py
echo " Baseline created"
# Step 4: Deploy endpoint
echo "Step 4/6: Deploying SageMaker endpoint..."
python deployment/generate_endpoint_config.py
aws cloudformation create-stack \
--stack-name ${PROJECT_NAME}-endpoint-${ENVIRONMENT} \
--template-body file://endpoint-config.yaml \
--capabilities CAPABILITY_IAM \
--region $REGION
echo " Endpoint deployment initiated"
# Step 5: Wait for endpoint
echo "Step 5/6: Waiting for endpoint to be InService..."
aws sagemaker wait endpoint-in-service \
--endpoint-name ml-endpoint-${ENVIRONMENT} \
--region $REGION
echo " Endpoint is InService"
# Step 6: Enable monitoring
echo "Step 6/6: Enabling monitoring schedule..."
python monitoring/model_monitor.py --enable-schedule
echo " Monitoring enabled"
echo ""
echo "=================================="
echo " Production Deployment Complete!"
echo "=================================="
echo ""
echo "Endpoint: ml-endpoint-${ENVIRONMENT}"
echo "Dashboard: https://console.aws.amazon.com/cloudwatch/home?region=${REGION}#dashboards:name=${PROJECT_NAME}-production"
echo "Monitor: https://console.aws.amazon.com/sagemaker/home?region=${REGION}#/monitoring-schedules"
Testing the Complete Pipeline
1. Test CI/CD Pipeline
# Trigger pipeline by updating model in S3
aws s3 cp model.tar.gz \
s3://ml-pipeline-model-artifacts-dev-ACCOUNT_ID/latest/model.tar.gz
# Monitor pipeline
aws codepipeline get-pipeline-state \
--name ml-pipeline-ml-pipeline
2. Test Endpoint
# Invoke endpoint
aws sagemaker-runtime invoke-endpoint \
--endpoint-name ml-endpoint-production \
--content-type text/csv \
--body '1.5,2.3,1.8,2.1' \
output.json
cat output.json
3. Test A/B Deployment
# Deploy with A/B testing
from deployment.ab_testing import create_ab_test_endpoint
create_ab_test_endpoint(
endpoint_name='ml-endpoint-production',
model_a_arn='arn:aws:sagemaker:...:model-package/current',
model_b_arn='arn:aws:sagemaker:...:model-package/new',
traffic_split_percentage=10
)
4. Test Auto-Rollback
# Simulate high error rate
# Auto-rollback lambda will trigger
# Or manual rollback
./scripts/manual_rollback.sh
Production Readiness Checklist
Infrastructure
- All resources deployed via Terraform
- VPC and security groups configured
- IAM roles follow least privilege
- Encryption enabled everywhere
CI/CD
- CodePipeline operational
- Automated tests passing
- Manual approval gate configured
- Rollback procedure tested
Monitoring
- CloudWatch dashboard created
- Critical alarms configured
- Model monitoring enabled
- Log retention policies set
Deployment
- Staging environment tested
- A/B testing capability ready
- Auto-scaling configured
- Rollback tested
Operations
- Runbooks documented
- On-call rotation established
- Incident response plan ready
- Disaster recovery tested
What You've Built - The Complete Platform
Congratulations! You now have a production-ready ML platform with:
Phase 1: Data (Part 2)
- Encrypted S3 data lake
- Automated validation
- Complete audit trail
Phase 2: Training (Part 3)
- Scalable SageMaker training
- Experiment tracking with MLflow
- Cost-optimized compute
- Model registry
Phase 3: Deployment (Part 4)
- CI/CD pipeline
- Production endpoints
- A/B testing
- Auto-scaling
- Comprehensive monitoring
- Automated rollback
Real-World Production Tips
1. Start Simple, Scale Gradually
Week 1: Deploy to staging
Week 2: Deploy to production with 10% traffic
Week 3: Increase to 50% if metrics good
Week 4: Full production rollout
2. Monitor Everything
- Endpoint health
- Model performance
- Data drift
- Business metrics
- Costs
3. Incident Response
1. Alert triggers
2. Check dashboard
3. Assess severity
4. Rollback if critical
5. Investigate root cause
6. Document learnings
4. Continuous Improvement
- Weekly model retraining
- Monthly architecture review
- Quarterly cost optimization
- Regular security audits
Common Issues & Solutions
Issue: High Latency
Solutions:
- Add more instances (auto-scaling)
- Optimize model (quantization, pruning)
- Use faster instance types
- Enable batch prediction
- Add caching layer
Issue: High Costs
Solutions:
- Use Serverless Inference for low traffic
- Implement scheduled scaling
- Archive old model versions
- Optimize S3 lifecycle policies
- Use Spot instances where possible
Issue: Model Drift
Solutions:
- Enable SageMaker Model Monitor
- Set up automated retraining
- Implement data quality checks
- Regular model evaluation
- A/B test new models
Issue: Failed Deployments
Solutions:
- Comprehensive testing in CI/CD
- Blue/green deployment strategy
- Canary releases with gradual rollout
- Quick rollback procedures
- Post-deployment validation
Series Wrap-Up
What We've Covered
Part 1: AIDLC Framework & Architecture
- Security-first ML design
- AWS service selection
- Best practices
Part 2: Data Pipelines
- S3 + Lambda automation
- Data validation
- Audit trails
Part 3: Training at Scale
- SageMaker + MLflow
- Spot instances
- Hyperparameter tuning
Part 4: Production Deployment
- CI/CD pipelines
- A/B testing
- Monitoring & rollback
Recommended Reading:
- AWS Well-Architected Machine Learning Lens
- SageMaker Best Practices
- MLOps: Continuous delivery and automation pipelines
Key Takeaways
- Automate everything - CI/CD is non-negotiable
- Monitor proactively - Don't wait for users to report issues
- Deploy safely - A/B testing and gradual rollouts
- Plan for failure - Rollback procedures must work
- Optimize costs - Production can get expensive
- Document thoroughly - Future you will thank present you
Remember: Production is where ML creates real value. Build systems that are reliable, observable, and maintainable.
Final Thoughts
Building production ML systems is complex, but with the right architecture and tools, it's absolutely achievable. You now have a complete blueprint for:
- Secure data handling
- Scalable training
- Safe deployment
- Comprehensive monitoring
- Cost optimization
This is just the beginning. Take these foundations and build amazing ML products!
Thank You!
Thank you for following along with this 4-part series! I hope you found it valuable.
Series Recap:
- Part 1: AIDLC Framework
- Part 2: Data Pipelines
- Part 3: Training at Scale
- Part 4: Production Deployment (You are here!)
What's your biggest ML deployment challenge?
Drop a comment below - I'd love to hear about your experiences and help if I can!
Building something cool with this architecture?
I'd love to see what you create! Tag me on social media or drop a comment.
Let's Stay Connected!
- Follow me for more AWS and ML content
- Like if this series helped you
- Share with your team/connects
Have questions? Reach out anytime!
About the Author
Connect with me:
Tags: #aws #machinelearning #mlops #sagemaker #cicd #devops #production #terraform #cloudwatch

Top comments (0)