Shoaibali Mir

Posted on Dec 30, 2025

Production ML on AWS: CI-CD, Deployment and Monitoring at Scale

#ai #devops #aws

Reading time: ~15-20 minutes

Level: Intermediate to Advanced

Prerequisites: Parts 1-3, Understanding of CI/CD concepts

Series: Part 4 of 4 (Series Finale) - Part 1 | Part 2 | Part 3

Welcome to the Series Finale!

We've come a long way! In Part 1, we covered the AIDLC framework. In Part 2, we built secure data pipelines. In Part 3, we trained models at scale with SageMaker. Now it's time to bring it all together and deploy to production.

What you'll build today:

CI/CD pipeline for ML models with CodePipeline
Production SageMaker endpoints with auto-scaling
A/B testing infrastructure for safe rollouts
Model drift detection and monitoring
Complete observability stack
Incident response and rollback procedures

By the end: You'll have a complete, production-ready ML platform on AWS that can safely deploy, monitor, and maintain models at scale.

The ML Deployment Problem

Training a great model is only half the battle. Production deployment brings new challenges:

Manual deployments - Error-prone, doesn't scale

No testing - Bad models reach production

Deployment downtime - Service interruptions

No rollback plan - Can't undo bad deployments

Performance degradation - Models drift over time

No observability - Can't debug production issues

The solution:

A complete CI/CD pipeline with automated testing, safe deployments, and comprehensive monitoring.

Architecture Overview

Here's the complete production ML platform:

Production ML Pipeline - 4 Phases:

Phase 1: Build

Code Push -> CodePipeline -> CodeBuild Testing -> Model Registry
Automated quality gates and model validation

Phase 2: Deploy

Staging deployment for testing
Manual approval gate with metrics review
Production deployment with A/B testing (90/10 traffic split)

Phase 3: Monitor

CloudWatch metrics and dashboards
Model Monitor for drift detection
SNS alerts for anomalies

Phase 4: Respond

Auto-Rollback on critical errors
Manual rollback procedures
Incident response workflows

Step 1: CI/CD Pipeline Setup

Pipeline Architecture

Code Push -> Build -> Test -> Deploy to Staging -> Approval -> Deploy to Prod -> Monitor

CodePipeline Configuration

Create terraform/cicd.tf:

# S3 bucket for pipeline artifacts
resource "aws_s3_bucket" "pipeline_artifacts" {
  bucket = "${var.project_name}-pipeline-artifacts-${data.aws_caller_identity.current.account_id}"

  tags = {
    Name        = "Pipeline Artifacts"
    Environment = var.environment
  }
}

resource "aws_s3_bucket_versioning" "pipeline_artifacts" {
  bucket = aws_s3_bucket.pipeline_artifacts.id

  versioning_configuration {
    status = "Enabled"
  }
}

resource "aws_s3_bucket_server_side_encryption_configuration" "pipeline_artifacts" {
  bucket = aws_s3_bucket.pipeline_artifacts.id

  rule {
    apply_server_side_encryption_by_default {
      sse_algorithm     = "aws:kms"
      kms_master_key_id = aws_kms_key.data_encryption.arn
    }
  }
}

# CodeBuild project for model testing
resource "aws_codebuild_project" "model_tests" {
  name          = "${var.project_name}-model-tests"
  service_role  = aws_iam_role.codebuild.arn
  build_timeout = 30

  artifacts {
    type = "CODEPIPELINE"
  }

  environment {
    compute_type                = "BUILD_GENERAL1_MEDIUM"
    image                      = "aws/codebuild/standard:7.0"
    type                       = "LINUX_CONTAINER"
    privileged_mode            = true
    image_pull_credentials_type = "CODEBUILD"

    environment_variable {
      name  = "MODEL_PACKAGE_GROUP_NAME"
      value = aws_sagemaker_model_package_group.ml_models.model_package_group_name
    }

    environment_variable {
      name  = "AWS_ACCOUNT_ID"
      value = data.aws_caller_identity.current.account_id
    }

    environment_variable {
      name  = "AWS_REGION"
      value = var.aws_region
    }
  }

  source {
    type      = "CODEPIPELINE"
    buildspec = "buildspec.yml"
  }

  tags = {
    Name        = "Model Testing"
    Environment = var.environment
  }
}

# IAM role for CodeBuild
resource "aws_iam_role" "codebuild" {
  name = "${var.project_name}-codebuild-role"

  assume_role_policy = jsonencode({
    Version = "2012-10-17"
    Statement = [{
      Action = "sts:AssumeRole"
      Effect = "Allow"
      Principal = {
        Service = "codebuild.amazonaws.com"
      }
    }]
  })
}

resource "aws_iam_role_policy" "codebuild" {
  name = "${var.project_name}-codebuild-policy"
  role = aws_iam_role.codebuild.id

  policy = jsonencode({
    Version = "2012-10-17"
    Statement = [
      {
        Effect = "Allow"
        Action = [
          "logs:CreateLogGroup",
          "logs:CreateLogStream",
          "logs:PutLogEvents"
        ]
        Resource = "arn:aws:logs:*:*:*"
      },
      {
        Effect = "Allow"
        Action = [
          "s3:GetObject",
          "s3:PutObject"
        ]
        Resource = [
          "${aws_s3_bucket.pipeline_artifacts.arn}/*",
          "${aws_s3_bucket.model_artifacts.arn}/*"
        ]
      },
      {
        Effect = "Allow"
        Action = [
          "sagemaker:DescribeModelPackage",
          "sagemaker:ListModelPackages"
        ]
        Resource = "*"
      },
      {
        Effect = "Allow"
        Action = [
          "kms:Decrypt",
          "kms:GenerateDataKey"
        ]
        Resource = aws_kms_key.data_encryption.arn
      }
    ]
  })
}

# CodePipeline
resource "aws_codepipeline" "ml_pipeline" {
  name     = "${var.project_name}-ml-pipeline"
  role_arn = aws_iam_role.codepipeline.arn

  artifact_store {
    location = aws_s3_bucket.pipeline_artifacts.bucket
    type     = "S3"

    encryption_key {
      id   = aws_kms_key.data_encryption.arn
      type = "KMS"
    }
  }

  stage {
    name = "Source"

    action {
      name             = "Source"
      category         = "Source"
      owner            = "AWS"
      provider         = "S3"
      version          = "1"
      output_artifacts = ["source_output"]

      configuration = {
        S3Bucket             = aws_s3_bucket.model_artifacts.id
        S3ObjectKey          = "latest/model.tar.gz"
        PollForSourceChanges = true
      }
    }
  }

  stage {
    name = "Test"

    action {
      name             = "ModelValidation"
      category         = "Build"
      owner            = "AWS"
      provider         = "CodeBuild"
      version          = "1"
      input_artifacts  = ["source_output"]
      output_artifacts = ["test_output"]

      configuration = {
        ProjectName = aws_codebuild_project.model_tests.name
      }
    }
  }

  stage {
    name = "DeployToStaging"

    action {
      name            = "CreateStagingEndpoint"
      category        = "Deploy"
      owner           = "AWS"
      provider        = "CloudFormation"
      version         = "1"
      input_artifacts = ["test_output"]

      configuration = {
        ActionMode    = "CREATE_UPDATE"
        StackName     = "${var.project_name}-staging-endpoint"
        TemplatePath  = "test_output::endpoint-config.yaml"
        Capabilities  = "CAPABILITY_IAM"
        RoleArn       = aws_iam_role.cloudformation.arn
      }
    }
  }

  stage {
    name = "Approval"

    action {
      name     = "ManualApproval"
      category = "Approval"
      owner    = "AWS"
      provider = "Manual"
      version  = "1"

      configuration = {
        NotificationArn = aws_sns_topic.validation_notifications.arn
        CustomData      = "Please review staging endpoint metrics before approving production deployment"
      }
    }
  }

  stage {
    name = "DeployToProduction"

    action {
      name            = "CreateProductionEndpoint"
      category        = "Deploy"
      owner           = "AWS"
      provider        = "CloudFormation"
      version         = "1"
      input_artifacts = ["test_output"]

      configuration = {
        ActionMode    = "CREATE_UPDATE"
        StackName     = "${var.project_name}-production-endpoint"
        TemplatePath  = "test_output::endpoint-config.yaml"
        Capabilities  = "CAPABILITY_IAM"
        RoleArn       = aws_iam_role.cloudformation.arn
        ParameterOverrides = jsonencode({
          Environment = "production"
        })
      }
    }
  }

  tags = {
    Name        = "ML Pipeline"
    Environment = var.environment
  }
}

# IAM role for CodePipeline
resource "aws_iam_role" "codepipeline" {
  name = "${var.project_name}-codepipeline-role"

  assume_role_policy = jsonencode({
    Version = "2012-10-17"
    Statement = [{
      Action = "sts:AssumeRole"
      Effect = "Allow"
      Principal = {
        Service = "codepipeline.amazonaws.com"
      }
    }]
  })
}

resource "aws_iam_role_policy" "codepipeline" {
  name = "${var.project_name}-codepipeline-policy"
  role = aws_iam_role.codepipeline.id

  policy = jsonencode({
    Version = "2012-10-17"
    Statement = [
      {
        Effect = "Allow"
        Action = [
          "s3:GetObject",
          "s3:PutObject",
          "s3:GetBucketLocation",
          "s3:ListBucket"
        ]
        Resource = [
          "${aws_s3_bucket.pipeline_artifacts.arn}",
          "${aws_s3_bucket.pipeline_artifacts.arn}/*",
          "${aws_s3_bucket.model_artifacts.arn}",
          "${aws_s3_bucket.model_artifacts.arn}/*"
        ]
      },
      {
        Effect = "Allow"
        Action = [
          "codebuild:StartBuild",
          "codebuild:BatchGetBuilds"
        ]
        Resource = aws_codebuild_project.model_tests.arn
      },
      {
        Effect = "Allow"
        Action = [
          "cloudformation:CreateStack",
          "cloudformation:UpdateStack",
          "cloudformation:DescribeStacks"
        ]
        Resource = "*"
      },
      {
        Effect = "Allow"
        Action = [
          "sns:Publish"
        ]
        Resource = aws_sns_topic.validation_notifications.arn
      },
      {
        Effect = "Allow"
        Action = [
          "iam:PassRole"
        ]
        Resource = aws_iam_role.cloudformation.arn
      },
      {
        Effect = "Allow"
        Action = [
          "kms:Decrypt",
          "kms:GenerateDataKey"
        ]
        Resource = aws_kms_key.data_encryption.arn
      }
    ]
  })
}

# IAM role for CloudFormation
resource "aws_iam_role" "cloudformation" {
  name = "${var.project_name}-cloudformation-role"

  assume_role_policy = jsonencode({
    Version = "2012-10-17"
    Statement = [{
      Action = "sts:AssumeRole"
      Effect = "Allow"
      Principal = {
        Service = "cloudformation.amazonaws.com"
      }
    }]
  })
}

resource "aws_iam_role_policy" "cloudformation" {
  name = "${var.project_name}-cloudformation-policy"
  role = aws_iam_role.cloudformation.id

  policy = jsonencode({
    Version = "2012-10-17"
    Statement = [
      {
        Effect = "Allow"
        Action = [
          "sagemaker:*"
        ]
        Resource = "*"
      },
      {
        Effect = "Allow"
        Action = [
          "iam:PassRole"
        ]
        Resource = aws_iam_role.sagemaker_execution.arn
      }
    ]
  })
}

Step 2: Model Testing Framework

BuildSpec Configuration

Create buildspec.yml in your repository root:

version: 0.2

phases:
  pre_build:
    commands:
      - echo "Installing dependencies..."
      - pip install --upgrade pip
      - pip install boto3 scikit-learn pandas numpy joblib pytest

  build:
    commands:
      - echo "Running model validation tests..."
      - python tests/test_model.py
      - echo "Generating endpoint configuration..."
      - python deployment/generate_endpoint_config.py

  post_build:
    commands:
      - echo "Build completed on `date`"

artifacts:
  files:
    - endpoint-config.yaml
    - test-results.json
  name: TestOutput

reports:
  ModelTestReport:
    files:
      - 'test-results.json'
    file-format: 'JUNITXML'

Model Test Suite

Create tests/test_model.py:

import pytest
import boto3
import json
import joblib
import numpy as np
from io import BytesIO

# Test configuration
MODEL_PACKAGE_GROUP_NAME = "ml-pipeline-models"
MINIMUM_ACCURACY = 0.75
MINIMUM_F1_SCORE = 0.70

def get_latest_model():
    """Retrieve latest approved model from registry"""
    sagemaker = boto3.client('sagemaker')

    response = sagemaker.list_model_packages(
        ModelPackageGroupName=MODEL_PACKAGE_GROUP_NAME,
        ModelApprovalStatus='Approved',
        SortBy='CreationTime',
        SortOrder='Descending',
        MaxResults=1
    )

    if not response['ModelPackageSummaryList']:
        raise ValueError("No approved models found")

    return response['ModelPackageSummaryList'][0]

def test_model_accuracy():
    """Test model meets minimum accuracy threshold"""
    model_info = get_latest_model()

    # Extract metrics from model metadata
    metadata = model_info.get('CustomerMetadataProperties', {})
    accuracy = float(metadata.get('accuracy', 0))

    assert accuracy >= MINIMUM_ACCURACY, \
        f"Model accuracy {accuracy} below threshold {MINIMUM_ACCURACY}"

    print(f" Model accuracy: {accuracy:.4f}")

def test_model_f1_score():
    """Test model meets minimum F1 score threshold"""
    model_info = get_latest_model()

    metadata = model_info.get('CustomerMetadataProperties', {})
    f1_score = float(metadata.get('f1_score', 0))

    assert f1_score >= MINIMUM_F1_SCORE, \
        f"Model F1 score {f1_score} below threshold {MINIMUM_F1_SCORE}"

    print(f" Model F1 score: {f1_score:.4f}")

def test_model_prediction_format():
    """Test model produces expected output format"""
    # This would test actual model predictions
    # Simplified version here

    test_input = np.array([[1.5, 2.3, 1.8, 2.1]])

    # In real implementation, load model and test
    # model = load_model()
    # prediction = model.predict(test_input)

    # assert prediction.shape == (1,)
    # assert prediction.dtype == np.int64

    print(" Model prediction format valid")

def test_model_performance_benchmarks():
    """Test model meets performance requirements"""
    # Test inference latency
    # Test batch processing capability
    # Test memory requirements

    print(" Model performance benchmarks passed")

def test_model_bias_fairness():
    """Test model for bias and fairness"""
    # Simplified bias detection
    # In production, use AWS SageMaker Clarify

    print(" Model bias checks passed")

if __name__ == '__main__':
    pytest.main([__file__, '-v', '--junit-xml=test-results.xml'])

Step 3: SageMaker Endpoint Deployment

Endpoint Configuration Generator

Create deployment/generate_endpoint_config.py:

import boto3
import yaml
import os

def generate_endpoint_config():
    """Generate CloudFormation template for SageMaker endpoint"""

    sagemaker = boto3.client('sagemaker')
    account_id = boto3.client('sts').get_caller_identity()['Account']
    region = os.environ.get('AWS_REGION', 'ap-south-1')

    # Get latest approved model
    response = sagemaker.list_model_packages(
        ModelPackageGroupName=os.environ['MODEL_PACKAGE_GROUP_NAME'],
        ModelApprovalStatus='Approved',
        SortBy='CreationTime',
        SortOrder='Descending',
        MaxResults=1
    )

    model_package_arn = response['ModelPackageSummaryList'][0]['ModelPackageArn']

    template = {
        'AWSTemplateFormatVersion': '2010-09-09',
        'Description': 'SageMaker Endpoint for ML Model',
        'Parameters': {
            'Environment': {
                'Type': 'String',
                'Default': 'staging',
                'AllowedValues': ['staging', 'production']
            }
        },
        'Resources': {
            'Model': {
                'Type': 'AWS::SageMaker::Model',
                'Properties': {
                    'ModelName': {
                        'Fn::Sub': 'ml-model-${Environment}-${AWS::StackName}'
                    },
                    'PrimaryContainer': {
                        'ModelPackageName': model_package_arn
                    },
                    'ExecutionRoleArn': f'arn:aws:iam::{account_id}:role/ml-pipeline-sagemaker-execution'
                }
            },
            'EndpointConfig': {
                'Type': 'AWS::SageMaker::EndpointConfig',
                'Properties': {
                    'EndpointConfigName': {
                        'Fn::Sub': 'ml-endpoint-config-${Environment}-${AWS::StackName}'
                    },
                    'ProductionVariants': [{
                        'VariantName': 'AllTraffic',
                        'ModelName': {'Ref': 'Model'},
                        'InitialInstanceCount': 1,
                        'InstanceType': 'ml.m5.large',
                        'InitialVariantWeight': 1.0
                    }],
                    'DataCaptureConfig': {
                        'EnableCapture': True,
                        'InitialSamplingPercentage': 100,
                        'DestinationS3Uri': f's3://ml-pipeline-model-artifacts-dev-{account_id}/data-capture/',
                        'CaptureOptions': [
                            {'CaptureMode': 'Input'},
                            {'CaptureMode': 'Output'}
                        ]
                    }
                }
            },
            'Endpoint': {
                'Type': 'AWS::SageMaker::Endpoint',
                'Properties': {
                    'EndpointName': {
                        'Fn::Sub': 'ml-endpoint-${Environment}'
                    },
                    'EndpointConfigName': {'Ref': 'EndpointConfig'},
                    'Tags': [
                        {'Key': 'Environment', 'Value': {'Ref': 'Environment'}},
                        {'Key': 'ManagedBy', 'Value': 'CloudFormation'}
                    ]
                }
            }
        },
        'Outputs': {
            'EndpointName': {
                'Value': {'Ref': 'Endpoint'},
                'Description': 'Name of the SageMaker endpoint'
            },
            'EndpointArn': {
                'Value': {'Fn::GetAtt': ['Endpoint', 'EndpointName']},
                'Description': 'ARN of the SageMaker endpoint'
            }
        }
    }

    # Write to file
    with open('endpoint-config.yaml', 'w') as f:
        yaml.dump(template, f, default_flow_style=False)

    print(" Endpoint configuration generated")

if __name__ == '__main__':
    generate_endpoint_config()

Step 4: A/B Testing Infrastructure

Traffic Splitting Configuration

Create deployment/ab_testing.py:

import boto3
from datetime import datetime

def create_ab_test_endpoint(
    endpoint_name,
    model_a_arn,
    model_b_arn,
    traffic_split_percentage=10
):
    """
    Create endpoint with A/B testing

    Args:
        endpoint_name: Name for the endpoint
        model_a_arn: ARN of current production model (90% traffic)
        model_b_arn: ARN of new model to test (10% traffic)
        traffic_split_percentage: Percentage of traffic to new model
    """
    sagemaker = boto3.client('sagemaker')

    # Create model A (current production)
    model_a_name = f"{endpoint_name}-model-a-{datetime.now().strftime('%Y%m%d%H%M%S')}"
    sagemaker.create_model(
        ModelName=model_a_name,
        PrimaryContainer={'ModelPackageName': model_a_arn},
        ExecutionRoleArn='arn:aws:iam::ACCOUNT_ID:role/ml-pipeline-sagemaker-execution'
    )

    # Create model B (new model)
    model_b_name = f"{endpoint_name}-model-b-{datetime.now().strftime('%Y%m%d%H%M%S')}"
    sagemaker.create_model(
        ModelName=model_b_name,
        PrimaryContainer={'ModelPackageName': model_b_arn},
        ExecutionRoleArn='arn:aws:iam::ACCOUNT_ID:role/ml-pipeline-sagemaker-execution'
    )

    # Create endpoint config with traffic split
    config_name = f"{endpoint_name}-ab-config-{datetime.now().strftime('%Y%m%d%H%M%S')}"
    sagemaker.create_endpoint_config(
        EndpointConfigName=config_name,
        ProductionVariants=[
            {
                'VariantName': 'ModelA-Current',
                'ModelName': model_a_name,
                'InitialInstanceCount': 2,
                'InstanceType': 'ml.m5.large',
                'InitialVariantWeight': 100 - traffic_split_percentage
            },
            {
                'VariantName': 'ModelB-New',
                'ModelName': model_b_name,
                'InitialInstanceCount': 1,
                'InstanceType': 'ml.m5.large',
                'InitialVariantWeight': traffic_split_percentage
            }
        ],
        DataCaptureConfig={
            'EnableCapture': True,
            'InitialSamplingPercentage': 100,
            'DestinationS3Uri': 's3://ml-pipeline-model-artifacts-dev-ACCOUNT_ID/data-capture/',
            'CaptureOptions': [
                {'CaptureMode': 'Input'},
                {'CaptureMode': 'Output'}
            ]
        }
    )

    # Create or update endpoint
    try:
        sagemaker.create_endpoint(
            EndpointName=endpoint_name,
            EndpointConfigName=config_name
        )
        print(f" Created A/B test endpoint: {endpoint_name}")
    except sagemaker.exceptions.ResourceInUse:
        sagemaker.update_endpoint(
            EndpointName=endpoint_name,
            EndpointConfigName=config_name
        )
        print(f" Updated A/B test endpoint: {endpoint_name}")

    return endpoint_name

def promote_model_b(endpoint_name):
    """
    Promote Model B to 100% traffic after successful A/B test
    """
    sagemaker = boto3.client('sagemaker')

    # Update traffic distribution
    sagemaker.update_endpoint_weights_and_capacities(
        EndpointName=endpoint_name,
        DesiredWeightsAndCapacities=[
            {'VariantName': 'ModelA-Current', 'DesiredWeight': 0},
            {'VariantName': 'ModelB-New', 'DesiredWeight': 100}
        ]
    )

    print(f" Promoted Model B to 100% traffic on {endpoint_name}")

def rollback_to_model_a(endpoint_name):
    """
    Rollback to Model A if Model B performs poorly
    """
    sagemaker = boto3.client('sagemaker')

    sagemaker.update_endpoint_weights_and_capacities(
        EndpointName=endpoint_name,
        DesiredWeightsAndCapacities=[
            {'VariantName': 'ModelA-Current', 'DesiredWeight': 100},
            {'VariantName': 'ModelB-New', 'DesiredWeight': 0}
        ]
    )

    print(f" Rolled back to Model A on {endpoint_name}")

Step 5: Auto-Scaling Configuration

Create terraform/endpoint-autoscaling.tf:

# Application Auto Scaling Target
resource "aws_appautoscaling_target" "sagemaker_endpoint" {
  max_capacity       = 10
  min_capacity       = 1
  resource_id        = "endpoint/ml-endpoint-production/variant/AllTraffic"
  scalable_dimension = "sagemaker:variant:DesiredInstanceCount"
  service_namespace  = "sagemaker"
}

# Scale up policy
resource "aws_appautoscaling_policy" "scale_up" {
  name               = "${var.project_name}-scale-up"
  service_namespace  = aws_appautoscaling_target.sagemaker_endpoint.service_namespace
  resource_id        = aws_appautoscaling_target.sagemaker_endpoint.resource_id
  scalable_dimension = aws_appautoscaling_target.sagemaker_endpoint.scalable_dimension
  policy_type        = "TargetTrackingScaling"

  target_tracking_scaling_policy_configuration {
    predefined_metric_specification {
      predefined_metric_type = "SageMakerVariantInvocationsPerInstance"
    }

    target_value       = 1000.0  # Target 1000 invocations per instance
    scale_in_cooldown  = 300     # 5 minutes
    scale_out_cooldown = 60      # 1 minute
  }
}

# Scale down during off-hours
resource "aws_appautoscaling_scheduled_action" "scale_down_night" {
  name               = "${var.project_name}-scale-down-night"
  service_namespace  = aws_appautoscaling_target.sagemaker_endpoint.service_namespace
  resource_id        = aws_appautoscaling_target.sagemaker_endpoint.resource_id
  scalable_dimension = aws_appautoscaling_target.sagemaker_endpoint.scalable_dimension
  schedule           = "cron(0 22 * * ? *)"  # 10 PM UTC

  scalable_target_action {
    min_capacity = 1
    max_capacity = 3
  }
}

# Scale up during business hours
resource "aws_appautoscaling_scheduled_action" "scale_up_morning" {
  name               = "${var.project_name}-scale-up-morning"
  service_namespace  = aws_appautoscaling_target.sagemaker_endpoint.service_namespace
  resource_id        = aws_appautoscaling_target.sagemaker_endpoint.resource_id
  scalable_dimension = aws_appautoscaling_target.sagemaker_endpoint.scalable_dimension
  schedule           = "cron(0 6 * * ? *)"  # 6 AM UTC

  scalable_target_action {
    min_capacity = 2
    max_capacity = 10
  }
}

Step 6: Model Monitoring & Drift Detection

SageMaker Model Monitor

Create monitoring/model_monitor.py:

import boto3
from sagemaker import Model Monitor
from sagemaker.model_monitor import (
    DataCaptureConfig,
    CronExpressionGenerator,
    DefaultModelMonitor
)
import sagemaker

# Initialize
sagemaker_session = sagemaker.Session()
role = 'arn:aws:iam::ACCOUNT_ID:role/ml-pipeline-sagemaker-execution'

def create_baseline():
    """Create baseline for model monitoring"""

    my_monitor = DefaultModelMonitor(
        role=role,
        instance_count=1,
        instance_type='ml.m5.xlarge',
        volume_size_in_gb=20,
        max_runtime_in_seconds=3600,
        sagemaker_session=sagemaker_session
    )

    # Create baseline from training data
    my_monitor.suggest_baseline(
        baseline_dataset='s3://ml-pipeline-validated-data-dev-ACCOUNT_ID/validated/training.csv',
        dataset_format={'csv': {'header': True}},
        output_s3_uri='s3://ml-pipeline-model-artifacts-dev-ACCOUNT_ID/baseline',
        wait=True
    )

    print(" Baseline created successfully")
    return my_monitor

def create_monitoring_schedule(endpoint_name):
    """Create hourly monitoring schedule"""

    my_monitor = DefaultModelMonitor(
        role=role,
        instance_count=1,
        instance_type='ml.m5.xlarge',
        volume_size_in_gb=20,
        max_runtime_in_seconds=3600,
        sagemaker_session=sagemaker_session
    )

    my_monitor.create_monitoring_schedule(
        monitor_schedule_name=f'{endpoint_name}-monitor',
        endpoint_input=endpoint_name,
        output_s3_uri='s3://ml-pipeline-model-artifacts-dev-ACCOUNT_ID/monitoring-reports',
        statistics='s3://ml-pipeline-model-artifacts-dev-ACCOUNT_ID/baseline/statistics.json',
        constraints='s3://ml-pipeline-model-artifacts-dev-ACCOUNT_ID/baseline/constraints.json',
        schedule_cron_expression=CronExpressionGenerator.hourly(),
        enable_cloudwatch_metrics=True
    )

    print(f" Monitoring schedule created for {endpoint_name}")

def analyze_drift():
    """Analyze model drift from monitoring results"""
    s3 = boto3.client('s3')

    # Download latest monitoring report
    # Parse violations
    # Calculate drift metrics
    # Alert if drift exceeds threshold

    print(" Drift analysis complete")

Step 7: Observability Dashboard

Create terraform/production-monitoring.tf:

# CloudWatch Dashboard for Production
resource "aws_cloudwatch_dashboard" "production_ml" {
  dashboard_name = "${var.project_name}-production"

  dashboard_body = jsonencode({
    widgets = [
      # Endpoint Invocations
      {
        type   = "metric"
        x      = 0
        y      = 0
        width  = 12
        height = 6
        properties = {
          metrics = [
            ["AWS/SageMaker", "Invocations", {
              stat = "Sum"
              label = "Total Invocations"
            }],
            [".", "Invocation4XXErrors", {
              stat = "Sum"
              label = "4XX Errors"
            }],
            [".", "Invocation5XXErrors", {
              stat = "Sum"
              label = "5XX Errors"
            }]
          ]
          period = 300
          stat   = "Sum"
          region = var.aws_region
          title  = "Endpoint Invocations & Errors"
          yAxis = {
            left = {
              min = 0
            }
          }
        }
      },
      # Model Latency
      {
        type   = "metric"
        x      = 12
        y      = 0
        width  = 12
        height = 6
        properties = {
          metrics = [
            ["AWS/SageMaker", "ModelLatency", {
              stat = "Average"
              label = "Avg Latency"
            }],
            ["...", {
              stat = "p99"
              label = "P99 Latency"
            }]
          ]
          period = 300
          stat   = "Average"
          region = var.aws_region
          title  = "Model Latency"
          yAxis = {
            left = {
              label = "Milliseconds"
              min   = 0
            }
          }
        }
      },
      # Instance Metrics
      {
        type   = "metric"
        x      = 0
        y      = 6
        width  = 12
        height = 6
        properties = {
          metrics = [
            ["AWS/SageMaker", "CPUUtilization", {
              stat = "Average"
              label = "CPU Utilization"
            }],
            [".", "MemoryUtilization", {
              stat = "Average"
              label = "Memory Utilization"
            }]
          ]
          period = 300
          stat   = "Average"
          region = var.aws_region
          title  = "Instance Resource Utilization"
          yAxis = {
            left = {
              label = "Percent"
              min   = 0
              max   = 100
            }
          }
        }
      },
      # Model Quality Metrics
      {
        type   = "metric"
        x      = 12
        y      = 6
        width  = 12
        height = 6
        properties = {
          metrics = [
            ["MLPipeline/ModelQuality", "PredictionAccuracy", {
              stat = "Average"
              label = "Accuracy"
            }],
            [".", "DriftScore", {
              stat = "Average"
              label = "Drift Score"
            }]
          ]
          period = 3600
          stat   = "Average"
          region = var.aws_region
          title  = "Model Quality Metrics"
        }
      }
    ]
  })
}

# Critical Alerts
resource "aws_cloudwatch_metric_alarm" "high_error_rate" {
  alarm_name          = "${var.project_name}-high-error-rate"
  comparison_operator = "GreaterThanThreshold"
  evaluation_periods  = "2"
  metric_name         = "Invocation5XXErrors"
  namespace           = "AWS/SageMaker"
  period              = "300"
  statistic           = "Sum"
  threshold           = "10"
  alarm_description   = "Alert when error rate is high"
  alarm_actions       = [aws_sns_topic.validation_notifications.arn]
  treat_missing_data  = "notBreaching"

  dimensions = {
    EndpointName = "ml-endpoint-production"
    VariantName  = "AllTraffic"
  }
}

resource "aws_cloudwatch_metric_alarm" "high_latency" {
  alarm_name          = "${var.project_name}-high-latency"
  comparison_operator = "GreaterThanThreshold"
  evaluation_periods  = "3"
  metric_name         = "ModelLatency"
  namespace           = "AWS/SageMaker"
  period              = "300"
  statistic           = "Average"
  threshold           = "1000"  # 1 second
  alarm_description   = "Alert when latency is high"
  alarm_actions       = [aws_sns_topic.validation_notifications.arn]

  dimensions = {
    EndpointName = "ml-endpoint-production"
    VariantName  = "AllTraffic"
  }
}

resource "aws_cloudwatch_metric_alarm" "model_drift_detected" {
  alarm_name          = "${var.project_name}-model-drift"
  comparison_operator = "GreaterThanThreshold"
  evaluation_periods  = "1"
  metric_name         = "DriftScore"
  namespace           = "MLPipeline/ModelQuality"
  period              = "3600"
  statistic           = "Average"
  threshold           = "0.15"  # 15% drift threshold
  alarm_description   = "Alert when model drift is detected"
  alarm_actions       = [aws_sns_topic.validation_notifications.arn]
}

Step 8: Incident Response & Rollback

Automated Rollback Lambda

Create lambda/auto-rollback/handler.py:

import boto3
import json
import os

sagemaker = boto3.client('sagemaker')
sns = boto3.client('sns')

ERROR_THRESHOLD = 50  # Rollback if 50+ errors in 5 minutes
SNS_TOPIC_ARN = os.environ['SNS_TOPIC_ARN']

def lambda_handler(event, context):
    """
    Automatically rollback endpoint if error rate exceeds threshold
    """
    # Parse CloudWatch alarm
    message = json.loads(event['Records'][0]['Sns']['Message'])
    alarm_name = message['AlarmName']

    if 'high-error-rate' in alarm_name:
        print(f"High error rate detected: {alarm_name}")

        # Get current endpoint config
        endpoint_name = 'ml-endpoint-production'
        endpoint = sagemaker.describe_endpoint(EndpointName=endpoint_name)
        current_config = endpoint['EndpointConfigName']

        # List all configs, find previous one
        configs = sagemaker.list_endpoint_configs(
            SortBy='CreationTime',
            SortOrder='Descending',
            MaxResults=10
        )

        # Find previous config (skip current)
        previous_config = None
        for config in configs['EndpointConfigs']:
            if config['EndpointConfigName'] != current_config:
                previous_config = config['EndpointConfigName']
                break

        if previous_config:
            print(f"Rolling back to: {previous_config}")

            # Update endpoint to previous config
            sagemaker.update_endpoint(
                EndpointName=endpoint_name,
                EndpointConfigName=previous_config
            )

            # Send notification
            sns.publish(
                TopicArn=SNS_TOPIC_ARN,
                Subject='AUTOMATED ROLLBACK INITIATED',
                Message=f"""
Automatic rollback triggered due to high error rate.

Endpoint: {endpoint_name}
Previous Config: {current_config}
Rolled Back To: {previous_config}

Please investigate the issue.
                """
            )

            return {
                'statusCode': 200,
                'body': json.dumps({
                    'action': 'rollback',
                    'endpoint': endpoint_name,
                    'config': previous_config
                })
            }
        else:
            print("No previous config found for rollback")

            sns.publish(
                TopicArn=SNS_TOPIC_ARN,
                Subject='ROLLBACK FAILED - Manual Intervention Required',
                Message=f"""
Automatic rollback failed - no previous configuration found.

Endpoint: {endpoint_name}
Current Config: {current_config}

IMMEDIATE ACTION REQUIRED.
                """
            )

    return {
        'statusCode': 200,
        'body': json.dumps('Alarm processed')
    }

Manual Rollback Script

Create scripts/manual_rollback.sh:

#!/bin/bash
# Manual rollback script for emergencies

set -e

ENDPOINT_NAME="ml-endpoint-production"
REGION="ap-south-1"

echo "EMERGENCY ROLLBACK INITIATED"
echo "================================="

# Get current config
echo "Getting current endpoint configuration..."
CURRENT_CONFIG=$(aws sagemaker describe-endpoint \
  --endpoint-name $ENDPOINT_NAME \
  --region $REGION \
  --query 'EndpointConfigName' \
  --output text)

echo "Current config: $CURRENT_CONFIG"

# List recent configs
echo "Finding previous configuration..."
PREVIOUS_CONFIG=$(aws sagemaker list-endpoint-configs \
  --sort-by CreationTime \
  --sort-order Descending \
  --region $REGION \
  --query "EndpointConfigs[?EndpointConfigName!='$CURRENT_CONFIG'] | [0].EndpointConfigName" \
  --output text)

echo "Previous config: $PREVIOUS_CONFIG"

# Confirm rollback
read -p "Rollback to $PREVIOUS_CONFIG? (yes/no): " CONFIRM

if [ "$CONFIRM" == "yes" ]; then
  echo "Executing rollback..."

  aws sagemaker update-endpoint \
    --endpoint-name $ENDPOINT_NAME \
    --endpoint-config-name $PREVIOUS_CONFIG \
    --region $REGION

  echo " Rollback initiated"
  echo "Monitor status: aws sagemaker describe-endpoint --endpoint-name $ENDPOINT_NAME"
else
  echo "Rollback cancelled"
fi

Step 9: Cost Optimization for Production

Production Cost Analysis

Monthly Costs (Production):

Resource	Configuration	Cost/Month
SageMaker Endpoint	2x ml.m5.large 24/7	~$167
Auto-scaling (peak)	+2 instances @ 8hr/day	~$111
Model Monitor	Hourly checks	~$36
Data Capture	100% sampling, 1M requests	~$5
CloudWatch	Logs + Metrics + Alarms	~$20
S3 Storage	200GB models + data	~$4.60
CodePipeline	1 active pipeline	~$1
Base Total		~$345/month
With Traffic		$345-500/month

Cost Saving Tips

# 1. Use Serverless Inference for low traffic
# Instead of persistent endpoint, use serverless
# Costs: $0.20/1M requests + $0.067/GB-hour memory

# 2. Async Inference for batch predictions
# Process predictions asynchronously
# Lower infrastructure costs

# 3. Multi-model endpoints
# Host multiple models on same endpoint
# Share infrastructure costs

# 4. Reserved instances for predictable workloads
# Save up to 75% vs on-demand

# 5. Scheduled scaling
# Scale down during off-hours (nights/weekends)
# Save 40-60% on compute

Step 10: Complete Deployment Workflow

End-to-End Deployment Script

Create scripts/deploy_production.sh:

#!/bin/bash
# Complete production deployment workflow

set -e

PROJECT_NAME="ml-pipeline"
ENVIRONMENT="production"
REGION="ap-south-1"

echo " Starting Production Deployment"
echo "=================================="

# Step 1: Run tests
echo "Step 1/6: Running model tests..."
python tests/test_model.py
echo " Tests passed"

# Step 2: Build infrastructure
echo "Step 2/6: Deploying infrastructure..."
cd terraform
terraform apply -auto-approve \
  -var="environment=production" \
  -var="notification_email=your-email@example.com"
cd ..
echo " Infrastructure deployed"

# Step 3: Create baseline
echo "Step 3/6: Creating monitoring baseline..."
python monitoring/model_monitor.py
echo " Baseline created"

# Step 4: Deploy endpoint
echo "Step 4/6: Deploying SageMaker endpoint..."
python deployment/generate_endpoint_config.py
aws cloudformation create-stack \
  --stack-name ${PROJECT_NAME}-endpoint-${ENVIRONMENT} \
  --template-body file://endpoint-config.yaml \
  --capabilities CAPABILITY_IAM \
  --region $REGION
echo " Endpoint deployment initiated"

# Step 5: Wait for endpoint
echo "Step 5/6: Waiting for endpoint to be InService..."
aws sagemaker wait endpoint-in-service \
  --endpoint-name ml-endpoint-${ENVIRONMENT} \
  --region $REGION
echo " Endpoint is InService"

# Step 6: Enable monitoring
echo "Step 6/6: Enabling monitoring schedule..."
python monitoring/model_monitor.py --enable-schedule
echo " Monitoring enabled"

echo ""
echo "=================================="
echo " Production Deployment Complete!"
echo "=================================="
echo ""
echo "Endpoint: ml-endpoint-${ENVIRONMENT}"
echo "Dashboard: https://console.aws.amazon.com/cloudwatch/home?region=${REGION}#dashboards:name=${PROJECT_NAME}-production"
echo "Monitor: https://console.aws.amazon.com/sagemaker/home?region=${REGION}#/monitoring-schedules"

Testing the Complete Pipeline

1. Test CI/CD Pipeline

# Trigger pipeline by updating model in S3
aws s3 cp model.tar.gz \
  s3://ml-pipeline-model-artifacts-dev-ACCOUNT_ID/latest/model.tar.gz

# Monitor pipeline
aws codepipeline get-pipeline-state \
  --name ml-pipeline-ml-pipeline

2. Test Endpoint

# Invoke endpoint
aws sagemaker-runtime invoke-endpoint \
  --endpoint-name ml-endpoint-production \
  --content-type text/csv \
  --body '1.5,2.3,1.8,2.1' \
  output.json

cat output.json

3. Test A/B Deployment

# Deploy with A/B testing
from deployment.ab_testing import create_ab_test_endpoint

create_ab_test_endpoint(
    endpoint_name='ml-endpoint-production',
    model_a_arn='arn:aws:sagemaker:...:model-package/current',
    model_b_arn='arn:aws:sagemaker:...:model-package/new',
    traffic_split_percentage=10
)

4. Test Auto-Rollback

# Simulate high error rate
# Auto-rollback lambda will trigger

# Or manual rollback
./scripts/manual_rollback.sh

Production Readiness Checklist

Infrastructure

All resources deployed via Terraform
VPC and security groups configured
IAM roles follow least privilege
Encryption enabled everywhere

CI/CD

CodePipeline operational
Automated tests passing
Manual approval gate configured
Rollback procedure tested

Monitoring

CloudWatch dashboard created
Critical alarms configured
Model monitoring enabled
Log retention policies set

Deployment

Staging environment tested
A/B testing capability ready
Auto-scaling configured
Rollback tested

Operations

Runbooks documented
On-call rotation established
Incident response plan ready
Disaster recovery tested

What You've Built - The Complete Platform

Congratulations! You now have a production-ready ML platform with:

Phase 1: Data (Part 2)

Encrypted S3 data lake
Automated validation
Complete audit trail

Phase 2: Training (Part 3)

Scalable SageMaker training
Experiment tracking with MLflow
Cost-optimized compute
Model registry

Phase 3: Deployment (Part 4)

CI/CD pipeline
Production endpoints
A/B testing
Auto-scaling
Comprehensive monitoring
Automated rollback

Real-World Production Tips

1. Start Simple, Scale Gradually

Week 1: Deploy to staging
Week 2: Deploy to production with 10% traffic
Week 3: Increase to 50% if metrics good
Week 4: Full production rollout

2. Monitor Everything

Endpoint health
Model performance
Data drift
Business metrics
Costs

3. Incident Response

1. Alert triggers
2. Check dashboard
3. Assess severity
4. Rollback if critical
5. Investigate root cause
6. Document learnings

4. Continuous Improvement

Weekly model retraining
Monthly architecture review
Quarterly cost optimization
Regular security audits

Common Issues & Solutions

Issue: High Latency

Solutions:

Add more instances (auto-scaling)
Optimize model (quantization, pruning)
Use faster instance types
Enable batch prediction
Add caching layer

Issue: High Costs

Solutions:

Use Serverless Inference for low traffic
Implement scheduled scaling
Archive old model versions
Optimize S3 lifecycle policies
Use Spot instances where possible

Issue: Model Drift

Solutions:

Enable SageMaker Model Monitor
Set up automated retraining
Implement data quality checks
Regular model evaluation
A/B test new models

Issue: Failed Deployments

Solutions:

Comprehensive testing in CI/CD
Blue/green deployment strategy
Canary releases with gradual rollout
Quick rollback procedures
Post-deployment validation

Series Wrap-Up

What We've Covered

Part 1: AIDLC Framework & Architecture

Security-first ML design
AWS service selection
Best practices

Part 2: Data Pipelines

S3 + Lambda automation
Data validation
Audit trails

Part 3: Training at Scale

SageMaker + MLflow
Spot instances
Hyperparameter tuning

Part 4: Production Deployment

CI/CD pipelines
A/B testing
Monitoring & rollback

Key Takeaways

Automate everything - CI/CD is non-negotiable
Monitor proactively - Don't wait for users to report issues
Deploy safely - A/B testing and gradual rollouts
Plan for failure - Rollback procedures must work
Optimize costs - Production can get expensive
Document thoroughly - Future you will thank present you

Remember: Production is where ML creates real value. Build systems that are reliable, observable, and maintainable.

Final Thoughts

Building production ML systems is complex, but with the right architecture and tools, it's absolutely achievable. You now have a complete blueprint for:

Secure data handling
Scalable training
Safe deployment
Comprehensive monitoring
Cost optimization

This is just the beginning. Take these foundations and build amazing ML products!

Thank You!

Thank you for following along with this 4-part series! I hope you found it valuable.

Series Recap:

Part 1: AIDLC Framework
Part 2: Data Pipelines
Part 3: Training at Scale
Part 4: Production Deployment (You are here!)

What's your biggest ML deployment challenge?

Drop a comment below - I'd love to hear about your experiences and help if I can!

Building something cool with this architecture?

I'd love to see what you create! Tag me on social media or drop a comment.

Let's Stay Connected!

Follow me for more AWS and ML content
Like if this series helped you
Share with your team/connects

Have questions? Reach out anytime!

About the Author

Shoaibali Mir

I'm an engineer with 4+ yrs of experience spanning across DevOps, Data, Cloud and AI/ML Engineering Domain. Along with full time work, I'm pursuing Masters Degree in AI/ML from BITS Pilani.

Connect with me:

Tags: #aws #machinelearning #mlops #sagemaker #cicd #devops #production #terraform #cloudwatch

Top comments (1)

Shoaibali Mir • Jan 2

Update: I published a strategic deep-dive on Medium explaining the why
behind every architecture decision in this series.

If you're wondering "why security-first instead of iterate-fast?" or
"why Spot instances despite interruptions?" — that post breaks down
the tradeoffs tutorials don't cover.

Link: shoaibalimir.medium.com/building-p...

Welcome to the Series Finale!

The ML Deployment Problem

Architecture Overview

Step 1: CI/CD Pipeline Setup

Pipeline Architecture

CodePipeline Configuration

Step 2: Model Testing Framework

BuildSpec Configuration

Model Test Suite

Step 3: SageMaker Endpoint Deployment

Endpoint Configuration Generator

Step 4: A/B Testing Infrastructure

Traffic Splitting Configuration

Step 5: Auto-Scaling Configuration

Step 6: Model Monitoring & Drift Detection

SageMaker Model Monitor

Step 7: Observability Dashboard

Step 8: Incident Response & Rollback

Automated Rollback Lambda

Manual Rollback Script

Step 9: Cost Optimization for Production

Production Cost Analysis

Cost Saving Tips

Step 10: Complete Deployment Workflow

End-to-End Deployment Script

Testing the Complete Pipeline

1. Test CI/CD Pipeline

2. Test Endpoint

3. Test A/B Deployment

4. Test Auto-Rollback

Production Readiness Checklist

What You've Built - The Complete Platform

Phase 1: Data (Part 2)

Phase 2: Training (Part 3)

Phase 3: Deployment (Part 4)

Real-World Production Tips

1. Start Simple, Scale Gradually

2. Monitor Everything

3. Incident Response

4. Continuous Improvement

Common Issues & Solutions

Issue: High Latency

Issue: High Costs

Issue: Model Drift

Issue: Failed Deployments

Series Wrap-Up

What We've Covered

Recommended Reading:

Key Takeaways

Final Thoughts

Thank You!

What's your biggest ML deployment challenge?

Building something cool with this architecture?

Let's Stay Connected!

About the Author

Shoaibali MirFollow

Shoaibali Mir