DEV Community

Cover image for Production ML on AWS: CI-CD, Deployment and Monitoring at Scale
Shoaibali Mir
Shoaibali Mir

Posted on

Production ML on AWS: CI-CD, Deployment and Monitoring at Scale

Reading time: ~15-20 minutes

Level: Intermediate to Advanced

Prerequisites: Parts 1-3, Understanding of CI/CD concepts

Series: Part 4 of 4 (Series Finale) - Part 1 | Part 2 | Part 3


Welcome to the Series Finale!

We've come a long way! In Part 1, we covered the AIDLC framework. In Part 2, we built secure data pipelines. In Part 3, we trained models at scale with SageMaker. Now it's time to bring it all together and deploy to production.

What you'll build today:

  • CI/CD pipeline for ML models with CodePipeline
  • Production SageMaker endpoints with auto-scaling
  • A/B testing infrastructure for safe rollouts
  • Model drift detection and monitoring
  • Complete observability stack
  • Incident response and rollback procedures

By the end: You'll have a complete, production-ready ML platform on AWS that can safely deploy, monitor, and maintain models at scale.


The ML Deployment Problem

Training a great model is only half the battle. Production deployment brings new challenges:

Manual deployments - Error-prone, doesn't scale

No testing - Bad models reach production

Deployment downtime - Service interruptions

No rollback plan - Can't undo bad deployments

Performance degradation - Models drift over time

No observability - Can't debug production issues

The solution:

A complete CI/CD pipeline with automated testing, safe deployments, and comprehensive monitoring.


Architecture Overview

Here's the complete production ML platform:

Production ML Architecture

Production ML Pipeline - 4 Phases:

Phase 1: Build

  • Code Push -> CodePipeline -> CodeBuild Testing -> Model Registry
  • Automated quality gates and model validation

Phase 2: Deploy

  • Staging deployment for testing
  • Manual approval gate with metrics review
  • Production deployment with A/B testing (90/10 traffic split)

Phase 3: Monitor

  • CloudWatch metrics and dashboards
  • Model Monitor for drift detection
  • SNS alerts for anomalies

Phase 4: Respond

  • Auto-Rollback on critical errors
  • Manual rollback procedures
  • Incident response workflows

Step 1: CI/CD Pipeline Setup

Pipeline Architecture

Code Push -> Build -> Test -> Deploy to Staging -> Approval -> Deploy to Prod -> Monitor
Enter fullscreen mode Exit fullscreen mode

CodePipeline Configuration

Create terraform/cicd.tf:

# S3 bucket for pipeline artifacts
resource "aws_s3_bucket" "pipeline_artifacts" {
  bucket = "${var.project_name}-pipeline-artifacts-${data.aws_caller_identity.current.account_id}"

  tags = {
    Name        = "Pipeline Artifacts"
    Environment = var.environment
  }
}

resource "aws_s3_bucket_versioning" "pipeline_artifacts" {
  bucket = aws_s3_bucket.pipeline_artifacts.id

  versioning_configuration {
    status = "Enabled"
  }
}

resource "aws_s3_bucket_server_side_encryption_configuration" "pipeline_artifacts" {
  bucket = aws_s3_bucket.pipeline_artifacts.id

  rule {
    apply_server_side_encryption_by_default {
      sse_algorithm     = "aws:kms"
      kms_master_key_id = aws_kms_key.data_encryption.arn
    }
  }
}

# CodeBuild project for model testing
resource "aws_codebuild_project" "model_tests" {
  name          = "${var.project_name}-model-tests"
  service_role  = aws_iam_role.codebuild.arn
  build_timeout = 30

  artifacts {
    type = "CODEPIPELINE"
  }

  environment {
    compute_type                = "BUILD_GENERAL1_MEDIUM"
    image                      = "aws/codebuild/standard:7.0"
    type                       = "LINUX_CONTAINER"
    privileged_mode            = true
    image_pull_credentials_type = "CODEBUILD"

    environment_variable {
      name  = "MODEL_PACKAGE_GROUP_NAME"
      value = aws_sagemaker_model_package_group.ml_models.model_package_group_name
    }

    environment_variable {
      name  = "AWS_ACCOUNT_ID"
      value = data.aws_caller_identity.current.account_id
    }

    environment_variable {
      name  = "AWS_REGION"
      value = var.aws_region
    }
  }

  source {
    type      = "CODEPIPELINE"
    buildspec = "buildspec.yml"
  }

  tags = {
    Name        = "Model Testing"
    Environment = var.environment
  }
}

# IAM role for CodeBuild
resource "aws_iam_role" "codebuild" {
  name = "${var.project_name}-codebuild-role"

  assume_role_policy = jsonencode({
    Version = "2012-10-17"
    Statement = [{
      Action = "sts:AssumeRole"
      Effect = "Allow"
      Principal = {
        Service = "codebuild.amazonaws.com"
      }
    }]
  })
}

resource "aws_iam_role_policy" "codebuild" {
  name = "${var.project_name}-codebuild-policy"
  role = aws_iam_role.codebuild.id

  policy = jsonencode({
    Version = "2012-10-17"
    Statement = [
      {
        Effect = "Allow"
        Action = [
          "logs:CreateLogGroup",
          "logs:CreateLogStream",
          "logs:PutLogEvents"
        ]
        Resource = "arn:aws:logs:*:*:*"
      },
      {
        Effect = "Allow"
        Action = [
          "s3:GetObject",
          "s3:PutObject"
        ]
        Resource = [
          "${aws_s3_bucket.pipeline_artifacts.arn}/*",
          "${aws_s3_bucket.model_artifacts.arn}/*"
        ]
      },
      {
        Effect = "Allow"
        Action = [
          "sagemaker:DescribeModelPackage",
          "sagemaker:ListModelPackages"
        ]
        Resource = "*"
      },
      {
        Effect = "Allow"
        Action = [
          "kms:Decrypt",
          "kms:GenerateDataKey"
        ]
        Resource = aws_kms_key.data_encryption.arn
      }
    ]
  })
}

# CodePipeline
resource "aws_codepipeline" "ml_pipeline" {
  name     = "${var.project_name}-ml-pipeline"
  role_arn = aws_iam_role.codepipeline.arn

  artifact_store {
    location = aws_s3_bucket.pipeline_artifacts.bucket
    type     = "S3"

    encryption_key {
      id   = aws_kms_key.data_encryption.arn
      type = "KMS"
    }
  }

  stage {
    name = "Source"

    action {
      name             = "Source"
      category         = "Source"
      owner            = "AWS"
      provider         = "S3"
      version          = "1"
      output_artifacts = ["source_output"]

      configuration = {
        S3Bucket             = aws_s3_bucket.model_artifacts.id
        S3ObjectKey          = "latest/model.tar.gz"
        PollForSourceChanges = true
      }
    }
  }

  stage {
    name = "Test"

    action {
      name             = "ModelValidation"
      category         = "Build"
      owner            = "AWS"
      provider         = "CodeBuild"
      version          = "1"
      input_artifacts  = ["source_output"]
      output_artifacts = ["test_output"]

      configuration = {
        ProjectName = aws_codebuild_project.model_tests.name
      }
    }
  }

  stage {
    name = "DeployToStaging"

    action {
      name            = "CreateStagingEndpoint"
      category        = "Deploy"
      owner           = "AWS"
      provider        = "CloudFormation"
      version         = "1"
      input_artifacts = ["test_output"]

      configuration = {
        ActionMode    = "CREATE_UPDATE"
        StackName     = "${var.project_name}-staging-endpoint"
        TemplatePath  = "test_output::endpoint-config.yaml"
        Capabilities  = "CAPABILITY_IAM"
        RoleArn       = aws_iam_role.cloudformation.arn
      }
    }
  }

  stage {
    name = "Approval"

    action {
      name     = "ManualApproval"
      category = "Approval"
      owner    = "AWS"
      provider = "Manual"
      version  = "1"

      configuration = {
        NotificationArn = aws_sns_topic.validation_notifications.arn
        CustomData      = "Please review staging endpoint metrics before approving production deployment"
      }
    }
  }

  stage {
    name = "DeployToProduction"

    action {
      name            = "CreateProductionEndpoint"
      category        = "Deploy"
      owner           = "AWS"
      provider        = "CloudFormation"
      version         = "1"
      input_artifacts = ["test_output"]

      configuration = {
        ActionMode    = "CREATE_UPDATE"
        StackName     = "${var.project_name}-production-endpoint"
        TemplatePath  = "test_output::endpoint-config.yaml"
        Capabilities  = "CAPABILITY_IAM"
        RoleArn       = aws_iam_role.cloudformation.arn
        ParameterOverrides = jsonencode({
          Environment = "production"
        })
      }
    }
  }

  tags = {
    Name        = "ML Pipeline"
    Environment = var.environment
  }
}

# IAM role for CodePipeline
resource "aws_iam_role" "codepipeline" {
  name = "${var.project_name}-codepipeline-role"

  assume_role_policy = jsonencode({
    Version = "2012-10-17"
    Statement = [{
      Action = "sts:AssumeRole"
      Effect = "Allow"
      Principal = {
        Service = "codepipeline.amazonaws.com"
      }
    }]
  })
}

resource "aws_iam_role_policy" "codepipeline" {
  name = "${var.project_name}-codepipeline-policy"
  role = aws_iam_role.codepipeline.id

  policy = jsonencode({
    Version = "2012-10-17"
    Statement = [
      {
        Effect = "Allow"
        Action = [
          "s3:GetObject",
          "s3:PutObject",
          "s3:GetBucketLocation",
          "s3:ListBucket"
        ]
        Resource = [
          "${aws_s3_bucket.pipeline_artifacts.arn}",
          "${aws_s3_bucket.pipeline_artifacts.arn}/*",
          "${aws_s3_bucket.model_artifacts.arn}",
          "${aws_s3_bucket.model_artifacts.arn}/*"
        ]
      },
      {
        Effect = "Allow"
        Action = [
          "codebuild:StartBuild",
          "codebuild:BatchGetBuilds"
        ]
        Resource = aws_codebuild_project.model_tests.arn
      },
      {
        Effect = "Allow"
        Action = [
          "cloudformation:CreateStack",
          "cloudformation:UpdateStack",
          "cloudformation:DescribeStacks"
        ]
        Resource = "*"
      },
      {
        Effect = "Allow"
        Action = [
          "sns:Publish"
        ]
        Resource = aws_sns_topic.validation_notifications.arn
      },
      {
        Effect = "Allow"
        Action = [
          "iam:PassRole"
        ]
        Resource = aws_iam_role.cloudformation.arn
      },
      {
        Effect = "Allow"
        Action = [
          "kms:Decrypt",
          "kms:GenerateDataKey"
        ]
        Resource = aws_kms_key.data_encryption.arn
      }
    ]
  })
}

# IAM role for CloudFormation
resource "aws_iam_role" "cloudformation" {
  name = "${var.project_name}-cloudformation-role"

  assume_role_policy = jsonencode({
    Version = "2012-10-17"
    Statement = [{
      Action = "sts:AssumeRole"
      Effect = "Allow"
      Principal = {
        Service = "cloudformation.amazonaws.com"
      }
    }]
  })
}

resource "aws_iam_role_policy" "cloudformation" {
  name = "${var.project_name}-cloudformation-policy"
  role = aws_iam_role.cloudformation.id

  policy = jsonencode({
    Version = "2012-10-17"
    Statement = [
      {
        Effect = "Allow"
        Action = [
          "sagemaker:*"
        ]
        Resource = "*"
      },
      {
        Effect = "Allow"
        Action = [
          "iam:PassRole"
        ]
        Resource = aws_iam_role.sagemaker_execution.arn
      }
    ]
  })
}
Enter fullscreen mode Exit fullscreen mode

Step 2: Model Testing Framework

BuildSpec Configuration

Create buildspec.yml in your repository root:

version: 0.2

phases:
  pre_build:
    commands:
      - echo "Installing dependencies..."
      - pip install --upgrade pip
      - pip install boto3 scikit-learn pandas numpy joblib pytest

  build:
    commands:
      - echo "Running model validation tests..."
      - python tests/test_model.py
      - echo "Generating endpoint configuration..."
      - python deployment/generate_endpoint_config.py

  post_build:
    commands:
      - echo "Build completed on `date`"

artifacts:
  files:
    - endpoint-config.yaml
    - test-results.json
  name: TestOutput

reports:
  ModelTestReport:
    files:
      - 'test-results.json'
    file-format: 'JUNITXML'
Enter fullscreen mode Exit fullscreen mode

Model Test Suite

Create tests/test_model.py:

import pytest
import boto3
import json
import joblib
import numpy as np
from io import BytesIO

# Test configuration
MODEL_PACKAGE_GROUP_NAME = "ml-pipeline-models"
MINIMUM_ACCURACY = 0.75
MINIMUM_F1_SCORE = 0.70

def get_latest_model():
    """Retrieve latest approved model from registry"""
    sagemaker = boto3.client('sagemaker')

    response = sagemaker.list_model_packages(
        ModelPackageGroupName=MODEL_PACKAGE_GROUP_NAME,
        ModelApprovalStatus='Approved',
        SortBy='CreationTime',
        SortOrder='Descending',
        MaxResults=1
    )

    if not response['ModelPackageSummaryList']:
        raise ValueError("No approved models found")

    return response['ModelPackageSummaryList'][0]

def test_model_accuracy():
    """Test model meets minimum accuracy threshold"""
    model_info = get_latest_model()

    # Extract metrics from model metadata
    metadata = model_info.get('CustomerMetadataProperties', {})
    accuracy = float(metadata.get('accuracy', 0))

    assert accuracy >= MINIMUM_ACCURACY, \
        f"Model accuracy {accuracy} below threshold {MINIMUM_ACCURACY}"

    print(f" Model accuracy: {accuracy:.4f}")

def test_model_f1_score():
    """Test model meets minimum F1 score threshold"""
    model_info = get_latest_model()

    metadata = model_info.get('CustomerMetadataProperties', {})
    f1_score = float(metadata.get('f1_score', 0))

    assert f1_score >= MINIMUM_F1_SCORE, \
        f"Model F1 score {f1_score} below threshold {MINIMUM_F1_SCORE}"

    print(f" Model F1 score: {f1_score:.4f}")

def test_model_prediction_format():
    """Test model produces expected output format"""
    # This would test actual model predictions
    # Simplified version here

    test_input = np.array([[1.5, 2.3, 1.8, 2.1]])

    # In real implementation, load model and test
    # model = load_model()
    # prediction = model.predict(test_input)

    # assert prediction.shape == (1,)
    # assert prediction.dtype == np.int64

    print(" Model prediction format valid")

def test_model_performance_benchmarks():
    """Test model meets performance requirements"""
    # Test inference latency
    # Test batch processing capability
    # Test memory requirements

    print(" Model performance benchmarks passed")

def test_model_bias_fairness():
    """Test model for bias and fairness"""
    # Simplified bias detection
    # In production, use AWS SageMaker Clarify

    print(" Model bias checks passed")

if __name__ == '__main__':
    pytest.main([__file__, '-v', '--junit-xml=test-results.xml'])
Enter fullscreen mode Exit fullscreen mode

Step 3: SageMaker Endpoint Deployment

Endpoint Configuration Generator

Create deployment/generate_endpoint_config.py:

import boto3
import yaml
import os

def generate_endpoint_config():
    """Generate CloudFormation template for SageMaker endpoint"""

    sagemaker = boto3.client('sagemaker')
    account_id = boto3.client('sts').get_caller_identity()['Account']
    region = os.environ.get('AWS_REGION', 'ap-south-1')

    # Get latest approved model
    response = sagemaker.list_model_packages(
        ModelPackageGroupName=os.environ['MODEL_PACKAGE_GROUP_NAME'],
        ModelApprovalStatus='Approved',
        SortBy='CreationTime',
        SortOrder='Descending',
        MaxResults=1
    )

    model_package_arn = response['ModelPackageSummaryList'][0]['ModelPackageArn']

    template = {
        'AWSTemplateFormatVersion': '2010-09-09',
        'Description': 'SageMaker Endpoint for ML Model',
        'Parameters': {
            'Environment': {
                'Type': 'String',
                'Default': 'staging',
                'AllowedValues': ['staging', 'production']
            }
        },
        'Resources': {
            'Model': {
                'Type': 'AWS::SageMaker::Model',
                'Properties': {
                    'ModelName': {
                        'Fn::Sub': 'ml-model-${Environment}-${AWS::StackName}'
                    },
                    'PrimaryContainer': {
                        'ModelPackageName': model_package_arn
                    },
                    'ExecutionRoleArn': f'arn:aws:iam::{account_id}:role/ml-pipeline-sagemaker-execution'
                }
            },
            'EndpointConfig': {
                'Type': 'AWS::SageMaker::EndpointConfig',
                'Properties': {
                    'EndpointConfigName': {
                        'Fn::Sub': 'ml-endpoint-config-${Environment}-${AWS::StackName}'
                    },
                    'ProductionVariants': [{
                        'VariantName': 'AllTraffic',
                        'ModelName': {'Ref': 'Model'},
                        'InitialInstanceCount': 1,
                        'InstanceType': 'ml.m5.large',
                        'InitialVariantWeight': 1.0
                    }],
                    'DataCaptureConfig': {
                        'EnableCapture': True,
                        'InitialSamplingPercentage': 100,
                        'DestinationS3Uri': f's3://ml-pipeline-model-artifacts-dev-{account_id}/data-capture/',
                        'CaptureOptions': [
                            {'CaptureMode': 'Input'},
                            {'CaptureMode': 'Output'}
                        ]
                    }
                }
            },
            'Endpoint': {
                'Type': 'AWS::SageMaker::Endpoint',
                'Properties': {
                    'EndpointName': {
                        'Fn::Sub': 'ml-endpoint-${Environment}'
                    },
                    'EndpointConfigName': {'Ref': 'EndpointConfig'},
                    'Tags': [
                        {'Key': 'Environment', 'Value': {'Ref': 'Environment'}},
                        {'Key': 'ManagedBy', 'Value': 'CloudFormation'}
                    ]
                }
            }
        },
        'Outputs': {
            'EndpointName': {
                'Value': {'Ref': 'Endpoint'},
                'Description': 'Name of the SageMaker endpoint'
            },
            'EndpointArn': {
                'Value': {'Fn::GetAtt': ['Endpoint', 'EndpointName']},
                'Description': 'ARN of the SageMaker endpoint'
            }
        }
    }

    # Write to file
    with open('endpoint-config.yaml', 'w') as f:
        yaml.dump(template, f, default_flow_style=False)

    print(" Endpoint configuration generated")

if __name__ == '__main__':
    generate_endpoint_config()
Enter fullscreen mode Exit fullscreen mode

Step 4: A/B Testing Infrastructure

Traffic Splitting Configuration

Create deployment/ab_testing.py:

import boto3
from datetime import datetime

def create_ab_test_endpoint(
    endpoint_name,
    model_a_arn,
    model_b_arn,
    traffic_split_percentage=10
):
    """
    Create endpoint with A/B testing

    Args:
        endpoint_name: Name for the endpoint
        model_a_arn: ARN of current production model (90% traffic)
        model_b_arn: ARN of new model to test (10% traffic)
        traffic_split_percentage: Percentage of traffic to new model
    """
    sagemaker = boto3.client('sagemaker')

    # Create model A (current production)
    model_a_name = f"{endpoint_name}-model-a-{datetime.now().strftime('%Y%m%d%H%M%S')}"
    sagemaker.create_model(
        ModelName=model_a_name,
        PrimaryContainer={'ModelPackageName': model_a_arn},
        ExecutionRoleArn='arn:aws:iam::ACCOUNT_ID:role/ml-pipeline-sagemaker-execution'
    )

    # Create model B (new model)
    model_b_name = f"{endpoint_name}-model-b-{datetime.now().strftime('%Y%m%d%H%M%S')}"
    sagemaker.create_model(
        ModelName=model_b_name,
        PrimaryContainer={'ModelPackageName': model_b_arn},
        ExecutionRoleArn='arn:aws:iam::ACCOUNT_ID:role/ml-pipeline-sagemaker-execution'
    )

    # Create endpoint config with traffic split
    config_name = f"{endpoint_name}-ab-config-{datetime.now().strftime('%Y%m%d%H%M%S')}"
    sagemaker.create_endpoint_config(
        EndpointConfigName=config_name,
        ProductionVariants=[
            {
                'VariantName': 'ModelA-Current',
                'ModelName': model_a_name,
                'InitialInstanceCount': 2,
                'InstanceType': 'ml.m5.large',
                'InitialVariantWeight': 100 - traffic_split_percentage
            },
            {
                'VariantName': 'ModelB-New',
                'ModelName': model_b_name,
                'InitialInstanceCount': 1,
                'InstanceType': 'ml.m5.large',
                'InitialVariantWeight': traffic_split_percentage
            }
        ],
        DataCaptureConfig={
            'EnableCapture': True,
            'InitialSamplingPercentage': 100,
            'DestinationS3Uri': 's3://ml-pipeline-model-artifacts-dev-ACCOUNT_ID/data-capture/',
            'CaptureOptions': [
                {'CaptureMode': 'Input'},
                {'CaptureMode': 'Output'}
            ]
        }
    )

    # Create or update endpoint
    try:
        sagemaker.create_endpoint(
            EndpointName=endpoint_name,
            EndpointConfigName=config_name
        )
        print(f" Created A/B test endpoint: {endpoint_name}")
    except sagemaker.exceptions.ResourceInUse:
        sagemaker.update_endpoint(
            EndpointName=endpoint_name,
            EndpointConfigName=config_name
        )
        print(f" Updated A/B test endpoint: {endpoint_name}")

    return endpoint_name

def promote_model_b(endpoint_name):
    """
    Promote Model B to 100% traffic after successful A/B test
    """
    sagemaker = boto3.client('sagemaker')

    # Update traffic distribution
    sagemaker.update_endpoint_weights_and_capacities(
        EndpointName=endpoint_name,
        DesiredWeightsAndCapacities=[
            {'VariantName': 'ModelA-Current', 'DesiredWeight': 0},
            {'VariantName': 'ModelB-New', 'DesiredWeight': 100}
        ]
    )

    print(f" Promoted Model B to 100% traffic on {endpoint_name}")

def rollback_to_model_a(endpoint_name):
    """
    Rollback to Model A if Model B performs poorly
    """
    sagemaker = boto3.client('sagemaker')

    sagemaker.update_endpoint_weights_and_capacities(
        EndpointName=endpoint_name,
        DesiredWeightsAndCapacities=[
            {'VariantName': 'ModelA-Current', 'DesiredWeight': 100},
            {'VariantName': 'ModelB-New', 'DesiredWeight': 0}
        ]
    )

    print(f" Rolled back to Model A on {endpoint_name}")
Enter fullscreen mode Exit fullscreen mode

Step 5: Auto-Scaling Configuration

Create terraform/endpoint-autoscaling.tf:

# Application Auto Scaling Target
resource "aws_appautoscaling_target" "sagemaker_endpoint" {
  max_capacity       = 10
  min_capacity       = 1
  resource_id        = "endpoint/ml-endpoint-production/variant/AllTraffic"
  scalable_dimension = "sagemaker:variant:DesiredInstanceCount"
  service_namespace  = "sagemaker"
}

# Scale up policy
resource "aws_appautoscaling_policy" "scale_up" {
  name               = "${var.project_name}-scale-up"
  service_namespace  = aws_appautoscaling_target.sagemaker_endpoint.service_namespace
  resource_id        = aws_appautoscaling_target.sagemaker_endpoint.resource_id
  scalable_dimension = aws_appautoscaling_target.sagemaker_endpoint.scalable_dimension
  policy_type        = "TargetTrackingScaling"

  target_tracking_scaling_policy_configuration {
    predefined_metric_specification {
      predefined_metric_type = "SageMakerVariantInvocationsPerInstance"
    }

    target_value       = 1000.0  # Target 1000 invocations per instance
    scale_in_cooldown  = 300     # 5 minutes
    scale_out_cooldown = 60      # 1 minute
  }
}

# Scale down during off-hours
resource "aws_appautoscaling_scheduled_action" "scale_down_night" {
  name               = "${var.project_name}-scale-down-night"
  service_namespace  = aws_appautoscaling_target.sagemaker_endpoint.service_namespace
  resource_id        = aws_appautoscaling_target.sagemaker_endpoint.resource_id
  scalable_dimension = aws_appautoscaling_target.sagemaker_endpoint.scalable_dimension
  schedule           = "cron(0 22 * * ? *)"  # 10 PM UTC

  scalable_target_action {
    min_capacity = 1
    max_capacity = 3
  }
}

# Scale up during business hours
resource "aws_appautoscaling_scheduled_action" "scale_up_morning" {
  name               = "${var.project_name}-scale-up-morning"
  service_namespace  = aws_appautoscaling_target.sagemaker_endpoint.service_namespace
  resource_id        = aws_appautoscaling_target.sagemaker_endpoint.resource_id
  scalable_dimension = aws_appautoscaling_target.sagemaker_endpoint.scalable_dimension
  schedule           = "cron(0 6 * * ? *)"  # 6 AM UTC

  scalable_target_action {
    min_capacity = 2
    max_capacity = 10
  }
}
Enter fullscreen mode Exit fullscreen mode

Step 6: Model Monitoring & Drift Detection

SageMaker Model Monitor

Create monitoring/model_monitor.py:

import boto3
from sagemaker import Model Monitor
from sagemaker.model_monitor import (
    DataCaptureConfig,
    CronExpressionGenerator,
    DefaultModelMonitor
)
import sagemaker

# Initialize
sagemaker_session = sagemaker.Session()
role = 'arn:aws:iam::ACCOUNT_ID:role/ml-pipeline-sagemaker-execution'

def create_baseline():
    """Create baseline for model monitoring"""

    my_monitor = DefaultModelMonitor(
        role=role,
        instance_count=1,
        instance_type='ml.m5.xlarge',
        volume_size_in_gb=20,
        max_runtime_in_seconds=3600,
        sagemaker_session=sagemaker_session
    )

    # Create baseline from training data
    my_monitor.suggest_baseline(
        baseline_dataset='s3://ml-pipeline-validated-data-dev-ACCOUNT_ID/validated/training.csv',
        dataset_format={'csv': {'header': True}},
        output_s3_uri='s3://ml-pipeline-model-artifacts-dev-ACCOUNT_ID/baseline',
        wait=True
    )

    print(" Baseline created successfully")
    return my_monitor

def create_monitoring_schedule(endpoint_name):
    """Create hourly monitoring schedule"""

    my_monitor = DefaultModelMonitor(
        role=role,
        instance_count=1,
        instance_type='ml.m5.xlarge',
        volume_size_in_gb=20,
        max_runtime_in_seconds=3600,
        sagemaker_session=sagemaker_session
    )

    my_monitor.create_monitoring_schedule(
        monitor_schedule_name=f'{endpoint_name}-monitor',
        endpoint_input=endpoint_name,
        output_s3_uri='s3://ml-pipeline-model-artifacts-dev-ACCOUNT_ID/monitoring-reports',
        statistics='s3://ml-pipeline-model-artifacts-dev-ACCOUNT_ID/baseline/statistics.json',
        constraints='s3://ml-pipeline-model-artifacts-dev-ACCOUNT_ID/baseline/constraints.json',
        schedule_cron_expression=CronExpressionGenerator.hourly(),
        enable_cloudwatch_metrics=True
    )

    print(f" Monitoring schedule created for {endpoint_name}")

def analyze_drift():
    """Analyze model drift from monitoring results"""
    s3 = boto3.client('s3')

    # Download latest monitoring report
    # Parse violations
    # Calculate drift metrics
    # Alert if drift exceeds threshold

    print(" Drift analysis complete")
Enter fullscreen mode Exit fullscreen mode

Step 7: Observability Dashboard

Create terraform/production-monitoring.tf:

# CloudWatch Dashboard for Production
resource "aws_cloudwatch_dashboard" "production_ml" {
  dashboard_name = "${var.project_name}-production"

  dashboard_body = jsonencode({
    widgets = [
      # Endpoint Invocations
      {
        type   = "metric"
        x      = 0
        y      = 0
        width  = 12
        height = 6
        properties = {
          metrics = [
            ["AWS/SageMaker", "Invocations", {
              stat = "Sum"
              label = "Total Invocations"
            }],
            [".", "Invocation4XXErrors", {
              stat = "Sum"
              label = "4XX Errors"
            }],
            [".", "Invocation5XXErrors", {
              stat = "Sum"
              label = "5XX Errors"
            }]
          ]
          period = 300
          stat   = "Sum"
          region = var.aws_region
          title  = "Endpoint Invocations & Errors"
          yAxis = {
            left = {
              min = 0
            }
          }
        }
      },
      # Model Latency
      {
        type   = "metric"
        x      = 12
        y      = 0
        width  = 12
        height = 6
        properties = {
          metrics = [
            ["AWS/SageMaker", "ModelLatency", {
              stat = "Average"
              label = "Avg Latency"
            }],
            ["...", {
              stat = "p99"
              label = "P99 Latency"
            }]
          ]
          period = 300
          stat   = "Average"
          region = var.aws_region
          title  = "Model Latency"
          yAxis = {
            left = {
              label = "Milliseconds"
              min   = 0
            }
          }
        }
      },
      # Instance Metrics
      {
        type   = "metric"
        x      = 0
        y      = 6
        width  = 12
        height = 6
        properties = {
          metrics = [
            ["AWS/SageMaker", "CPUUtilization", {
              stat = "Average"
              label = "CPU Utilization"
            }],
            [".", "MemoryUtilization", {
              stat = "Average"
              label = "Memory Utilization"
            }]
          ]
          period = 300
          stat   = "Average"
          region = var.aws_region
          title  = "Instance Resource Utilization"
          yAxis = {
            left = {
              label = "Percent"
              min   = 0
              max   = 100
            }
          }
        }
      },
      # Model Quality Metrics
      {
        type   = "metric"
        x      = 12
        y      = 6
        width  = 12
        height = 6
        properties = {
          metrics = [
            ["MLPipeline/ModelQuality", "PredictionAccuracy", {
              stat = "Average"
              label = "Accuracy"
            }],
            [".", "DriftScore", {
              stat = "Average"
              label = "Drift Score"
            }]
          ]
          period = 3600
          stat   = "Average"
          region = var.aws_region
          title  = "Model Quality Metrics"
        }
      }
    ]
  })
}

# Critical Alerts
resource "aws_cloudwatch_metric_alarm" "high_error_rate" {
  alarm_name          = "${var.project_name}-high-error-rate"
  comparison_operator = "GreaterThanThreshold"
  evaluation_periods  = "2"
  metric_name         = "Invocation5XXErrors"
  namespace           = "AWS/SageMaker"
  period              = "300"
  statistic           = "Sum"
  threshold           = "10"
  alarm_description   = "Alert when error rate is high"
  alarm_actions       = [aws_sns_topic.validation_notifications.arn]
  treat_missing_data  = "notBreaching"

  dimensions = {
    EndpointName = "ml-endpoint-production"
    VariantName  = "AllTraffic"
  }
}

resource "aws_cloudwatch_metric_alarm" "high_latency" {
  alarm_name          = "${var.project_name}-high-latency"
  comparison_operator = "GreaterThanThreshold"
  evaluation_periods  = "3"
  metric_name         = "ModelLatency"
  namespace           = "AWS/SageMaker"
  period              = "300"
  statistic           = "Average"
  threshold           = "1000"  # 1 second
  alarm_description   = "Alert when latency is high"
  alarm_actions       = [aws_sns_topic.validation_notifications.arn]

  dimensions = {
    EndpointName = "ml-endpoint-production"
    VariantName  = "AllTraffic"
  }
}

resource "aws_cloudwatch_metric_alarm" "model_drift_detected" {
  alarm_name          = "${var.project_name}-model-drift"
  comparison_operator = "GreaterThanThreshold"
  evaluation_periods  = "1"
  metric_name         = "DriftScore"
  namespace           = "MLPipeline/ModelQuality"
  period              = "3600"
  statistic           = "Average"
  threshold           = "0.15"  # 15% drift threshold
  alarm_description   = "Alert when model drift is detected"
  alarm_actions       = [aws_sns_topic.validation_notifications.arn]
}
Enter fullscreen mode Exit fullscreen mode

Step 8: Incident Response & Rollback

Automated Rollback Lambda

Create lambda/auto-rollback/handler.py:

import boto3
import json
import os

sagemaker = boto3.client('sagemaker')
sns = boto3.client('sns')

ERROR_THRESHOLD = 50  # Rollback if 50+ errors in 5 minutes
SNS_TOPIC_ARN = os.environ['SNS_TOPIC_ARN']

def lambda_handler(event, context):
    """
    Automatically rollback endpoint if error rate exceeds threshold
    """
    # Parse CloudWatch alarm
    message = json.loads(event['Records'][0]['Sns']['Message'])
    alarm_name = message['AlarmName']

    if 'high-error-rate' in alarm_name:
        print(f"High error rate detected: {alarm_name}")

        # Get current endpoint config
        endpoint_name = 'ml-endpoint-production'
        endpoint = sagemaker.describe_endpoint(EndpointName=endpoint_name)
        current_config = endpoint['EndpointConfigName']

        # List all configs, find previous one
        configs = sagemaker.list_endpoint_configs(
            SortBy='CreationTime',
            SortOrder='Descending',
            MaxResults=10
        )

        # Find previous config (skip current)
        previous_config = None
        for config in configs['EndpointConfigs']:
            if config['EndpointConfigName'] != current_config:
                previous_config = config['EndpointConfigName']
                break

        if previous_config:
            print(f"Rolling back to: {previous_config}")

            # Update endpoint to previous config
            sagemaker.update_endpoint(
                EndpointName=endpoint_name,
                EndpointConfigName=previous_config
            )

            # Send notification
            sns.publish(
                TopicArn=SNS_TOPIC_ARN,
                Subject='AUTOMATED ROLLBACK INITIATED',
                Message=f"""
Automatic rollback triggered due to high error rate.

Endpoint: {endpoint_name}
Previous Config: {current_config}
Rolled Back To: {previous_config}

Please investigate the issue.
                """
            )

            return {
                'statusCode': 200,
                'body': json.dumps({
                    'action': 'rollback',
                    'endpoint': endpoint_name,
                    'config': previous_config
                })
            }
        else:
            print("No previous config found for rollback")

            sns.publish(
                TopicArn=SNS_TOPIC_ARN,
                Subject='ROLLBACK FAILED - Manual Intervention Required',
                Message=f"""
Automatic rollback failed - no previous configuration found.

Endpoint: {endpoint_name}
Current Config: {current_config}

IMMEDIATE ACTION REQUIRED.
                """
            )

    return {
        'statusCode': 200,
        'body': json.dumps('Alarm processed')
    }
Enter fullscreen mode Exit fullscreen mode

Manual Rollback Script

Create scripts/manual_rollback.sh:

#!/bin/bash
# Manual rollback script for emergencies

set -e

ENDPOINT_NAME="ml-endpoint-production"
REGION="ap-south-1"

echo "EMERGENCY ROLLBACK INITIATED"
echo "================================="

# Get current config
echo "Getting current endpoint configuration..."
CURRENT_CONFIG=$(aws sagemaker describe-endpoint \
  --endpoint-name $ENDPOINT_NAME \
  --region $REGION \
  --query 'EndpointConfigName' \
  --output text)

echo "Current config: $CURRENT_CONFIG"

# List recent configs
echo "Finding previous configuration..."
PREVIOUS_CONFIG=$(aws sagemaker list-endpoint-configs \
  --sort-by CreationTime \
  --sort-order Descending \
  --region $REGION \
  --query "EndpointConfigs[?EndpointConfigName!='$CURRENT_CONFIG'] | [0].EndpointConfigName" \
  --output text)

echo "Previous config: $PREVIOUS_CONFIG"

# Confirm rollback
read -p "Rollback to $PREVIOUS_CONFIG? (yes/no): " CONFIRM

if [ "$CONFIRM" == "yes" ]; then
  echo "Executing rollback..."

  aws sagemaker update-endpoint \
    --endpoint-name $ENDPOINT_NAME \
    --endpoint-config-name $PREVIOUS_CONFIG \
    --region $REGION

  echo " Rollback initiated"
  echo "Monitor status: aws sagemaker describe-endpoint --endpoint-name $ENDPOINT_NAME"
else
  echo "Rollback cancelled"
fi
Enter fullscreen mode Exit fullscreen mode

Step 9: Cost Optimization for Production

Production Cost Analysis

Monthly Costs (Production):

Resource Configuration Cost/Month
SageMaker Endpoint 2x ml.m5.large 24/7 ~$167
Auto-scaling (peak) +2 instances @ 8hr/day ~$111
Model Monitor Hourly checks ~$36
Data Capture 100% sampling, 1M requests ~$5
CloudWatch Logs + Metrics + Alarms ~$20
S3 Storage 200GB models + data ~$4.60
CodePipeline 1 active pipeline ~$1
Base Total ~$345/month
With Traffic $345-500/month

Cost Saving Tips

# 1. Use Serverless Inference for low traffic
# Instead of persistent endpoint, use serverless
# Costs: $0.20/1M requests + $0.067/GB-hour memory

# 2. Async Inference for batch predictions
# Process predictions asynchronously
# Lower infrastructure costs

# 3. Multi-model endpoints
# Host multiple models on same endpoint
# Share infrastructure costs

# 4. Reserved instances for predictable workloads
# Save up to 75% vs on-demand

# 5. Scheduled scaling
# Scale down during off-hours (nights/weekends)
# Save 40-60% on compute
Enter fullscreen mode Exit fullscreen mode

Step 10: Complete Deployment Workflow

End-to-End Deployment Script

Create scripts/deploy_production.sh:

#!/bin/bash
# Complete production deployment workflow

set -e

PROJECT_NAME="ml-pipeline"
ENVIRONMENT="production"
REGION="ap-south-1"

echo " Starting Production Deployment"
echo "=================================="

# Step 1: Run tests
echo "Step 1/6: Running model tests..."
python tests/test_model.py
echo " Tests passed"

# Step 2: Build infrastructure
echo "Step 2/6: Deploying infrastructure..."
cd terraform
terraform apply -auto-approve \
  -var="environment=production" \
  -var="notification_email=your-email@example.com"
cd ..
echo " Infrastructure deployed"

# Step 3: Create baseline
echo "Step 3/6: Creating monitoring baseline..."
python monitoring/model_monitor.py
echo " Baseline created"

# Step 4: Deploy endpoint
echo "Step 4/6: Deploying SageMaker endpoint..."
python deployment/generate_endpoint_config.py
aws cloudformation create-stack \
  --stack-name ${PROJECT_NAME}-endpoint-${ENVIRONMENT} \
  --template-body file://endpoint-config.yaml \
  --capabilities CAPABILITY_IAM \
  --region $REGION
echo " Endpoint deployment initiated"

# Step 5: Wait for endpoint
echo "Step 5/6: Waiting for endpoint to be InService..."
aws sagemaker wait endpoint-in-service \
  --endpoint-name ml-endpoint-${ENVIRONMENT} \
  --region $REGION
echo " Endpoint is InService"

# Step 6: Enable monitoring
echo "Step 6/6: Enabling monitoring schedule..."
python monitoring/model_monitor.py --enable-schedule
echo " Monitoring enabled"

echo ""
echo "=================================="
echo " Production Deployment Complete!"
echo "=================================="
echo ""
echo "Endpoint: ml-endpoint-${ENVIRONMENT}"
echo "Dashboard: https://console.aws.amazon.com/cloudwatch/home?region=${REGION}#dashboards:name=${PROJECT_NAME}-production"
echo "Monitor: https://console.aws.amazon.com/sagemaker/home?region=${REGION}#/monitoring-schedules"
Enter fullscreen mode Exit fullscreen mode

Testing the Complete Pipeline

1. Test CI/CD Pipeline

# Trigger pipeline by updating model in S3
aws s3 cp model.tar.gz \
  s3://ml-pipeline-model-artifacts-dev-ACCOUNT_ID/latest/model.tar.gz

# Monitor pipeline
aws codepipeline get-pipeline-state \
  --name ml-pipeline-ml-pipeline
Enter fullscreen mode Exit fullscreen mode

2. Test Endpoint

# Invoke endpoint
aws sagemaker-runtime invoke-endpoint \
  --endpoint-name ml-endpoint-production \
  --content-type text/csv \
  --body '1.5,2.3,1.8,2.1' \
  output.json

cat output.json
Enter fullscreen mode Exit fullscreen mode

3. Test A/B Deployment

# Deploy with A/B testing
from deployment.ab_testing import create_ab_test_endpoint

create_ab_test_endpoint(
    endpoint_name='ml-endpoint-production',
    model_a_arn='arn:aws:sagemaker:...:model-package/current',
    model_b_arn='arn:aws:sagemaker:...:model-package/new',
    traffic_split_percentage=10
)
Enter fullscreen mode Exit fullscreen mode

4. Test Auto-Rollback

# Simulate high error rate
# Auto-rollback lambda will trigger

# Or manual rollback
./scripts/manual_rollback.sh
Enter fullscreen mode Exit fullscreen mode

Production Readiness Checklist

Infrastructure

  • All resources deployed via Terraform
  • VPC and security groups configured
  • IAM roles follow least privilege
  • Encryption enabled everywhere

CI/CD

  • CodePipeline operational
  • Automated tests passing
  • Manual approval gate configured
  • Rollback procedure tested

Monitoring

  • CloudWatch dashboard created
  • Critical alarms configured
  • Model monitoring enabled
  • Log retention policies set

Deployment

  • Staging environment tested
  • A/B testing capability ready
  • Auto-scaling configured
  • Rollback tested

Operations

  • Runbooks documented
  • On-call rotation established
  • Incident response plan ready
  • Disaster recovery tested

What You've Built - The Complete Platform

Congratulations! You now have a production-ready ML platform with:

Phase 1: Data (Part 2)

  • Encrypted S3 data lake
  • Automated validation
  • Complete audit trail

Phase 2: Training (Part 3)

  • Scalable SageMaker training
  • Experiment tracking with MLflow
  • Cost-optimized compute
  • Model registry

Phase 3: Deployment (Part 4)

  • CI/CD pipeline
  • Production endpoints
  • A/B testing
  • Auto-scaling
  • Comprehensive monitoring
  • Automated rollback

Real-World Production Tips

1. Start Simple, Scale Gradually

Week 1: Deploy to staging
Week 2: Deploy to production with 10% traffic
Week 3: Increase to 50% if metrics good
Week 4: Full production rollout
Enter fullscreen mode Exit fullscreen mode

2. Monitor Everything

  • Endpoint health
  • Model performance
  • Data drift
  • Business metrics
  • Costs

3. Incident Response

1. Alert triggers
2. Check dashboard
3. Assess severity
4. Rollback if critical
5. Investigate root cause
6. Document learnings
Enter fullscreen mode Exit fullscreen mode

4. Continuous Improvement

  • Weekly model retraining
  • Monthly architecture review
  • Quarterly cost optimization
  • Regular security audits

Common Issues & Solutions

Issue: High Latency

Solutions:

  • Add more instances (auto-scaling)
  • Optimize model (quantization, pruning)
  • Use faster instance types
  • Enable batch prediction
  • Add caching layer

Issue: High Costs

Solutions:

  • Use Serverless Inference for low traffic
  • Implement scheduled scaling
  • Archive old model versions
  • Optimize S3 lifecycle policies
  • Use Spot instances where possible

Issue: Model Drift

Solutions:

  • Enable SageMaker Model Monitor
  • Set up automated retraining
  • Implement data quality checks
  • Regular model evaluation
  • A/B test new models

Issue: Failed Deployments

Solutions:

  • Comprehensive testing in CI/CD
  • Blue/green deployment strategy
  • Canary releases with gradual rollout
  • Quick rollback procedures
  • Post-deployment validation

Series Wrap-Up

What We've Covered

Part 1: AIDLC Framework & Architecture

  • Security-first ML design
  • AWS service selection
  • Best practices

Part 2: Data Pipelines

  • S3 + Lambda automation
  • Data validation
  • Audit trails

Part 3: Training at Scale

  • SageMaker + MLflow
  • Spot instances
  • Hyperparameter tuning

Part 4: Production Deployment

  • CI/CD pipelines
  • A/B testing
  • Monitoring & rollback

Recommended Reading:


Key Takeaways

  1. Automate everything - CI/CD is non-negotiable
  2. Monitor proactively - Don't wait for users to report issues
  3. Deploy safely - A/B testing and gradual rollouts
  4. Plan for failure - Rollback procedures must work
  5. Optimize costs - Production can get expensive
  6. Document thoroughly - Future you will thank present you

Remember: Production is where ML creates real value. Build systems that are reliable, observable, and maintainable.


Final Thoughts

Building production ML systems is complex, but with the right architecture and tools, it's absolutely achievable. You now have a complete blueprint for:

  • Secure data handling
  • Scalable training
  • Safe deployment
  • Comprehensive monitoring
  • Cost optimization

This is just the beginning. Take these foundations and build amazing ML products!


Thank You!

Thank you for following along with this 4-part series! I hope you found it valuable.

Series Recap:

What's your biggest ML deployment challenge?

Drop a comment below - I'd love to hear about your experiences and help if I can!

Building something cool with this architecture?

I'd love to see what you create! Tag me on social media or drop a comment.


Let's Stay Connected!

  • Follow me for more AWS and ML content
  • Like if this series helped you
  • Share with your team/connects

Have questions? Reach out anytime!


About the Author

Connect with me:


Tags: #aws #machinelearning #mlops #sagemaker #cicd #devops #production #terraform #cloudwatch


Top comments (0)