DEV Community

Cover image for Building Highly Available Authentication with Amazon Cognito Multi-Region Replication: Architecture, Migration, and Failover Guide
Manish Kumar
Manish Kumar

Posted on

Building Highly Available Authentication with Amazon Cognito Multi-Region Replication: Architecture, Migration, and Failover Guide

Amazon Cognito's long-awaited Multi-Region Replication (MRR) feature is now generally available, automatically synchronizing user data, credentials, and pool configurations to a secondary AWS Region. Alongside this, AWS has added native support for customer managed KMS keys for encryption control — a critical feature for regulated industries like healthcare and financial services.

Why This Matters

Before MRR, teams building HA authentication on Cognito had to maintain error-prone custom replication solutions using Lambda triggers, DynamoDB Global Tables, and complex sync logic. End users experienced forced password resets during regional failovers, and machine-to-machine (M2M) clients needed to be manually reconfigured in secondary regions.

Before MRR - DIY Approach

With MRR, Cognito now:

  • Automatically replicates user profiles, credentials, MFA secrets, and pool configurations from primary → secondary region
  • Allows both regions to recognize tokens issued by either region, preserving active sessions
  • Supports all auth methods — social federation (Google, Apple, Amazon, Facebook), SAML, OIDC, and M2M OAuth2 flows
  • Provides a built-in Route 53 health check-based failover for custom domains

After MRR - Native Solution

Architecture Overview

Cognito Multi-Region Replication Architecture

The diagram above shows the complete MRR architecture with:

Prerequisites

Before enabling MRR, your user pool must meet these requirements:

  • Essentials or Plus feature plan (not available on Lite tier)
  • Multi-region customer managed KMS key replicated in all target regions
  • Multi-region OIDC issuer configured on the user pool
  • A custom domain configured (required for automatic Route 53-based failover)

Step 1: Create a Multi-Region KMS Key

KMS Multi-Region Key Replication Flow

AWS CLI

# Step 1: Create the primary multi-region KMS key in us-west-2
aws kms create-key \
  --region us-west-2 \
  --description "Cognito MRR Key" \
  --multi-region \
  --key-usage ENCRYPT_DECRYPT \
  --origin AWS_KMS \
  --tags TagKey=Purpose,TagValue=CognitoMRR

# Capture the key ARN
PRIMARY_KEY_ARN=$(aws kms list-keys --region us-west-2 \
  --query "Keys[?contains(KeyId, 'mrk')]" \
  --output text | head -1)

# Step 2: Replicate the key to the secondary region
aws kms replicate-key \
  --region us-west-2 \
  --key-id $PRIMARY_KEY_ARN \
  --replica-region us-east-1 \
  --description "Cognito MRR Key Replica (us-east-1)"

# Step 3: Update key policy to allow Cognito access
aws kms put-key-policy \
  --region us-west-2 \
  --key-id $PRIMARY_KEY_ARN \
  --policy-name default \
  --policy '{
    "Version": "2012-10-17",
    "Statement": [
      {
        "Sid": "AllowCognitoKMSAccess",
        "Effect": "Allow",
        "Principal": {
          "Service": "cognito-idp.amazonaws.com"
        },
        "Action": [
          "kms:Encrypt",
          "kms:Decrypt",
          "kms:GenerateDataKey",
          "kms:DescribeKey"
        ],
        "Resource": "*"
      },
      {
        "Sid": "AllowRootAccount",
        "Effect": "Allow",
        "Principal": {
          "AWS": "arn:aws:iam::YOUR_ACCOUNT_ID:root"
        },
        "Action": "kms:*",
        "Resource": "*"
      }
    ]
  }'
Enter fullscreen mode Exit fullscreen mode

Step 2: Configure the Cognito User Pool

Token Validation with Multi-Region OIDC Issuer

Attach the KMS Key and Configure Multi-Region OIDC Issuer (CLI)

# Update the user pool to use the customer managed KMS key
aws cognito-idp update-user-pool \
  --region us-west-2 \
  --user-pool-id us-west-2_XXXXXXXXX \
  --kms-key-id arn:aws:kms:us-west-2:<ACCOUNT_ID>:key/mrk-XXXXXXXXXX

# Switch the user pool to a multi-region OIDC issuer
# (This is done via the console "Change issuer type" step;
# verify issuer type via describe-user-pool)
aws cognito-idp describe-user-pool \
  --region us-west-2 \
  --user-pool-id us-west-2_XXXXXXXXX \
  --query "UserPool.{IssuerConfiguration:IssuerConfiguration, Domain:Domain}"
Enter fullscreen mode Exit fullscreen mode

⚠️ Important: Switching to a multi-region OIDC issuer changes the iss claim in all tokens. Update all backend services, mobile apps, and SPAs to use the new issuer URL before proceeding.

Step 3: Create the Replica User Pool

AWS CLI

# Create the replica in us-east-1
# Note: The API call is made against the PRIMARY region
aws cognito-idp create-user-pool-replica-region \
  --region us-west-2 \
  --user-pool-id us-west-2_XXXXXXXXX \
  --replica-region '{"RegionName": "us-east-1", "KmsKeyId": "arn:aws:kms:us-east-1:<ACCOUNT_ID>:key/mrk-XXXXXXXXXX"}'

# Check replication status — replica info lives on the PRIMARY pool's ReplicaRegions field
aws cognito-idp describe-user-pool \
  --region us-west-2 \
  --user-pool-id us-west-2_XXXXXXXXX \
  --query "UserPool.ReplicaRegions[*].{Region:RegionName, Status:Status}"

# Describe the replica pool directly in the secondary region
aws cognito-idp describe-user-pool \
  --region us-east-1 \
  --user-pool-id us-east-1_XXXXXXXXX \
  --query "UserPool.{Id:Id, Status:Status}"
Enter fullscreen mode Exit fullscreen mode

⚠️ Note: There is no update-user-pool-replica or list-user-pool-replicas CLI command. The replica becomes active automatically once initial sync completes. Replica status is tracked via the primary pool's ReplicaRegions field.

Step 4: Configure Route 53 Health Check & Failover

Normal Traffic Flow

Normal Traffic Flow - Primary Active

Failover Scenario

Failover Traffic Flow - Replica Serving Read Traffic

CLI Configuration

# Create a Route 53 health check for the primary Cognito endpoint
aws route53 create-health-check \
  --caller-reference "cognito-primary-hc-$(date +%s)" \
  --health-check-config '{
    "Type": "HTTPS",
    "FullyQualifiedDomainName": "cognito-idp.us-west-2.amazonaws.com",
    "Port": 443,
    "RequestInterval": 30,
    "FailureThreshold": 3,
    "ResourcePath": "/health",
    "MeasureLatency": true,
    "Regions": ["us-east-1","us-west-2","eu-west-1"]
  }'

# Store health check ID
HC_ID=$(aws route53 list-health-checks \
  --query "HealthChecks[-1].Id" --output text)

echo "Health Check ID: $HC_ID"

# Update the Cognito custom domain to use this health check for auto-failover
# This is done in the console: Branding > Domain > Edit multi-Region failover
# Associate the $HC_ID with the custom domain
Enter fullscreen mode Exit fullscreen mode

Infrastructure as Code (Terraform)

# main.tf

terraform {
  required_providers {
    aws = {
      source  = "hashicorp/aws"
      version = ">= 5.50.0"
    }
  }
}

# ─────────────────────────────────────────────────────────────
# Provider configurations
# ─────────────────────────────────────────────────────────────
provider "aws" {
  alias  = "primary"
  region = "us-west-2"
}

provider "aws" {
  alias  = "secondary"
  region = "us-east-1"
}

data "aws_caller_identity" "current" {}

# ─────────────────────────────────────────────────────────────
# Multi-Region KMS Key
# ─────────────────────────────────────────────────────────────
resource "aws_kms_key" "cognito_mrk" {
  provider                = aws.primary
  description             = "Multi-region KMS key for Cognito MRR"
  multi_region            = true
  deletion_window_in_days = 30
  enable_key_rotation     = true

  policy = jsonencode({
    Version = "2012-10-17"
    Statement = [
      {
        Sid    = "AllowRoot"
        Effect = "Allow"
        Principal = {
          AWS = "arn:aws:iam::${data.aws_caller_identity.current.account_id}:root"
        }
        Action   = "kms:*"
        Resource = "*"
      },
      {
        Sid    = "AllowCognito"
        Effect = "Allow"
        Principal = {
          Service = "cognito-idp.amazonaws.com"
        }
        Action = [
          "kms:Encrypt",
          "kms:Decrypt",
          "kms:GenerateDataKey",
          "kms:DescribeKey",
          "kms:CreateGrant"
        ]
        Resource = "*"
      }
    ]
  })

  tags = {
    Name        = "cognito-mrr-key"
    Environment = "production"
  }
}

resource "aws_kms_alias" "cognito_mrk" {
  provider      = aws.primary
  name          = "alias/cognito-mrr-key"
  target_key_id = aws_kms_key.cognito_mrk.key_id
}

# Replicate the key to secondary region
resource "aws_kms_replica_key" "cognito_mrk_replica" {
  provider                = aws.secondary
  description             = "Replica of Cognito MRR key in us-east-1"
  primary_key_arn         = aws_kms_key.cognito_mrk.arn
  deletion_window_in_days = 30
  enabled                 = true

  policy = jsonencode({
    Version = "2012-10-17"
    Statement = [
      {
        Sid    = "AllowRoot"
        Effect = "Allow"
        Principal = {
          AWS = "arn:aws:iam::${data.aws_caller_identity.current.account_id}:root"
        }
        Action   = "kms:*"
        Resource = "*"
      },
      {
        Sid    = "AllowCognito"
        Effect = "Allow"
        Principal = {
          Service = "cognito-idp.amazonaws.com"
        }
        Action = [
          "kms:Encrypt",
          "kms:Decrypt",
          "kms:GenerateDataKey",
          "kms:DescribeKey",
          "kms:CreateGrant"
        ]
        Resource = "*"
      }
    ]
  })

  tags = {
    Name        = "cognito-mrr-key-replica"
    Environment = "production"
  }
}

# ─────────────────────────────────────────────────────────────
# Primary Cognito User Pool
# ─────────────────────────────────────────────────────────────
resource "aws_cognito_user_pool" "primary" {
  provider = aws.primary
  name     = "myapp-user-pool-primary"

  # Use customer managed KMS key
  user_pool_add_ons {
    advanced_security_mode = "ENFORCED"
  }

  password_policy {
    minimum_length                   = 12
    require_lowercase                = true
    require_uppercase                = true
    require_numbers                  = true
    require_symbols                  = true
    temporary_password_validity_days = 7
  }

  mfa_configuration = "OPTIONAL"

  software_token_mfa_configuration {
    enabled = true
  }

  # Email verification
  auto_verified_attributes = ["email"]

  account_recovery_setting {
    recovery_mechanism {
      name     = "verified_email"
      priority = 1
    }
  }

  schema {
    name                     = "email"
    attribute_data_type      = "String"
    required                 = true
    mutable                  = true
  }

  tags = {
    Name        = "myapp-primary"
    Environment = "production"
    Region      = "us-west-2"
  }
}

# App Client for the primary pool
resource "aws_cognito_user_pool_client" "primary" {
  provider        = aws.primary
  name            = "myapp-client-primary"
  user_pool_id    = aws_cognito_user_pool.primary.id

  explicit_auth_flows = [
    "ALLOW_USER_SRP_AUTH",
    "ALLOW_REFRESH_TOKEN_AUTH",
    "ALLOW_USER_PASSWORD_AUTH"
  ]

  access_token_validity  = 60
  id_token_validity      = 60
  refresh_token_validity = 30

  token_validity_units {
    access_token  = "minutes"
    id_token      = "minutes"
    refresh_token = "days"
  }

  prevent_user_existence_errors = "ENABLED"
}

# ─────────────────────────────────────────────────────────────
# Replica User Pool (Secondary Region)
# NOTE: There is no standalone aws_cognito_user_pool_replica resource in
# the AWS Terraform provider. Replication is configured via the
# replica_regions block inside aws_cognito_user_pool.
# ─────────────────────────────────────────────────────────────
# Add a replica_regions block to aws_cognito_user_pool.primary:
#
# resource "aws_cognito_user_pool" "primary" {
#   ...
#   replica_regions {
#     region_name = "us-east-1"
#     kms_key_id  = aws_kms_replica_key.cognito_mrk_replica.arn
#   }
# }
#
# Reference: https://registry.terraform.io/providers/hashicorp/aws/latest/docs/resources/cognito_user_pool

# ─────────────────────────────────────────────────────────────
# Route 53 Health Check for Failover
# ─────────────────────────────────────────────────────────────
resource "aws_route53_health_check" "cognito_primary" {
  fqdn              = "cognito-idp.us-west-2.amazonaws.com"
  port              = 443
  type              = "HTTPS"
  resource_path     = "/health"
  failure_threshold = 3
  request_interval  = 30

  tags = {
    Name = "cognito-primary-health-check"
  }
}

# ─────────────────────────────────────────────────────────────
# CloudWatch Alarm for Failover Monitoring
# ─────────────────────────────────────────────────────────────
resource "aws_cloudwatch_metric_alarm" "cognito_errors_primary" {
  provider            = aws.primary
  alarm_name          = "cognito-high-error-rate-primary"
  comparison_operator = "GreaterThanThreshold"
  evaluation_periods  = 3
  metric_name         = "Errors"
  namespace           = "AWS/Cognito"
  period              = 60
  statistic           = "Sum"
  threshold           = 10

  dimensions = {
    UserPool       = aws_cognito_user_pool.primary.id
    UserPoolClient = aws_cognito_user_pool_client.primary.id
  }

  alarm_description = "Cognito primary region error rate too high - consider failover"
  alarm_actions     = [aws_sns_topic.cognito_alerts.arn]
}

resource "aws_sns_topic" "cognito_alerts" {
  provider = aws.primary
  name     = "cognito-mrr-alerts"
}

# ─────────────────────────────────────────────────────────────
# Outputs
# ─────────────────────────────────────────────────────────────
output "primary_user_pool_id" {
  value = aws_cognito_user_pool.primary.id
}

output "primary_user_pool_endpoint" {
  value = aws_cognito_user_pool.primary.endpoint
}

# Replica pool ID is obtained from describe-user-pool in the secondary region,
# not from a separate Terraform resource output.

output "kms_primary_key_arn" {
  value = aws_kms_key.cognito_mrk.arn
}

output "kms_replica_key_arn" {
  value = aws_kms_replica_key.cognito_mrk_replica.arn
}

output "route53_health_check_id" {
  value = aws_route53_health_check.cognito_primary.id
}
Enter fullscreen mode Exit fullscreen mode

Python Automation Scripts

Script 1: Full Setup Orchestrator

#!/usr/bin/env python3
"""
cognito_mrr_setup.py
Automates Amazon Cognito Multi-Region Replication setup using boto3.
"""

import boto3
import json
import time
import logging
from typing import Optional

logging.basicConfig(
    level=logging.INFO,
    format="%(asctime)s [%(levelname)s] %(message)s"
)
log = logging.getLogger(__name__)

PRIMARY_REGION   = "us-west-2"
SECONDARY_REGION = "us-east-1"
ACCOUNT_ID       = boto3.client("sts").get_caller_identity()["Account"]


# ─────────────────────────────────────────────────────────────
# KMS: Create and Replicate a Multi-Region Key
# ─────────────────────────────────────────────────────────────
def create_multi_region_kms_key(primary_region: str) -> str:
    kms = boto3.client("kms", region_name=primary_region)

    key_policy = json.dumps({
        "Version": "2012-10-17",
        "Statement": [
            {
                "Sid": "AllowRoot",
                "Effect": "Allow",
                "Principal": {"AWS": f"arn:aws:iam::{ACCOUNT_ID}:root"},
                "Action": "kms:*",
                "Resource": "*"
            },
            {
                "Sid": "AllowCognito",
                "Effect": "Allow",
                "Principal": {"Service": "cognito-idp.amazonaws.com"},
                "Action": [
                    "kms:Encrypt", "kms:Decrypt",
                    "kms:GenerateDataKey", "kms:DescribeKey", "kms:CreateGrant"
                ],
                "Resource": "*"
            }
        ]
    })

    response = kms.create_key(
        Description="Multi-region KMS key for Cognito MRR",
        MultiRegion=True,
        KeyUsage="ENCRYPT_DECRYPT",
        Origin="AWS_KMS",
        Policy=key_policy,
        Tags=[{"TagKey": "Purpose", "TagValue": "CognitoMRR"}]
    )

    key_arn = response["KeyMetadata"]["Arn"]
    key_id  = response["KeyMetadata"]["KeyId"]
    log.info(f"✅ Created multi-region KMS key: {key_arn}")

    kms.create_alias(AliasName="alias/cognito-mrr-key", TargetKeyId=key_id)
    return key_arn


def replicate_kms_key(primary_key_arn: str, target_region: str) -> str:
    kms = boto3.client("kms", region_name=PRIMARY_REGION)

    response = kms.replicate_key(
        KeyId=primary_key_arn,
        ReplicaRegion=target_region,
        Description=f"Cognito MRR key replica in {target_region}"
    )

    replica_arn = response["ReplicaKeyMetadata"]["Arn"]
    log.info(f"✅ Replicated KMS key to {target_region}: {replica_arn}")

    # Update replica key policy for Cognito access
    kms_secondary = boto3.client("kms", region_name=target_region)
    kms_secondary.put_key_policy(
        KeyId=replica_arn,
        PolicyName="default",
        Policy=json.dumps({
            "Version": "2012-10-17",
            "Statement": [
                {
                    "Sid": "AllowRoot",
                    "Effect": "Allow",
                    "Principal": {"AWS": f"arn:aws:iam::{ACCOUNT_ID}:root"},
                    "Action": "kms:*",
                    "Resource": "*"
                },
                {
                    "Sid": "AllowCognito",
                    "Effect": "Allow",
                    "Principal": {"Service": "cognito-idp.amazonaws.com"},
                    "Action": [
                        "kms:Encrypt", "kms:Decrypt",
                        "kms:GenerateDataKey", "kms:DescribeKey", "kms:CreateGrant"
                    ],
                    "Resource": "*"
                }
            ]
        })
    )
    return replica_arn


# ─────────────────────────────────────────────────────────────
# Cognito: Create User Pool with KMS Encryption
# ─────────────────────────────────────────────────────────────
def create_primary_user_pool(kms_key_arn: str, region: str) -> str:
    cognito = boto3.client("cognito-idp", region_name=region)

    response = cognito.create_user_pool(
        PoolName="myapp-user-pool-primary",
        Policies={
            "PasswordPolicy": {
                "MinimumLength": 12,
                "RequireUppercase": True,
                "RequireLowercase": True,
                "RequireNumbers": True,
                "RequireSymbols": True,
                "TemporaryPasswordValidityDays": 7
            }
        },
        MfaConfiguration="OPTIONAL",
        UserPoolAddOns={"AdvancedSecurityMode": "ENFORCED"},
        AutoVerifiedAttributes=["email"],
        Schema=[
            {
                "Name": "email",
                "AttributeDataType": "String",
                "Required": True,
                "Mutable": True
            }
        ],
        UserPoolTags={
            "Environment": "production",
            "Region": region,
            "MRR": "enabled"
        },
        UserPoolEncryptionConfig={"KMSKeyID": kms_key_arn}
    )

    pool_id = response["UserPool"]["Id"]
    log.info(f"✅ Created primary user pool: {pool_id}")
    return pool_id


# ─────────────────────────────────────────────────────────────
# Cognito: Create Replica User Pool
# ─────────────────────────────────────────────────────────────
def create_user_pool_replica(
    primary_pool_id: str,
    target_region: str,
    source_region: str = PRIMARY_REGION
) -> dict:
    cognito = boto3.client("cognito-idp", region_name=source_region)

    response = cognito.create_user_pool_replica_region(
        UserPoolId=primary_pool_id,
        ReplicaRegion={
            "RegionName": target_region
        }
    )

    replica = response["UserPoolReplica"]
    log.info(
        f"✅ Created replica user pool in {target_region}\n"
        f"   ARN: {replica['UserPoolArn']}\n"
        f"   Status: {replica['Status']}"
    )
    return replica


# ─────────────────────────────────────────────────────────────
# Cognito: Poll until replica is INACTIVE (synced), then ACTIVATE
# ─────────────────────────────────────────────────────────────
def wait_and_activate_replica(
    replica_pool_id: str,
    secondary_region: str,
    timeout_seconds: int = 600
):
    cognito = boto3.client("cognito-idp", region_name=secondary_region)
    elapsed = 0

    log.info(f"⏳ Waiting for replica pool {replica_pool_id} to be ready...")

    while elapsed < timeout_seconds:
        resp = cognito.describe_user_pool(UserPoolId=replica_pool_id)
        status = resp["UserPool"].get("Status", "UNKNOWN")
        log.info(f"   Pool status: {status} ({elapsed}s elapsed)")

        if status == "ACTIVE":
            # Replica becomes ACTIVE automatically once initial sync completes
            log.info(f"✅ Replica pool {replica_pool_id} is ACTIVE")
            return

        time.sleep(30)
        elapsed += 30

    raise TimeoutError(f"Replica pool did not become INACTIVE within {timeout_seconds}s")


# ─────────────────────────────────────────────────────────────
# Route 53: Create Health Check
# ─────────────────────────────────────────────────────────────
def create_route53_health_check(primary_region: str) -> str:
    r53 = boto3.client("route53")

    response = r53.create_health_check(
        CallerReference=f"cognito-hc-{int(time.time())}",
        HealthCheckConfig={
            "Type": "HTTPS",
            "FullyQualifiedDomainName": f"cognito-idp.{primary_region}.amazonaws.com",
            "Port": 443,
            "RequestInterval": 30,
            "FailureThreshold": 3,
            "MeasureLatency": True,
            "Regions": ["us-east-1", "us-west-2", "eu-west-1"]
        }
    )

    hc_id = response["HealthCheck"]["Id"]
    log.info(f"✅ Created Route 53 health check: {hc_id}")

    r53.change_tags_for_resource(
        ResourceType="healthcheck",
        ResourceId=hc_id,
        AddTags=[{"Key": "Name", "Value": "cognito-primary-hc"}]
    )
    return hc_id


# ─────────────────────────────────────────────────────────────
# CloudWatch: Alarms and SNS notifications
# ─────────────────────────────────────────────────────────────
def setup_monitoring(
    pool_id: str,
    client_id: str,
    region: str,
    alert_email: Optional[str] = None
) -> str:
    sns = boto3.client("sns", region_name=region)
    cw  = boto3.client("cloudwatch", region_name=region)

    # Create SNS topic
    topic = sns.create_topic(Name="cognito-mrr-alerts")
    topic_arn = topic["TopicArn"]

    if alert_email:
        sns.subscribe(
            TopicArn=topic_arn,
            Protocol="email",
            Endpoint=alert_email
        )
        log.info(f"📧 Subscribed {alert_email} to alerts topic")

    # Create CloudWatch alarm for auth errors
    cw.put_metric_alarm(
        AlarmName="cognito-primary-high-errors",
        AlarmDescription="Cognito primary region auth error rate is high - consider failover",
        MetricName="Errors",
        Namespace="AWS/Cognito",
        Dimensions=[
            {"Name": "UserPool", "Value": pool_id},
            {"Name": "UserPoolClient", "Value": client_id}
        ],
        Statistic="Sum",
        Period=60,
        EvaluationPeriods=3,
        Threshold=10,
        ComparisonOperator="GreaterThanThreshold",
        AlarmActions=[topic_arn],
        OKActions=[topic_arn],
        TreatMissingData="notBreaching"
    )

    # Alarm for sign-in latency
    cw.put_metric_alarm(
        AlarmName="cognito-primary-high-latency",
        AlarmDescription="Cognito primary region sign-in latency > 2000ms",
        MetricName="SignInSuccesses",
        Namespace="AWS/Cognito",
        Dimensions=[{"Name": "UserPool", "Value": pool_id}],
        Statistic="p99",
        Period=60,
        EvaluationPeriods=5,
        Threshold=2000,
        ComparisonOperator="GreaterThanThreshold",
        AlarmActions=[topic_arn],
        TreatMissingData="notBreaching"
    )

    log.info("✅ CloudWatch monitoring configured")
    return topic_arn


# ─────────────────────────────────────────────────────────────
# Main Orchestration
# ─────────────────────────────────────────────────────────────
def main():
    log.info("🚀 Starting Cognito Multi-Region Replication setup...")

    # 1. Create multi-region KMS key
    primary_key_arn = create_multi_region_kms_key(PRIMARY_REGION)

    # 2. Replicate KMS key to secondary region
    replica_key_arn = replicate_kms_key(primary_key_arn, SECONDARY_REGION)

    # 3. Create primary Cognito user pool with KMS encryption
    primary_pool_id = create_primary_user_pool(primary_key_arn, PRIMARY_REGION)

    # NOTE: Before calling create_user_pool_replica, you must:
    # a) Update the user pool to use the multi-region OIDC issuer (via console)
    # b) Update your applications with the new OIDC issuer URLs

    # 4. Create replica user pool
    replica = create_user_pool_replica(primary_pool_id, SECONDARY_REGION)
    # Extract the replica pool ID from the ARN
    replica_pool_id = replica["UserPoolArn"].split("/")[-1]

    # 5. Wait for replication to complete and activate
    wait_and_activate_replica(replica_pool_id, SECONDARY_REGION)

    # 6. Set up Route 53 health check
    health_check_id = create_route53_health_check(PRIMARY_REGION)

    # 7. Set up monitoring and alerting
    cognito_primary = boto3.client("cognito-idp", region_name=PRIMARY_REGION)
    clients = cognito_primary.list_user_pool_clients(UserPoolId=primary_pool_id)
    client_id = clients["UserPoolClients"][0]["ClientId"] if clients["UserPoolClients"] else "NONE"

    setup_monitoring(
        pool_id=primary_pool_id,
        client_id=client_id,
        region=PRIMARY_REGION,
        alert_email=os.environ.get("ALERT_EMAIL", "")
    )

    log.info("\n" + "="*60)
    log.info("✅ COGNITO MULTI-REGION REPLICATION SETUP COMPLETE")
    log.info("="*60)
    log.info(f"Primary Pool ID   : {primary_pool_id}")
    log.info(f"Primary Region    : {PRIMARY_REGION}")
    log.info(f"Replica Pool ID   : {replica_pool_id}")
    log.info(f"Secondary Region  : {SECONDARY_REGION}")
    log.info(f"KMS Key (Primary) : {primary_key_arn}")
    log.info(f"KMS Key (Replica) : {replica_key_arn}")
    log.info(f"Route53 HC ID     : {health_check_id}")
    log.info("="*60)


if __name__ == "__main__":
    main()
Enter fullscreen mode Exit fullscreen mode

Script 2: Failover Health Monitor (Lambda-Compatible)

#!/usr/bin/env python3
"""
cognito_failover_monitor.py
Monitors primary Cognito health and can be deployed as a Lambda function
to automate failover decisions or send alerts.
"""

import boto3
import json
import logging
import os
from datetime import datetime, timezone, timedelta

log = logging.getLogger()
log.setLevel(logging.INFO)

PRIMARY_REGION   = os.environ.get("PRIMARY_REGION", "us-west-2")
SECONDARY_REGION = os.environ.get("SECONDARY_REGION", "us-east-1")
PRIMARY_POOL_ID  = os.environ.get("PRIMARY_POOL_ID", "")
REPLICA_POOL_ID  = os.environ.get("REPLICA_POOL_ID", "")
SNS_TOPIC_ARN    = os.environ.get("SNS_TOPIC_ARN", "")
HC_ID            = os.environ.get("ROUTE53_HEALTH_CHECK_ID", "")


def get_cognito_error_rate(pool_id: str, region: str) -> float:
    """Returns the error count in the last 5 minutes."""
    cw = boto3.client("cloudwatch", region_name=region)
    end   = datetime.now(timezone.utc)
    start = end - timedelta(minutes=5)

    resp = cw.get_metric_statistics(
        Namespace="AWS/Cognito",
        MetricName="Errors",
        Dimensions=[{"Name": "UserPool", "Value": pool_id}],
        StartTime=start,
        EndTime=end,
        Period=300,
        Statistics=["Sum"]
    )

    datapoints = resp.get("Datapoints", [])
    return datapoints[0]["Sum"] if datapoints else 0.0


def get_route53_health_check_status(hc_id: str) -> str:
    r53 = boto3.client("route53")
    resp = r53.get_health_check_status(HealthCheckId=hc_id)
    statuses = resp.get("HealthCheckObservations", [])
    healthy = sum(1 for s in statuses if s["StatusReport"]["Status"].startswith("Success"))
    total   = len(statuses)
    return "HEALTHY" if healthy > (total / 2) else "UNHEALTHY"


def describe_replica_status(pool_id: str, region: str) -> str:
    cognito = boto3.client("cognito-idp", region_name=region)
    resp    = cognito.describe_user_pool(UserPoolId=pool_id)
    return resp["UserPool"].get("Status", "UNKNOWN")


def send_alert(message: str, subject: str):
    if not SNS_TOPIC_ARN:
        log.warning("SNS_TOPIC_ARN not set — skipping alert")
        return
    sns = boto3.client("sns")
    sns.publish(TopicArn=SNS_TOPIC_ARN, Message=message, Subject=subject)
    log.info(f"📢 Alert sent: {subject}")


def lambda_handler(event, context):
    """
    Lambda entry point.
    Checks Cognito primary health, logs status, and sends alert if degraded.
    """
    log.info("🔍 Running Cognito MRR health check...")

    error_count   = get_cognito_error_rate(PRIMARY_POOL_ID, PRIMARY_REGION)
    hc_status     = get_route53_health_check_status(HC_ID) if HC_ID else "NOT_CONFIGURED"
    replica_status = describe_replica_status(REPLICA_POOL_ID, SECONDARY_REGION) if REPLICA_POOL_ID else "N/A"

    report = {
        "timestamp":      datetime.now(timezone.utc).isoformat(),
        "primary_region": PRIMARY_REGION,
        "error_count_5m": error_count,
        "route53_hc":     hc_status,
        "replica_status": replica_status,
        "recommendation": "FAILOVER" if (error_count > 10 or hc_status == "UNHEALTHY") else "HEALTHY"
    }

    log.info(json.dumps(report, indent=2))

    if report["recommendation"] == "FAILOVER":
        send_alert(
            message=json.dumps(report, indent=2),
            subject="⚠️ Cognito Primary Region Degraded — Consider Failover"
        )

    return {"statusCode": 200, "body": json.dumps(report)}


# Local testing entry point
if __name__ == "__main__":
    result = lambda_handler({}, {})
    print(result)
Enter fullscreen mode Exit fullscreen mode

Key Limitations to Know

Limitation Details
Write operations Secondary pools are read-only — no new sign-ups, password resets, or profile edits during failover
TOTP MFA Not supported in secondary replicas; TOTP users must authenticate via primary
Replica count Maximum one secondary replica per user pool
Federated users Must have previously signed in via primary before they can use the replica
Lockout counts Failed auth attempt counters are not synced across regions
Custom domain required Automatic Route 53 failover only works with a custom domain
Feature plan Requires Essentials or Plus tier — not available on Lite

Pricing Summary

Auth Type Essentials Tier Plus Tier
User Authentication \$0.0045 / MAU / replica region \$0.006 / MAU / replica region
M2M Authentication +30% on standard token pricing +30% on standard token pricing

Pricing details are per replica region added on top of standard Cognito costs.

Available Regions

MRR is available across major AWS regions as of June 2026, including US East/West, EU (Frankfurt, Ireland, London, Paris, Stockholm), APAC (Mumbai, Tokyo, Seoul, Singapore, Sydney), Canada (Central), and South America (São Paulo). Any of these can serve as either the source or destination for replication.

Operational Checklist Before Going Live

  • [ ] Upgrade user pool to Essentials or Plus plan
  • [ ] Create a multi-region KMS key and replicate it to target region
  • [ ] Update key policy to allow cognito-idp.amazonaws.com access
  • [ ] Switch to multi-region OIDC issuer and update all app clients
  • [ ] Deploy Lambda triggers, WAF rules, and logging config in secondary region
  • [ ] Create replica and wait for INACTIVEACTIVE transition
  • [ ] Set up Route 53 health check and link it to your custom domain's failover config
  • [ ] Configure CloudWatch alarms for error rates and latency
  • [ ] Test failover during off-peak hours by routing a small traffic slice to secondary
  • [ ] Disable sign-up/password-reset UI elements when operating in failover mode

Migrating Existing Cognito User Pools to Multi-Region Replication

Migration Phase Flowchart

Migrating an existing Cognito user pool to MRR is more involved than a fresh setup because you have live users, active sessions, and applications already hardcoded to the original OIDC issuer URL. This guide walks you through every phase — eligibility check, issuer migration, KMS attachment, replica creation, and app updates — without forcing users to re-authenticate or reset passwords.

Phase 0: Eligibility Check — Are You on Next-Gen Infrastructure?

This is the most critical gating factor. MRR only works on next-generation Cognito infrastructure. Older existing pools will be automatically upgraded by AWS, but they cannot self-opt-in. Until then, the console shows an exception message on ineligible pools.

Check Your Pool's Eligibility via CLI

# Check your user pool details for infrastructure version
aws cognito-idp describe-user-pool \
  --region us-west-2 \
  --user-pool-id us-west-2_XXXXXXXXX \
  --query "UserPool.{Tier:UserPoolTier, Status:Status, Domain:Domain}"

# Check if MRR options are available by inspecting ReplicaRegions on describe-user-pool
aws cognito-idp describe-user-pool \
  --region us-west-2 \
  --user-pool-id us-west-2_XXXXXXXXX \
  --query "UserPool.{Tier:UserPoolTier, ReplicaRegions:ReplicaRegions}"
Enter fullscreen mode Exit fullscreen mode

⚠️ Note: There is no list-user-pool-replicas CLI command. Replica information is returned via the ReplicaRegions field in describe-user-pool on the primary pool. If the field is absent or the feature returns an error, the pool is not yet on next-gen infrastructure.

💡 Tip: Check the AWS Security Blog post on Cognito next-generation infrastructure to understand the upgrade timeline.

Phase 1: Pre-Migration Audit

Before touching anything, run a full audit of your existing pool. This prevents surprises mid-migration.

Python: Audit Script for Existing Pool

#!/usr/bin/env python3
"""
cognito_mrr_audit.py
Audits an existing Cognito user pool for MRR readiness.
Outputs a checklist of items that need remediation.
"""

import boto3
import json
import sys
from dataclasses import dataclass, field
from typing import List

@dataclass
class AuditResult:
    check: str
    status: str       # PASS / FAIL / WARN / INFO
    detail: str
    action_required: str = ""

def audit_pool_for_mrr(pool_id: str, region: str) -> List[AuditResult]:
    cognito = boto3.client("cognito-idp", region_name=region)
    results = []

    pool = cognito.describe_user_pool(UserPoolId=pool_id)["UserPool"]

    # ── 1. Feature Plan (Tier) ────────────────────────────────
    tier = pool.get("UserPoolTier", "LITE")
    results.append(AuditResult(
        check="Feature Plan",
        status="PASS" if tier in ["ESSENTIALS", "PLUS"] else "FAIL",
        detail=f"Current tier: {tier}",
        action_required="" if tier != "LITE" else "Upgrade to Essentials or Plus tier before enabling MRR"
    ))

    # ── 2. KMS Key Configuration ──────────────────────────────
    kms_config = pool.get("UserPoolEncryptionConfig", {})
    kms_key_id = kms_config.get("KMSKeyID", "")
    if kms_key_id:
        kms = boto3.client("kms", region_name=region)
        key_meta = kms.describe_key(KeyId=kms_key_id)["KeyMetadata"]
        is_mrk = key_meta.get("MultiRegion", False)
        results.append(AuditResult(
            check="KMS Key",
            status="PASS" if is_mrk else "FAIL",
            detail=f"Key ARN: {kms_key_id}, MultiRegion: {is_mrk}",
            action_required="" if is_mrk else "Replace with a multi-region KMS key (mrk- prefix)"
        ))
    else:
        results.append(AuditResult(
            check="KMS Key",
            status="FAIL",
            detail="No customer managed KMS key configured",
            action_required="Create a multi-region KMS key and attach it to the user pool"
        ))

    # ── 3. OIDC Issuer Type ───────────────────────────────────
    issuer_config = pool.get("IssuerConfiguration", {})
    issuer_type   = issuer_config.get("Type", "LEGACY")
    results.append(AuditResult(
        check="OIDC Issuer Type",
        status="PASS" if issuer_type == "UPDATED" else "FAIL",
        detail=f"Issuer type: {issuer_type}",
        action_required="" if issuer_type == "UPDATED" else (
            "Switch to UPDATED issuer — WARNING: breaking change for existing apps. "
            "Update all services that validate the 'iss' claim before switching."
        )
    ))

    # ── 4. Current Issuer URL ─────────────────────────────────
    old_issuer = f"https://cognito-idp.{region}.amazonaws.com/{pool_id}"
    new_issuer = f"https://issuer-cognito-idp.{region}.amazonaws.com/{pool_id}"
    results.append(AuditResult(
        check="Issuer URL (for reference)",
        status="INFO",
        detail=f"OLD: {old_issuer}\nNEW: {new_issuer}",
        action_required="Update all apps, API GWs, and JWK validators to use NEW issuer URL"
    ))

    # ── 5. MFA Configuration ──────────────────────────────────
    mfa = pool.get("MfaConfiguration", "OFF")
    totp_enabled = pool.get("UserPoolAddOns", {}).get("AdvancedSecurityMode", "OFF")
    results.append(AuditResult(
        check="TOTP MFA",
        status="WARN" if mfa != "OFF" else "PASS",
        detail=f"MFA Config: {mfa}",
        action_required="TOTP MFA users CANNOT authenticate on the replica. "
                        "Plan a communication strategy and disable TOTP-reliant flows in failover mode."
                        if mfa != "OFF" else ""
    ))

    # ── 6. Custom Domain ──────────────────────────────────────
    domain = pool.get("Domain", "") or pool.get("CustomDomain", "")
    results.append(AuditResult(
        check="Custom Domain",
        status="PASS" if domain else "WARN",
        detail=f"Domain: {domain or 'NOT CONFIGURED'}",
        action_required="Automatic Route 53 failover requires a custom domain. "
                        "Without it, your app must manually switch regional endpoints."
                        if not domain else ""
    ))

    # ── 7. App Clients ────────────────────────────────────────
    clients = cognito.list_user_pool_clients(UserPoolId=pool_id, MaxResults=10)
    client_count = len(clients["UserPoolClients"])
    results.append(AuditResult(
        check="App Clients",
        status="INFO",
        detail=f"{client_count} app client(s) found — will be auto-replicated after MRR enabled",
        action_required="Verify each client's callback URLs and allowed OAuth flows post-migration"
    ))

    # ── 8. Lambda Triggers ────────────────────────────────────
    triggers = pool.get("LambdaConfig", {})
    has_triggers = bool(triggers)
    results.append(AuditResult(
        check="Lambda Triggers",
        status="WARN" if has_triggers else "PASS",
        detail=f"Triggers configured: {list(triggers.keys()) if has_triggers else 'None'}",
        action_required="Lambda triggers must be separately configured for the replica region. "
                        "Cross-region Lambda invocations won't work automatically."
                        if has_triggers else ""
    ))

    # ── 9. User Count ─────────────────────────────────────────
    try:
        stats = cognito.describe_user_pool(UserPoolId=pool_id)["UserPool"]
        estimated_users = stats.get("EstimatedNumberOfUsers", "Unknown")
    except Exception:
        estimated_users = "Unknown"
    results.append(AuditResult(
        check="Estimated Users",
        status="INFO",
        detail=f"~{estimated_users} users — larger pools may take longer to initially sync",
        action_required="Allow additional time for initial replication if user count is large"
    ))

    return results


def print_audit_report(results: List[AuditResult], pool_id: str):
    icons = {"PASS": "", "FAIL": "", "WARN": "⚠️ ", "INFO": "ℹ️ "}
    print(f"\n{'='*65}")
    print(f"  COGNITO MRR MIGRATION READINESS AUDIT: {pool_id}")
    print(f"{'='*65}")

    blockers = [r for r in results if r.status == "FAIL"]
    warnings = [r for r in results if r.status == "WARN"]

    for r in results:
        print(f"\n{icons[r.status]} [{r.status}] {r.check}")
        print(f"   Detail : {r.detail}")
        if r.action_required:
            print(f"   Action : {r.action_required}")

    print(f"\n{'='*65}")
    print(f"  BLOCKERS: {len(blockers)}  |  WARNINGS: {len(warnings)}")
    print(f"  {'🚫 MIGRATION BLOCKED — Fix all FAIL items first.' if blockers else '🟢 Ready to proceed (review warnings).'}")
    print(f"{'='*65}\n")


if __name__ == "__main__":
    POOL_ID = sys.argv[1] if len(sys.argv) > 1 else "us-west-2_XXXXXXXXX"
    REGION  = sys.argv[2] if len(sys.argv) > 2 else "us-west-2"
    results = audit_pool_for_mrr(POOL_ID, REGION)
    print_audit_report(results, POOL_ID)
Enter fullscreen mode Exit fullscreen mode

Usage:

python3 cognito_mrr_audit.py us-west-2_XXXXXXXXX us-west-2
Enter fullscreen mode Exit fullscreen mode

Phase 2: Upgrade the Feature Plan (If Needed)

# Upgrade from Lite to Essentials
aws cognito-idp update-user-pool \
  --region us-west-2 \
  --user-pool-id us-west-2_XXXXXXXXX \
  --user-pool-tier ESSENTIALS

# Verify the upgrade
aws cognito-idp describe-user-pool \
  --region us-west-2 \
  --user-pool-id us-west-2_XXXXXXXXX \
  --query "UserPool.UserPoolTier"
Enter fullscreen mode Exit fullscreen mode

Phase 3: Attach a Multi-Region KMS Key to an Existing Pool

⚠️ If your pool is already using an AWS managed key or a single-region CMK, you must create a new multi-region key. You cannot convert an existing single-region key to multi-region.

CLI: Create, Replicate, and Attach

PRIMARY_REGION="us-west-2"
SECONDARY_REGION="us-east-1"
POOL_ID="us-west-2_XXXXXXXXX"
ACCOUNT_ID=$(aws sts get-caller-identity --query Account --output text)

# Step 1: Create multi-region primary key
MRK_KEY_ID=$(aws kms create-key \
  --region $PRIMARY_REGION \
  --multi-region \
  --description "Cognito MRR CMK" \
  --query "KeyMetadata.KeyId" \
  --output text)

echo "Primary MRK Key ID: $MRK_KEY_ID"

MRK_KEY_ARN="arn:aws:kms:${PRIMARY_REGION}:${ACCOUNT_ID}:key/${MRK_KEY_ID}"

# Step 2: Apply key policy (must include identitystore.amazonaws.com for replication)
cat > /tmp/key-policy.json <<EOF
{
  "Version": "2012-10-17",
  "Statement": [
    {
      "Sid": "AllowRoot",
      "Effect": "Allow",
      "Principal": {"AWS": "arn:aws:iam::${ACCOUNT_ID}:root"},
      "Action": "kms:*",
      "Resource": "*"
    },
    {
      "Sid": "AllowCognitoAndIdentityStore",
      "Effect": "Allow",
      "Principal": {
        "Service": [
          "cognito-idp.amazonaws.com",
          "identitystore.amazonaws.com"
        ]
      },
      "Action": [
        "kms:Encrypt","kms:Decrypt","kms:ReEncrypt*",
        "kms:GenerateDataKey*","kms:DescribeKey","kms:CreateGrant"
      ],
      "Resource": "*",
      "Condition": {
        "StringEquals": {"aws:SourceAccount": "${ACCOUNT_ID}"}
      }
    }
  ]
}
EOF

# Apply policy to primary key
aws kms put-key-policy \
  --region $PRIMARY_REGION \
  --key-id $MRK_KEY_ID \
  --policy-name default \
  --policy file:///tmp/key-policy.json

# Step 3: Replicate key to secondary region
aws kms replicate-key \
  --region $PRIMARY_REGION \
  --key-id $MRK_KEY_ARN \
  --replica-region $SECONDARY_REGION

# Wait for the replica key to become active
echo "Waiting for replica key to become active..."
sleep 15

# Step 4: Apply the same policy to the replica key
aws kms put-key-policy \
  --region $SECONDARY_REGION \
  --key-id $MRK_KEY_ARN \   # Multi-region keys share same ARN prefix
  --policy-name default \
  --policy file:///tmp/key-policy.json

# Step 5: Attach the multi-region KMS key to the existing user pool
aws cognito-idp update-user-pool \
  --region $PRIMARY_REGION \
  --user-pool-id $POOL_ID \
  --kms-key-id $MRK_KEY_ARN
Enter fullscreen mode Exit fullscreen mode

Phase 4: The Critical — Switch to Updated OIDC Issuer

This is the highest-risk step. The OIDC issuer change modifies the iss claim in all new tokens . Every backend service, API Gateway authorizer, and JWT validator that checks iss will break if not updated first.

Before Switching — Find All Affected Services

# Search CloudFormation stacks for Cognito issuer references
aws cloudformation list-stacks \
  --stack-status-filter CREATE_COMPLETE UPDATE_COMPLETE \
  --query "StackSummaries[].StackName" \
  --output text | tr '\t' '\n' | while read stack; do
    aws cloudformation get-template --stack-name "$stack" 2>/dev/null | \
      grep -l "cognito-idp\." && echo "  ↳ Found in: $stack"
  done

# Check API Gateway authorizers
aws apigateway get-rest-apis --query "items[].id" --output text | \
  tr '\t' '\n' | while read api_id; do
    aws apigateway get-authorizers --rest-api-id $api_id \
      --query "items[?type=='COGNITO_USER_POOLS'].{name:name,uri:authorizerUri}" \
      --output table 2>/dev/null
  done
Enter fullscreen mode Exit fullscreen mode

Issuer URL Change Reference

Old Issuer (Legacy) New Issuer (Updated)
Format https://cognito-idp.{region}.amazonaws.com/{poolId} https://issuer-cognito-idp.{region}.amazonaws.com/{poolId}
JWKS Endpoint .../..well-known/jwks.json Same path, new base URL
Breaking? Current ✅ Yes — update apps before switching
Token iss claim Per-region Same for both primary and replica

CLI: Perform the Issuer Switch

# ⚠️  Only run this AFTER updating all downstream JWT validators

# ⚠️  IMPORTANT: Switching to the multi-region OIDC issuer ("Updated" issuer type)
# is a CONSOLE-ONLY operation — it cannot be performed via the CLI or SDK.
# Navigate to: Cognito Console → User Pool → App Integration → Issuer URL → Change issuer type

# After switching via the console, verify the change:
aws cognito-idp describe-user-pool \
  --region $PRIMARY_REGION \
  --user-pool-id $POOL_ID \
  --query "UserPool.IssuerConfiguration"
Enter fullscreen mode Exit fullscreen mode

Python: Bulk-Update API Gateway Authorizers

#!/usr/bin/env python3
"""
update_apigw_authorizers.py
Finds all API Gateway Cognito authorizers and updates the
issuer URL from the old legacy format to the new Updated format.
"""

import boto3
import re

REGION        = "us-west-2"
OLD_ISS_PREFIX = "https://cognito-idp."
NEW_ISS_PREFIX = "https://issuer-cognito-idp."

apigw  = boto3.client("apigateway", region_name=REGION)
apigwv2 = boto3.client("apigatewayv2", region_name=REGION)

def update_rest_api_authorizers():
    """Update REST API Cognito authorizers."""
    apis = apigw.get_rest_apis()["items"]
    for api in apis:
        api_id   = api["id"]
        api_name = api["name"]
        authorizers = apigw.get_authorizers(restApiId=api_id).get("items", [])

        for auth in authorizers:
            if auth.get("type") != "COGNITO_USER_POOLS":
                continue

            provider_arns = auth.get("providerARNs", [])
            print(f"\n🔍 API: {api_name} ({api_id}) | Authorizer: {auth['name']}")

            # The issuer is embedded in the userPoolArn — patch the audience/config
            # For REST APIs, update the JWT issuer in policy or Lambda authorizer config
            # For Cognito-native authorizers, the issuer is inferred from the user pool ARN
            # No direct issuer string to patch here — update your custom JWT validators instead
            print(f"   Provider ARNs: {provider_arns}")
            print(f"   ℹ️  Cognito-native authorizers use pool ARN — no direct issuer string to patch.")
            print(f"   ✅ These will automatically use the new issuer once pool is updated.")


def update_http_api_authorizers():
    """Update HTTP API JWT authorizers (explicitly reference issuer URL)."""
    apis = apigwv2.get_apis()["Items"]
    for api in apis:
        api_id   = api["ApiId"]
        api_name = api["Name"]
        authorizers = apigwv2.get_authorizers(ApiId=api_id).get("Items", [])

        for auth in authorizers:
            if auth.get("AuthorizerType") != "JWT":
                continue

            jwt_config   = auth.get("JwtConfiguration", {})
            current_issuer = jwt_config.get("Issuer", "")

            if OLD_ISS_PREFIX in current_issuer:
                new_issuer = current_issuer.replace(OLD_ISS_PREFIX, NEW_ISS_PREFIX)
                print(f"\n🔧 Updating HTTP API: {api_name} ({api_id})")
                print(f"   Auth: {auth['Name']}")
                print(f"   OLD issuer: {current_issuer}")
                print(f"   NEW issuer: {new_issuer}")

                apigwv2.update_authorizer(
                    ApiId=api_id,
                    AuthorizerId=auth["AuthorizerId"],
                    JwtConfiguration={
                        "Issuer":   new_issuer,
                        "Audience": jwt_config.get("Audience", [])
                    }
                )
                print(f"   ✅ Updated.")
            else:
                print(f"\n✅ HTTP API: {api_name} ({api_id}) | {auth['Name']} — already updated or not Cognito")


if __name__ == "__main__":
    print("=" * 60)
    print("  Scanning REST API Authorizers")
    print("=" * 60)
    update_rest_api_authorizers()

    print("\n" + "=" * 60)
    print("  Scanning HTTP API JWT Authorizers")
    print("=" * 60)
    update_http_api_authorizers()

    print("\n✅ Scan complete. Review any remaining custom JWT validators in your application code.")
Enter fullscreen mode Exit fullscreen mode

Phase 5: Create the Replica and Activate

Once the pool has a multi-region KMS key and an updated OIDC issuer, creating the replica is straightforward .

# Create the replica (call is made against the PRIMARY region)
aws cognito-idp create-user-pool-replica-region \
  --region $PRIMARY_REGION \
  --user-pool-id $POOL_ID \
  --replica-region '{"RegionName": "'$SECONDARY_REGION'", "KmsKeyId": "'$MRK_KEY_ARN'"}'

# Poll status — replica info lives on the primary pool's ReplicaRegions field
watch -n 15 "aws cognito-idp describe-user-pool \
  --user-pool-id $POOL_ID \
  --region $PRIMARY_REGION \
  --query 'UserPool.ReplicaRegions[*].{Region:RegionName,Status:Status}' \
  --output table"

# Once ACTIVE, get the replica pool ID from the secondary region directly
REPLICA_POOL_ID=$(aws cognito-idp describe-user-pool \
  --region $SECONDARY_REGION \
  --user-pool-id "${POOL_ID/$PRIMARY_REGION/$SECONDARY_REGION}" \
  --query "UserPool.Id" \
  --output text)

# Configure replica-specific Lambda triggers (must point to secondary-region functions)
aws cognito-idp update-user-pool \
  --region $SECONDARY_REGION \
  --user-pool-id $REPLICA_POOL_ID \
  --lambda-config '{
    "PostAuthentication": "arn:aws:lambda:us-east-1:<ACCOUNT_ID>:function:cognito-post-auth",
    "PreTokenGeneration": "arn:aws:lambda:us-east-1:<ACCOUNT_ID>:function:cognito-pre-token"
  }'

echo "✅ Replica is ACTIVE in $SECONDARY_REGION"
Enter fullscreen mode Exit fullscreen mode

Phase 6: Configure Failover-Aware Application Code

After MRR is enabled, your app must intelligently route write vs. read operations and handle failover.

Python: Smart Cognito Client with Regional Failover

#!/usr/bin/env python3
"""
cognito_smart_client.py
A resilient Cognito client that:
- Routes writes to primary region
- Routes reads (sign-in) to nearest healthy region
- Falls back automatically on OperationNotEnabledException
"""

import boto3
import logging
from botocore.exceptions import ClientError
from typing import Optional

log = logging.getLogger(__name__)

PRIMARY_REGION   = "us-west-2"
SECONDARY_REGION = "us-east-1"
CLIENT_ID        = "your-app-client-id"


class ResilientCognitoClient:
    def __init__(self, prefer_secondary: bool = False):
        self._primary   = boto3.client("cognito-idp", region_name=PRIMARY_REGION)
        self._secondary = boto3.client("cognito-idp", region_name=SECONDARY_REGION)

        # Determine which region to use for authentication
        self._auth_client = self._secondary if prefer_secondary else self._primary
        self._auth_region = SECONDARY_REGION if prefer_secondary else PRIMARY_REGION

    # ── READ OPERATIONS (replica-safe) ──────────────────────
    def sign_in(self, username: str, password: str) -> dict:
        """
        Attempts sign-in on preferred region; falls back to primary if replica is degraded.
        """
        for client, region in [
            (self._auth_client, self._auth_region),
            (self._primary, PRIMARY_REGION)
        ]:
            try:
                resp = client.initiate_auth(
                    AuthFlow="USER_PASSWORD_AUTH",
                    ClientId=CLIENT_ID,
                    AuthParameters={"USERNAME": username, "PASSWORD": password}
                )
                log.info(f"✅ Authenticated via {region}")
                return resp["AuthenticationResult"]
            except ClientError as e:
                code = e.response["Error"]["Code"]
                if code in ("OperationNotEnabledException", "ServiceUnavailableException"):
                    log.warning(f"⚠️  {region} unavailable ({code}), trying fallback...")
                    continue
                raise  # Re-raise auth errors (wrong password, etc.)

        raise RuntimeError("Authentication failed in all regions")

    def get_user(self, access_token: str) -> dict:
        """Token was issued by either region — try both if needed."""
        for client, region in [
            (self._auth_client, self._auth_region),
            (self._primary, PRIMARY_REGION)
        ]:
            try:
                return client.get_user(AccessToken=access_token)
            except ClientError as e:
                if "NotAuthorizedException" in e.response["Error"]["Code"]:
                    raise  # Bad token — don't retry
                log.warning(f"get_user failed on {region}: {e}")
                continue
        raise RuntimeError("get_user failed in all regions")

    # ── WRITE OPERATIONS (primary only) ──────────────────────
    def sign_up(self, username: str, password: str, email: str) -> dict:
        """Always routes to primary — writes are rejected on replica."""
        return self._primary.sign_up(
            ClientId=CLIENT_ID,
            Username=username,
            Password=password,
            UserAttributes=[{"Name": "email", "Value": email}]
        )

    def change_password(self, access_token: str, old_pw: str, new_pw: str):
        """Must go to primary — OperationNotEnabledException on replica."""
        return self._primary.change_password(
            AccessToken=access_token,
            PreviousPassword=old_pw,
            ProposedPassword=new_pw
        )

    def forgot_password(self, username: str):
        """Password reset always to primary."""
        return self._primary.forgot_password(
            ClientId=CLIENT_ID,
            Username=username
        )

    def refresh_tokens(self, refresh_token: str) -> dict:
        """Refresh works on both regions; try preferred first."""
        for client, region in [
            (self._auth_client, self._auth_region),
            (self._primary, PRIMARY_REGION)
        ]:
            try:
                resp = client.initiate_auth(
                    AuthFlow="REFRESH_TOKEN_AUTH",
                    ClientId=CLIENT_ID,
                    AuthParameters={"REFRESH_TOKEN": refresh_token}
                )
                log.info(f"✅ Token refreshed via {region}")
                return resp["AuthenticationResult"]
            except ClientError as e:
                if e.response["Error"]["Code"] == "NotAuthorizedException":
                    raise  # Invalid/expired refresh token
                log.warning(f"Refresh failed on {region}: {e}")
                continue

        raise RuntimeError("Token refresh failed in all regions")


# ── Usage Example ────────────────────────────────────────────
if __name__ == "__main__":
    client = ResilientCognitoClient(prefer_secondary=False)  # Set True during failover

    try:
        tokens = client.sign_in("testuser@example.com", "MyPassword123!")
        print("Access Token:", tokens["AccessToken"][:20], "...")
        print("ID Token:    ", tokens["IdToken"][:20], "...")
    except RuntimeError as e:
        print(f"❌ Auth failed: {e}")
Enter fullscreen mode Exit fullscreen mode

Terraform: Migrating an Existing Pool (State Import + MRR Attachment)

If you manage your existing pool in Terraform but it was created before MRR, use terraform import to bring it under the new MRR-enabled config without re-creating users.

# terraform.tf — Update existing pool resource block to add MRR config

resource "aws_cognito_user_pool" "existing" {
  provider = aws.primary
  name     = "myapp-user-pool"       # Keep the same name

  # ── Add these new blocks to existing resource ─────────────
  user_pool_tier = "ESSENTIALS"       # Upgrade from LITE if needed

  # Attach multi-region KMS key
  # (aws_kms_key.cognito_mrk created as shown in previous blog post)

  tags = {
    Environment = "production"
    MRR         = "enabled"
  }
}

# After terraform apply, create the replica
resource "aws_cognito_user_pool_replica" "secondary" {
  provider     = aws.primary
  user_pool_id = aws_cognito_user_pool.existing.id
  region_name  = "us-east-1"

  depends_on = [
    aws_kms_replica_key.cognito_mrk_replica
  ]
}
Enter fullscreen mode Exit fullscreen mode
# Import existing pool into Terraform state (no re-creation of users)
terraform import aws_cognito_user_pool.existing us-west-2_XXXXXXXXX

# Plan — verify only MRR-related changes, not destructive ones
terraform plan -out=mrr-migration.tfplan

# Review the plan carefully — ensure no "destroy" on user_pool
grep -i "destroy\|replace" mrr-migration.tfplan

# Apply when satisfied
terraform apply mrr-migration.tfplan
Enter fullscreen mode Exit fullscreen mode

Full Migration Sequence Summary

Step Action Risk CLI Command
0 Check next-gen eligibility Low list-user-pool-replicas
1 Upgrade to Essentials/Plus Low update-user-pool --user-pool-tier ESSENTIALS
2 Create multi-region KMS key Low kms create-key --multi-region
3 Replicate KMS key to secondary Low kms replicate-key
4 Attach KMS key to user pool Low update-user-pool --key-configuration
5 Update all JWT validators to new issuer HIGH Manual app update
6 Switch OIDC issuer to UPDATED HIGH update-user-pool --issuer-configuration Type=UPDATED
7 Configure replica-specific settings Medium update-user-pool on secondary
8 Create replica Low create-user-pool-replica
9 Wait for replica to become ACTIVE Low describe-user-pool --query UserPool.ReplicaRegions
10 Set up Route 53 health check Low route53 create-health-check
11 Enable failover on custom domain Low Console or CLI
12 Test failover in staging Medium initiate-auth against replica region

Post-Migration Verification

# Verify replica is syncing users correctly
REPLICA_POOL_ID="us-east-1_XXXXXXXXX"  # Get from list-user-pool-replicas

# Check a known user on the replica
aws cognito-idp admin-get-user \
  --region $SECONDARY_REGION \
  --user-pool-id $REPLICA_POOL_ID \
  --username testuser@example.com

# Test auth on the replica region
aws cognito-idp initiate-auth \
  --region $SECONDARY_REGION \
  --auth-flow USER_PASSWORD_AUTH \
  --client-id $CLIENT_ID \
  --auth-parameters USERNAME=testuser@example.com,PASSWORD='TestPass123!'

# Confirm the iss claim matches the updated issuer format
REPLICA_TOKEN=$(aws cognito-idp initiate-auth \
  --region $SECONDARY_REGION \
  --auth-flow USER_PASSWORD_AUTH \
  --client-id $CLIENT_ID \
  --auth-parameters USERNAME=testuser@example.com,PASSWORD='TestPass123!' \
  --query "AuthenticationResult.IdToken" \
  --output text)

# Decode and check the iss claim (requires jq and base64)
echo $REPLICA_TOKEN | cut -d'.' -f2 | base64 -d 2>/dev/null | jq '{iss: .iss, sub: .sub, aud: .aud}'
Enter fullscreen mode Exit fullscreen mode

The iss claim should now read https://issuer-cognito-idp.us-west-2.amazonaws.com/us-west-2_XXXXXXXXX — identical for tokens from both regions, confirming that your JWT validators need no per-region branching .

Top comments (0)