DEV Community

Clariza Look
Clariza Look

Posted on

Building a Cross-Account S3 to Azure Backup System with ECS Fargate

The Challenge

When you need to transfer large database backups (85GB+) from AWS S3 to Azure Blob Storage as part of the Disaster Recovery measure, AWS Lambda's 15-minute timeout becomes a significant constraint. The backup files take 60-90 minutes to transfer, making Lambda unsuitable for this use case.

REPORT RequestId: b000000b6a-000-0000-0000-0a0f0bf0c000 Duration: 900000.00 ms Billed Duration: 901344 ms Memory Size: 1024 MB Max Memory Used: 156 MB Init Duration: 1343.19 ms Status: Task timed out
Enter fullscreen mode Exit fullscreen mode

Additionally, in enterprise environments with multi-account AWS setups, you often need to:

  • Build and store Docker images in a central CICD account
  • Deploy infrastructure to separate environment accounts (dev, staging, prod)
  • Maintain proper security boundaries between accounts

This article walks through building a production-ready backup offloading system that handles these challenges using ECS Fargate and cross-account deployment patterns.

Architecture Overview

[Account A: CICD Account]
├── ECR Repository (Docker Images)
├── CodeBuild (CI/CD)
└── CodePipeline

[Account B: Target Account for Deployment]
├── S3 Bucket (Backup Files)
├── SQS Queue
├── Lambda (ECS Trigger)
├── ECS Fargate (Transfer Task)
└── Secrets Manager (Azure Credentials)

Flow:
S3 → Event Notification → SQS → Lambda → ECS Fargate → Azure Blob Storage
Enter fullscreen mode Exit fullscreen mode

S3 → Event Notification → SQS → Lambda → ECS Fargate → Azure Blob Storage

Why This Architecture?

ECS Fargate over Lambda:

  • No timeout limits (Lambda: 15 min max)
  • Handles files of any size
  • Streaming transfer (no local storage limits)
  • Better for long-running tasks

Multi-Account Pattern:

  • Centralized Docker image management
  • Clear separation of concerns
  • Reusable images across environments
  • Enhanced security boundaries

Implementation Guide

Part 1: Docker Container for Backup Processing

The heart of the system is a Python container that streams files from S3 to Azure:

docker/backup_processor/Dockerfile:

FROM python:3.12-slim

WORKDIR /app

COPY requirements.txt .
RUN pip install --no-cache-dir -r requirements.txt

COPY app.py .

CMD ["python", "app.py"]
Enter fullscreen mode Exit fullscreen mode

docker/backup_processor/requirements.txt:

azure-storage-blob==12.19.0
boto3
Enter fullscreen mode Exit fullscreen mode

Key features of the transfer application:

  • Retrieves Azure credentials from AWS Secrets Manager
  • Streams data from S3 (no local storage needed)
  • Parallel uploads to Azure (8x concurrency)
  • Emits CloudWatch metrics for monitoring
  • Handles errors gracefully

Part 2: Multi-Account Docker Image Management

In Account A (CICD account) - Build and Push:

The CI/CD pipeline builds the Docker image and pushes it to ECR in Account A:

# buildspec.yml
pre_build:
  commands:
    # Login to Account A ECR
    - aws ecr get-login-password --region $REGION | docker login --username AWS --password-stdin $ACCOUNT_A.dkr.ecr.$REGION.amazonaws.com

    # Build and push image
    - cd docker/backup_processor
    - docker build -t backup-processor:$IMAGE_TAG .
    - docker tag backup-processor:$IMAGE_TAG $ACCOUNT_A.dkr.ecr.$REGION.amazonaws.com/backup-processor:$IMAGE_TAG
    - docker push $ACCOUNT_A.dkr.ecr.$REGION.amazonaws.com/backup-processor:$IMAGE_TAG

    # Set image URI for deployment
    - export DOCKER_IMAGE_URI="$ACCOUNT_A.dkr.ecr.$REGION.amazonaws.com/backup-processor:$IMAGE_TAG"
Enter fullscreen mode Exit fullscreen mode

ECR Repository Policy in Account A:

Allow Account B to pull images:

{
  "Version": "2008-10-17",
  "Statement": [{
    "Sid": "AllowAccountBPull",
    "Effect": "Allow",
    "Principal": {
      "AWS": "arn:aws:iam::ACCOUNT_B:root"
    },
    "Action": [
      "ecr:GetDownloadUrlForLayer",
      "ecr:BatchGetImage",
      "ecr:BatchCheckLayerAvailability"
    ]
  }]
}
Enter fullscreen mode Exit fullscreen mode

Part 3: ECS Infrastructure in Account B

CDK Stack Structure:

class BackupOffloaderStack(Stack):
    def __init__(self, scope, construct_id, **kwargs):
        super().__init__(scope, construct_id, **kwargs)

        # Get Docker image URI from environment (set by buildspec)
        docker_image_uri = os.environ.get('DOCKER_IMAGE_URI')

        # Create ECS cluster
        cluster = ecs.Cluster(self, "BackupCluster",
            cluster_name="backup-processor-cluster",
            vpc=self.vpc
        )

        # Create task definition
        task_definition = ecs.FargateTaskDefinition(
            self, "BackupTaskDef",
            memory_limit_mib=2048,
            cpu=1024,
            task_role=task_role,
            execution_role=execution_role
        )

        # Add container referencing cross-account image
        container = task_definition.add_container(
            "BackupContainer",
            image=ecs.ContainerImage.from_registry(docker_image_uri),
            logging=ecs.LogDrivers.aws_logs(
                stream_prefix="backup-processor",
                log_retention=logs.RetentionDays.ONE_MONTH
            ),
            environment={
                "AZURE_SECRET_NAME": "azure-credentials",
                "LOG_LEVEL": "INFO"
            }
        )
Enter fullscreen mode Exit fullscreen mode

Critical: Cross-Account ECR Permissions

The ECS Task Execution Role in Account B needs permission to pull from Account A's ECR:

execution_role = iam.Role(
    self, "ExecutionRole",
    assumed_by=iam.ServicePrincipal("ecs-tasks.amazonaws.com"),
    managed_policies=[
        iam.ManagedPolicy.from_aws_managed_policy_name(
            "service-role/AmazonECSTaskExecutionRolePolicy"
        )
    ],
    inline_policies={
        "CrossAccountECRAccess": iam.PolicyDocument(
            statements=[
                iam.PolicyStatement(
                    effect=iam.Effect.ALLOW,
                    actions=["ecr:GetAuthorizationToken"],
                    resources=["*"]
                ),
                iam.PolicyStatement(
                    effect=iam.Effect.ALLOW,
                    actions=[
                        "ecr:BatchCheckLayerAvailability",
                        "ecr:GetDownloadUrlForLayer",
                        "ecr:BatchGetImage"
                    ],
                    resources=[
                        f"arn:aws:ecr:region:ACCOUNT_A:repository/backup-processor"
                    ]
                )
            ]
        )
    }
)
Enter fullscreen mode Exit fullscreen mode

Part 4: Event-Driven Trigger System

S3 → SQS → Lambda → ECS Flow:

# Lambda function to trigger ECS tasks
trigger_code = """
import boto3
import json

ecs_client = boto3.client('ecs')

def lambda_handler(event, context):
    for record in event.get('Records', []):
        message = json.loads(record['body'])

        # Extract S3 details
        s3_event = message['Records'][0]
        bucket = s3_event['s3']['bucket']['name']
        key = s3_event['s3']['object']['key']

        # Trigger ECS task
        ecs_client.run_task(
            cluster='backup-processor-cluster',
            taskDefinition='BackupTaskDef',
            launchType='FARGATE',
            networkConfiguration={
                'awsvpcConfiguration': {
                    'subnets': SUBNET_IDS,
                    'assignPublicIp': 'ENABLED'
                }
            },
            overrides={
                'containerOverrides': [{
                    'name': 'BackupContainer',
                    'environment': [
                        {'name': 'S3_BUCKET', 'value': bucket},
                        {'name': 'S3_KEY', 'value': key}
                    ]
                }]
            }
        )
"""
Enter fullscreen mode Exit fullscreen mode

Part 5: Deployment Pipeline

Complete CI/CD Flow:

  1. Pre-build Phase (Account A):

    • Build Docker image
    • Push to Account A ECR
    • Set DOCKER_IMAGE_URI environment variable
  2. Assume Role to Account B:

    • Assume deployment role in Account B
    • Switch AWS credentials
  3. Build Phase (Account B):

    • CDK synth (reads DOCKER_IMAGE_URI)
    • CDK deploy infrastructure
    • ECS tasks pull image from Account A ECR

Key Configuration:

# buildspec.yml
phases:
  pre_build:
    commands:
      # Build in Account A
      - docker build -t $IMAGE_NAME .
      - docker push $ACCOUNT_A_ECR/$IMAGE_NAME:$TAG
      - export DOCKER_IMAGE_URI="$ACCOUNT_A_ECR/$IMAGE_NAME:$TAG"

      # Assume role to Account B
      - aws sts assume-role --role-arn $ACCOUNT_B_ROLE ...
      - export AWS_ACCESS_KEY_ID=...

  build:
    commands:
      # Deploy to Account B (uses DOCKER_IMAGE_URI)
      - cdk deploy
Enter fullscreen mode Exit fullscreen mode

Monitoring and Operations

CloudWatch Metrics

Track transfer success and performance:

cloudwatch.put_metric_data(
    Namespace='BackupOffloader',
    MetricData=[
        {'MetricName': 'BackupsCopied', 'Value': 1, 'Unit': 'Count'},
        {'MetricName': 'TransferDuration', 'Value': duration, 'Unit': 'Seconds'},
        {'MetricName': 'BytesTransferred', 'Value': bytes, 'Unit': 'Bytes'}
    ]
)
Enter fullscreen mode Exit fullscreen mode

Log Groups

  • ECS Task Logs:
  • Lambda Trigger Logs:

Troubleshooting Cross-Account Issues

CannotPullContainerError (403 Forbidden):

  • Verify ECR repository policy in Account A allows Account B
  • Check execution role has cross-account ECR permissions
  • Confirm image exists and tag is correct

Security Considerations

  1. Credentials Management:

    • Azure credentials in Secrets Manager (encrypted)
    • IAM roles for AWS resource access (no long-term keys)
  2. Network Security:

    • ECS tasks in private subnets with NAT gateway
    • Security groups restrict traffic
  3. Cross-Account Access:

    • Least-privilege IAM policies
    • Specific ECR repository access only
    • Regular credential rotation

Key Takeaways

  1. ECS Fargate removes Lambda timeout constraints for long-running data transfers
  2. Multi-account patterns enhance security and enable centralized image management
  3. Cross-account ECR access requires proper permissions on both accounts
  4. Event-driven architecture scales automatically with backup frequency
  5. Infrastructure as Code makes the system reproducible and maintainable

Conclusion

This architecture demonstrates how to build production-ready, cross-account data transfer systems on AWS. By combining ECS Fargate's flexibility with proper multi-account IAM configuration, you can handle large-scale data movements that exceed Lambda's limitations while maintaining security boundaries between environments.

The pattern shown here is applicable beyond backup systems - any long-running data processing task that needs to span AWS accounts can benefit from this approach.

Top comments (0)