The Challenge
When you need to transfer large database backups (85GB+) from AWS S3 to Azure Blob Storage as part of the Disaster Recovery measure, AWS Lambda's 15-minute timeout becomes a significant constraint. The backup files take 60-90 minutes to transfer, making Lambda unsuitable for this use case.
REPORT RequestId: b000000b6a-000-0000-0000-0a0f0bf0c000 Duration: 900000.00 ms Billed Duration: 901344 ms Memory Size: 1024 MB Max Memory Used: 156 MB Init Duration: 1343.19 ms Status: Task timed out
Additionally, in enterprise environments with multi-account AWS setups, you often need to:
- Build and store Docker images in a central CICD account
- Deploy infrastructure to separate environment accounts (dev, staging, prod)
- Maintain proper security boundaries between accounts
This article walks through building a production-ready backup offloading system that handles these challenges using ECS Fargate and cross-account deployment patterns.
Architecture Overview
[Account A: CICD Account]
├── ECR Repository (Docker Images)
├── CodeBuild (CI/CD)
└── CodePipeline
[Account B: Target Account for Deployment]
├── S3 Bucket (Backup Files)
├── SQS Queue
├── Lambda (ECS Trigger)
├── ECS Fargate (Transfer Task)
└── Secrets Manager (Azure Credentials)
Flow:
S3 → Event Notification → SQS → Lambda → ECS Fargate → Azure Blob Storage
Why This Architecture?
ECS Fargate over Lambda:
- No timeout limits (Lambda: 15 min max)
- Handles files of any size
- Streaming transfer (no local storage limits)
- Better for long-running tasks
Multi-Account Pattern:
- Centralized Docker image management
- Clear separation of concerns
- Reusable images across environments
- Enhanced security boundaries
Implementation Guide
Part 1: Docker Container for Backup Processing
The heart of the system is a Python container that streams files from S3 to Azure:
docker/backup_processor/Dockerfile:
FROM python:3.12-slim
WORKDIR /app
COPY requirements.txt .
RUN pip install --no-cache-dir -r requirements.txt
COPY app.py .
CMD ["python", "app.py"]
docker/backup_processor/requirements.txt:
azure-storage-blob==12.19.0
boto3
Key features of the transfer application:
- Retrieves Azure credentials from AWS Secrets Manager
- Streams data from S3 (no local storage needed)
- Parallel uploads to Azure (8x concurrency)
- Emits CloudWatch metrics for monitoring
- Handles errors gracefully
Part 2: Multi-Account Docker Image Management
In Account A (CICD account) - Build and Push:
The CI/CD pipeline builds the Docker image and pushes it to ECR in Account A:
# buildspec.yml
pre_build:
commands:
# Login to Account A ECR
- aws ecr get-login-password --region $REGION | docker login --username AWS --password-stdin $ACCOUNT_A.dkr.ecr.$REGION.amazonaws.com
# Build and push image
- cd docker/backup_processor
- docker build -t backup-processor:$IMAGE_TAG .
- docker tag backup-processor:$IMAGE_TAG $ACCOUNT_A.dkr.ecr.$REGION.amazonaws.com/backup-processor:$IMAGE_TAG
- docker push $ACCOUNT_A.dkr.ecr.$REGION.amazonaws.com/backup-processor:$IMAGE_TAG
# Set image URI for deployment
- export DOCKER_IMAGE_URI="$ACCOUNT_A.dkr.ecr.$REGION.amazonaws.com/backup-processor:$IMAGE_TAG"
ECR Repository Policy in Account A:
Allow Account B to pull images:
{
"Version": "2008-10-17",
"Statement": [{
"Sid": "AllowAccountBPull",
"Effect": "Allow",
"Principal": {
"AWS": "arn:aws:iam::ACCOUNT_B:root"
},
"Action": [
"ecr:GetDownloadUrlForLayer",
"ecr:BatchGetImage",
"ecr:BatchCheckLayerAvailability"
]
}]
}
Part 3: ECS Infrastructure in Account B
CDK Stack Structure:
class BackupOffloaderStack(Stack):
def __init__(self, scope, construct_id, **kwargs):
super().__init__(scope, construct_id, **kwargs)
# Get Docker image URI from environment (set by buildspec)
docker_image_uri = os.environ.get('DOCKER_IMAGE_URI')
# Create ECS cluster
cluster = ecs.Cluster(self, "BackupCluster",
cluster_name="backup-processor-cluster",
vpc=self.vpc
)
# Create task definition
task_definition = ecs.FargateTaskDefinition(
self, "BackupTaskDef",
memory_limit_mib=2048,
cpu=1024,
task_role=task_role,
execution_role=execution_role
)
# Add container referencing cross-account image
container = task_definition.add_container(
"BackupContainer",
image=ecs.ContainerImage.from_registry(docker_image_uri),
logging=ecs.LogDrivers.aws_logs(
stream_prefix="backup-processor",
log_retention=logs.RetentionDays.ONE_MONTH
),
environment={
"AZURE_SECRET_NAME": "azure-credentials",
"LOG_LEVEL": "INFO"
}
)
Critical: Cross-Account ECR Permissions
The ECS Task Execution Role in Account B needs permission to pull from Account A's ECR:
execution_role = iam.Role(
self, "ExecutionRole",
assumed_by=iam.ServicePrincipal("ecs-tasks.amazonaws.com"),
managed_policies=[
iam.ManagedPolicy.from_aws_managed_policy_name(
"service-role/AmazonECSTaskExecutionRolePolicy"
)
],
inline_policies={
"CrossAccountECRAccess": iam.PolicyDocument(
statements=[
iam.PolicyStatement(
effect=iam.Effect.ALLOW,
actions=["ecr:GetAuthorizationToken"],
resources=["*"]
),
iam.PolicyStatement(
effect=iam.Effect.ALLOW,
actions=[
"ecr:BatchCheckLayerAvailability",
"ecr:GetDownloadUrlForLayer",
"ecr:BatchGetImage"
],
resources=[
f"arn:aws:ecr:region:ACCOUNT_A:repository/backup-processor"
]
)
]
)
}
)
Part 4: Event-Driven Trigger System
S3 → SQS → Lambda → ECS Flow:
# Lambda function to trigger ECS tasks
trigger_code = """
import boto3
import json
ecs_client = boto3.client('ecs')
def lambda_handler(event, context):
for record in event.get('Records', []):
message = json.loads(record['body'])
# Extract S3 details
s3_event = message['Records'][0]
bucket = s3_event['s3']['bucket']['name']
key = s3_event['s3']['object']['key']
# Trigger ECS task
ecs_client.run_task(
cluster='backup-processor-cluster',
taskDefinition='BackupTaskDef',
launchType='FARGATE',
networkConfiguration={
'awsvpcConfiguration': {
'subnets': SUBNET_IDS,
'assignPublicIp': 'ENABLED'
}
},
overrides={
'containerOverrides': [{
'name': 'BackupContainer',
'environment': [
{'name': 'S3_BUCKET', 'value': bucket},
{'name': 'S3_KEY', 'value': key}
]
}]
}
)
"""
Part 5: Deployment Pipeline
Complete CI/CD Flow:
-
Pre-build Phase (Account A):
- Build Docker image
- Push to Account A ECR
- Set DOCKER_IMAGE_URI environment variable
-
Assume Role to Account B:
- Assume deployment role in Account B
- Switch AWS credentials
-
Build Phase (Account B):
- CDK synth (reads DOCKER_IMAGE_URI)
- CDK deploy infrastructure
- ECS tasks pull image from Account A ECR
Key Configuration:
# buildspec.yml
phases:
pre_build:
commands:
# Build in Account A
- docker build -t $IMAGE_NAME .
- docker push $ACCOUNT_A_ECR/$IMAGE_NAME:$TAG
- export DOCKER_IMAGE_URI="$ACCOUNT_A_ECR/$IMAGE_NAME:$TAG"
# Assume role to Account B
- aws sts assume-role --role-arn $ACCOUNT_B_ROLE ...
- export AWS_ACCESS_KEY_ID=...
build:
commands:
# Deploy to Account B (uses DOCKER_IMAGE_URI)
- cdk deploy
Monitoring and Operations
CloudWatch Metrics
Track transfer success and performance:
cloudwatch.put_metric_data(
Namespace='BackupOffloader',
MetricData=[
{'MetricName': 'BackupsCopied', 'Value': 1, 'Unit': 'Count'},
{'MetricName': 'TransferDuration', 'Value': duration, 'Unit': 'Seconds'},
{'MetricName': 'BytesTransferred', 'Value': bytes, 'Unit': 'Bytes'}
]
)
Log Groups
- ECS Task Logs:
- Lambda Trigger Logs:
Troubleshooting Cross-Account Issues
CannotPullContainerError (403 Forbidden):
- Verify ECR repository policy in Account A allows Account B
- Check execution role has cross-account ECR permissions
- Confirm image exists and tag is correct
Security Considerations
-
Credentials Management:
- Azure credentials in Secrets Manager (encrypted)
- IAM roles for AWS resource access (no long-term keys)
-
Network Security:
- ECS tasks in private subnets with NAT gateway
- Security groups restrict traffic
-
Cross-Account Access:
- Least-privilege IAM policies
- Specific ECR repository access only
- Regular credential rotation
Key Takeaways
- ECS Fargate removes Lambda timeout constraints for long-running data transfers
- Multi-account patterns enhance security and enable centralized image management
- Cross-account ECR access requires proper permissions on both accounts
- Event-driven architecture scales automatically with backup frequency
- Infrastructure as Code makes the system reproducible and maintainable
Conclusion
This architecture demonstrates how to build production-ready, cross-account data transfer systems on AWS. By combining ECS Fargate's flexibility with proper multi-account IAM configuration, you can handle large-scale data movements that exceed Lambda's limitations while maintaining security boundaries between environments.
The pattern shown here is applicable beyond backup systems - any long-running data processing task that needs to span AWS accounts can benefit from this approach.
Top comments (0)