AWS Data migration service (DMS) configuration considerations

#awsbigdata #awsdatalake #aws #s3

A data migration to AWS involves leveraging various AWS services tailored to the type of data being migrated:

Relational Database Migration
Use AWS Database Migration Service (DMS) for migrating relational databases.
Additionally, native database tools such as Oracle GoldenGate, SQL Server Replication, bulk load, etc., can be considered based on the source and target environments.
File-Based Migration
Utilize services like AWS Transfer Family and AWS DataSync etc for migrating file-based data efficiently and securely.
Real-Time Data Migration
For streaming or real-time data, consider Amazon Kinesis or Amazon MSK (Managed Streaming for Apache Kafka) to ensure low-latency and scalable data movement.

In this blog, we will explore AWS Database Migration Service (DMS)—its core capabilities, configuration options, and key considerations around cost and optimization. Whether one is planning a large-scale migration or a targeted database shift, understanding how to configure DMS effectively and manage associated costs is essential for a successful cloud transition.

AWS DMS:

AWS Database Migration Service is a cloud service to migrate relational databases, NoSQL databases, data warehouses and all other type of data stores into AWS Cloud or between cloud and on-premises setups efficiently and securely. DMS supports several types of source and target databases such as Oracle, MS SQL Server, MySQL, Postgres SQL, Amazon Aurora, AWS RDS, Redshift and S3 etc.

The service runs on an Amazon EC2 instance (Replication instance), which performs the following tasks:

Reads data from the source database.
Transforms/formats the data as needed.
Loads the data into the target database.

AWS DMS operates through a secure, highly available architecture designed for reliable database migrations. Here's the detailed architecture and implementation approach:

Security Implementation

Network Security:

{
    "VPC": {
        "CIDR": "10.0.0.0/16",
        "Subnets": ["10.0.1.0/24", "10.0.2.0/24"],
        "SecurityGroups": {
            "DMSReplicationInstance": {
                "Inbound": [
                    {"Port": 3306, "Source": "SourceDB-SG"},
                    {"Port": 5432, "Source": "TargetDB-SG"}
                ]
            }
        }
    }
}

Encryption Configuration:

Encryption:
  AtRest:
    - KMS key for replication instance
    - KMS key for stored credentials
  InTransit:
    - SSL/TLS for database connections
    - AWS Certificate Manager integration

Performance Optimization

Replication Instance Sizing Matrix:

Data Volume | Instance Class  | Storage (GB) | Network Performance
-----------|----------------|--------------|--------------------
< 1 TB     | dms.t3.large  | 100          | Up to 5 Gbps
1-5 TB     | dms.r5.xlarge | 200          | Up to 10 Gbps
5-10 TB    | dms.r5.2xlarge| 400          | Up to 15 Gbps
> 10 TB    | dms.r5.4xlarge| 1000         | Up to 25 Gbps

Task Configuration for Optimal Performance:

{
    "TaskSettings": {
        "TargetMetadata": {
            "BatchApplyEnabled": true,
            "ParallelLoadThreads": 8,
            "ParallelLoadBufferSize": 50000
        },
        "FullLoadSettings": {
            "MaxFullLoadSubTasks": 8,
            "TransactionConsistencyTimeout": 600
        }
    }
}

Monitoring and Alerting

CloudWatch Metrics Dashboard:

Essential Metrics:
  - CPUUtilization: Alert threshold > 80%
  - FreeableMemory: Alert threshold < 2GB
  - FreeStorageSpace: Alert threshold < 10%
  - ReplicationLag: Alert threshold > 300 seconds

Operational Alerts:

def create_cloudwatch_alarm():
    cloudwatch.put_metric_alarm(
        AlarmName='DMS-ReplicationLag',
        MetricName='ReplicationLag',
        Threshold=300,
        Period=300,
        EvaluationPeriods=2,
        ComparisonOperator='GreaterThanThreshold',
        AlarmActions=[sns_topic_arn]
    )

Disaster Recovery

Backup Strategy:
  - Task settings backup
  - Source endpoint configurations
  - Target endpoint configurations
  - CloudFormation templates

Recovery Procedures:
  1. Launch new replication instance
  2. Restore task settings
  3. Recreate endpoints
  4. Resume replication

Components to consider for cost factors for AWS DMS are as follows:

Instance class:
This is the type of instance class considered for the replication instance for the actual data migration. AWS DMS supports T2, T3, C4, C5, C6i, R4, R5 and R6i instance classes.
Storage cost:
Each of these instances will have 50/100GBs or more of the storage attached for data cache, replication logs with $0.115 GB/month for single AZ instance.
Multi-AZ:
Single-AZ or Multi-AZ for high availability of the above configurations would increase the cost of above configuration accordingly.
Data Transfer:
A standard data transfer cost for any data out of the region or the AWS account.

Database migrations and Change Data Capture (CDC) are inherently complex processes influenced by multiple factors. To ensure a successful and optimized migration using AWS Database Migration Service (DMS), it is essential to carefully evaluate and configure the service based on the specific data migration requirements:

Source database engine and versions.
Target database engine and versions.
Is target database relational or non-relational database.
Homogeneous or heterogeneous data migration for SCD tool considerations accordingly.
Requirement of one-time data migration or continuous data replication.
Selected tables or entire database.
Number of tables, columns, transformations required.
LOB columns to consider for the data migration.
In case of CDC, acceptable amount of replication latency.
Rate of change of data at source during peak hours.
Source and target database utilization during peak hours.
Transaction log size.
Network connectivity between source and target.
Network latency between source to replication instance and replication instance and target.
Additional source database engine configurations such as storage on ASM, RAC implementation for Oracle.
Execute AWS provided scripts on the source database to capture configuration details.

A factors that affects the performance during data migration using AWS DMS are as follows :

Resource availability on the source.
The available network throughput.
The resource capacity of the replication server.
The ability of the target to ingest changes.
The type and distribution of source data.
The number of objects to be migrated.

Considering all the above points one need to consider below configurations:

Latest version of the AWS DMS.
Sufficient storage on the replication instance for transaction logs and buffering of the data during peak hours.
Limitations for data migration for AWS DMS for each of the source and target databases.
Multi-AZ configurations: Lower environments like Dev, SIT and UAT can be configured in Single-AZ and Prod environment with multi-AZ for high availability.

Replication instance sizing - AWS DMS configures EC2 instance internally for the replication instance setup and is critical for the actual data migration process of DMS. One need to consider below factors for the replication instance sizing -

Database and table size : Data volume helps determine the task configuration to optimize full load performance. For example, for TBs of schemas, one can partition tables into multiple tasks of GBs and run them in parallel. The possible parallelism depends on the CPU resource available in the replication instance. That's why it's a good idea understands the size of your database and tables to optimize full load performance. It determine the number of tasks that you can possibly have.
LOBs : The data types that are present in migration scope can affect performance. Particularly, large objects LOBs) impact performance and memory consumption. To migrate a LOB value, AWS DMS performs a two-step process. First, AWS DMS inserts the row into the target without the LOB value. Second, AWS DMS updates the row with the LOB value. This has an impact on the memory, so it's important to identify LOB columns in the source and analyze their size.
Load frequency and transaction size : Load frequency and transactions per second (TPS) influence memory usage. A high number of TPS or data manipulation language (DML) activities leads to high usage of memory. This happens because DMS caches the changes until they are applied to the target. During CDC, this leads to swapping (writing to the physical disk due to memory overflow), which causes latency.
Table keys and referential integrity : Information about the keys of the table determine the CDC mode (batch apply or transactional apply) that one use to migrate data. In general, transactional apply is slower than batch apply. For long-running transactions, there can be many changes to migrate. When one use transactional apply, AWS DMS might require more memory to store the changes compared to batch apply. If one migrates tables without primary keys, batch apply will fail and the DMS task moves to transactional apply mode. When referential integrity is active between tables during CDC, AWS DMS uses transactional apply by default.

Best Practices Checklist:

Pre-Migration:
Network bandwidth assessment
Source/target database version compatibility check
Storage capacity planning
Security group and IAM role configuration
SSL certificate setup
During Migration:
Monitor replication lag
Track CPU/Memory utilization
Validate data consistency
Monitor network throughput
Post-Migration:
Application connectivity verification
Performance baseline comparison
Data integrity validation
Cleanup temporary resources

One can test the DMS implementation by starting with subset of the data and adding tables gradually and accordingly changing the DMS configurations. Document all these details.

Conclusion:
This is a blog to consider all the points for before performing the relational databases migration using AWS DMS. This will help to consider all the factors affecting the database migrations.