About two months ago, I migrated our database from RDS for MySQL to Aurora MySQL, so I've compiled some considerations and the actual steps we took during the migration.
I hope this will be helpful to others who are considering a similar migration.
Background
In an effort to cut costs, we continued to use a single-AZ RDS for MySQL even after the official release of our service. However, as our client base and traffic grew, concerns about data durability and future performance began to arise. To address these concerns, we decided to migrate to Aurora, which distributes data across multiple Availability Zones and can enhance MySQL performance by up to fivefold.
Migration
- Resource management: We managed the newly created resources (Aurora cluster, instances, subnet groups, parameter groups) using Terraform (the original RDS was not managed with code).
- Application connection: The connection with the application (Lambda) was completed by changing the target of the RDS Proxy.
- Migration method: We considered using either snapshots or DMS for migration and chose snapshots based on the assumption of downtime. This allowed us to carry over the usernames and passwords that were used on the RDS instance.
- Storage layer of Aurora: Aurora's storage layer automatically distributes and replicates data across multiple AZs within the selected DB subnet group.
Aurora vs Aurora Serverless
Considering the anticipated high transaction rates and data volume increase, we decided that Aurora was more suitable than Aurora Serverless. Replication across three AZs is provided for free, and additional costs depend on the amount of storage used.
Cost Estimation
- Initial setup: We initially did not use any replicas.
-
Pricing structure:
- RDS: Charges based on instance uptime + storage capacity.
- Aurora: Charges based on instance uptime + storage capacity + I/O request numbers.
-
Cost calculation:
-
Instance: Switching to
db.r6g.xlarge
costs about $0.1 USD x 24 x 30 = approximately $72 USD/month increase. - Storage: No change.
- I/O Requests: 200 requests per second lead to an increase of about $124 USD/month (0.24 USD per million requests).
- According to AWS support, it is not possible to estimate Aurora's I/O charges from RDS's I/O. Therefore, the estimate is for reference only.
"We understand that you are seeking a method to estimate Aurora I/O charges based on RDS I/O performance. Unfortunately, as the I/O mechanisms differ between RDS and Aurora, it is not possible to directly estimate Aurora I/O costs from RDS I/O. In RDS, I/O occurs with log records and data page writings, whereas in Aurora, only log records contribute to I/O, making the cost structure fundamentally different."
-
Instance: Switching to
Considerations
-
Instance selection: We chose the
db.r6g.xlarge
, which has similar specs to the originaldb.t3.xlarge
. - Instance type selection: Given the expected high I/O, we started with an I/O-optimized instance and planned to switch to a standard instance if daily I/O costs were below 25% of the total Aurora cost.
- Aurora Serverless consideration: We initially adopted Aurora, planning to consider switching to Serverless if it seems more appropriate later on.
-
Version compatibility issue: The original RDS's MySQL version was 8.0.36, and there was no compatible Aurora engine available as of May 2024. We considered migrating via mysqldump but decided to wait for a compatible engine release.
- On June 4, 2024, the Aurora MySQL Database Engine 3.0.7 (compatible with MySQL 8.0.36) was released, so we proceeded with using this version.
Migration Procedure
The general flow of the migration was as follows:
- Temporarily halt the service.
- Create a snapshot of the original RDS called
database-prod-snapshot
. - Apply Terraform configurations.
- Change the target of the RDS Proxy to Aurora.
- Restart the service.
- Perform functional tests.
- If the tests show no issues, stop (or delete) the original RDS.
Creating a snapshot of the original RDS (database-prod-snapshot
)
While the service was halted, we safely created a snapshot from the original RDS instance. This ensured data consistency and prevented data loss during the migration.
Applying Terraform
The following Terraform code automates the creation of the Aurora cluster, instances, cluster parameter group, instance parameter group, and subnet group. This setup is intended for the production environment:
resource "aws_rds_cluster" "aurora_cluster_prod" {
cluster_identifier = "aurora-cluster-prod"
engine = "aurora-mysql"
engine_version = "8.0.mysql_aurora.3.07.0"
db_subnet_group_name = aws_db_subnet_group.aurora_db_subnet_group_prod.name
skip_final_snapshot = false
snapshot_identifier = "arn:aws:rds:ap-northeast-1:************:snapshot:database-prod-snapshot"
vpc_security_group_ids = [var.security_group_rds_sg_id]
db_cluster_parameter_group_name = aws_rds_cluster_parameter_group.aurora_cluster_parameter_group_prod.name
deletion_protection = true
backup_retention_period = 7
preferred_maintenance_window = "sat:08:00-sat:12:00"
preferred_backup_window = "13:00-19:00"
copy_tags_to_snapshot = true
enabled_cloudwatch_logs_exports = ["audit", "error", "general", "slowquery"]
storage_type = "aurora-iopt1"
tags = {
Environment = "prod"
}
}
resource "aws_rds_cluster_instance" "aurora_cluster_instance_prod" {
identifier = "aurora-instance-prod-1"
cluster_identifier = aws_rds_cluster.aurora_cluster_prod.id
instance_class = "db.r6g.xlarge"
engine = "aurora-mysql"
engine_version = "8.0.mysql_aurora.3.07.0"
availability_zone = "ap-northeast-1a"
db_parameter_group_name = aws_db_parameter_group.aurora_instance_parameter_group_prod.name
ca_cert_identifier = "rds-ca-rsa4096-g1"
auto_minor_version_upgrade = true
performance_insights_enabled = true
monitoring_interval = 60
monitoring_role_arn = "arn:aws:iam::************:role/rds-monitoring-role"
tags = {
Environment = "prod"
}
}
resource "aws_rds_cluster_parameter_group" "aurora_cluster_parameter_group_prod" {
name = "aurora-cluster-parameter-group-prod"
family = "aurora-mysql8.0"
description = "Aurora MySQL Cluster Parameter Group for prod"
# Parameter settings are omitted for default configuration use
}
resource "aws_db_parameter_group" "aurora_instance_parameter_group_prod" {
name = "aurora-instance-parameter-group-prod"
family = "aurora-mysql8.0"
description = "Aurora MySQL Instance Parameter Group for prod"
# Parameter settings are omitted for default configuration use
}
resource "aws_db_subnet_group" "aurora_db_subnet_group_prod" {
name = "aurora-db-subnet-group-prod"
description = "Subnet group for Aurora MySQL cluster in production environment"
subnet_ids = [var.subnet_db_private_1a_id, var.subnet_db_private_1c_id]
tags = {
Environment = "prod"
}
}
Additional Notes for Development Cluster and Instances
The setup for the development Aurora cluster and instances is
similar to the production environment, but with several changes made to reduce costs and enhance flexibility:
-
Logging: The setting for CloudWatch logs output (
enabled_cloudwatch_logs_exports
) is omitted to reduce logging costs in the development environment. -
Storage Type: The
storage_type
specification is omitted, and the default standard instance is used. This optimizes costs in the development environment where I/O is less frequent. -
Deletion Protection:
deletion_protection
is disabled, allowing rapid redeployment or deletion of the cluster during development. -
Instance Class: A lower-cost instance class (
instance_class
) is chosen to reduce operational costs while ensuring adequate performance in the development environment. -
Performance Insights:
performance_insights_enabled
is disabled. In the development environment, detailed performance monitoring is often unnecessary, so it is disabled to reduce costs. -
Enhanced Monitoring: The settings for
monitoring_role_arn
andmonitoring_interval
are omitted, disabling enhanced monitoring to further reduce operational costs in the development environment.
resource "aws_rds_cluster" "aurora_cluster_dev" {
cluster_identifier = "aurora-cluster-dev"
engine = "aurora-mysql"
engine_version = "8.0.mysql_aurora.3.07.0"
db_subnet_group_name = aws_db_subnet_group.aurora_db_subnet_group_dev.name
skip_final_snapshot = false
snapshot_identifier = "arn:aws:rds:ap-northeast-1:************:snapshot:database-dev-snapshot"
vpc_security_group_ids = [var.security_group_rds_sg_id]
db_cluster_parameter_group_name = aws_rds_cluster_parameter_group.aurora_cluster_parameter_group_dev.name
deletion_protection = false
backup_retention_period = 7
preferred_maintenance_window = "sat:08:00-sat:12:00"
preferred_backup_window = "13:00-19:00"
copy_tags_to_snapshot = true
tags = {
Environment = "dev"
}
}
resource "aws_rds_cluster_instance" "aurora_cluster_instance_dev" {
identifier = "aurora-instance-dev-1"
cluster_identifier = aws_rds_cluster.aurora_cluster_dev.id
instance_class = "db.t3.medium"
engine = "aurora-mysql"
engine_version = "8.0.mysql_aurora.3.07.0"
availability_zone = "ap-northeast-1a"
db_parameter_group_name = aws_db_parameter_group.aurora_instance_parameter_group_dev.name
ca_cert_identifier = "rds-ca-rsa4096-g1"
auto_minor_version_upgrade = true
performance_insights_enabled = false
tags = {
Environment = "dev"
}
}
Changing the RDS Proxy Target to Aurora
From the RDS Proxy detail page, find the Target Group section and press the edit button to reach the following page. By setting the database to the newly created Aurora cluster aurora-cluster-prod
and saving, the connection destination is changed.
Reflection Points
Cost at the Time of Migration
I won't specify the exact charges, but the cost of Aurora on the day of the DB migration was higher than expected, which was alarming.
A large amount of writing occurred during the process of creating instances from the snapshot, which I believe led to the additional I/O costs. This aspect should have also been considered before the migration.
Session Timeout
When executing terraform apply
from GitHub Actions, the maximum session time of the AssumeRole IAM role was too short, causing a timeout. In the end, we had to reschedule and re-release on another day.
Automatic Restart After RDS Shutdown
After temporarily shutting down the original RDS, it automatically restarted after 7 days, incurring unnecessary costs. Once it was confirmed that there were no issues with the newly migrated Aurora, the RDS should have been deleted immediately.
Top comments (0)