Relational databases migration to AWS environment into S3 data lake using AWS DMS Part II:
This is the second blog of the multi-part series on considerations and observations during relational databases migration to AWS S3 data lake using AWS DMS service. Relational databases such as Oracle, MS SQL Server, MySQL, PostgreSQL, etc., were migrated to AWS.
Part I in this series covered considerations such as:
• Modified date not populated properly at the source
• Enabling supplemental logging for Oracle as the source
• Network bandwidth issues between source and target databases
This blog will cover on below considerations -
Missing LOB columns for change data capture from MS SQL Server as source:
Full load data from MS SQL Server was completed using AWS DMS into AWS S3 data lake. Most of these were small to medium-sized tables, and data and record count validation was completed and matched with the source.
Change data capture (CDC) tasks were configured for the tables with a one-hour interval for writing the data into the data lake, and data was populated accordingly. Data validation is essential and critical post migration to make sure all the columns, data types migrated correctly and there are no any data issues such as data truncation, mismatch, datetime, int, float columns migrated correctly. During data validation, it was observed that a few columns were missing data for CDC tasks, even though data was available at the source and was populated during the full load for the same columns.
Upon further analysis the observation was as follows:
• These columns were of the following data types:
- Xml
- Varchar(max)
- Nvarchar(max)
- Text
- Image
• These are LOB columns in MS SQL Server database.
• Hence It was found that LOB columns were not getting populated with ongoing replication.
• However, this was not the case for all tables having the LOB columns; some tables with LOB columns were getting data populated correctly.
After analysing CloudWatch logs for these tasks, the following error logs were found:
"2024-08-27T11:49:23 [SOURCE_UNLOAD ]W: Column 'COLUMN NAME' was removed from table definition SCHEMA NAME.TABLE NAME': the column data type is LOB and the table has no primary key or unique index (metadatamanager.c:2492)"
Upon investigating this issue further, it was found after reading AWS documentation that:
During CDC, AWS DMS supports CLOB data types only in tables that include a primary key.
• Upon validation, it was found that tables with LOB columns that were populated correctly had primary keys,
• Tables with no data populated for LOB columns did not have primary keys.
Primary keys were added for all those tables and all those tables were loading the data again, it was confirmed that those columns are getting LOB data populated now and matched with the source.
Note– LOB columns can be migrated in full or limited mode for specific size of the data to target using AWS DMS service.
MS SQL Server time zone issue for timestamp column:
Another issue found during the data migration for MS SQL Server tables was the timestamp columns were populated incorrectly in the target data store.
Timestamp columns for all the source tables were configured and populated with a non-UTC time zone as – Asia/Calcutta, UTC+5.30.
Upon data validation post data migration, data migrated for the tables had different values for the timestamp columns in the target data store compared to the source data.
After analysing the data, it was found that the data populated at the target data store was populated in a different time zone as – UTC.
After further analysis, it needs to set one property in extra connection attributes for MS SQL Server as the source – serverTimezone.
One needs to set up serverTimezone for the source endpoint having a non-UTC value at the source with the value as:
serverTimezone=Asia/Calcutta;
Upon setting this property for the source endpoint and reloading the data into the S3 data lake, it was found that the timestamp column was populated correctly with the same value as source and the non-UTC Asia/Calcutta time zone.
There is a similar property – Initstmt=SET time_zone for the target DB as MySQL compatible database, if it is in a non-UTC time zone and needs to be set as follows:
Initstmt=SET time_zone= Asia/Kolkata;
RDBMS instances pause or shutdown during non-business hours:
One issue observed for the AWS DMS change data capture tasks were failing for MySQL and Oracle databases was due to the source database not available for few hours in the day and mainly during non-business hours, and the error logged in CloudWatch logs was:
Oracle:
2024-08-08T06:24:45 [SOURCE_CAPTURE ]W: Oracle error code is '1089' ORA-01089: immediate shutdown or close in progress - no operations are permitted Process ID: 41371 Session ID: XXXX Serial number: XXXXX (oracdc_merger.c:176)
2024-15-08T06:24:47 [METADATA_MANAGE ]W: Oracle error code is '12514' ORA-12514: TNS:listener does not currently know of service requested in connect descriptor (oracle_endpoint_conn.c:914)
Last Error Task 'TASK NAME’ was suspended after 9 successive recovery failures Stop Reason FATAL_ERROR Error Level FATAL
MySQL:
2024-06-21T46:16:00 [METADATA_MANAGE ]E: RetCode: SQL_ERROR SqlState: HY000 NativeError: 2003 Message: [unixODBC][MySQL][ODBC 8.0(w) Driver]Can't connect to MySQL server on 'XX.XX.XX.XX' (XXX) [XXXXXXX] (ar_odbc_conn.c:XXX)
Last Error Task TASK NAME' was suspended after 9 successive recovery failures Stop Reason FATAL_ERROR Error Level FATAL
There could be various reasons for a DMS task not being able to connect to the source, such as the source not being available, network issues, changes in permissions, security configuration settings, etc.
In this case after further analysis, it was found that, these were non-prod servers hosted on the cloud or on-premises that are paused or shut down during non-business hours daily for cost optimization or regular maintenance activities.
AWS DMS change data capture tasks could not connect to source databases during non-business hour time and after retry for a certain number of times the tasks fail. One needs to RESUME/RESTART these tasks once the source server is up and running and ensure such pauses or shutdowns are minimized to avoid the tasks failing regularly, it needs to keep these servers running all the time.
Conclusion:
This is Part II of the multi-part series on relational database migration challenges using AWS DMS and their implemented solutions. In this blog, we discussed mainly the considerations during table migration from MS SQL Server as the source.
Top comments (1)
Amazing content