qeasy-cloud

Posted on Feb 3

Detailed Guide to Extracting Product Category Data from MySQL Database and Performing ETL Transformation：10k rps Sub-ms

#mysql #etl #performance

## MySQL Data Integration Case Sharing: 12 - BI Bingxin - Product Category Table In the daily operations of enterprises today, the efficient flow and integration of data are particularly important. This sharing presents a typical case: integrating the product category tableProductCategory_zfrom a MySQL database into the target tableProductCategory` of another MySQL database through the Qeasy Cloud platform. The name of this solution is "12 - BI Bingxin - Product Category Table - ProductCategory_z -> ProductCategory".

To achieve this data connection, we utilized several core features of the Qeasy Cloud Data Integration Platform, including high-throughput data writing capabilities, real-time monitoring and alert systems, custom data transformation logic, and a visual data flow design tool.

First, extract raw product category data from the source MySQL database via API interface calls:
sql SELECT * FROM ProductCategory_z;

Subsequently, perform necessary data cleaning and transformation operations to ensure it meets the target table structure requirements. It is worth noting that due to potential field differences between the two tables, we need to customize transformation logic for mapping. For example, map the source field name category_id to the target field name s_category_id, and handle null values and default value settings when necessary.

After completing data transformation, use batch execution commands to quickly and efficiently write a large number of organized product category records into the target MySQL database:
sql BATCH_EXECUTE INSERT INTO ProductCategory (s_category_id, s_name, s_description) VALUES (?, ?, ?);

Throughout the entire process, the centralized monitoring system provided by the platform allows real-time tracking of the status and performance of each operation. Once abnormal conditions such as network delays or errors are detected, we can receive timely notifications and take corresponding measures. In addition, there are mature mechanisms to ensure stable task operation for pagination processing and current limiting issues.

This integration implementation not only addresses the data interaction needs between two independent systems but also improves the transparency and efficiency of business processes. Next, we will detail the specific configuration steps to complete this solution.

Using Qeasy Cloud Data Integration Platform to Call MySQL Interface for Data Acquisition and Processing

In the Qeasy Cloud Data Integration Platform, calling the source system's MySQL interface select to obtain and process data is the first step in the data processing lifecycle. This article will delve into how to implement this process by configuring metadata and share relevant technical details.

Metadata Configuration Analysis

First, we need to understand each field in the metadata configuration and its function. The following is the provided metadata configuration:
json { "api": "select", "effect": "QUERY", "method": "SQL", "number": "Id", "id": "Id", "request": [ { "field": "main_params", "label": "Main Parameters", "type": "object", "describe": "Corresponds to the main parameters of the SQL statement in other request fields, must be in one-to-one correspondence.", "value": "1", "children": [ { "field": "limit", "label": "Limit the number of rows returned by the result set", "type": "int", "describe": "Necessary parameter! The LIMIT clause is used to limit the number of rows returned by the query result. It specifies the maximum number of rows that the query should return. For example, LIMIT 10 means the query result contains at most 10 rows of data. This is very useful for pagination queries, as it can return a certain number of results in each query.", "value": 5000 }, { "field": "offset", "label": "Offset", "type": "int", "describe": "The OFFSET clause is used to specify the starting position or offset of the query result. It indicates which row of the result set the query should start returning data from. For example, OFFSET 20 means the query should start returning data from the 21st row of the result set. When used in conjunction with the LIMIT clause, OFFSET specifies the starting row number of the query result." } ] } ], ... }

Main SQL Statement Optimization and Parameter Binding

The main_sql field in the metadata configuration defines the main SQL statement:
json { ... "otherRequest": [ { "field": "main_sql", "label": "Main SQL Statement", "type": "string", "describe": "For the assignment of dynamic syntax fields such as :limit in the main SQL query statement, to ensure that the fields correspond one-to-one with the request parameters, we can adopt the parameter binding method. The following are the specific optimization steps:\n1. Replace the dynamic fields :limit in the main SQL query statement with placeholders (e.g., ?) to indicate the position of the parameters.\n2. Before executing the query, use the parameter binding method to correspond and bind the values of the request parameters with the placeholders.\nThrough this optimization method, we can improve the readability and maintainability of the query statement, and ensure the correct correspondence between dynamic syntax fields and request parameters. This can better ensure the accuracy and security of the query.", "value": "select * from ProductCategory_z limit :limit offset :offset" } ], ... }

When executing this SQL statement, it is necessary to replace :limit and :offset with actual values. This method not only improves code readability but also enhances security.

Specific Steps:

Replace the dynamic fields :limit and :offset in the main SQL statement with placeholders (e.g., ?).
Before executing the query, use the parameter binding method to correspond and bind the request parameters (such as 5000 and 0) with the placeholders.

Example:
sql SELECT * FROM ProductCategory_z LIMIT ? OFFSET ?

Then pass in specific values during execution:
sql SELECT * FROM ProductCategory_z LIMIT 5000 OFFSET 0

Data Request and Cleaning

In actual operations, when calling the MySQL database through the API interface, the following points need to be noted:

Connect to the database: Ensure the database connection information is correct, including database address, port, username, and password.
Construct the request: Build the request object according to the metadata configuration, including setting necessary parameters such as limit and offset.
Execute the query: Execute the query operation using the constructed SQL statement and bound parameters.
Process the results: Clean and preprocess the returned data, such as removing invalid data and format conversion.

The following is a simplified sample code snippet to demonstrate how to implement the above steps through configuration on the Qeasy Cloud platform:
`python
import mysql.connector

Database connection information

db_config = {
'user': 'username',
'password': 'password',
'host': '127.0.0.1',
'database': 'database_name'
}

Establish database connection

conn = mysql.connector.connect(**db_config)
cursor = conn.cursor()

Construct SQL statement and bind parameters

query = 'SELECT * FROM ProductCategory_z LIMIT %s OFFSET %s'
params = (5000, 0)

Execute query

cursor.execute(query, params)

Obtain and process results

results = cursor.fetchall()
for row in results:
# Data cleaning and preprocessing logic
print(row)

Close connections

cursor.close()
conn.close()
`

Summary

Through the above steps, we can efficiently call the MySQL interface to obtain and process data. In the Qeasy Cloud Data Integration Platform, reasonable configuration of metadata enables support for data processing of complex business requirements. This not only improves development efficiency but also ensures the stability and security of system operation.

The Second Step in the Data Integration Lifecycle: ETL Transformation and Writing to MySQL API Interface

In the data integration process, ETL (Extract, Transform, Load) is a crucial link. This article will delve into how to use the Qeasy Cloud Data Integration Platform to perform ETL transformation on the integrated source platform data and ultimately write it to the target platform through the MySQL API interface.

Metadata Configuration Analysis

First, we need to understand the metadata configuration, which will guide us in performing data transformation and writing operations. The following is the specific metadata configuration:
json { "api": "batchexecute", "effect": "EXECUTE", "method": "SQL", "idCheck": true, "request": [ {"field": "Id", "label": "Id", "type": "string", "value": "{Id}"}, {"field": "CreateDate", "label": "CreateDate", "type": "datetime", "value": "{CreateDate}", "default": "1970-01-01 00:00:00"}, {"field": "Code", "label": "Code", "type": "string", "value": "{Code}"}, {"field": "Name", "label": "Name", "type": "string", "value": "{Name}"}, {"field": "ParentId", "label": "ParentId", "type": "string", "value": "{ParentId}"}, {"field": "Level", "label": "Level", "type": "int", "value": "{Level}"} ], ... }

Data Request and Cleaning

In the ETL process, the first step is data request and cleaning. We obtain data from the source platform and perform necessary cleaning and formatting processing to ensure data accuracy and consistency. For example, if the CreateDate field has no provided value, it is set to the default "1970-01-01 00:00:00".

Data Transformation

Next is the data transformation phase. According to the metadata configuration, we need to map the data fields of the source platform to the field format required by the target platform. This process includes type conversion, default value setting, and field mapping.

Examples:

The Id field is mapped to {Id} with the type string.
The CreateDate field is mapped to {CreateDate} with the type datetime and a default value.
Fields such as Code, Name, ParentId, and Level are also mapped and defined with corresponding types.

Writing Data to the Target Platform

After completing data transformation, we use the MySQL API interface to write the processed data to the target platform. According to the otherRequest part in the metadata configuration, we constructed a SQL insert statement:
sql REPLACE INTO ProductCategory (Id, CreateDate, Code, Name, ParentId, Level) VALUES (?, ?, ?, ?, ?, ?)

This statement writes multiple transformed records into the database at once through batch execution. Each placeholder corresponds to a field value, and specific data is passed through the API interface.

Batch Execution and Performance Optimization

To improve efficiency, we use the batch execute method, inserting up to 1000 records each time. This not only reduces the number of network requests but also significantly improves writing speed.
json { ... "otherRequest": [ { "field": "main_sql", "label": "Main Statement", "type": "string", "describe": "111", "value": "REPLACE INTO ProductCategory (Id, CreateDate, Code, Name, ParentId, Level) VALUES" }, { "field": "limit", "label": "limit", "type": "string", "value": "1000" } ], ... }

Practical Case: Data Integration of Product Category Table

Taking the product category table (ProductCategory_z) as an example, we need to convert it into a format acceptable to the target platform and write it to the MySQL database. The specific steps are as follows:

Extract data: Extract data from the product category table of the source platform.
Clean and transform: Clean and format the extracted data according to the metadata configuration.
Construct SQL statement: Build a batch insert statement using the main_sql in the configuration.
Execute insert operation: Batch insert the processed data into the target MySQL database through the API interface.

The above is the detailed technical process of performing ETL transformation using the Qeasy Cloud Data Integration Platform and writing to the target platform through the MySQL API interface. In actual operations, the metadata configuration needs to be adjusted according to specific business requirements to ensure efficient and accurate completion of data integration tasks.`
For more information, please visit the official website: https://www.qeasy.cloud/
github pages:https://xiaoyimeng666.github.io/xiaoyimeng/

DEV Community

Detailed Guide to Extracting Product Category Data from MySQL Database and Performing ETL Transformation：10k rps Sub-ms

Using Qeasy Cloud Data Integration Platform to Call MySQL Interface for Data Acquisition and Processing

Metadata Configuration Analysis

Main SQL Statement Optimization and Parameter Binding

Specific Steps:

Data Request and Cleaning

Database connection information

Establish database connection

Construct SQL statement and bind parameters

Execute query

Obtain and process results

Close connections

Summary

The Second Step in the Data Integration Lifecycle: ETL Transformation and Writing to MySQL API Interface

Metadata Configuration Analysis

Data Request and Cleaning

Data Transformation

Writing Data to the Target Platform

Batch Execution and Performance Optimization

Practical Case: Data Integration of Product Category Table

Top comments (0)