DEV Community

Cong Li
Cong Li

Posted on

Data Migration from GaussDB to GBase8a

Exporting Data from GaussDB

Comparison of Export Methods

Export Tool Export Steps Applicable Scenarios and Notes
Using GDS Tool to Export Data to a Regular File System

Note: The GDS tool must be installed on the server where the data files are exported
Remote Export Mode: Export business data from the cluster to an external host.
1. Plan the export path, create GDS operation users, and set write permissions for the GDS user on the export path.
2. Install, configure, and start GDS on the server where data will be exported.
3. Create an external table in the cluster, with the location path in the format "gsfs://192.168.0.90:5000/".

Local Export Mode: Export business data from the cluster to the host where the cluster nodes are located. This strategy is tailored for numerous small files.
1. Plan the export path and create directories to store exported data files on each DN in the cluster, such as "/output_data", and change the owner of this path to omm.
2. Install, configure, and start GDS on the server where data will be exported.
3. Create an external table in the cluster, with the location path in the format "file:///output_data/".
GDS tools suitable for scenarios with high concurrency and large data exports. Utilizes multi-DN parallelism to export data from the database to data files, improving overall export performance. Does not support direct export to HDFS file system.

Notes on Remote Export:
1. Supports concurrent export by multiple GDS services, but one GDS can only provide export services for one cluster at a time.
2. Configure GDS services within the same intranet as the cluster nodes. Export speed is affected by network bandwidth. Recommended network configuration is 10GE.
3. Supported data file formats: TEXT, CSV, and FIXED. Single row data size must be <1GB.

Notes on Local Export:
1. Data will be evenly split and generated in the specified folders on the cluster nodes, occupying disk space on the cluster nodes.
2. Supports data file formats: TEXT, CSV, and FIXED. Single row data size must be <1GB.
gs_dump and gs_dumpall Tools
gs_dump supports exporting a single database or its objects.
gs_dumpall supports exporting all databases in the cluster or common global objects in each database.
The tools support exporting content at the database level, schema level, and second level. Each level can be separately defined to export the entire content, only object definitions, or only data files.
Step 1: The omm operating system user logs into any host with MPPDB service installed and executes: source $ {BIGDATA_HOME}/mppdb/.mppdb

gs_profile command to start environment variables

Step 2: Use gs_dump to export the postgres database: gs_dump -W Bigdata@123 -U jack -f /home/omm/backup/postgres_backup.tar -p 25308 postgres -F t
1. Export the entire database information, including data and all object definitions.
2. Export the full information of all databases, including each database in the cluster and common global objects (including roles and tablespace information).
3. Export only all object definitions, including: tablespace, database definitions, function definitions, schema definitions, table definitions, index definitions, and stored procedure definitions.
4. Export only data, excluding all object definitions.

GDS External Table Remote Export Example:

mkdir -p /output_data 
groupadd gdsgrp 
useradd -g gdsgrp gds_user 
chown -R gds_user:gdsgrp /output_data
/opt/bin/gds/gds -d /output_data -p 192.168.0.90:5000 -H 10.10.0.1/24 -D 
CREATE FOREIGN TABLE foreign_tpcds_reasons 
( 
r_reason_sk integer not null, 
r_reason_id char(16) not null, 
r_reason_desc char(100) 
) SERVER gsmpp_server OPTIONS (LOCATION 'gsfs://192.168.0.90:5000/', FORMAT 'CSV',ENCODING 
'utf8',DELIMITER E'\x08', QUOTE E'\x1b', NULL '') WRITE ONLY; 
INSERT INTO foreign_tpcds_reasons SELECT * FROM reasons; 
ps -ef|grep gds 
gds_user 128954 1 0 15:03 ? 00:00:00 gds -d /output_data -p 192.168.0.90:5000 -D 
gds_user 129003 118723 0 15:04 pts/0 00:00:00 grep gds 
kill -9 128954 
Enter fullscreen mode Exit fullscreen mode

GDS External Table Local Export Example:

mkdir -p /output_data 
chown -R omm:wheel /output_data 
CREATE FOREIGN TABLE foreign_tpcds_reasons 
( 
r_reason_sk integer not null, 
r_reason_id char(16) not null, 
r_reason_desc char(100) 
) SERVER gsmpp_server OPTIONS (LOCATION 'file:///output_data/', FORMAT 'CSV',ENCODING 
'utf8', DELIMITER E'\x08', QUOTE E'\x1b', NULL '') WRITE ONLY; 
INSERT INTO foreign_tpcds_reasons SELECT * FROM reasons; 
Enter fullscreen mode Exit fullscreen mode

gs_dumpall Export Example:

Export all global tablespace and user information of all databases (omm user as the administrator), the export file is in text format.

gs_dumpall -W Bigdata@123 -U omm -f /home/omm/backup/MPPDB_globals.sql -p 25308 -g 
gs_dumpall[port='25308'][2018-11-14 19:06:24]: dumpall operation successful 
gs_dumpall[port='25308'][2018-11-14 19:06:24]: total time: 1150 ms
Enter fullscreen mode Exit fullscreen mode

Export all database information (omm user as the administrator), the export file is in text format. After executing the command, there will be a long printout, and finally, when "total time" appears, it means the execution was successful.

gs_dumpall -W Bigdata@123 -U omm -f /home/omm/backup/MPPDB_backup.sql -p 25308 
gs_dumpall[port='25308'][2017-07-21 15:57:31]: dumpall operation successful 
gs_dumpall[port='25308'][2017-07-21 15:57:31]: total time: 9627 ms 
Enter fullscreen mode Exit fullscreen mode

Export all database definitions (omm user as the administrator), the export file is in text format.

gs_dumpall -W Bigdata@123 -U omm -f /home/omm/backup/MPPDB_backup.sql -p 25308 -s 
gs_dumpall[port='25308'][2018-11-14 11:28:14]: dumpall operation successful 
gs_dumpall[port='25308'][2018-11-14 11:28:14]: total time: 4147 ms 
Enter fullscreen mode Exit fullscreen mode

GBase 8a MPP Data Import:

Execute SQL file to import database definitions

gccli -ugbase -pgbase20110531 -Dtestdb -vvv -f <guessdb_out.sql >>guessdb_out.result  2>guessdb_out.err
Enter fullscreen mode Exit fullscreen mode

Note: The -D parameter must be followed by an existing database within the GBase cluster. The executed guessdb_out.sql file will operate according to the databases specified within the SQL file, regardless of the database specified after the -D parameter.

GBase 8a MPP Import Text Data
Step 1:
The data server where the data exported from GaussDB is located needs to be configured with FTP service. Ensure that all nodes in the GBase 8a MPP cluster can access the data files on the data server via FTP.

Step 2:
Organize the characteristics of the data files exported from GaussDB:

  • Encoding format
  • Field delimiter
  • Quote character
  • Null value in data files
  • Escape character (default is double quotes)
  • Whether the data file contains a header row
  • Line break style of the exported data files
  • Date format in date columns, etc.

Step 3:
Based on the characteristics organized in Step 2, write and execute the SQL for importing data in GBase 8a MPP.

Syntax format:

LOAD DATA INFILE 'file_list' 
INTO TABLE [dbname.]tbl_name 
[options] 
options: 
[CHARACTER SET charset_name] 
[DATA_FORMAT number [HAVING LINES SEPARATOR]] 
[NULL_VALUE 'string'] 
[FIELDS 
[TERMINATED BY 'string'] 
[ENCLOSED BY 'string'] 
[PRESERVE BLANKS] 
[AUTOFILL] 
[LENGTH 'string'] 
[TABLE_FIELDS 'string'] 
] 
[LINES 
[TERMINATED BY 'string'] 
] 
[MAX_BAD_RECORDS number] 
[DATETIME FORMAT format]
[DATE FORMAT format] 
[TIMESTAMP FORMAT format] 
[TIME FORMAT format] 
[TRACE number] 
[TRACE_PATH 'string'] 
[NOSPLIT] 
[PARALLEL number] 
[MAX_DATA_PROCESSORS number] 
[MIN_CHUNK_SIZE number] 
[SKIP_BAD_FILE number] 
[SET col_name = value[,...]] 
[IGNORE NUM LINES] 
[FILE_FORMAT format] 
Enter fullscreen mode Exit fullscreen mode

Load Examples:
Multi-data file load

gbase> LOAD DATA INFILE 'ftp://192.168.0.1/pub/lineitem.tbl, 
http://192.168.0.2/lineitem.tbl' INTO TABLE test.lineitem FIELDS 
TERMINATED BY '|' ENCLOSED BY '"' LINES TERMINATED BY '\n';
Enter fullscreen mode Exit fullscreen mode

Import statement with wildcards for multiple files

gbase> LOAD DATA INFILE 'ftp://192.168.10.114/data/*' INTO TABLE test.t; 
Enter fullscreen mode Exit fullscreen mode

Import statement with column, row delimiters, and enclosing characters

gbase> LOAD DATA INFILE 'ftp://192.168.0.1/pub/lineitem.tbl' INTO TABLE test.lineitem FIELDS TERMINATED BY '|' ENCLOSED BY '"' LINES TERMINATED BY '\n'
Enter fullscreen mode Exit fullscreen mode

Import statement with date format

load data infile 
'ftp://192.168.88.141/load_data/table_fields.tbl' into table test.t 
fields terminated by ',' table_fields 'i, vc, dt date "%H:%i:%s %Y-%m-%d", dt1 date "%Y-%m-%d %H:%i:%s"'; 
Enter fullscreen mode Exit fullscreen mode

Import statement with auto-fill

load data infile 'ftp://192.168.88.141/load_data/autofill.tbl' into table test.t fields terminated by '|' autofill;
Enter fullscreen mode Exit fullscreen mode

Import statement with constant values

gbase> Load data infile 'data.tbl' into table t fields terminated by '|' set c='2016-06-06 18:08:08',d='default',e=20.6; 
Enter fullscreen mode Exit fullscreen mode

Import statement ignoring header

gbase>load data infile ‘http://192.168.6.39/test.tbl’ into table data_test fields terminated by ‘|’ ignore 3 lines;
Enter fullscreen mode Exit fullscreen mode

Import statement with Blob data

gbase>load data infile ‘http://192.168.6.39/test.tbl’ into table 
data_test fields terminated by ‘|’ table_fields ‘a,b,c type_text,d’; 
gbase>load data infile ‘http://192.168.6.39/test.tbl’ into table 
data_test fields terminated by ‘|’ table_fields ‘a,b,c type_base64,d’; 
gbase>Load data infile ‘http://192.168.6.39/test.tbl’ into table 
data_test fields terminated by ‘|’ table_fields ‘a,b,c type_url,d’; 
Enter fullscreen mode Exit fullscreen mode

Top comments (0)