When moving data between GBase 8a clusters, stability trumps speed. Failures often stem not from the commands themselves but from inconsistent formatting, unverified character sets, untested load links, and the absence of a validation strategy. This article focuses on the SELECT INTO OUTFILE and LOAD DATA path, providing a set of conventions for format control, protocol verification, and checkpoint‑based reliable migration.
1. Choosing the Right Migration Path
GBase 8a offers two common approaches: platform‑orchestrated pipelines (e.g., DataX) and native export‑import. The decision depends on the use case:
| Aspect | Platform Pipeline (DataX) | Native Export‑Import |
|---|---|---|
| Orchestration | Strong, supports scheduling and retries | Wrapped via shell scripts |
| Large one‑time migration | Possible but may be less efficient | More direct, higher throughput |
| Format control | Limited by plugin implementation | Fully user‑defined |
| Best for | Periodic sync jobs | Bulk homogeneous cluster migration |
For bulk homogeneous migration, the native export‑import path is often the first choice because of its short toolchain and complete format control.
2. Conventions for Export Format and Protocol
1. Lock Down the Delimiter and NULL Representation
Inconsistent formatting across different batches is a leading cause of import failures. Use a delimiter that rarely appears in the actual data, and explicitly specify a NULL placeholder.
Export example:
SET GLOBAL gbase_export_directory = OFF;
SELECT *
FROM ods.lineorder_hist
INTO OUTFILE '/data/migrate/ods_lineorder_hist_20260401.txt'
FIELDS TERMINATED BY '&|?'
NULL_VALUE 'gbasenull'
LINES TERMINATED BY '=??=';
Corresponding load:
LOAD DATA INFILE 'sftp://gbase:***@192.0.2.18//data/migrate/ods_lineorder_hist_20260401.txt'
INTO TABLE ods.lineorder_hist
DATA_FORMAT 3
FIELDS TERMINATED BY '&|?'
NULL_VALUE 'gbasenull'
LINES TERMINATED BY '=??=';
2. Protocol Selection and Connectivity Test
LOAD DATA supports ftp://, sftp://, and file://. Before any real migration, run a small‑file connectivity test to avoid surprises like SSH MaxStartups limits that can block large concurrent loads.
- sftp:// – the preferred choice, but verify SSH configuration.
- ftp:// – common in legacy setups; confirm service availability.
- file:// – only when the source and target share a filesystem.
Minimal connectivity test:
#!/bin/bash
set -e
SRC_HOST="192.0.2.18"
SRC_FILE="/data/migrate/check_lineorder.txt"
sftp gbase@${SRC_HOST} <<EOF
put ${SRC_FILE}
bye
EOF
3. Ensuring Reliability with Checksums and Checkpoint Restart
1. Checksum Comparison
After loading, compare checksums between the source and target for critical tables to ensure data integrity.
-- Source
CHECKSUM TABLE ods.lineorder_hist;
-- Target
CHECKSUM TABLE ods.lineorder_hist;
For larger tables or stricter requirements, use CHECKSUM TABLE ... EXTENDED.
2. Checkpoint‑Based Batching
For extremely large tables, split the export file into smaller chunks and load each chunk separately, recording progress. This allows you to resume from the failed chunk rather than restarting from scratch.
Batch load framework:
#!/bin/bash
for f in /data/migrate/ods_lineorder_hist_splits/*.txt; do
echo "Loading $f..."
gccli -u gbase -p*** -e "LOAD DATA INFILE 'sftp://gbase@${HOST}/${f}' ..."
if [ $? -ne 0 ]; then
echo "Failed on $f, retry later"
exit 1
fi
done
4. Automated Migration Script Example
This shell script combines export, source checksum, load, target checksum, and diff into a single automated flow:
#!/bin/bash
set -e
DB_USER="gbase"
DB_HOST="192.0.2.18"
EXPORT_DIR="/data/migrate"
TABLE="ods.lineorder_hist"
BATCH="20260401"
# 1. Export
gccli -u ${DB_USER} -h ${DB_HOST} -e "
SET GLOBAL gbase_export_directory = OFF;
SELECT * FROM ${TABLE}
INTO OUTFILE '${EXPORT_DIR}/${TABLE}_${BATCH}.txt'
FIELDS TERMINATED BY '&|?' NULL_VALUE 'gbasenull' LINES TERMINATED BY '=??=';
"
# 2. Capture source checksum
gccli -u ${DB_USER} -h ${DB_HOST} -e "CHECKSUM TABLE ${TABLE};" > /tmp/src_checksum.txt
# 3. Load into target
TARGET_HOST="192.0.2.19"
gccli -u ${DB_USER} -h ${TARGET_HOST} -e "
LOAD DATA INFILE 'sftp://gbase@${DB_HOST}//${EXPORT_DIR}/${TABLE}_${BATCH}.txt'
INTO TABLE ${TABLE}
DATA_FORMAT 3
FIELDS TERMINATED BY '&|?' NULL_VALUE 'gbasenull' LINES TERMINATED BY '=??=';
"
# 4. Capture target checksum
gccli -u ${DB_USER} -h ${TARGET_HOST} -e "CHECKSUM TABLE ${TABLE};" > /tmp/tgt_checksum.txt
# 5. Compare
diff /tmp/src_checksum.txt /tmp/tgt_checksum.txt && echo "Validation passed" || echo "Validation failed, investigate"
Standardizing format, checksums, and checkpoint logic before the actual migration saves far more time than fixing data inconsistencies after the fact in your gbase database.
Top comments (0)