DEV Community

Michael
Michael

Posted on • Originally published at gbase.cn

GBase 8a Data Migration: Standardizing Export, Load, and Verification

When moving data between GBase 8a clusters, stability trumps speed. Failures often stem not from the commands themselves but from inconsistent formatting, unverified character sets, untested load links, and the absence of a validation strategy. This article focuses on the SELECT INTO OUTFILE and LOAD DATA path, providing a set of conventions for format control, protocol verification, and checkpoint‑based reliable migration.

1. Choosing the Right Migration Path

GBase 8a offers two common approaches: platform‑orchestrated pipelines (e.g., DataX) and native export‑import. The decision depends on the use case:

Aspect Platform Pipeline (DataX) Native Export‑Import
Orchestration Strong, supports scheduling and retries Wrapped via shell scripts
Large one‑time migration Possible but may be less efficient More direct, higher throughput
Format control Limited by plugin implementation Fully user‑defined
Best for Periodic sync jobs Bulk homogeneous cluster migration

For bulk homogeneous migration, the native export‑import path is often the first choice because of its short toolchain and complete format control.

2. Conventions for Export Format and Protocol

1. Lock Down the Delimiter and NULL Representation

Inconsistent formatting across different batches is a leading cause of import failures. Use a delimiter that rarely appears in the actual data, and explicitly specify a NULL placeholder.

Export example:

SET GLOBAL gbase_export_directory = OFF;

SELECT *
FROM ods.lineorder_hist
INTO OUTFILE '/data/migrate/ods_lineorder_hist_20260401.txt'
FIELDS TERMINATED BY '&|?'
NULL_VALUE 'gbasenull'
LINES TERMINATED BY '=??=';
Enter fullscreen mode Exit fullscreen mode

Corresponding load:

LOAD DATA INFILE 'sftp://gbase:***@192.0.2.18//data/migrate/ods_lineorder_hist_20260401.txt'
INTO TABLE ods.lineorder_hist
DATA_FORMAT 3
FIELDS TERMINATED BY '&|?'
NULL_VALUE 'gbasenull'
LINES TERMINATED BY '=??=';
Enter fullscreen mode Exit fullscreen mode

2. Protocol Selection and Connectivity Test

LOAD DATA supports ftp://, sftp://, and file://. Before any real migration, run a small‑file connectivity test to avoid surprises like SSH MaxStartups limits that can block large concurrent loads.

  • sftp:// – the preferred choice, but verify SSH configuration.
  • ftp:// – common in legacy setups; confirm service availability.
  • file:// – only when the source and target share a filesystem.

Minimal connectivity test:

#!/bin/bash
set -e
SRC_HOST="192.0.2.18"
SRC_FILE="/data/migrate/check_lineorder.txt"
sftp gbase@${SRC_HOST} <<EOF
put ${SRC_FILE}
bye
EOF
Enter fullscreen mode Exit fullscreen mode

3. Ensuring Reliability with Checksums and Checkpoint Restart

1. Checksum Comparison

After loading, compare checksums between the source and target for critical tables to ensure data integrity.

-- Source
CHECKSUM TABLE ods.lineorder_hist;

-- Target
CHECKSUM TABLE ods.lineorder_hist;
Enter fullscreen mode Exit fullscreen mode

For larger tables or stricter requirements, use CHECKSUM TABLE ... EXTENDED.

2. Checkpoint‑Based Batching

For extremely large tables, split the export file into smaller chunks and load each chunk separately, recording progress. This allows you to resume from the failed chunk rather than restarting from scratch.

Batch load framework:

#!/bin/bash
for f in /data/migrate/ods_lineorder_hist_splits/*.txt; do
    echo "Loading $f..."
    gccli -u gbase -p*** -e "LOAD DATA INFILE 'sftp://gbase@${HOST}/${f}' ..."
    if [ $? -ne 0 ]; then
        echo "Failed on $f, retry later"
        exit 1
    fi
done
Enter fullscreen mode Exit fullscreen mode

4. Automated Migration Script Example

This shell script combines export, source checksum, load, target checksum, and diff into a single automated flow:

#!/bin/bash
set -e
DB_USER="gbase"
DB_HOST="192.0.2.18"
EXPORT_DIR="/data/migrate"
TABLE="ods.lineorder_hist"
BATCH="20260401"

# 1. Export
gccli -u ${DB_USER} -h ${DB_HOST} -e "
SET GLOBAL gbase_export_directory = OFF;
SELECT * FROM ${TABLE}
INTO OUTFILE '${EXPORT_DIR}/${TABLE}_${BATCH}.txt'
FIELDS TERMINATED BY '&|?' NULL_VALUE 'gbasenull' LINES TERMINATED BY '=??=';
"

# 2. Capture source checksum
gccli -u ${DB_USER} -h ${DB_HOST} -e "CHECKSUM TABLE ${TABLE};" > /tmp/src_checksum.txt

# 3. Load into target
TARGET_HOST="192.0.2.19"
gccli -u ${DB_USER} -h ${TARGET_HOST} -e "
LOAD DATA INFILE 'sftp://gbase@${DB_HOST}//${EXPORT_DIR}/${TABLE}_${BATCH}.txt'
INTO TABLE ${TABLE}
DATA_FORMAT 3
FIELDS TERMINATED BY '&|?' NULL_VALUE 'gbasenull' LINES TERMINATED BY '=??=';
"

# 4. Capture target checksum
gccli -u ${DB_USER} -h ${TARGET_HOST} -e "CHECKSUM TABLE ${TABLE};" > /tmp/tgt_checksum.txt

# 5. Compare
diff /tmp/src_checksum.txt /tmp/tgt_checksum.txt && echo "Validation passed" || echo "Validation failed, investigate"
Enter fullscreen mode Exit fullscreen mode

Standardizing format, checksums, and checkpoint logic before the actual migration saves far more time than fixing data inconsistencies after the fact in your gbase database.

Top comments (0)