DEV Community

Michael
Michael

Posted on • Originally published at gbase.cn

GBase 8a Data Import & Export Guide: gload, LOAD DATA, and SELECT INTO OUTFILE

Importing and exporting data are among the most frequent operations in a gbase database MPP data warehouse. This guide covers tool selection, core parameter configuration, character set handling, error troubleshooting, and production‑tuning experience.

1. Import Method Selection

Method Command Throughput Best For
gload gload -f load.cfg Highest Production bulk loads, parallel processing, checkpoint resume
LOAD DATA INFILE SQL statement Medium Single‑file loads, simple syntax, development/testing
INSERT INTO ... VALUES SQL statement Low Small data writes, not for bulk

For large imports (>1 GB), gload is strongly recommended — its parallel processing far exceeds LOAD DATA.

2. gload in Detail

Configuration File

gload is driven by a .cfg file. A full example:

# load_orders.cfg
host     = 10.168.10.26
port     = 5258
user     = gbase
password = your_password
database = sales_db
table    = orders

# Data files (wildcards supported to load multiple files at once)
infile   = /data/orders/orders_2024_*.csv

# File format
fields terminated by ','       # field delimiter
enclosed by '"'                 # string quoting
lines terminated by '\n'        # line terminator
ignore 1 lines                  # skip header line

# Column mapping (file columns mapped to table columns in order)
(order_id, customer_id, dept_id, amount, status, order_date, create_time)

# Error handling
errors = 1000                   # max bad rows allowed (exceeding aborts the load)
Enter fullscreen mode Exit fullscreen mode

Execute the load:

gload -f load_orders.cfg
Enter fullscreen mode Exit fullscreen mode

Common File Format Settings

  • CSV: fields terminated by ',', enclosed by '"'
  • TSV: fields terminated by '\t'
  • Pipe‑delimited: fields terminated by '|'
  • Skip header & remap columns: ignore 1 lines + column list (order_id, amount, order_date, status)

Character Set Configuration

Mismatched character sets are the most common cause of garbled data after import. Specify the file encoding explicitly in the cfg file:

character_set = utf8    # encoding of the data file
Enter fullscreen mode Exit fullscreen mode

Server‑side parameters (in gbase.cnf) should match. If the file is GBK‑encoded, set character_set = gbk in the cfg and ensure the table uses DEFAULT CHARSET=utf8 — the server will auto‑convert.

Collecting Error Rows

After enabling error collection, query the error log using the task_id printed during the load:

SELECT task_id, error_row_no, error_msg,
       SUBSTR(raw_data, 1, 200) AS raw_line
FROM gclusterdb.load_error_log
WHERE task_id = '20240601_143022_000001'
LIMIT 50;
Enter fullscreen mode Exit fullscreen mode

gload Performance Tuning Parameters

Parameter Scope Description Recommended
gcluster_loader_max_data_processors gcluster Concurrent processing threads physical CPU cores / 2
gcluster_loader_min_chunk_size gcluster Chunk size per gnode (bytes) 67108864 (64 MB)
gbase_loader_parallel_degree gnode Parallel write threads per gnode 4–8
gbase_loader_buffer_count gnode Number of write buffers 4
gbase_loader_read_timeout gnode Data read timeout (seconds) 300

3. LOAD DATA INFILE in Detail

Basic Syntax

LOAD DATA INFILE '/data/orders/orders.csv'
INTO TABLE orders
CHARACTER SET utf8
FIELDS TERMINATED BY ','
       ENCLOSED BY '"'
       ESCAPED BY '\\'
LINES TERMINATED BY '\n'
IGNORE 1 LINES
(order_id, customer_id, dept_id, amount, status, order_date, @create_time)
SET create_time = STR_TO_DATE(@create_time, '%Y-%m-%d %H:%i:%s');
Enter fullscreen mode Exit fullscreen mode

Key points: the INFILE path is on the gcluster node; use @var to capture column values and transform them in SET.

LOCAL Keyword

LOAD DATA LOCAL INFILE '/local/path/data.csv'
INTO TABLE orders
FIELDS TERMINATED BY ','
LINES TERMINATED BY '\n';
Enter fullscreen mode Exit fullscreen mode

LOCAL reads the file from the client — convenient for small test files, but not recommended for production.

4. Exporting with SELECT INTO OUTFILE

Basic Usage

SELECT order_id, customer_id, amount, order_date
INTO OUTFILE '/data/export/orders_2024.csv'
CHARACTER SET utf8
FIELDS TERMINATED BY ','
       ENCLOSED BY '"'
       ESCAPED BY '\\'
LINES TERMINATED BY '\n'
FROM orders
WHERE order_date >= '2024-01-01';
Enter fullscreen mode Exit fullscreen mode

Notes: the export path is local to the gcluster node; the target file must not already exist; use gbase_export_directory to restrict allowed write directories.

Export a Specific Partition

SELECT *
INTO OUTFILE '/data/export/orders_2024q1.csv'
FIELDS TERMINATED BY ','
LINES TERMINATED BY '\n'
FROM orders PARTITION (p2024q1);
Enter fullscreen mode Exit fullscreen mode

Concurrent Export Script for Large Tables

#!/bin/bash
for month in 01 02 03 04 05 06 07 08 09 10 11 12; do
    gccli -u gbase -p password -e "
        SELECT * INTO OUTFILE '/data/export/orders_2024${month}.csv'
        FIELDS TERMINATED BY ','
        LINES TERMINATED BY '\n'
        FROM orders
        WHERE order_date BETWEEN '2024-${month}-01' AND LAST_DAY('2024-${month}-01')
    " &
done
wait
echo "All exports done"
Enter fullscreen mode Exit fullscreen mode

5. Monitoring Export Progress

-- Running exports
SELECT task_id, table_name, status, start_time,
       exported_rows,
       TIMESTAMPDIFF(SECOND, start_time, NOW()) AS elapsed_sec
FROM gclusterdb.export_task
WHERE status = 'RUNNING';

-- Historical exports
SELECT task_id, table_name, status, exported_rows,
       start_time, end_time
FROM gclusterdb.export_task
ORDER BY start_time DESC
LIMIT 20;
Enter fullscreen mode Exit fullscreen mode

6. Common Issues and Solutions

  • Fewer rows imported than expected, no error: likely illegal characters silently skipped. Enable gbase_loader_logs_collect = ON and check load_error_log.
  • Slow import with low I/O: concurrency is too low. Increase gcluster_loader_max_data_processors and gbase_loader_parallel_degree.
  • OUTFILE "Can't create/write to file": check directory existence, file pre‑existence, and gbase_export_directory restrictions.
  • Date format errors: use @var + SET with STR_TO_DATE:
LOAD DATA INFILE '/data/orders.csv'
INTO TABLE orders
FIELDS TERMINATED BY ','
(@order_id, @customer_id, @amount, @order_date_str)
SET
    order_id    = @order_id,
    customer_id = @customer_id,
    amount      = @amount,
    order_date  = STR_TO_DATE(@order_date_str, '%Y%m%d');
Enter fullscreen mode Exit fullscreen mode

7. Best Practices

Scenario Recommendation
Daily incremental loads (>1 GB) gload with config file, concurrent multi‑file batches
Dev/test small table loads LOAD DATA LOCAL INFILE
Periodic full exports for backup SELECT INTO OUTFILE with partition‑based concurrent export
Cross‑database migration gload combined with SELECT INTO OUTFILE pipeline
Data quality inspection Enable gbase_loader_logs_collect first to see error distribution

gload is the production workhorse. It parallelises processing at both the gcluster and gnode layers, fully leveraging the multi‑node concurrent write capability of the MPP cluster — throughput is typically 3–5× that of LOAD DATA.

Top comments (0)