Michael

Posted on Jun 11 • Originally published at gbase.cn

GBase 8c Backup, Recovery, and Data Security: A Production‑Ready Guide

#gbase #database #数据库 #security

Backup and recovery is the last line of defense for your gbase database. This guide covers backup types, strategy design, recovery procedures with full step‑by‑step instructions, and security measures.

1. Backup Types and Use Cases

Type	Characteristics	Speed	Recovery Speed	Use Case
Full	All data, metadata, config	Slow	Fast	Weekly, before major changes
Incremental	Changes since last backup	Fast	Medium (requires full + all incrementals)	Daily
Differential	Changes since last full	Moderate	Medium	Moderate change, shorter chain
WAL Log	Real‑time write‑ahead log	Very fast	Point‑in‑time	Always used to minimise RPO

Backups are coordinated by the CN; each DN backs up its own data. Recovery must follow the “CN first, then DNs” order.

2. Backup Strategy Design

Core principles:

Meet RTO/RPO targets (e.g., RTO ≤ 30 min, RPO ≤ 5 min).
Run during off‑peak hours (0:00–3:00).
Combine full + incremental + WAL archiving.
Validate every backup immediately; alert on failure.
Store backups on a physically separate device, plus an offsite copy.

Example strategy (1 TB cluster):

Full: Sunday 0:00, retained 30 days.
Incremental: Mon–Sat 0:00, retained 7 days.
WAL: real‑time archiving, 5‑minute rotation, retained 15 days.
Validation: automatic after each backup.
Storage: dedicated NAS + offsite backup.

3. Essential Backup Commands with Steps

3.1 Full Backup

# Run on the CN node to back up all data, metadata, and configuration
gs_basebackup -D /backup/gbase8c/full/$(date +%Y%m%d) \
  -h 192.168.1.100 \   # CN IP
  -p 5432 \            # CN port
  -U backup_user \     # Dedicated backup user with necessary privileges
  -F c \               # Custom format for easy restore
  -X stream \          # Stream backup to reduce I/O
  -P \                 # Show progress
  -v                   # Verbose logging

3.2 Incremental Backup

# Specify the previous full backup directory
gs_backup incremental \
  --backup-dir /backup/gbase8c/incremental/$(date +%Y%m%d) \
  --base-backup-dir /backup/gbase8c/full/20260330 \   # Previous full backup path
  --host 192.168.1.100 \
  --port 5432 \
  --user backup_user \
  --verbose

3.3 Enable WAL Archiving

-- Configure automatic archiving to prevent log loss
ALTER SYSTEM SET wal_level = replica;
ALTER SYSTEM SET archive_mode = on;
ALTER SYSTEM SET archive_command = 'cp %p /backup/gbase8c/wal/%f';
ALTER SYSTEM SET archive_timeout = 300;   -- Rotate every 5 minutes
SELECT pg_reload_conf();                  -- Apply without restart

3.4 Backup Validation

# Verify full backup integrity
gs_basebackup -C -D /backup/gbase8c/full/20260330 \
  -h 192.168.1.100 -p 5432 -U backup_user -v

# Verify WAL log file
pg_waldump /backup/gbase8c/wal/20260330/000000010000000000000001 --verify

4. Recovery Procedures with Step‑by‑Step Instructions

Scenario 1: Single DN Node Data Loss

Symptom: DN2 node disk failure, all data lost, node marked offline, related shards inaccessible.

Prerequisites: Hardware replaced; recent full/incremental backups and complete WAL logs available.

Steps:

# 1. Stop the faulty node service (if still running)
gbase_ctl stop -D /data/gbase8c/dn2

# 2. Remove corrupted data directory
rm -rf /data/gbase8c/dn2/*

# 3. Restore full backup to DN2
gs_basebackup -R -D /data/gbase8c/dn2 \
  -h 192.168.1.100 \   # CN IP
  -p 5432 \
  -U backup_user \
  -F c \
  -f /backup/gbase8c/full/20260330/full_backup.tar

# 4. Restore incremental backup if available
gs_backup restore incremental \
  --backup-dir /backup/gbase8c/incremental/20260331 \
  --target-dir /data/gbase8c/dn2 \
  --user backup_user \
  --verbose

# 5. Apply WAL logs to synchronise to the moment before failure
#    Specify the failure time (adjust as needed)
pg_waldump /backup/gbase8c/wal/20260331/ --start-time '2026-03-31 08:00:00'
pg_basebackup -X fetch -D /data/gbase8c/dn2 --wal-method=stream

# 6. Start the node and verify status
gbase_ctl start -D /data/gbase8c/dn2
gs_om -t status --detail   # Confirm node Normal, shards synced

# 7. Test affected shards to ensure business continuity
SELECT * FROM order WHERE shard_id IN (xxx, xxx);

Important: Ensure hardware is healthy before restore; prohibit writes during recovery; log recovery must target the exact failure moment for consistency.

Scenario 2: Accidental Table Drop

Symptom: DROP TABLE user executed by mistake; need to recover the table with RPO < 5 minutes.

Steps:

# 1. Identify the exact drop time from logs; the recovery end time should be 1-2 seconds earlier
grep "DROP TABLE user" /GBase_HOME/log/gbase-xxxx.log

# 2. Create a temporary restore directory to avoid overwriting live data
mkdir -p /data/gbase8c/temp_restore

# 3. Restore full backup to the temporary directory
gs_basebackup -R -D /data/gbase8c/temp_restore \
  -h 192.168.1.100 \
  -p 5432 \
  -U backup_user \
  -F c \
  -f /backup/gbase8c/full/20260330/full_backup.tar

# 4. Apply WAL logs up to the moment just before the drop (e.g., 10:29:59)
pg_waldump /backup/gbase8c/wal/20260331/ \
  --start-time '2026-03-31 00:00:00' \
  --end-time '2026-03-31 10:29:59' \
  -f /data/gbase8c/temp_restore/wal_restore.sql

# 5. Export the mistakenly deleted table data from the temporary directory
gbase -U backup_user -d gbase -c "COPY (SELECT * FROM user) TO '/data/gbase8c/temp_restore/user_data.csv' WITH CSV;"

# 6. Import the data back to the live cluster
gbase -U backup_user -d gbase -c "COPY user FROM '/data/gbase8c/temp_restore/user_data.csv' WITH CSV;"

# 7. Verify data integrity
SELECT COUNT(*) FROM user;
SELECT * FROM user LIMIT 10;

Important: Always use a temporary directory to avoid overwriting current data; the recovery timestamp must be slightly before the mistake; validate row counts and content after import.

Scenario 3: Full Cluster Crash (e.g., power outage corrupting all nodes)

Symptom: Entire cluster down, all node data damaged; must recover from backups with RTO ≤ 30 minutes.

Steps:

# 1. Ensure all hardware is healthy and network is operational
systemctl start network
systemctl start firewalld   # If rules are configured

# 2. Recover CN node first (coordinator must be ready before DNs)
gbase_ctl stop -D /data/gbase8c/cn   # Stop if already started
rm -rf /data/gbase8c/cn/*            # Purge corrupted data
gs_basebackup -R -D /data/gbase8c/cn \
  -h 192.168.1.100 \
  -p 5432 \
  -U backup_user \
  -F c \
  -f /backup/gbase8c/full/20260330/full_backup.tar
gbase_ctl start -D /data/gbase8c/cn

# 3. Recover all DN nodes one by one (example: dn1~dn4)
for dn in dn1 dn2 dn3 dn4; do
  gbase_ctl stop -D /data/gbase8c/$dn
  rm -rf /data/gbase8c/$dn/*
  gs_basebackup -R -D /data/gbase8c/$dn \
    -h 192.168.1.100 \
    -p 5432 \
    -U backup_user \
    -F c \
    -f /backup/gbase8c/full/20260330/full_backup.tar
  gbase_ctl start -D /data/gbase8c/$dn
done

# 4. Stop the cluster, apply WAL logs up to the failure time
gs_om -t stop
pg_waldump /backup/gbase8c/wal/20260331/ \
  --start-time '2026-03-31 00:00:00' \
  --end-time '2026-03-31 14:00:00'   # Failure time
gs_om -t start

# 5. Verify cluster status and data consistency
gs_om -t status --detail   # All nodes should be Normal
gs_sync_check              # Shard synchronisation check
SELECT COUNT(*) FROM order; # Compare with pre‑failure count

Important: Strictly follow “CN first, then DNs”; all nodes must have network connectivity; perform full business testing after recovery.

5. Data Security Measures

Least privilege: Dedicated backup user; revoke DROP/TRUNCATE from business accounts.
Storage: Local + offsite + encryption; regularly purge expired backups.
Audit & monitoring: Enable audit logs; monitor backup failures and dangerous operations in real time.
Regular drills: Quarterly restore exercises covering single node, full cluster, and accidental drop scenarios.
Hardware redundancy: Use RAID, monitor environment, replace ageing hardware proactively.

6. Common Pitfalls and Correct Practices

Pitfall	Correct Practice
Only full backups	Combine full + incremental + WAL
Backups stored on the cluster itself	Independent storage with offsite copy
Skipping backup validation	Validate every backup immediately
Restoring without consistency checks	Restore CN first, then DNs; verify shards
Over‑privileged backup user	Least privilege, regular audits
Never practicing restores	Quarterly drills to optimise the process

All commands, strategies, and recovery procedures presented here are battle‑tested in production gbase database environments. Apply them directly to safeguard your GBASE deployment.

DEV Community

GBase 8c Backup, Recovery, and Data Security: A Production‑Ready Guide

1. Backup Types and Use Cases

2. Backup Strategy Design

3. Essential Backup Commands with Steps

3.1 Full Backup

3.2 Incremental Backup

3.3 Enable WAL Archiving

3.4 Backup Validation

4. Recovery Procedures with Step‑by‑Step Instructions

Scenario 1: Single DN Node Data Loss

Scenario 2: Accidental Table Drop

Scenario 3: Full Cluster Crash (e.g., power outage corrupting all nodes)

5. Data Security Measures

6. Common Pitfalls and Correct Practices

Top comments (0)