Backup and recovery is the last line of defense for your gbase database. This guide covers backup types, strategy design, recovery procedures with full step‑by‑step instructions, and security measures.
1. Backup Types and Use Cases
| Type | Characteristics | Speed | Recovery Speed | Use Case |
|---|---|---|---|---|
| Full | All data, metadata, config | Slow | Fast | Weekly, before major changes |
| Incremental | Changes since last backup | Fast | Medium (requires full + all incrementals) | Daily |
| Differential | Changes since last full | Moderate | Medium | Moderate change, shorter chain |
| WAL Log | Real‑time write‑ahead log | Very fast | Point‑in‑time | Always used to minimise RPO |
Backups are coordinated by the CN; each DN backs up its own data. Recovery must follow the “CN first, then DNs” order.
2. Backup Strategy Design
Core principles:
- Meet RTO/RPO targets (e.g., RTO ≤ 30 min, RPO ≤ 5 min).
- Run during off‑peak hours (0:00–3:00).
- Combine full + incremental + WAL archiving.
- Validate every backup immediately; alert on failure.
- Store backups on a physically separate device, plus an offsite copy.
Example strategy (1 TB cluster):
- Full: Sunday 0:00, retained 30 days.
- Incremental: Mon–Sat 0:00, retained 7 days.
- WAL: real‑time archiving, 5‑minute rotation, retained 15 days.
- Validation: automatic after each backup.
- Storage: dedicated NAS + offsite backup.
3. Essential Backup Commands with Steps
3.1 Full Backup
# Run on the CN node to back up all data, metadata, and configuration
gs_basebackup -D /backup/gbase8c/full/$(date +%Y%m%d) \
-h 192.168.1.100 \ # CN IP
-p 5432 \ # CN port
-U backup_user \ # Dedicated backup user with necessary privileges
-F c \ # Custom format for easy restore
-X stream \ # Stream backup to reduce I/O
-P \ # Show progress
-v # Verbose logging
3.2 Incremental Backup
# Specify the previous full backup directory
gs_backup incremental \
--backup-dir /backup/gbase8c/incremental/$(date +%Y%m%d) \
--base-backup-dir /backup/gbase8c/full/20260330 \ # Previous full backup path
--host 192.168.1.100 \
--port 5432 \
--user backup_user \
--verbose
3.3 Enable WAL Archiving
-- Configure automatic archiving to prevent log loss
ALTER SYSTEM SET wal_level = replica;
ALTER SYSTEM SET archive_mode = on;
ALTER SYSTEM SET archive_command = 'cp %p /backup/gbase8c/wal/%f';
ALTER SYSTEM SET archive_timeout = 300; -- Rotate every 5 minutes
SELECT pg_reload_conf(); -- Apply without restart
3.4 Backup Validation
# Verify full backup integrity
gs_basebackup -C -D /backup/gbase8c/full/20260330 \
-h 192.168.1.100 -p 5432 -U backup_user -v
# Verify WAL log file
pg_waldump /backup/gbase8c/wal/20260330/000000010000000000000001 --verify
4. Recovery Procedures with Step‑by‑Step Instructions
Scenario 1: Single DN Node Data Loss
Symptom: DN2 node disk failure, all data lost, node marked offline, related shards inaccessible.
Prerequisites: Hardware replaced; recent full/incremental backups and complete WAL logs available.
Steps:
# 1. Stop the faulty node service (if still running)
gbase_ctl stop -D /data/gbase8c/dn2
# 2. Remove corrupted data directory
rm -rf /data/gbase8c/dn2/*
# 3. Restore full backup to DN2
gs_basebackup -R -D /data/gbase8c/dn2 \
-h 192.168.1.100 \ # CN IP
-p 5432 \
-U backup_user \
-F c \
-f /backup/gbase8c/full/20260330/full_backup.tar
# 4. Restore incremental backup if available
gs_backup restore incremental \
--backup-dir /backup/gbase8c/incremental/20260331 \
--target-dir /data/gbase8c/dn2 \
--user backup_user \
--verbose
# 5. Apply WAL logs to synchronise to the moment before failure
# Specify the failure time (adjust as needed)
pg_waldump /backup/gbase8c/wal/20260331/ --start-time '2026-03-31 08:00:00'
pg_basebackup -X fetch -D /data/gbase8c/dn2 --wal-method=stream
# 6. Start the node and verify status
gbase_ctl start -D /data/gbase8c/dn2
gs_om -t status --detail # Confirm node Normal, shards synced
# 7. Test affected shards to ensure business continuity
SELECT * FROM order WHERE shard_id IN (xxx, xxx);
Important: Ensure hardware is healthy before restore; prohibit writes during recovery; log recovery must target the exact failure moment for consistency.
Scenario 2: Accidental Table Drop
Symptom: DROP TABLE user executed by mistake; need to recover the table with RPO < 5 minutes.
Steps:
# 1. Identify the exact drop time from logs; the recovery end time should be 1-2 seconds earlier
grep "DROP TABLE user" /GBase_HOME/log/gbase-xxxx.log
# 2. Create a temporary restore directory to avoid overwriting live data
mkdir -p /data/gbase8c/temp_restore
# 3. Restore full backup to the temporary directory
gs_basebackup -R -D /data/gbase8c/temp_restore \
-h 192.168.1.100 \
-p 5432 \
-U backup_user \
-F c \
-f /backup/gbase8c/full/20260330/full_backup.tar
# 4. Apply WAL logs up to the moment just before the drop (e.g., 10:29:59)
pg_waldump /backup/gbase8c/wal/20260331/ \
--start-time '2026-03-31 00:00:00' \
--end-time '2026-03-31 10:29:59' \
-f /data/gbase8c/temp_restore/wal_restore.sql
# 5. Export the mistakenly deleted table data from the temporary directory
gbase -U backup_user -d gbase -c "COPY (SELECT * FROM user) TO '/data/gbase8c/temp_restore/user_data.csv' WITH CSV;"
# 6. Import the data back to the live cluster
gbase -U backup_user -d gbase -c "COPY user FROM '/data/gbase8c/temp_restore/user_data.csv' WITH CSV;"
# 7. Verify data integrity
SELECT COUNT(*) FROM user;
SELECT * FROM user LIMIT 10;
Important: Always use a temporary directory to avoid overwriting current data; the recovery timestamp must be slightly before the mistake; validate row counts and content after import.
Scenario 3: Full Cluster Crash (e.g., power outage corrupting all nodes)
Symptom: Entire cluster down, all node data damaged; must recover from backups with RTO ≤ 30 minutes.
Steps:
# 1. Ensure all hardware is healthy and network is operational
systemctl start network
systemctl start firewalld # If rules are configured
# 2. Recover CN node first (coordinator must be ready before DNs)
gbase_ctl stop -D /data/gbase8c/cn # Stop if already started
rm -rf /data/gbase8c/cn/* # Purge corrupted data
gs_basebackup -R -D /data/gbase8c/cn \
-h 192.168.1.100 \
-p 5432 \
-U backup_user \
-F c \
-f /backup/gbase8c/full/20260330/full_backup.tar
gbase_ctl start -D /data/gbase8c/cn
# 3. Recover all DN nodes one by one (example: dn1~dn4)
for dn in dn1 dn2 dn3 dn4; do
gbase_ctl stop -D /data/gbase8c/$dn
rm -rf /data/gbase8c/$dn/*
gs_basebackup -R -D /data/gbase8c/$dn \
-h 192.168.1.100 \
-p 5432 \
-U backup_user \
-F c \
-f /backup/gbase8c/full/20260330/full_backup.tar
gbase_ctl start -D /data/gbase8c/$dn
done
# 4. Stop the cluster, apply WAL logs up to the failure time
gs_om -t stop
pg_waldump /backup/gbase8c/wal/20260331/ \
--start-time '2026-03-31 00:00:00' \
--end-time '2026-03-31 14:00:00' # Failure time
gs_om -t start
# 5. Verify cluster status and data consistency
gs_om -t status --detail # All nodes should be Normal
gs_sync_check # Shard synchronisation check
SELECT COUNT(*) FROM order; # Compare with pre‑failure count
Important: Strictly follow “CN first, then DNs”; all nodes must have network connectivity; perform full business testing after recovery.
5. Data Security Measures
- Least privilege: Dedicated backup user; revoke DROP/TRUNCATE from business accounts.
- Storage: Local + offsite + encryption; regularly purge expired backups.
- Audit & monitoring: Enable audit logs; monitor backup failures and dangerous operations in real time.
- Regular drills: Quarterly restore exercises covering single node, full cluster, and accidental drop scenarios.
- Hardware redundancy: Use RAID, monitor environment, replace ageing hardware proactively.
6. Common Pitfalls and Correct Practices
| Pitfall | Correct Practice |
|---|---|
| Only full backups | Combine full + incremental + WAL |
| Backups stored on the cluster itself | Independent storage with offsite copy |
| Skipping backup validation | Validate every backup immediately |
| Restoring without consistency checks | Restore CN first, then DNs; verify shards |
| Over‑privileged backup user | Least privilege, regular audits |
| Never practicing restores | Quarterly drills to optimise the process |
All commands, strategies, and recovery procedures presented here are battle‑tested in production gbase database environments. Apply them directly to safeguard your GBASE deployment.
Top comments (0)