Michael

Posted on Jun 19 • Originally published at gbase.cn

GBase 8a High Availability Deep Dive: gcware Quorum, Replica Consistency, and Failover

#gbase #database #数据库 #operations

This article explains the core high‑availability mechanisms of a gbase database cluster: how gcware arbitration works, how multi‑replica consistency is maintained, what happens during automatic node failover, and how to handle common replica anomalies.

1. Three‑Tier HA Architecture

GBase 8a's high availability relies on three cooperating layers:

gcware (arbitration layer): Based on Corosync/Pacemaker, deployed on an odd number of nodes (3 or 5). Responsible for heartbeats, split‑brain prevention, and leader election.
gcluster (coordination layer): Multi‑node deployment; any node can serve external requests. Metadata is synchronised across gcluster nodes.
gnode (data layer): Each piece of data has 1 primary + N replicas. The primary handles reads/writes; replicas sync from the primary. gcware arbitrates the primary role.

2. gcware: The Arbitration Core

gcware uses a quorum principle: the cluster works only when more than half the gcware nodes are alive.

gcware Nodes	Tolerated Failures	Minimum Alive
3	1	2
5	2	3
7	3	4

Deploying an even number (e.g., 4) is dangerous: during a network partition, both sides have 2 nodes and each thinks it has quorum — causing a split‑brain. The cluster will refuse service to protect consistency. Always deploy gcware on an odd number of nodes.

From V9.5.3 onwards, gcware can be deployed independently — you can run it on lightweight VMs, saving data‑node resources, and gcluster scaling is no longer constrained by the odd‑node requirement.

Each gnode periodically reports its status to gcware. When a gnode fails, gcware detects the heartbeat timeout and: marks the node DOWN → picks the replica with the highest data version (LSN) and promotes it to primary → notifies gcluster to update the routing table.

3. Data Replica Mechanism

Segments and Replicas

Specify the replica count when creating a distribution:

# p 2 = 2 primary shards, d 1 = 1 duplicate → 1 primary + 1 replica
gcadmin distribution gcChangeInfo.xml p 2 d 1 pattern 1

View segment placement:

gcadmin showdistribution node

Each segment's primary and replica reside on different nodes. When a node fails, its primary segments are taken over by replicas on other nodes.

Replication Mode

Primary‑replica sync is asynchronous: the primary returns to the client immediately after a write, and the change is pushed to replicas in the background. In rare cases (primary crashes right after a write), replicas may briefly lag. gcware compares the Log Sequence Number (LSN) to select the most up‑to‑date replica for promotion.

Checking Replica Consistency

SELECT segment_id, node_name, is_primary, data_state, version
FROM gclusterdb.segment_info
ORDER BY segment_id, is_primary DESC;

data_state values: 0 = consistent, 1 = replica catching up, 2 = severely lagging — manual intervention needed.

4. Node Failover Process

Automatic Failover

gcware detects heartbeat timeout (default 5 s)
gcware marks the node DOWN
Promotes the most up‑to‑date replica to primary
The new primary starts serving reads and writes
gcluster updates its internal routing table
Subsequent SQL is automatically routed to the new primary — transparent to applications

The whole process typically completes in 5–30 seconds.

Handling Primary‑Replica Inconsistency

Configure the behaviour when inconsistency is detected:

# gbase.cnf on gcluster
# 0 = refuse service (conservative)
# 1 = auto‑select a new primary (may lose a small amount of data)
gcluster_suffix_consistency_resolve = 1

Evaluate data‑loss tolerance carefully in production before enabling automatic promotion.

Data Resync After Node Recovery

When a failed node restarts, it automatically re‑synchronises with the current primary:

# Check sync progress
gcadmin showdistribution node

# Force a resync if stuck
gcadmin resync node <node_name>

5. Common HA Troubleshooting

Fault 1: gcware won't start — "can not connect to any server"

Cause: gcware service not running, or Corosync port (UDP 5405) blocked by firewall.

# Check gcware process
ps -ef | grep gcware

# Check Corosync port
netstat -tunlp | grep 5405

# Manually start gcware
gcware_services all start

# Inspect gcware log
tail -200 $GCWARE_BASE/log/gcware.log

Fault 2: gnode status CLOSE, log shows memory limit exceeded

Cause: gnode heap memory parameters are too low.

Fix: edit gbase.cnf on the affected node:

gbase_memory_pct_target = 0.75
gbase_heap_data         = 4096M
gbase_heap_temp         = 2048M
gbase_heap_large        = 4096M

Restart and verify:

gcluster_services all restart
gcadmin  # confirm node status returns to OPEN

Fault 3: Cluster INACTIVE — more than half the gcware nodes unreachable

When over half the gcware nodes are down, the cluster enters INACTIVE state and rejects all writes (protecting data consistency). Do not attempt forced writes. First restore gcware to a quorum majority, then check gnodes one by one.

6. HA Operations Best Practices

Recommendation	Reason
Deploy gcware on odd numbers (3 or 5)	Prevents split‑brain; ensures quorum arbitration
Separate gcware from data nodes (V9.5.3+)	Avoids data‑node failures impacting the arbitration layer
Place primary/replica on different physical machines/racks	Prevents a single hardware fault from taking down both
Periodically check `data_state` in segment_info	Catches replica lag early
Replica count ≥ 2 (i.e., at least 1 primary + 1 replica)	Survives single‑node failures without service impact

7. Quick Command Reference

# Overall cluster status
gcadmin

# Segment distribution and replica state per node
gcadmin showdistribution node

# Start gcware on all gcware nodes
gcware_services all start

# Start gcluster/gnode on all nodes
gcluster_services all start

# Follow gcware log
tail -f $GCWARE_BASE/log/gcware.log

# Follow gcluster log
tail -f $GCLUSTER_BASE/log/gcluster/system.log

# Follow gnode log
tail -f $GNODE_BASE/log/gbase/system.log

Understanding these HA mechanisms is essential for keeping a gbase database cluster reliable. The quorum‑based gcware layer, asynchronous replica sync, and automatic failover work together to provide continuous service even when individual nodes fail — as long as the cluster is deployed with the right topology and monitored proactively.

DEV Community