When a data node (gnode) process exits unexpectedly, GBase 8a's multi‑layer monitoring kicks in automatically. The cluster will try to restart the service, isolate the failed node, switch traffic away, and later resync the data — all without human intervention in most cases.
How Faults Are Detected
-
Process‑level monitoring (GCMonit) runs on every node, watching core processes like
gbasedandsyncserver. The moment a process dies, GCMonit attempts an automatic restart according to its configuration. -
Cluster‑level heartbeat (GCware) tracks every GNode's heartbeat. If heartbeats time out or stop, GCware marks the node's service state as
CLOSE.
Automated Response Sequence
- Automatic restart — GCMonit tries to relaunch the
gbasedprocess immediately. This first line of defense recovers from transient issues like memory spikes. - Service isolation and traffic redirection — If the restart fails, or GCware declares the node dead first, GCware sets the node status to
CLOSEand notifies the GCluster coordinator. New queries are then routed exclusively to healthy nodes that hold replicas of the failed node's data. This switch is transparent to applications. - Data consistency repair — Once the node comes back (either via auto‑restart or manual recovery), GCware logs inconsistency events (DML_EVENT / DDL_EVENT). The GCrecover process picks up these events and triggers SyncServer to copy fresh data from healthy replicas to the recovered node until it is fully consistent.
When Manual Intervention Is Needed
- The monitoring process itself (GCMonit or gcware_monit) crashes, breaking the auto‑restart chain.
- The
gbasedprocess fails to start repeatedly due to misconfiguration, disk full, or memory exhaustion. - A majority of GCware nodes fail, locking the cluster and disabling automated recovery.
- Routine checks reveal a node stuck in
CLOSEor a process permanentlyDOWN.
Common manual commands:
# Check cluster and node status
gcadmin showcluster vc <vc_name>
# Restart all services on the failed node
gcluster_services all restart
# Inspect the error log
tail -100f /opt/gbase/gnode/log/gbase/system.log
GBase 8a’s multi‑layered automation ensures that a single gnode crash rarely causes a service disruption. In a typical gbase database deployment, the DBA’s role shifts from firefighting to monitoring edge cases where the automated machinery needs a helping hand.
Top comments (0)