Safe Node Removal vs. Failed Node Replacement in GBase 8a

#gbase #database #数据库 #rebalance

Shrinking a cluster and replacing a failed node are two fundamental operations in GBase 8a. Both follow a “data safety first” principle, but they differ in goals, procedures, and risk profiles. Here’s how to execute each safely in your gbase database.

How to Safely Remove a Node (Scale In)

The goal is to permanently reduce the cluster size. The rule: move all data off the node first, then remove it.

Create a new Distribution: Edit gcChangeInfo.xml to list only the nodes you want to keep, run gcadmin distribution to build the new map, and execute initnodedatamap.
Run REBALANCE: Migrate data from the old Distribution to the new one. Monitor progress until it reaches 100%:

   SELECT * FROM gclusterdb.rebalancing_status;

Clean up the old Distribution: Drop the old hashmap and Distribution.
Remove the node from the cluster: Edit gcChangeInfo.xml again, this time listing only the departing node’s IP, then execute gcadmin rmnodes to turn it into a Free Node.
(Optional) Uninstall the software completely.

How to Replace a Failed Node

The goal is to maintain the same cluster size while swapping out faulty hardware. There are two modes:

Mode A: Replace with a Freenode (old and new IPs differ).
Mode B: Replace with a brand‑new node (IP must match the old one).

Key steps:

Isolate the failed node:

   gcadmin setnodestate <failed_IP> unavailable
   gcadmin rmfeventlog <failed_IP>

Create a transitional Distribution and move data off: Build a Distribution that excludes the failed node, then rebalance so data remains fully replicated elsewhere.
Execute the replacement command (unique to this operation):

   ./replace.py --host=<failed_IP> --freenode=<new_IP> --type=data --vcname=vc1 --dbaUser=gbase --overwrite

The new node joins and a fresh Distribution is generated automatically.

Second REBALANCE and cleanup: Rebalance again to bring data back onto the new node, then drop the transitional Distribution.
Remove the old node: Now empty, it can be safely deleted.

Comparison at a Glance

Dimension	Remove Node (Scale In)	Replace Failed Node
Goal	Fewer nodes, free resources	Same node count, new hardware
Data flow	One‑way: off the departing node	Two‑way: off the failed node, back onto the new one
New node required?	No	Yes
Key commands	`gcadmin distribution`, `rebalance`, `gcadmin rmnodes`	`gcadmin setnodestate`, `replace.py`, `rebalance`
Distribution changes	One new Distribution (excluding removed node)	Two: transitional (excluding failed node) and final (including new node)
REBALANCE count	1	At least 2
Node state management	Usually direct	Must set to `UNAVAILABLE` first
Risk level	Performance impact during migration	Higher — involves state transitions and potential inconsistency

Golden Rules

Always back up first.
Monitor continuously — watch rebalancing_status and cluster health.
Follow the sequence — especially for replacement: isolate → move off → replace → move back → clean up.
Schedule during off‑peak hours — both operations generate heavy I/O and network traffic.
Mind the IPs — when using a Freenode the IP changes, so update connection configs or DNS accordingly. With a brand‑new node, the IP must remain identical.

Knowing whether you’re “slimming down” or “swapping parts” lets you pick the right playbook and execute node changes safely in your gbase database.

DEV Community

Safe Node Removal vs. Failed Node Replacement in GBase 8a

How to Safely Remove a Node (Scale In)

How to Replace a Failed Node

Comparison at a Glance

Golden Rules

Top comments (0)