DEV Community

Michael
Michael

Posted on • Originally published at gbase.cn

Safe Node Removal vs. Failed Node Replacement in GBase 8a

Shrinking a cluster and replacing a failed node are two fundamental operations in GBase 8a. Both follow a “data safety first” principle, but they differ in goals, procedures, and risk profiles. Here’s how to execute each safely in your gbase database.

How to Safely Remove a Node (Scale In)

The goal is to permanently reduce the cluster size. The rule: move all data off the node first, then remove it.

  1. Create a new Distribution: Edit gcChangeInfo.xml to list only the nodes you want to keep, run gcadmin distribution to build the new map, and execute initnodedatamap.
  2. Run REBALANCE: Migrate data from the old Distribution to the new one. Monitor progress until it reaches 100%:
   SELECT * FROM gclusterdb.rebalancing_status;
Enter fullscreen mode Exit fullscreen mode
  1. Clean up the old Distribution: Drop the old hashmap and Distribution.
  2. Remove the node from the cluster: Edit gcChangeInfo.xml again, this time listing only the departing node’s IP, then execute gcadmin rmnodes to turn it into a Free Node.
  3. (Optional) Uninstall the software completely.

How to Replace a Failed Node

The goal is to maintain the same cluster size while swapping out faulty hardware. There are two modes:

  • Mode A: Replace with a Freenode (old and new IPs differ).
  • Mode B: Replace with a brand‑new node (IP must match the old one).

Key steps:

  1. Isolate the failed node:
   gcadmin setnodestate <failed_IP> unavailable
   gcadmin rmfeventlog <failed_IP>
Enter fullscreen mode Exit fullscreen mode
  1. Create a transitional Distribution and move data off: Build a Distribution that excludes the failed node, then rebalance so data remains fully replicated elsewhere.
  2. Execute the replacement command (unique to this operation):
   ./replace.py --host=<failed_IP> --freenode=<new_IP> --type=data --vcname=vc1 --dbaUser=gbase --overwrite
Enter fullscreen mode Exit fullscreen mode

The new node joins and a fresh Distribution is generated automatically.

  1. Second REBALANCE and cleanup: Rebalance again to bring data back onto the new node, then drop the transitional Distribution.
  2. Remove the old node: Now empty, it can be safely deleted.

Comparison at a Glance

Dimension Remove Node (Scale In) Replace Failed Node
Goal Fewer nodes, free resources Same node count, new hardware
Data flow One‑way: off the departing node Two‑way: off the failed node, back onto the new one
New node required? No Yes
Key commands gcadmin distribution, rebalance, gcadmin rmnodes gcadmin setnodestate, replace.py, rebalance
Distribution changes One new Distribution (excluding removed node) Two: transitional (excluding failed node) and final (including new node)
REBALANCE count 1 At least 2
Node state management Usually direct Must set to UNAVAILABLE first
Risk level Performance impact during migration Higher — involves state transitions and potential inconsistency

Golden Rules

  • Always back up first.
  • Monitor continuously — watch rebalancing_status and cluster health.
  • Follow the sequence — especially for replacement: isolate → move off → replace → move back → clean up.
  • Schedule during off‑peak hours — both operations generate heavy I/O and network traffic.
  • Mind the IPs — when using a Freenode the IP changes, so update connection configs or DNS accordingly. With a brand‑new node, the IP must remain identical.

Knowing whether you’re “slimming down” or “swapping parts” lets you pick the right playbook and execute node changes safely in your gbase database.

Top comments (0)