Shrinking a cluster and replacing a failed node are two fundamental operations in GBase 8a. Both follow a “data safety first” principle, but they differ in goals, procedures, and risk profiles. Here’s how to execute each safely in your gbase database.
How to Safely Remove a Node (Scale In)
The goal is to permanently reduce the cluster size. The rule: move all data off the node first, then remove it.
-
Create a new Distribution: Edit
gcChangeInfo.xmlto list only the nodes you want to keep, rungcadmin distributionto build the new map, and executeinitnodedatamap. - Run REBALANCE: Migrate data from the old Distribution to the new one. Monitor progress until it reaches 100%:
SELECT * FROM gclusterdb.rebalancing_status;
- Clean up the old Distribution: Drop the old hashmap and Distribution.
-
Remove the node from the cluster: Edit
gcChangeInfo.xmlagain, this time listing only the departing node’s IP, then executegcadmin rmnodesto turn it into a Free Node. - (Optional) Uninstall the software completely.
How to Replace a Failed Node
The goal is to maintain the same cluster size while swapping out faulty hardware. There are two modes:
- Mode A: Replace with a Freenode (old and new IPs differ).
- Mode B: Replace with a brand‑new node (IP must match the old one).
Key steps:
- Isolate the failed node:
gcadmin setnodestate <failed_IP> unavailable
gcadmin rmfeventlog <failed_IP>
- Create a transitional Distribution and move data off: Build a Distribution that excludes the failed node, then rebalance so data remains fully replicated elsewhere.
- Execute the replacement command (unique to this operation):
./replace.py --host=<failed_IP> --freenode=<new_IP> --type=data --vcname=vc1 --dbaUser=gbase --overwrite
The new node joins and a fresh Distribution is generated automatically.
- Second REBALANCE and cleanup: Rebalance again to bring data back onto the new node, then drop the transitional Distribution.
- Remove the old node: Now empty, it can be safely deleted.
Comparison at a Glance
| Dimension | Remove Node (Scale In) | Replace Failed Node |
|---|---|---|
| Goal | Fewer nodes, free resources | Same node count, new hardware |
| Data flow | One‑way: off the departing node | Two‑way: off the failed node, back onto the new one |
| New node required? | No | Yes |
| Key commands |
gcadmin distribution, rebalance, gcadmin rmnodes
|
gcadmin setnodestate, replace.py, rebalance
|
| Distribution changes | One new Distribution (excluding removed node) | Two: transitional (excluding failed node) and final (including new node) |
| REBALANCE count | 1 | At least 2 |
| Node state management | Usually direct | Must set to UNAVAILABLE first |
| Risk level | Performance impact during migration | Higher — involves state transitions and potential inconsistency |
Golden Rules
- Always back up first.
-
Monitor continuously — watch
rebalancing_statusand cluster health. - Follow the sequence — especially for replacement: isolate → move off → replace → move back → clean up.
- Schedule during off‑peak hours — both operations generate heavy I/O and network traffic.
- Mind the IPs — when using a Freenode the IP changes, so update connection configs or DNS accordingly. With a brand‑new node, the IP must remain identical.
Knowing whether you’re “slimming down” or “swapping parts” lets you pick the right playbook and execute node changes safely in your gbase database.
Top comments (0)