DEV Community

Michael
Michael

Posted on • Originally published at gbase.cn

When and Why to Run REBALANCE in GBase 8a

The REBALANCE command moves table data from an old distribution map to a new one, aligning physical storage with the latest data layout plan. Beyond fixing uneven data spread, any operation that changes the Distribution requires a manual REBALANCE in your gbase database.

The Core Purpose of REBALANCE

REBALANCE doesn't simply "balance data" — it migrates data to match a new Distribution. Here's the logic:

  1. The cluster's data layout is defined by a Distribution (mapping of segments, nodes, and replicas).
  2. When the Distribution changes (e.g., nodes added or removed), old and new Distributions coexist.
  3. Table data remains tied to the old Distribution.
  4. REBALANCE moves the data from the old Distribution's mapping to the new one, physically relocating rows.

Four Scenarios That Require a Manual REBALANCE

1. Scaling the Cluster (Adding or Removing Nodes)

  • Adding Data Nodes: Create a new Distribution that includes the new nodes, then REBALANCE shifts a portion of data onto them, rebalancing storage and compute load.
  • Removing Data Nodes: Create a Distribution that excludes the nodes to be removed, then REBALANCE moves all their data to the remaining nodes before the nodes can be safely taken out.

2. Node Replacement (Failure Recovery)

When replacing a failed data node, two REBALANCE operations are needed:

  1. First REBALANCE: Create a transitional Distribution without the failed node to move its data elsewhere, preserving data completeness.
  2. Second REBALANCE: After the replacement node joins, create a final Distribution that includes the new node and move data back, restoring the original replica layout and high availability.

3. Resource Transfer Between Virtual Clusters (VCs)

When moving node resources between two VCs, REBALANCE is required on both sides:

  1. On VC1, shrink by creating a Distribution without the departing node, REBALANCE data out, then remove the node.
  2. Join the node to VC2.
  3. On VC2, expand by creating a Distribution that includes the new node, REBALANCE data in.

4. Changing Distribution Pattern or Fragment Parameters

If you need to change the distribution pattern (e.g., Pattern 1 → Pattern 2) or adjust fragment parameters (e.g., increasing replicas from 1 to 2), a new Distribution must be created and REBALANCE triggered to reorganize data under the new rules.

When REBALANCE Is NOT Needed

  • Altering table structure (ALTER TABLE ADD COLUMN) — no physical data movement.
  • Routine DML (INSERT, DELETE, UPDATE) — handled within the current Distribution.
  • Coordinator node scaling — affects metadata only, not user data placement.
  • Service restart without hardware change — Distribution remains unchanged.

REBALANCE Scenarios at a Glance

Scenario Trigger Prerequisite Goal
Scale Up More nodes Create Distribution including new nodes Move data onto new nodes
Scale Down Fewer nodes Create Distribution excluding departing nodes Move data off departing nodes
Node Replacement Hardware change Create transitional Distribution without failed node Move out, replace, move back
Resource Transfer Node changes VC Create old/new Distributions on source and target VCs Migrate data between VCs
Policy Change Pattern/replica change Create Distribution with new parameters Reorganize data under new rules

REBALANCE is the core command for online data reorganization in GBASE's MPP cluster. Any operational change that alters the node‑to‑segment mapping is, at heart, a Distribution change — and must be followed by a REBALANCE to synchronize the physical data. It is always triggered after a new Distribution is created and initialized, and before the old one is dropped. Mastering this command is essential for elastic scaling and advanced operations in a gbase database.

Top comments (0)