DEV Community

Michael
Michael

Posted on • Originally published at gbase.cn

Managing Hollow Rate and Historical Data Cleanup in GBase 8a

In GBase 8a, DELETE or TRUNCATE operations do not immediately release disk space. Data is merely marked as deleted, causing the hollow rate to rise over time. This leads to persistent storage consumption and slower scans. The key to effective cleanup is distinguishing "data removal" from "space reclamation" and choosing the right strategy — partition management, SHRINK SPACE, or table rebuilding — based on the table type.

1. Understanding Hollow Rate

GBase 8a uses logical deletion: invalidated rows are flagged but the underlying disk blocks are not returned to the OS. Two direct consequences follow:

  1. Disk space does not shrink as expected.
  2. Scans still traverse the deleted rows, wasting resources.

Before taking any action, evaluate the table's actual state through system views:

SELECT *
FROM performance_schema.tables
WHERE TABLE_SCHEMA = 'ods'
  AND TABLE_NAME = 'order_hist';
Enter fullscreen mode Exit fullscreen mode

Compare the volume of data that was logically deleted with the physical disk usage change — this quickly reveals how severe the hollowing is.

2. Choose a Strategy Based on Table Type

Different table profiles demand different approaches:

Table Type Recommended Approach Reason
Large, time‑based growth Daily or partitioned tables Clear lifecycle boundaries, lowest cleanup cost
Existing large, non‑partitioned tables SHRINK SPACE or rebuild Depends on reclaimable space, window size, and blocking cost
Small tables with frequent updates/deletes Periodic SHRINK SPACE FULL or rebuild Small footprint, easier to obtain a maintenance window
Data needing fast monthly/daily cleanup Partition drop or truncate Far more efficient than row‑by‑row DELETE
Very large tables without partitions First add lifecycle design Fix the root cause instead of repeatedly patching

For data with natural time boundaries, partition cleanup is a fundamentally better solution than DELETE followed by SHRINK.

3. Partition Governance: Design for Lifecycle Management

Organize data by month using RANGE partitioning:

CREATE TABLE ods.order_hist (
  order_id        bigint,
  cust_id         bigint,
  statis_date     date,
  order_amt       decimal(18,2)
)
PARTITION BY RANGE (year(statis_date) * 100 + month(statis_date)) (
  PARTITION p202401 VALUES LESS THAN (202402),
  PARTITION p202402 VALUES LESS THAN (202403),
  PARTITION p202403 VALUES LESS THAN (202404),
  PARTITION pmax VALUES LESS THAN MAXVALUE
);
Enter fullscreen mode Exit fullscreen mode

Common partition management operations:

-- Add a new partition
ALTER TABLE ods.order_hist ADD PARTITION (PARTITION p202404 VALUES LESS THAN (202405));

-- Drop a partition (physically removes data)
ALTER TABLE ods.order_hist DROP PARTITION p202401;

-- Truncate a partition
ALTER TABLE ods.order_hist TRUNCATE PARTITION p202402;
Enter fullscreen mode Exit fullscreen mode

4. SHRINK SPACE: Reclaiming Space from Already-Hollow Tables

For non‑partitioned tables that already have hollowing, GBase 8a offers several shrink modes:

Method Effect Best For
SHRINK SPACE Fast, but may not reclaim all space Short maintenance windows, quick relief
SHRINK SPACE FULL Deep reorganization, more thorough Smaller tables where a longer window is available
SHRINK SPACE FULL BLOCK_REUSE_RATIO=N Balances reclaim ratio and time When you want more thoroughness without extreme full shrink
ALTER TABLE ods.order_hist SHRINK SPACE;
ALTER TABLE ods.order_hist SHRINK SPACE FULL;
ALTER TABLE ods.order_hist SHRINK SPACE FULL BLOCK_REUSE_RATIO=30;
Enter fullscreen mode Exit fullscreen mode

Important: SHRINK SPACE is a DDL operation that impacts table access. Using the rebalance mode relaxes some restrictions, but you should still schedule it during off‑peak hours.

5. Table Rebuild: A More Controllable Deep Cleanup

When a table is heavily fragmented, or you want to adjust storage attributes or column order simultaneously, rebuilding is often the most reliable approach:

-- 1. Create a new table with the same structure
CREATE TABLE ods.order_hist_new LIKE ods.order_hist;

-- 2. Migrate only the needed data
INSERT INTO ods.order_hist_new
SELECT * FROM ods.order_hist
WHERE statis_date >= '2024-01-01';

-- 3. Swap tables
RENAME TABLE ods.order_hist TO ods.order_hist_bak;
RENAME TABLE ods.order_hist_new TO ods.order_hist;

-- 4. Verify, then drop the old table
DROP TABLE ods.order_hist_bak;
Enter fullscreen mode Exit fullscreen mode

This approach breaks the operation into clear, verifiable steps that are easy to roll back.

6. Build a Candidate Table Screening and Monitoring Routine

Regularly inspect system tables, identify tables that need governance, and record key metrics:

Metric Meaning
Current disk size Total physical size of the table
Valid row count Number of rows actually in use
Estimated valid space Theoretical space for valid data
Reclaimable space Space expected to be freed after governance
Hollow rate Ratio of deleted data to total space

A simple tracking table:

CREATE TABLE IF NOT EXISTS ops.table_space_check (
  dbname                VARCHAR(128),
  tbname                VARCHAR(128),
  current_disk_size     BIGINT,
  record_count          BIGINT,
  calc_disk_size        BIGINT,
  can_release_disk_size BIGINT,
  hole_ratio            DECIMAL(10,4),
  check_time            DATETIME
);
Enter fullscreen mode Exit fullscreen mode

Pair this with a scheduled scan to transform space management from a reactive firefight into a routine operation.

7. Schedule Maintenance Windows Outside Business Peaks

Space reclamation operations — SHRINK, partition drops, rebuilds — are resource‑intensive DDL. They must run inside a designated change window, coordinated with batch jobs, data sync schedules, and definitely outside of peak traffic.

8. Summary

  • Partitioning is the first choice — For time‑based large tables, use partitions or daily tables to solve the cleanup problem at the structural level.
  • Evaluate before shrinking — For non‑partitioned tables, assess reclaimable space and hollow rate, then choose the right SHRINK mode.
  • Rebuilding is more controllable — Severely fragmented tables or those needing structural adjustments benefit from a step‑by‑step rebuild.
  • Assess first, then execute — Build a closed loop of candidate screening, space estimation, and change logging.

The heart of the matter is distinguishing "how much data was logically deleted" from "how much space can physically be reclaimed" — and then executing the right operation in the right window. Plan before you clean, and design before you cure — this prevents the cycle of repeated emergency governance in your gbase database.

Top comments (0)