In GBase 8a, DELETE or TRUNCATE operations do not immediately release disk space. Data is merely marked as deleted, causing the hollow rate to rise over time. This leads to persistent storage consumption and slower scans. The key to effective cleanup is distinguishing "data removal" from "space reclamation" and choosing the right strategy — partition management, SHRINK SPACE, or table rebuilding — based on the table type.
1. Understanding Hollow Rate
GBase 8a uses logical deletion: invalidated rows are flagged but the underlying disk blocks are not returned to the OS. Two direct consequences follow:
- Disk space does not shrink as expected.
- Scans still traverse the deleted rows, wasting resources.
Before taking any action, evaluate the table's actual state through system views:
SELECT *
FROM performance_schema.tables
WHERE TABLE_SCHEMA = 'ods'
AND TABLE_NAME = 'order_hist';
Compare the volume of data that was logically deleted with the physical disk usage change — this quickly reveals how severe the hollowing is.
2. Choose a Strategy Based on Table Type
Different table profiles demand different approaches:
| Table Type | Recommended Approach | Reason |
|---|---|---|
| Large, time‑based growth | Daily or partitioned tables | Clear lifecycle boundaries, lowest cleanup cost |
| Existing large, non‑partitioned tables |
SHRINK SPACE or rebuild |
Depends on reclaimable space, window size, and blocking cost |
| Small tables with frequent updates/deletes | Periodic SHRINK SPACE FULL or rebuild |
Small footprint, easier to obtain a maintenance window |
| Data needing fast monthly/daily cleanup | Partition drop or truncate | Far more efficient than row‑by‑row DELETE
|
| Very large tables without partitions | First add lifecycle design | Fix the root cause instead of repeatedly patching |
For data with natural time boundaries, partition cleanup is a fundamentally better solution than DELETE followed by SHRINK.
3. Partition Governance: Design for Lifecycle Management
Organize data by month using RANGE partitioning:
CREATE TABLE ods.order_hist (
order_id bigint,
cust_id bigint,
statis_date date,
order_amt decimal(18,2)
)
PARTITION BY RANGE (year(statis_date) * 100 + month(statis_date)) (
PARTITION p202401 VALUES LESS THAN (202402),
PARTITION p202402 VALUES LESS THAN (202403),
PARTITION p202403 VALUES LESS THAN (202404),
PARTITION pmax VALUES LESS THAN MAXVALUE
);
Common partition management operations:
-- Add a new partition
ALTER TABLE ods.order_hist ADD PARTITION (PARTITION p202404 VALUES LESS THAN (202405));
-- Drop a partition (physically removes data)
ALTER TABLE ods.order_hist DROP PARTITION p202401;
-- Truncate a partition
ALTER TABLE ods.order_hist TRUNCATE PARTITION p202402;
4. SHRINK SPACE: Reclaiming Space from Already-Hollow Tables
For non‑partitioned tables that already have hollowing, GBase 8a offers several shrink modes:
| Method | Effect | Best For |
|---|---|---|
SHRINK SPACE |
Fast, but may not reclaim all space | Short maintenance windows, quick relief |
SHRINK SPACE FULL |
Deep reorganization, more thorough | Smaller tables where a longer window is available |
SHRINK SPACE FULL BLOCK_REUSE_RATIO=N |
Balances reclaim ratio and time | When you want more thoroughness without extreme full shrink |
ALTER TABLE ods.order_hist SHRINK SPACE;
ALTER TABLE ods.order_hist SHRINK SPACE FULL;
ALTER TABLE ods.order_hist SHRINK SPACE FULL BLOCK_REUSE_RATIO=30;
Important: SHRINK SPACE is a DDL operation that impacts table access. Using the rebalance mode relaxes some restrictions, but you should still schedule it during off‑peak hours.
5. Table Rebuild: A More Controllable Deep Cleanup
When a table is heavily fragmented, or you want to adjust storage attributes or column order simultaneously, rebuilding is often the most reliable approach:
-- 1. Create a new table with the same structure
CREATE TABLE ods.order_hist_new LIKE ods.order_hist;
-- 2. Migrate only the needed data
INSERT INTO ods.order_hist_new
SELECT * FROM ods.order_hist
WHERE statis_date >= '2024-01-01';
-- 3. Swap tables
RENAME TABLE ods.order_hist TO ods.order_hist_bak;
RENAME TABLE ods.order_hist_new TO ods.order_hist;
-- 4. Verify, then drop the old table
DROP TABLE ods.order_hist_bak;
This approach breaks the operation into clear, verifiable steps that are easy to roll back.
6. Build a Candidate Table Screening and Monitoring Routine
Regularly inspect system tables, identify tables that need governance, and record key metrics:
| Metric | Meaning |
|---|---|
| Current disk size | Total physical size of the table |
| Valid row count | Number of rows actually in use |
| Estimated valid space | Theoretical space for valid data |
| Reclaimable space | Space expected to be freed after governance |
| Hollow rate | Ratio of deleted data to total space |
A simple tracking table:
CREATE TABLE IF NOT EXISTS ops.table_space_check (
dbname VARCHAR(128),
tbname VARCHAR(128),
current_disk_size BIGINT,
record_count BIGINT,
calc_disk_size BIGINT,
can_release_disk_size BIGINT,
hole_ratio DECIMAL(10,4),
check_time DATETIME
);
Pair this with a scheduled scan to transform space management from a reactive firefight into a routine operation.
7. Schedule Maintenance Windows Outside Business Peaks
Space reclamation operations — SHRINK, partition drops, rebuilds — are resource‑intensive DDL. They must run inside a designated change window, coordinated with batch jobs, data sync schedules, and definitely outside of peak traffic.
8. Summary
- Partitioning is the first choice — For time‑based large tables, use partitions or daily tables to solve the cleanup problem at the structural level.
-
Evaluate before shrinking — For non‑partitioned tables, assess reclaimable space and hollow rate, then choose the right
SHRINKmode. - Rebuilding is more controllable — Severely fragmented tables or those needing structural adjustments benefit from a step‑by‑step rebuild.
- Assess first, then execute — Build a closed loop of candidate screening, space estimation, and change logging.
The heart of the matter is distinguishing "how much data was logically deleted" from "how much space can physically be reclaimed" — and then executing the right operation in the right window. Plan before you clean, and design before you cure — this prevents the cycle of repeated emergency governance in your gbase database.
Top comments (0)