DEV Community

Michael
Michael

Posted on • Originally published at gbase.cn

Tiered Compression for Hot and Cold Data in GBase 8a: Managing the Data Lifecycle

In data lifecycle management, hot data and cold data differ significantly in access frequency, mutation rate, and storage requirements. By leveraging GBase 8a's compression algorithms and levels, you can implement a tiered compression strategy that balances storage efficiency and performance.

Compression Algorithms and Levels

GBase 8a offers three main compression algorithms:

  • HighZ: Storage-first, pursues the highest compression ratio.
  • RapidZ: Balances load and query performance.
  • STDZ: Improves compression ratio while maintaining performance.

Compression levels (0–9) directly affect compression ratio and speed:

Level Characteristic
0 Default, self‑adaptive, balanced
1 Lowest ratio, fastest load
9 Highest ratio, best complex query performance, slowest load

Function mapping: Compress(1,3) equals COMPRESS('HighZ',0), and Compress(5,5) equals COMPRESS('RapidZ',0).

Compression Scope

Three granularities are supported:

  • Global: Applies to all storage nodes in a VC, for strict storage budget control.
  • Table‑level: Specified at table creation, for tables with uniform data patterns.
  • Column‑level: Set per column, adapting to inter‑column differences (e.g., highly repetitive columns versus frequently queried ones).

Hot and Cold Data Strategies

Hot Data

  • Core needs: Low latency, high throughput, high reliability.
  • Data profile: Extremely frequent access, real‑time.
  • Compression strategy: Prioritize load/query performance. Recommended: STDZ0 or RapidZ.

Cold Data

  • Core needs: Low storage cost, infrequent queries.
  • Data profile: Largely read‑only, no inserts or updates; very rare queries, long‑term retention.
  • Compression strategy: Use high‑ratio algorithms HighZ or STDZ9 to dramatically reduce storage pressure.

Implementation

The following example creates hot and cold tables with different compression settings, migrates data, and reclaims space.

-- 1. Hot data table (low compression, high performance)
CREATE TABLE hot_data (
    a INT,
    b VARCHAR(10),
    create_time DATETIME DEFAULT CURRENT_TIMESTAMP
) COMPRESS('STDZ', 0);

-- 2. Cold data archive table (high compression, low storage)
CREATE TABLE cold_data (
    a INT,
    b VARCHAR(10),
    create_time DATETIME DEFAULT CURRENT_TIMESTAMP
) COMPRESS('HighZ', 0);

-- 3. Define the cutoff timestamp for cold data
SET @data_migration_timestamp = '20250101';

-- 4. Move expired rows from hot to cold
INSERT INTO cold_data
SELECT * FROM hot_data
WHERE create_time < @data_migration_timestamp;

-- 5. Verify consistency
SELECT COUNT(*) FROM cold_data;
SELECT COUNT(*) FROM hot_data WHERE create_time < @data_migration_timestamp;

-- 6. Clean up hot data and free space
DELETE FROM hot_data WHERE create_time < @data_migration_timestamp;
ALTER TABLE hot_data SHRINK SPACE FULL;
Enter fullscreen mode Exit fullscreen mode

This workflow uses INSERT...SELECT for hot‑to‑cold migration and SHRINK SPACE FULL for disk reclamation. By aligning GBase 8a database compression features with lifecycle stages, you can achieve the best balance between storage cost and query efficiency in your gbase database environment.

Top comments (0)