How GBase 8a Rough Index Works: Block‑Level Pruning for 10x Faster Queries

#gbase #database #数据库 #performance

The Rough Index is a lightweight, built‑in indexing mechanism of GBase 8a's columnar storage engine. Instead of tracking exact row positions, it records the minimum and maximum value of a column for each data block. When a query runs, the optimizer scans the Rough Index and skips blocks that cannot possibly contain matching rows — drastically cutting disk I/O in a gbase database.

How Rough Index Works

Imagine an amount column split into 1,000 blocks:

Block 1: min 10, max 500
Block 2: min 501, max 1,200

For a query like WHERE amount > 2000, the engine skips every block whose max value is ≤ 2000. This block pruning can eliminate over 90% of I/O when data is well‑ordered.

Verifying That Rough Index Is Active

Use EXPLAIN to inspect the plan:

EXPLAIN SELECT SUM(amount) FROM orders WHERE create_time >= '2024-01-01';

If you see Rough Index Scan or the scanned block count is far lower than the total, the index is working. You can also query the system view:

SELECT * FROM information_schema.GBASE_ROUGH_INDEX 
WHERE table_name = 'orders';

Three Tips to Maximize Rough Index Performance

Keep data ordered during ingestion: Sorting by frequently filtered columns (e.g., time) before loading shrinks the min‑max range inside each block.
Manually rebuild after heavy updates: Run ALTER TABLE orders REBUILD ROUGH INDEX; after large‑scale modifications or out‑of‑order inserts.
Prefer range predicates in queries: BETWEEN, >, < deliver the best pruning. IN lists and != are less effective.

The Rough Index is a core performance accelerator of GBASE's columnar engine. Building your data model and load routines around it can yield dramatic speed improvements without any additional hardware in your gbase database.