DEV Community

Michael
Michael

Posted on • Originally published at gbase.cn

How GBase 8a Rough Index Works: Block‑Level Pruning for 10x Faster Queries

The Rough Index is a lightweight, built‑in indexing mechanism of GBase 8a's columnar storage engine. Instead of tracking exact row positions, it records the minimum and maximum value of a column for each data block. When a query runs, the optimizer scans the Rough Index and skips blocks that cannot possibly contain matching rows — drastically cutting disk I/O in a gbase database.

How Rough Index Works

Imagine an amount column split into 1,000 blocks:

  • Block 1: min 10, max 500
  • Block 2: min 501, max 1,200

For a query like WHERE amount > 2000, the engine skips every block whose max value is ≤ 2000. This block pruning can eliminate over 90% of I/O when data is well‑ordered.

Verifying That Rough Index Is Active

Use EXPLAIN to inspect the plan:

EXPLAIN SELECT SUM(amount) FROM orders WHERE create_time >= '2024-01-01';
Enter fullscreen mode Exit fullscreen mode

If you see Rough Index Scan or the scanned block count is far lower than the total, the index is working. You can also query the system view:

SELECT * FROM information_schema.GBASE_ROUGH_INDEX 
WHERE table_name = 'orders';
Enter fullscreen mode Exit fullscreen mode

Three Tips to Maximize Rough Index Performance

  1. Keep data ordered during ingestion: Sorting by frequently filtered columns (e.g., time) before loading shrinks the min‑max range inside each block.
  2. Manually rebuild after heavy updates: Run ALTER TABLE orders REBUILD ROUGH INDEX; after large‑scale modifications or out‑of‑order inserts.
  3. Prefer range predicates in queries: BETWEEN, >, < deliver the best pruning. IN lists and != are less effective.

The Rough Index is a core performance accelerator of GBASE's columnar engine. Building your data model and load routines around it can yield dramatic speed improvements without any additional hardware in your gbase database.

Top comments (0)