DEV Community

Michael
Michael

Posted on • Originally published at gbase.cn

GBase 8c Index Design and Performance Tuning: A Practical Guide

Indexes in a gbase database distributed cluster can slash query times from tens of seconds to milliseconds, or they can silently kill write performance and waste storage. This guide covers index types, design principles, optimisation techniques, and a real‑world case study — all with ready‑to‑run commands.

1. Core Index Types and When to Use Them

Type Characteristics Best For Notes
B‑Tree Default, supports equality & range Equality, range queries Avoid leading wildcard LIKE
Hash Extremely fast equality, no range support High‑frequency equality (e.g., user ID, phone) Performance drops with high duplication
GIN Multi‑value types (arrays, JSONB) JSONB queries, array containment Poor write performance; read‑heavy use
GiST Geospatial, full‑text search Coordinate searches, full‑text High maintenance cost

Indexes in GBase 8c are built on each DN individually. The CN coordinates scans across DNs. Design must consider data sharding to avoid unnecessary cross‑node index scans.

2. Six Core Design Principles

  1. Build only what you need — I once saw a table with 12 indexes drop bulk insert speed from 1,000 rows/s to under 100 rows/s.
  2. Prefer single‑column indexes; use composite indexes wisely — Follow the leftmost prefix rule; put the most selective and frequently queried column first.
  3. Align with data sharding — Index the shard key whenever possible to avoid full‑cluster index scans.
  4. Prevent index invalidation — No functions on indexed columns, no implicit conversions, avoid leading % in LIKE, ensure both sides of OR are indexed.
  5. Balance write performance — Keep index count ≤ 3 on high‑write tables; prefer B‑Tree over GIN/GiST.
  6. Regularly purge unused indexes — Quarterly review, drop indexes with zero scans.

3. Performance Tuning Techniques

3.1 Identify Unused Indexes

-- Indexes with idx_scan = 0 are candidates for removal
SELECT schemaname, relname AS table_name, indexrelname AS index_name,
       idx_scan, idx_tup_read, idx_tup_fetch
FROM pg_stat_user_indexes
ORDER BY idx_scan ASC;

-- Tables with high sequential scans may need indexes
SELECT relname AS table_name, seq_scan, idx_scan, n_live_tup
FROM pg_stat_user_tables
WHERE seq_scan > 0
ORDER BY seq_scan DESC;
Enter fullscreen mode Exit fullscreen mode

3.2 Restructure Indexes

Fix composite column order:

DROP INDEX IF EXISTS idx_order_create_user;
CREATE INDEX idx_order_user_create ON "order"(user_id, create_time);
Enter fullscreen mode Exit fullscreen mode

Switch to Hash for pure equality:

DROP INDEX IF EXISTS idx_user_phone;
CREATE INDEX idx_user_phone ON "user"(phone) USING HASH;
Enter fullscreen mode Exit fullscreen mode

Partitioned tables — use local indexes:

CREATE INDEX idx_order_create_time ON "order"(create_time) LOCAL;
-- Query with partition pruning
SELECT * FROM "order" PARTITION (p202603) WHERE create_time BETWEEN '2026-03-01' AND '2026-03-30';
Enter fullscreen mode Exit fullscreen mode

3.3 Rewrite SQL to Use Indexes

-- Avoid function on column: SUBSTR(phone,1,3)='138' → phone LIKE '138%'
-- Avoid implicit cast: phone = 13800138000 → phone = '13800138000'
-- Split OR when only one side is indexed
SELECT * FROM "order" WHERE user_id = 1001
UNION ALL
SELECT * FROM "order" WHERE phone = '13800138000';
Enter fullscreen mode Exit fullscreen mode

3.4 Rebuild Indexes to Fix Fragmentation

-- Single index (online)
REINDEX INDEX idx_user_id;
-- Entire table (off‑peak)
REINDEX TABLE "user";
-- Non‑blocking
REINDEX INDEX CONCURRENTLY idx_order_create_time;
Enter fullscreen mode Exit fullscreen mode

4. Real‑World Case Study

Symptom: 10‑million‑row orders table, sharded by order_id. Query “user’s last 3 months of orders” took >10 seconds. Bulk inserts were slow.

Diagnosis: Only order_id was indexed; user_id and create_time caused full scans. Index maintenance consumed >60% of insert time.

Fix:

-- 1. Composite index for the query pattern
CREATE INDEX idx_order_user_create ON "order"(user_id, create_time);

-- 2. Query with partition pruning
SELECT * FROM "order" PARTITION (p202601, p202602, p202603)
WHERE user_id = 1001 AND create_time BETWEEN '2026-01-01' AND '2026-03-31';

-- 3. Monthly rebuild
REINDEX INDEX CONCURRENTLY idx_order_user_create;
Enter fullscreen mode Exit fullscreen mode

Result: Query time dropped from >10 s to <50 ms. Insert throughput rose from 800 rows/s to >2,000 rows/s. Cluster CPU/IO fell by 40%.

5. Common Pitfalls and Quick Fixes

  • Too many indexes → Build only for high‑frequency columns.
  • Wrong composite column order → Follow leftmost prefix; high selectivity first.
  • Ignoring shard key → Index the shard key to avoid cross‑node scans.
  • Index‑killing SQL → Check execution plans; avoid functions and implicit casts.
  • Heavy indexing on write‑heavy tables → Keep it ≤ 3 indexes; prefer B‑Tree.
  • Never maintaining indexes → Quarterly cleanup and rebuild.
  • Hash index misuse → Use only for pure equality, never for range.

Index optimisation in a gbase database is an ongoing cycle: map query patterns → design indexes → write index‑friendly SQL → monitor regularly → verify. Keeping this rhythm ensures your GBASE cluster stays fast and stable.

Top comments (0)