Michael

Posted on Jun 12 • Originally published at gbase.cn

GBase 8c Index Design and Performance Tuning: A Practical Guide

#gbase #database #数据库 #operations

Indexes in a gbase database distributed cluster can slash query times from tens of seconds to milliseconds, or they can silently kill write performance and waste storage. This guide covers index types, design principles, optimisation techniques, and a real‑world case study — all with ready‑to‑run commands.

1. Core Index Types and When to Use Them

Type	Characteristics	Best For	Notes
B‑Tree	Default, supports equality & range	Equality, range queries	Avoid leading wildcard `LIKE`
Hash	Extremely fast equality, no range support	High‑frequency equality (e.g., user ID, phone)	Performance drops with high duplication
GIN	Multi‑value types (arrays, JSONB)	JSONB queries, array containment	Poor write performance; read‑heavy use
GiST	Geospatial, full‑text search	Coordinate searches, full‑text	High maintenance cost

Indexes in GBase 8c are built on each DN individually. The CN coordinates scans across DNs. Design must consider data sharding to avoid unnecessary cross‑node index scans.

2. Six Core Design Principles

Build only what you need — I once saw a table with 12 indexes drop bulk insert speed from 1,000 rows/s to under 100 rows/s.
Prefer single‑column indexes; use composite indexes wisely — Follow the leftmost prefix rule; put the most selective and frequently queried column first.
Align with data sharding — Index the shard key whenever possible to avoid full‑cluster index scans.
Prevent index invalidation — No functions on indexed columns, no implicit conversions, avoid leading % in LIKE, ensure both sides of OR are indexed.
Balance write performance — Keep index count ≤ 3 on high‑write tables; prefer B‑Tree over GIN/GiST.
Regularly purge unused indexes — Quarterly review, drop indexes with zero scans.

3. Performance Tuning Techniques

3.1 Identify Unused Indexes

-- Indexes with idx_scan = 0 are candidates for removal
SELECT schemaname, relname AS table_name, indexrelname AS index_name,
       idx_scan, idx_tup_read, idx_tup_fetch
FROM pg_stat_user_indexes
ORDER BY idx_scan ASC;

-- Tables with high sequential scans may need indexes
SELECT relname AS table_name, seq_scan, idx_scan, n_live_tup
FROM pg_stat_user_tables
WHERE seq_scan > 0
ORDER BY seq_scan DESC;

3.2 Restructure Indexes

Fix composite column order:

DROP INDEX IF EXISTS idx_order_create_user;
CREATE INDEX idx_order_user_create ON "order"(user_id, create_time);

Switch to Hash for pure equality:

DROP INDEX IF EXISTS idx_user_phone;
CREATE INDEX idx_user_phone ON "user"(phone) USING HASH;

Partitioned tables — use local indexes:

CREATE INDEX idx_order_create_time ON "order"(create_time) LOCAL;
-- Query with partition pruning
SELECT * FROM "order" PARTITION (p202603) WHERE create_time BETWEEN '2026-03-01' AND '2026-03-30';

3.3 Rewrite SQL to Use Indexes

-- Avoid function on column: SUBSTR(phone,1,3)='138' → phone LIKE '138%'
-- Avoid implicit cast: phone = 13800138000 → phone = '13800138000'
-- Split OR when only one side is indexed
SELECT * FROM "order" WHERE user_id = 1001
UNION ALL
SELECT * FROM "order" WHERE phone = '13800138000';

3.4 Rebuild Indexes to Fix Fragmentation

-- Single index (online)
REINDEX INDEX idx_user_id;
-- Entire table (off‑peak)
REINDEX TABLE "user";
-- Non‑blocking
REINDEX INDEX CONCURRENTLY idx_order_create_time;

4. Real‑World Case Study

Symptom: 10‑million‑row orders table, sharded by order_id. Query “user’s last 3 months of orders” took >10 seconds. Bulk inserts were slow.

Diagnosis: Only order_id was indexed; user_id and create_time caused full scans. Index maintenance consumed >60% of insert time.

Fix:

-- 1. Composite index for the query pattern
CREATE INDEX idx_order_user_create ON "order"(user_id, create_time);

-- 2. Query with partition pruning
SELECT * FROM "order" PARTITION (p202601, p202602, p202603)
WHERE user_id = 1001 AND create_time BETWEEN '2026-01-01' AND '2026-03-31';

-- 3. Monthly rebuild
REINDEX INDEX CONCURRENTLY idx_order_user_create;

Result: Query time dropped from >10 s to <50 ms. Insert throughput rose from 800 rows/s to >2,000 rows/s. Cluster CPU/IO fell by 40%.

5. Common Pitfalls and Quick Fixes

Too many indexes → Build only for high‑frequency columns.
Wrong composite column order → Follow leftmost prefix; high selectivity first.
Ignoring shard key → Index the shard key to avoid cross‑node scans.
Index‑killing SQL → Check execution plans; avoid functions and implicit casts.
Heavy indexing on write‑heavy tables → Keep it ≤ 3 indexes; prefer B‑Tree.
Never maintaining indexes → Quarterly cleanup and rebuild.
Hash index misuse → Use only for pure equality, never for range.

Index optimisation in a gbase database is an ongoing cycle: map query patterns → design indexes → write index‑friendly SQL → monitor regularly → verify. Keeping this rhythm ensures your GBASE cluster stays fast and stable.

DEV Community