A simple SQL suddenly triggers a full table scan, or a JOIN order looks completely off. Most of the time, the culprit is stale statistics misleading the optimizer. This guide walks through how GBase 8a's statistics work, how to read execution plans, and what to do when the optimizer gets it wrong in a gbase database.
1. What Are Statistics?
The GBase 8a optimizer relies on three layers of statistics to estimate the cost of each execution path:
- Table‑level: row count, data pages, average row length.
- Column‑level: min/max values, NULL ratio, cardinality (number of distinct values).
- Block‑level: rough index bounds (min/max) for each data block.
When these statistics become stale or inaccurate, the optimizer's cost estimates drift — and the execution plan can go from a quick range scan to a painfully slow full table scan.
2. Inspecting Statistics
Start by checking the current state of statistics for your tables:
-- Basic table statistics
SHOW TABLE STATUS FROM analytics_db LIKE 'orders';
-- Column-level statistics
SELECT * FROM information_schema.COLUMN_STATISTICS
WHERE table_name = 'orders';
-- Index statistics (Cardinality is key)
SHOW INDEX FROM orders;
-- Last update time
SELECT table_name, table_rows, avg_row_length, data_length, update_time
FROM information_schema.TABLES
WHERE table_schema = 'analytics_db' AND table_name = 'orders';
If the Cardinality value is far from reality, an immediate ANALYZE is needed.
3. Reading EXPLAIN Plans
Before updating statistics, use EXPLAIN to see what the optimizer is currently doing:
EXPLAIN SELECT SUM(amount) FROM orders
WHERE create_time >= '2024-01-01' AND region = 'East';
EXPLAIN PARTITIONS SELECT SUM(amount) FROM orders
WHERE create_time >= '2024-01-01';
EXPLAIN EXTENDED SELECT SUM(amount) FROM orders
WHERE create_time >= '2024-01-01';
Key fields to watch:
-
type:
ALLsignals a full table scan — the most urgent red flag. - rows: If the estimate is far larger than the actual result set, filtering is inefficient.
-
Extra:
Using filesortorUsing temporarydeserve attention. -
key:
NULLmeans no index is being used.
4. Manually Updating Statistics
Trigger ANALYZE manually in these situations: after bulk loads or deletes, when performance suddenly drops, before the first query on a new table, after adding or dropping partitions, or when Cardinality deviates by over 50% from reality.
-- Single table
ANALYZE TABLE orders;
-- Multiple tables
ANALYZE TABLE orders, users, products;
-- Specific partition (faster than full‑table)
ALTER TABLE orders ANALYZE PARTITION p2024;
Automate with a cron script that runs during off‑peak hours:
#!/bin/bash
TABLES="orders users products regions"
for TABLE in $TABLES; do
gccli -u gbase -p'password' analytics_db \
-e "ANALYZE TABLE $TABLE;" >> /var/log/gbase_analyze.log 2>&1
done
# crontab: 0 3 * * * /opt/scripts/gbase_analyze.sh
5. Common Problems and Fixes
5.1 Partition Pruning Disabled
Applying a function to the partition key prevents the optimizer from inferring partition boundaries.
-- Bad: YEAR() forces a scan of all partitions
SELECT * FROM orders WHERE YEAR(create_time) = 2024;
-- Good: direct range condition
SELECT * FROM orders
WHERE create_time >= '2024-01-01' AND create_time < '2025-01-01';
5.2 Wrong JOIN Order
Run ANALYZE first, then verify with EXPLAIN.
EXPLAIN
SELECT o.order_id, u.username
FROM orders o JOIN users u ON o.user_id = u.user_id
WHERE o.create_time >= '2024-01-01';
5.3 Large Table JOIN Causing Full Shuffle
Update statistics first. If the plan still looks bad, use a Hint to broadcast the small table.
ANALYZE TABLE users;
SELECT /*+ BROADCAST(u) */
o.order_id, u.username
FROM orders o JOIN users u ON o.user_id = u.user_id;
6. Using Hints to Override Plans
Hints are a temporary measure, not a permanent fix. Always try statistics and SQL rewriting first.
-- Force JOIN driving order
SELECT /*+ STRAIGHT_JOIN */
o.order_id, u.username
FROM orders o JOIN users u ON o.user_id = u.user_id;
-- Broadcast the small table
SELECT /*+ BROADCAST(u) */
o.order_id, u.username
FROM orders o JOIN users u ON o.user_id = u.user_id;
-- Set parallelism
SELECT /*+ PARALLEL(4) */
region, SUM(amount) FROM orders GROUP BY region;
-- Combine hints
SELECT /*+ BROADCAST(u) PARALLEL(8) */
o.region, u.username, SUM(o.amount)
FROM orders o JOIN users u ON o.user_id = u.user_id
GROUP BY o.region, u.username;
7. Building a Statistics Management Routine
- Always ANALYZE after data imports — make it the final step of every load script.
- Schedule daily ANALYZE for write‑heavy tables during off‑peak hours (2–4 AM).
-
Monitor Cardinality drift — alert if
table_rowsdeviates fromCOUNT(*)by more than 30%. - Periodically compare EXPLAIN plans for critical queries against a stored baseline.
8. Important Notes
-
ANALYZE TABLEconsumes resources; run it on large tables during maintenance windows. - Statistics have a natural lag; don't rely solely on automatic collection for frequently written tables.
- Hints are a last resort. Overusing them makes SQL harder to maintain and can become a bottleneck as data evolves.
- When execution plans become unstable, always check statistics first — not the hardware or the SQL.
- For partitioned tables,
ANALYZEnew partitions immediately; otherwise the optimizer sees them as empty.
Keeping statistics fresh is the single most impactful maintenance task for query performance in a gbase database. A disciplined ANALYZE routine prevents most optimizer misjudgments before they ever reach production.
Top comments (0)