DEV Community

Michael
Michael

Posted on • Originally published at gbase.cn

When GBase 8a Execution Plans Go Wrong: Mastering Statistics Management

A simple SQL suddenly triggers a full table scan, or a JOIN order looks completely off. Most of the time, the culprit is stale statistics misleading the optimizer. This guide walks through how GBase 8a's statistics work, how to read execution plans, and what to do when the optimizer gets it wrong in a gbase database.

1. What Are Statistics?

The GBase 8a optimizer relies on three layers of statistics to estimate the cost of each execution path:

  • Table‑level: row count, data pages, average row length.
  • Column‑level: min/max values, NULL ratio, cardinality (number of distinct values).
  • Block‑level: rough index bounds (min/max) for each data block.

When these statistics become stale or inaccurate, the optimizer's cost estimates drift — and the execution plan can go from a quick range scan to a painfully slow full table scan.

2. Inspecting Statistics

Start by checking the current state of statistics for your tables:

-- Basic table statistics
SHOW TABLE STATUS FROM analytics_db LIKE 'orders';

-- Column-level statistics
SELECT * FROM information_schema.COLUMN_STATISTICS 
WHERE table_name = 'orders';

-- Index statistics (Cardinality is key)
SHOW INDEX FROM orders;

-- Last update time
SELECT table_name, table_rows, avg_row_length, data_length, update_time
FROM information_schema.TABLES
WHERE table_schema = 'analytics_db' AND table_name = 'orders';
Enter fullscreen mode Exit fullscreen mode

If the Cardinality value is far from reality, an immediate ANALYZE is needed.

3. Reading EXPLAIN Plans

Before updating statistics, use EXPLAIN to see what the optimizer is currently doing:

EXPLAIN SELECT SUM(amount) FROM orders 
WHERE create_time >= '2024-01-01' AND region = 'East';

EXPLAIN PARTITIONS SELECT SUM(amount) FROM orders 
WHERE create_time >= '2024-01-01';

EXPLAIN EXTENDED SELECT SUM(amount) FROM orders 
WHERE create_time >= '2024-01-01';
Enter fullscreen mode Exit fullscreen mode

Key fields to watch:

  • type: ALL signals a full table scan — the most urgent red flag.
  • rows: If the estimate is far larger than the actual result set, filtering is inefficient.
  • Extra: Using filesort or Using temporary deserve attention.
  • key: NULL means no index is being used.

4. Manually Updating Statistics

Trigger ANALYZE manually in these situations: after bulk loads or deletes, when performance suddenly drops, before the first query on a new table, after adding or dropping partitions, or when Cardinality deviates by over 50% from reality.

-- Single table
ANALYZE TABLE orders;

-- Multiple tables
ANALYZE TABLE orders, users, products;

-- Specific partition (faster than full‑table)
ALTER TABLE orders ANALYZE PARTITION p2024;
Enter fullscreen mode Exit fullscreen mode

Automate with a cron script that runs during off‑peak hours:

#!/bin/bash
TABLES="orders users products regions"
for TABLE in $TABLES; do
    gccli -u gbase -p'password' analytics_db \
        -e "ANALYZE TABLE $TABLE;" >> /var/log/gbase_analyze.log 2>&1
done

# crontab: 0 3 * * * /opt/scripts/gbase_analyze.sh
Enter fullscreen mode Exit fullscreen mode

5. Common Problems and Fixes

5.1 Partition Pruning Disabled

Applying a function to the partition key prevents the optimizer from inferring partition boundaries.

-- Bad: YEAR() forces a scan of all partitions
SELECT * FROM orders WHERE YEAR(create_time) = 2024;

-- Good: direct range condition
SELECT * FROM orders 
WHERE create_time >= '2024-01-01' AND create_time < '2025-01-01';
Enter fullscreen mode Exit fullscreen mode

5.2 Wrong JOIN Order

Run ANALYZE first, then verify with EXPLAIN.

EXPLAIN 
SELECT o.order_id, u.username
FROM orders o JOIN users u ON o.user_id = u.user_id
WHERE o.create_time >= '2024-01-01';
Enter fullscreen mode Exit fullscreen mode

5.3 Large Table JOIN Causing Full Shuffle

Update statistics first. If the plan still looks bad, use a Hint to broadcast the small table.

ANALYZE TABLE users;

SELECT /*+ BROADCAST(u) */
    o.order_id, u.username
FROM orders o JOIN users u ON o.user_id = u.user_id;
Enter fullscreen mode Exit fullscreen mode

6. Using Hints to Override Plans

Hints are a temporary measure, not a permanent fix. Always try statistics and SQL rewriting first.

-- Force JOIN driving order
SELECT /*+ STRAIGHT_JOIN */ 
    o.order_id, u.username
FROM orders o JOIN users u ON o.user_id = u.user_id;

-- Broadcast the small table
SELECT /*+ BROADCAST(u) */
    o.order_id, u.username
FROM orders o JOIN users u ON o.user_id = u.user_id;

-- Set parallelism
SELECT /*+ PARALLEL(4) */
    region, SUM(amount) FROM orders GROUP BY region;

-- Combine hints
SELECT /*+ BROADCAST(u) PARALLEL(8) */
    o.region, u.username, SUM(o.amount)
FROM orders o JOIN users u ON o.user_id = u.user_id
GROUP BY o.region, u.username;
Enter fullscreen mode Exit fullscreen mode

7. Building a Statistics Management Routine

  • Always ANALYZE after data imports — make it the final step of every load script.
  • Schedule daily ANALYZE for write‑heavy tables during off‑peak hours (2–4 AM).
  • Monitor Cardinality drift — alert if table_rows deviates from COUNT(*) by more than 30%.
  • Periodically compare EXPLAIN plans for critical queries against a stored baseline.

8. Important Notes

  • ANALYZE TABLE consumes resources; run it on large tables during maintenance windows.
  • Statistics have a natural lag; don't rely solely on automatic collection for frequently written tables.
  • Hints are a last resort. Overusing them makes SQL harder to maintain and can become a bottleneck as data evolves.
  • When execution plans become unstable, always check statistics first — not the hardware or the SQL.
  • For partitioned tables, ANALYZE new partitions immediately; otherwise the optimizer sees them as empty.

Keeping statistics fresh is the single most impactful maintenance task for query performance in a gbase database. A disciplined ANALYZE routine prevents most optimizer misjudgments before they ever reach production.

Top comments (0)