Six real slow‑query scenarios with complete trace: data skew, wrong execution plans, oversized intermediate results, and functions that disable partition pruning.
Case 1: Partition Pruning Disabled, 300M Row Full Scan
Symptom: A date‑filtered query that should take 5 seconds ran for 3 minutes.
SELECT dept_id, SUM(amount)
FROM orders
WHERE DATE_FORMAT(order_date, '%Y%m') = '202406'
GROUP BY dept_id;
Diagnosis: EXPLAIN showed SeqScan orders,rows=300000000 — no partition filtering.
Root Cause: The DATE_FORMAT function on the partition key prevented the optimizer from recognising partition boundaries.
Fix:
SELECT dept_id, SUM(amount)
FROM orders
WHERE order_date >= '2024-06-01' AND order_date < '2024-07-01'
GROUP BY dept_id;
Result: Rows dropped from 300M to 25M; time fell from 3 minutes to 8 seconds. Golden rule: never wrap the partition key in a function.
Case 2: Data Skew — One Node Did 95% of the Work
Symptom: A GROUP BY query on province looked fine in EXPLAIN but took 20 minutes.
Diagnosis: gclusterdb.dql_statistic revealed node1 consumed 1198 seconds while other nodes took 12–15 seconds. Node1 processed 280M rows; the others under 500K each.
Root Cause: The distribution key was province, which has only 34 distinct values. High‑volume provinces overloaded a single node.
Fix:
-
Permanent: Rebuild the table with a high‑cardinality key like
user_id. - Temporary: Enable multi‑column hash redistribution for skewed GROUP BYs.
Result: After rebuilding, the query dropped from 20 minutes to 45 seconds.
Case 3: Cartesian Product Filled /tmp
Symptom: A multi‑table JOIN ran for 2 hours and then crashed when /tmp was full.
Diagnosis: EXPLAIN showed CrossJoin.
Root Cause: Three tables were listed in the FROM clause, but the WHERE clause contained only a date filter — no JOIN condition at all.
Fix: Add proper JOIN conditions and set a row cap on intermediate results.
SELECT a.order_id, b.product_name, c.customer_name
FROM orders a
JOIN products b ON a.product_id = b.product_id
JOIN customers c ON a.customer_id = c.customer_id
WHERE a.order_date = '2024-06-01';
-- Safety cap: error when intermediate rows exceed 1 billion
SET _gbase_result_threshold = 1000000000;
Case 4: Small Table Exceeded Broadcast Threshold
Symptom: A JOIN with an 800K‑row dimension table took 40 seconds.
Diagnosis: 800K rows exceeded the broadcast threshold (default 100K), so the optimizer chose Hash Redistribute, but misaligned distribution keys caused multiple shuffles.
Fix: Convert the dimension table to a replicated table (800K rows ≈ 100MB, easily affordable).
CREATE TABLE dim_channel_rep (
channel_id INT, channel_name VARCHAR(64)
) REPLICATED;
Result: The JOIN ran entirely locally; 40 seconds became 3 seconds.
Case 5: COUNT(DISTINCT) on High‑Cardinality Column
Symptom: A daily active user count on 500M rows took 25 minutes with frequent "heap exceeds limit" warnings.
Diagnosis: user_id cardinality exceeded 100M; the HashAgg memory spilled to disk.
Fix:
- Enable the two‑phase distinct optimization parameters.
- Long‑term: Pre‑aggregate with a daily ETL materialised view.
Result: Parameters brought it to 6 minutes; pre‑aggregation reduced report queries to milliseconds.
Case 6: OR Condition Caused Full Table Scan
Symptom: A query with an OR filter scanned the entire table, even though each branch ran fast individually.
Root Cause: The columnar engine cannot effectively push down OR conditions for partition pruning and column projection.
Fix: Rewrite as UNION ALL so each branch uses its optimal path.
SELECT * FROM orders WHERE customer_id = 10001
UNION ALL
SELECT * FROM orders WHERE order_date = '2024-06-01' AND customer_id != 10001;
Result: 45 seconds down to 4 seconds.
Slow SQL Troubleshooting Flow
-
EXPLAIN— look for CrossJoin, missing partition pruning, multiple Redistribute steps. - Check
gclusterdb.dql_statistic— a >10× gap between node execution times indicates data skew. - Inspect specific operators — COUNT DISTINCT, Sort, large table broadcast.
- For fixed reports, prefer pre‑aggregation or materialised views.
Quick Reference
| Symptom | Likely Cause | First Check |
|---|---|---|
| Query 10×+ slower than expected | Partition pruning disabled | EXPLAIN rows |
| One node at 100% CPU, others idle | Data skew | dql_statistic node times |
| /tmp ballooned | Cartesian product or huge ORDER BY | EXPLAIN for CrossJoin |
| Memory warning then query error | COUNT DISTINCT high cardinality | gnode heap logs |
| OR queries extremely slow | OR can't push down | Rewrite as UNION ALL |
| JOIN runs on a single node | Small table exceeds broadcast threshold | Replicate table or raise threshold |
These six patterns cover the most frequent slow‑query culprits in a gbase database. Working through them systematically gets you to the root cause faster and keeps your GBASE analytical workloads running at full speed.
Top comments (0)