Slow queries are a major factor affecting the performance of a gbase database cluster. This article covers the complete workflow — from enabling slow query logging, locating problematic SQL, analyzing execution plans, to optimizing partitions and distribution keys — with a real‑world case study.
1. Enabling Slow Query Logging
Add the following parameters to gbase_8a_gcluster.cnf and restart the cluster:
gcluster_rpc_timeout = 10 # threshold in seconds
slow_query_log = 1
slow_query_log_file = /data/gbase/logs/slow_query.log
log_queries_not_using_indexes = 1
A threshold of 1–3 seconds is recommended based on your workload.
2. Locating Slow Queries
Inspecting the Log File
tail -n 100 /data/gbase/logs/slow_query.log
Each entry records execution time, lock time, rows scanned, and the SQL text.
Querying System Views
-- Currently executing queries
SELECT * FROM gclusterdb.processlist;
-- Historical slow queries
SELECT * FROM gclusterdb.query_history
WHERE execution_time > 3
ORDER BY start_time DESC LIMIT 20;
-- Per‑node query statistics
SELECT * FROM gclusterdb.gnode_query_stats;
3. Analyzing the Execution Plan
EXPLAIN FORMAT=JSON
SELECT a.*, b.name
FROM order_detail a
JOIN orders b ON a.order_id = b.id
WHERE b.create_time > '2026-01-01';
Focus on type (avoid ALL full table scans), key (index usage), rows (estimated rows scanned), and Extra (watch for Using filesort or Using temporary).
Common issues and fixes:
- Full table scan: Avoid wrapping indexed columns in functions.
-- Inefficient: YEAR() disables the index
SELECT * FROM orders WHERE YEAR(create_time) = 2026;
-- Optimized: range condition
SELECT * FROM orders WHERE create_time >= '2026-01-01' AND create_time < '2027-01-01';
- Cartesian product: Always provide explicit JOIN conditions.
- Large table JOINs: Use hints to control the driving table.
SELECT /*+ LEADING(a b) */ a.*, b.*
FROM orders a INNER JOIN order_detail b ON a.id = b.order_id;
4. Partition and Distribution Key Optimization
Range Partitioning Example
CREATE TABLE orders (
id BIGINT PRIMARY KEY,
create_time DATETIME,
status INT,
amount DECIMAL(10,2)
)
PARTITION BY RANGE (YEAR(create_time)) (
PARTITION p2024 VALUES LESS THAN (2025),
PARTITION p2025 VALUES LESS THAN (2026),
PARTITION p2026 VALUES LESS THAN (2027),
PARTITION pmax VALUES LESS THAN MAXVALUE
);
Queries that filter by year will scan only the corresponding partition.
Distribution Key Selection
Choose a high‑cardinality column that is frequently used in JOINs. Avoid low‑cardinality columns that cause data skew.
-- Wrong: status has only 0/1, leading to skew
CREATE TABLE test_table (id INT, status INT) DISTRIBUTED BY (status);
-- Correct: use high‑cardinality order_id
CREATE TABLE test_table (id INT, order_id INT, status INT) DISTRIBUTED BY (order_id);
5. Real‑World Case: Report Query Optimization
Original SQL — counting daily orders and amounts — took 30 seconds.
SELECT DATE(create_time) AS date,
COUNT(*) AS order_count,
SUM(amount) AS total_amount
FROM orders
WHERE create_time BETWEEN '2026-01-01' AND '2026-03-27'
GROUP BY DATE(create_time)
ORDER BY date;
Diagnosis: The execution plan showed a full table scan of 50 million rows. The table had no indexes or partitions.
Optimization:
-- 1. Add an index
ALTER TABLE orders ADD INDEX idx_create_time(create_time);
-- 2. Convert to a partitioned table
ALTER TABLE orders PARTITION BY RANGE (TO_DAYS(create_time)) (
PARTITION p202601 VALUES LESS THAN (TO_DAYS('2026-02-01')),
PARTITION p202602 VALUES LESS THAN (TO_DAYS('2026-03-01')),
PARTITION p202603 VALUES LESS THAN (TO_DAYS('2026-04-01')),
PARTITION pmax VALUES LESS THAN MAXVALUE
);
Result: Execution time dropped from 30 seconds to 0.8 seconds, rows scanned from 50 million to 2 million — a ~37× improvement.
6. Summary
Slow query optimization should be a continuous process. Use execution plans, proper table design, and SQL rewriting to eliminate bottlenecks from the start. Regularly review slow query logs and keep your gbase database performing at its best.
Top comments (0)