Debugging Slow SQL in GBase 8a: How Execution Plans, Distribution Keys, and Parallelism Go Off the Rails Together

#gbase #database #数据库 #parallelism

A slow query in a gbase database MPP cluster is rarely just a matter of sloppy SQL. More often, it's the combined effect of data distribution, execution plan choices, parallelism settings, and resource contention — all interacting at once. If you apply single‑node OLTP tuning habits here, you'll usually make things worse.

This article lays out a complete troubleshooting path, from deciding whether a query is truly slow to fixing distribution‑key mistakes and reining in runaway parallelism.

1. First, Distinguish "Naturally Heavy" from "Suddenly Slow"

Analytical workloads run anywhere from seconds to hours, so there's no single threshold that flags a slow query. Instead, compare execution times against historical baselines for the same task.

Naturally heavy SQL — always slow but stable. Look at the data model, distribution keys, and whether pre‑aggregation would help.
Suddenly slow SQL — the same task recently degraded. Check locks, execution plan changes, data skew, and parallel‑resource contention.

Collect periodic task durations via audit_log and correlate them with table‑growth trends to build a daily monitoring baseline.

2. Rule Out Wait States First

Many queries appear slow because they haven't actually started running — they're stuck on a lock or queued behind other work.

SHOW PROCESSLIST;

If you see states like checking permission, suspect lock waits.

gcadmin showlock

Two common traps: treating a queued session as a slow execution (rewriting the SQL won't help), and mistaking a single slow node for a cluster‑wide problem. In a 16‑node cluster, if 15 nodes finish in 2 minutes and one node takes 18, the entire query feels slow.

3. Watch Data Movement in the Execution Plan

Key motion operators in a GBase 8a plan[reference:8]:

Operator	Meaning
RESULT	Send result to client
GATHER	Collect data at the coordinator
REDIST(...)	Hash‑redistribute on a column
BROADCAST	Replicate / broadcast result
RAND REDIST	Random redistribution

In an MPP context, the most important question is: how many times does the plan move data around? A classic costly scenario: a large fact table is not distributed on its JOIN key, and the dimension table is not replicated — causing multiple REDIST steps that generate extra disk I/O, network transfer, and write overhead on the receiving side.

The community documentation highlights three core issues to look for in execution plans: unreasonable dynamic redistribution, extra redistribution caused by mismatched column types, and Cartesian products triggered by poor JOIN ordering[reference:9].

4. Wrong Distribution Keys Set the Stage for Slow SQL

Make the large‑table JOIN column the Hash distribution key whenever possible; the GROUP BY column is a second choice. GBase 8a V9 also supports multi‑column Hash to reduce skew[reference:10].

The guiding principle: do the most expensive work locally, on the node where the data already sits. Avoid shredding and redistributing massive data sets just to run a JOIN or GROUP.

If a fact table grows by hundreds of millions of rows a day and most core reports slice by customer, distributing on customer_id is far better than the "neutral" order_id.

CREATE TABLE dws_order_day (
    stat_date   DATE,
    order_id    BIGINT,
    customer_id BIGINT,
    product_id  BIGINT,
    pay_amount  DECIMAL(18,2),
    city_id     INT
)
DISTRIBUTED BY HASH(customer_id);

For small, slowly‑changing dimension tables, use replication so that JOINs with large tables don't trigger redistribution[reference:11].

Three easily overlooked issues: pick a hash column with many distinct values to avoid node skew[reference:12]; ensure JOIN columns have matching types on both sides; don't pin all hope on single‑table indexes — in analytical workloads, how data is distributed and moved usually dominates performance.

5. Rewrite SQL to Work With Columnar Storage and MPP Execution

Avoid SELECT *. Remove unnecessary inner ORDER BY. Prevent Cartesian JOINs. For multi‑column GROUP BY / JOIN, place the column with the most distinct values first.

A common anti‑pattern:

-- ❌ SELECT * defeats columnar benefits, inner ORDER BY is pointless, JOIN types mismatch
SELECT *
FROM (
    SELECT * FROM dwd_order_detail
    WHERE stat_date BETWEEN '2026-03-01' AND '2026-03-07'
    ORDER BY create_time
) t
JOIN dim_customer c ON CAST(t.customer_id AS VARCHAR(32)) = c.customer_code;

A safer rewrite:

-- ✅ Slim columns first, aggregate locally on the fact table, then JOIN with matching types
SELECT t.customer_id, t.order_cnt, c.customer_level
FROM (
    SELECT customer_id, COUNT(*) AS order_cnt
    FROM dwd_order_detail
    WHERE stat_date BETWEEN '2026-03-01' AND '2026-03-07'
    GROUP BY customer_id
) t
JOIN dim_customer c ON t.customer_id = c.customer_id;

6. Use Hash Indexes Judiciously

Hash indexes excel at single‑table equality lookups with small result sets. They don't help with range scans or fuzzy queries[reference:13]. For columns with few distinct values and frequent exact queries, a Global Hash index is a common choice.

CREATE INDEX idx_user_id ON dwd_user_behavior(user_id) USING HASH GLOBAL;

But indexes have a maintenance cost: they can slow down data loading and DML. For real‑time loading scenarios, stage data into an index‑free temporary table first, then bulk‑move into the indexed target or build the index in one shot[reference:14]. On large fact tables, always tackle distribution keys and SQL patterns before reaching for Hash indexes.

7. In High‑Concurrency Scenarios, Dialing Back Parallelism Often Stabilizes Performance

Key parameters: gbase_parallel_execution (parallel enable), gbase_parallel_degree (max parallelism per SQL), gbase_parallel_max_thread_in_pool (thread pool size)[reference:15].

Higher concurrency does not automatically mean higher performance. When CPU, disk, and memory are finite, too much parallelism results in elevated CPU sys time, 100% disk busy with low throughput, and swapping.

Low concurrency, few heavy queries: raise gbase_parallel_degree (e.g., 16).
High peak concurrency, many batch tasks: lower single‑SQL parallelism (e.g., 4) and cap the thread pool (e.g., 64) so that more tasks can make progress and the pool isn't drained by a few large queries[reference:16].

8. A Complete Troubleshooting Flow

Confirm it's truly a regression — compare historical runtimes and data‑volume trends.
Rule out waiting — SHOW PROCESSLIST, gcadmin showlock.
Inspect the execution plan — watch for REDIST, GATHER, BROADCAST and mismatched JOIN column types.
Revisit table design — do the distribution keys align with the core JOIN / GROUP paths? Should small tables be replicated?
Tune parameters — gbase_parallel_degree, thread pool, memory relative to concurrency.
Evaluate indexes last — is this genuinely a single‑table exact‑match problem? Can you afford the maintenance cost?

At its heart, performance tuning in a gbase database is about minimizing unnecessary data movement. Keep data on the node where the work happens, aggregate early, and don't let parallelism cannibalize itself. Build these habits into your model design and job scheduling, and you'll stop a large share of slow queries before they ever reach production.