One seemingly innocent parameter adjustment — increasing group_concat_max_len to accommodate a business requirement — caused a cascade of performance degradation across a GBase 8a production cluster. A simple TOP‑N query that normally completed in seconds suddenly ran for over three hours, and multiple other queries on the same node slowed to a crawl. This article reconstructs the full investigation, from identifying the bottlenecked node to uncovering the hidden chain that turned a 200,000‑row sort into a 10 TB disk write storm.
1. Symptom: One Node’s I/O Pegged at 100%
Monitoring showed several queries exceeding 10,000 seconds of execution time. Cross‑referencing the coordinator‑level task view and the data‑node task view revealed that all slow queries were pinned to node3.
-- Coordinators
SELECT COORDINATOR_NAME, ID, user, host, command, start_time, time, state,
substring(info,0,100) info
FROM information_schema.COORDINATORS_TASK_INFORMATION
WHERE command='query' AND time >=0
ORDER BY time DESC LIMIT 10;
-- Data nodes – all problematic queries on node3
SELECT NODE_NAME, ID, user, host, command, start_time, time, state,
substring(info,0,100) info
FROM information_schema.GNODES_TASK_INFORMATION
WHERE command='query' AND info is not null
AND info not like '%information_schema.processlist%'
ORDER BY time DESC LIMIT 10;
On node3, the iostat output showed disk utilisation at a flat 100%, with write rates hitting 900 MB/s. OS monitoring logs confirmed the spike started exactly when the slow queries began. Digging into the database temporary directory, we found thousands of files with a total size exceeding 10 TB.
2. Pinpointing the Culprit Query and Intermediate Results
The slowest query followed a pattern of three subqueries LEFT JOIN-ed together, with an outer ORDER BY … LIMIT 1000.
select xxxx
from (...) a
left join (...) b on ...
left join (...) c on ...
order by xxx
limit 1000
To rule out a Cartesian product, we materialised each subquery into a temporary table:
- Subquery a: grouped 2 billion rows → 200,000 rows
- Subquery b: distinct on a dimension → 2,000 rows
- Subquery c: two‑table LEFT JOIN + group by → about 20,000 rows
Simplified SELECT COUNT(*) tests confirmed the joins produced exactly 200,000 rows, each completing in under 10 seconds. A sorted output of 200,000 rows with a LIMIT should never require terabytes of temp space — so something else was at play.
3. Root Cause: A Parameter Setting Inflated Column Width, Then Disk Usage
3.1 Find the “Heavy” Column
When we replaced the COUNT(*) with the original projection columns one by one, one column — originating from subquery c — caused the query to stall immediately. Inspecting the structure of the temporary table for subquery c revealed its data type: LONGTEXT.
3.2 Why LONGTEXT?
The original expression for that column was group_concat(xxx). The cluster‑level parameter group_concat_max_len had been changed from the default 32 KB to 1 MB to satisfy another business module.
show variables like '%group_concat_max_len%';
-- returned 1048576 (1 MB)
When GBase 8a creates an intermediate table (e.g., CREATE TABLE tmp AS SELECT …), it must determine the column width before executing the query. Because the parameter was set to 1 MB — far exceeding VARCHAR’s maximum 32 KB — the optimiser conservatively typed the intermediate column as LONGTEXT.
3.3 The Disk Sort Disaster
In version 8.6.2, a sort operation materialises all projection columns. For a LONGTEXT column, the engine pre‑allocates memory based on the maximum possible length of 64 MB per row. With 200,000 rows, that equates to 200,000 × 64 MB ≈ 12.2 TB. Memory cannot hold that, so the data is spilled to disk, producing the observed 10 TB+ of temporary sort files on node3, sustaining >900 MB/s writes for hours.
3.4 Why Only Node3?
The main table was randomly distributed. The query’s GROUP BY columns were clttime (low cardinality) and cell_id (high cardinality). During hash redistribution, the first column was chosen as the distribution key, concentrating all intermediate data on a single node. Placing cell_id first or enabling multi‑column hash redistribution would avoid such skew.
4. Solutions
Option 1: Rewrite the SQL (Immediate Fix)
Wrap the group_concat call with a substr to cap the expected output width. The optimiser will then type the intermediate column as VARCHAR, eliminating the pre‑allocation problem entirely.
substr(group_concat(xxx), 0, 1000)
The customer applied this change; the same query finished within 30 seconds.
Option 2: Use a Hint to Override the Parameter Per‑Query
GBase 8a supports hints that temporarily set session‑level parameters for a single statement.
select /*+group_concat_max_len(3000)*/ ...
Option 3: Upgrade to Version 9.5
Version 9.5 improves the materialisation strategy — memory is no longer allocated based on the maximum theoretical column size, but adaptively based on actual data, preventing this entire class of problems.
5. Key Takeaways
A global parameter adjustment can trigger a hidden cascade: “parameter → column‑width estimation → materialisation pre‑allocation → massive disk spill → node‑wide I/O starvation.” When a single parameter cannot satisfy all workloads, use statement‑level hints to give critical queries their own safe configuration, rather than applying a global value that may silently cripple other operations. In a gbase database, understanding how the optimiser interprets parameters is just as important as tuning the parameters themselves.
Top comments (0)