SQLFlash

Posted on Sep 3

HAVING Condition Pushdown: Optimizing Query Performance

#mysql #sql

Introduction

In modern data-driven applications, efficient data querying stands as one of the core challenges in ensuring system performance. This is especially true when dealing with massive amounts of data: even seemingly minor differences in SQL syntax can lead to exponentially increased resource consumption or even trigger cascading system bottlenecks. Take common grouped aggregation queries as an example—developers often rely on the HAVING clause to filter aggregated results. However, when filtering conditions do not depend on aggregate functions (e.g., filtering directly based on grouped columns), this approach may harbor significant performance pitfalls. Through performance comparison tests and execution plan analysis, this article fully demonstrates the optimization benefits of HAVING condition pushdown.

I. Test Data Generation (Generating 5 Million Records in MySQL)

1. Create a Test Table

-- Create the employees table  
CREATE TABLE employees (  
    id INT AUTO_INCREMENT PRIMARY KEY,  
    department_id INT NOT NULL, -- Department ID (randomly distributed between 1-20)  
    name VARCHAR(8) NOT NULL, -- Employee Name (random 8-character string)  
    age INT NOT NULL -- Employee Age (random value between 20-60)  
) ENGINE=InnoDB;

2. Bulk Insert 5 Million Records

-- Enable recursive CTE and adjust the generation limit  
SET SESSION cte_max_recursion_depth = 1000000;  

-- Generate 5 million records  

INSERT INTO employees (department_id, name, age)  
WITH RECURSIVE seq AS (  
    SELECT 0 AS n  
    UNION ALL  
    SELECT n + 1 FROM seq WHERE n < 4999999  
)  
SELECT  
    FLOOR(RAND(n) * 20) + 1 AS department_id,  
    SUBSTRING(MD5(RAND(n)), 1, 8) AS name,  
    FLOOR(RAND(n + 100000) * 41) + 20 AS age  
FROM seq;

3. Create Indexes (Optimize Query Performance)

-- Create an index on department_id  
CREATE INDEX idx_department ON employees(department_id);

II. Performance Benchmarking & Optimization

1. Original SQL Query

SELECT department_id, COUNT(*) AS employee_count  
FROM employees  
GROUP BY department_id  
HAVING department_id > 5;

2. Optimized SQL Query

Rewriting Strategy with SQLFlash Insights:

The rewritten SQL is as follows:

SELECT department_id, COUNT(*) AS employee_count 
FROM employees 
WHERE department_id > 5 GROUP BY department_id;

View detailed report: https://sqlflash.ai/app/sqlResult?shareid=dfe1a9dc-b583-4f1c-aa71-9d2ca41cd40b

III. Performance Benchmarking

1. Test Environment

Database Version: MySQL 8.0.18
Hardware Configuration: 4-core CPU / 8GB RAM / SSD storage
Dataset: 5 million records across 20 departments (approximately 250,000 records per department)

2. Test Results

Metric	Traditional Method (HAVING)	Optimized Method (WHERE)	Performance Improvement
Execution Time	0.58 seconds	0.48 seconds	17%
Rows Scanned	5 million rows	3.75 million rows	25%

III. Performance Analysis

1. SQLFlash Analysis

Compared with the original SQL, the rewritten SQL is logically clearer. By moving the filtering to the WHERE clause, it avoids grouping and aggregating operations on the large number of rows where department_id ≤ 5, thereby reducing unnecessary processing and improving overall computational efficiency. Without changing the meaning of the result set, the new execution plan can filter out non-matching rows earlier, optimizing resource consumption and significantly reducing query overhead. Meanwhile, the structure of the new SQL is more concise and readable.

2. Performance Bottlenecks of the Traditional Writing Style

Full table scan grouping: All 5 million rows of data participate in grouping calculations, generating grouping results for 20 departments.
Redundant filtering operations: Departments with department_id ≤ 5 are filtered out only after grouping is completed, wasting computational resources.

3. Core Advantages of the Optimized Writing Style

Index-accelerated filtering: Directly locating valid data (departments 6–20) via WHERE department_id > 5, with only 3.75 million rows entering the grouping phase.
Reduced temporary table computation: MySQL creates temporary tables by default when using GROUP BY. Reducing the data volume lowers memory and disk overhead.

4. Execution Plan Comparison (Using EXPLAIN)

Original SQL Execution Plan:

+----+-------------+-----------+------------+-------+----------------+----------------+---------+------+---------+----------+-------------+
| id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra |
+----+-------------+-----------+------------+-------+----------------+----------------+---------+------+---------+----------+-------------+
| 1 | SIMPLE | employees | NULL | index | idx_department | idx_department | 4 | NULL | 4987392 | 100.00 | Using index |
+----+-------------+-----------+------------+-------+----------------+----------------+---------+------+---------+----------+-------------+

Optimized SQL Execution Plan:

+----+-------------+-----------+------------+-------+----------------+----------------+---------+------+---------+----------+--------------------------+
| id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra |
+----+-------------+-----------+------------+-------+----------------+----------------+---------+------+---------+----------+--------------------------+
| 1 | SIMPLE | employees | NULL | range | idx_department | idx_department | 4 | NULL | 2493696 | 100.00 | Using where; Using index |
+----+-------------+-----------+------------+-------+----------------+----------------+---------+------+---------+----------+--------------------------+

V. Summary

Through HAVING condition pushing optimization combined with index acceleration, the following can be achieved:

17% performance improvement: Execution time decreased from 0.58 seconds to 0.48 seconds.
Efficient resource utilization: Reduced temporary table memory and disk usage.
Clear and maintainable code: Filter logic is front-loaded, lowering complexity for subsequent extensions.

Final recommendation: When handling large-data grouping in MySQL, prioritize moving non-aggregate conditions to WHERE and creating necessary indexes.

DEV Community