Introduction
MySQL deep pagination is a common performance bottleneck in real-world development, particularly evident when paginating through large datasets. Many developers habitually use the LIMIT offset, size
syntax to implement pagination. However, when querying large page numbers or accessing data located far back in the result set, this straightforward approach can lead to severe performance issues. Excessively large offsets force the database to scan and discard vast amounts of data, resulting in drastically increased response times and excessive system resource consumption. This article delves into the root causes of deep pagination problems, validates an efficient solution—Deferred Join—through comprehensive experiments, and compares performance differences before and after optimization, offering developers a practical approach to enhancement.
1. Experimental Environment and Data Preparation
1.1 Hardware Configuration
Component | Specification |
---|---|
CPU | 4-core |
Memory | 8GB |
Storage | 1TB SSD |
MySQL Version | 5.7.24 |
1.2 Table Structure Design
CREATE TABLE `access_logs` (
`log_id` bigint(20) NOT NULL AUTO_INCREMENT,
`user_id` bigint(20) DEFAULT NULL,
`request_url` varchar(500) DEFAULT NULL,
`http_method` varchar(10) DEFAULT NULL,
`response_code` int(11) DEFAULT NULL,
`response_time_ms` int(11) DEFAULT NULL,
`ip_address` varchar(50) DEFAULT NULL,
`user_agent` varchar(200) DEFAULT NULL,
`referrer` varchar(200) DEFAULT NULL,
`created_at` datetime DEFAULT CURRENT_TIMESTAMP,
PRIMARY KEY (`log_id`),
KEY `idx_time` (`created_at`)
) ENGINE=InnoDB AUTO_INCREMENT=2000001 DEFAULT CHARSET=UTF8MB4
2. Data Generation Method
Generated 2 million test records, sufficient for deep pagination testing.
DELIMITER $$
CREATE PROCEDURE populate_test_data(IN log_count INT)
BEGIN
DECLARE i INT DEFAULT 1;
SET i = 1;
WHILE i <= log_count DO
INSERT INTO access_logs(user_id, request_url, http_method, response_code, response_time_ms, ip_address, user_agent, referrer)
VALUES (
FLOOR(1 + RAND()*user_count),
CONCAT('/api/v1/resource/', FLOOR(RAND()*1000)),
ELT(FLOOR(1 + RAND()*4), 'GET','POST','PUT','DELETE'),
ELT(FLOOR(1 + RAND()*5), 200, 200, 200, 404, 500),
FLOOR(RAND()*3000),
CONCAT('192.168.', FLOOR(RAND()*255), '.', FLOOR(RAND()*255)),
ELT(FLOOR(1 + RAND()*4), 'Chrome','Safari','Firefox','Edge'),
ELT(FLOOR(1 + RAND()*4), 'google.com','bing.com','baidu.com','direct')
);
SET i = i + 1;
END WHILE;
END$$
DELIMITER ;
CALL populate_test_data(2000000);
3. Performance Comparison Experiment
3.1 Original SQL
SELECT
log_id,
user_id,
request_url,
ip_address,
created_at
FROM
access_logs
ORDER BY
created_at
LIMIT
1800000, 10;
3.2 Optimized SQL
We attempted to rewrite using SQLFlash.
Rewritten SQL Query:
SELECT t1.log_id, t1.user_id, t1.request_url, t1.ip_address, t1.created_at
FROM access_logs t1
INNER JOIN (
SELECT log_id
FROM access_logs
ORDER BY created_at
LIMIT 1800000, 10
) t2 ON t1.log_id = t2.log_id
ORDER BY t1.created_at;
View Detailed Optimization Report
3.3 Performance Metrics Comparison
Metric | Original SQL | Optimized SQL | Improvement |
---|---|---|---|
Execution Time | 2.34s | 0.46s | 80.3% faster |
Rows Scanned | 1,800,010 | 20 | 99.998% reduction |
Back-to-Table Operations | 1,800,010 | 10 | 99.999% reduction |
3.4 SQLFlash Analysis
Compared to the original SQL, the rewritten SQL achieves optimization through covering indexes and split queries:
- Reduced Back-to-Table Operations: Subquery scans index without back-to-table, reducing from 1,800,010 to 10 operations
- Smaller Physical Scan Range: Higher storage density of index entries significantly reduces scanned pages
- Optimized Execution Order: Filters primary keys first then retrieves complete data, avoiding redundant data transfer
4. Execution Plan Deep Analysis
4.1 Original SQL Execution Plan
+----+-------------+-------------+------------+------+---------------+------+---------+------+---------+----------+----------------+
| id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra |
+----+-------------+-------------+------------+------+---------------+------+---------+------+---------+----------+----------------+
| 1 | SIMPLE | access_logs | NULL | ALL | NULL | NULL | NULL | NULL | 1987219 | 100.00 | Using filesort |
+----+-------------+-------------+------------+------+---------------+------+---------+------+---------+----------+----------------+
Analysis: The original SQL performs full table scan with in-memory filesort, processing nearly 2 million rows.
4.2 Optimized SQL Execution Plan
+----+-------------+-------------+------------+--------+---------------+----------+---------+-----------+---------+----------+-------------+
| id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra |
+----+-------------+-------------+------------+--------+---------------+----------+---------+-----------+---------+----------+-------------+
| 1 | PRIMARY | <derived2> | NULL | ALL | NULL | NULL | NULL | NULL | 1800010 | 100.00 | NULL |
| 1 | PRIMARY | t1 | NULL | eq_ref | PRIMARY | PRIMARY | 8 | t2.log_id | 1 | 100.00 | NULL |
| 2 | DERIVED | access_logs | NULL | index | NULL | idx_time | 6 | NULL | 1800010 | 100.00 | Using index |
+----+-------------+-------------+------------+--------+---------------+----------+---------+-----------+---------+----------+-------------+
Analysis: The optimized execution plan uses index scanning for primary key retrieval followed by efficient primary key matching.
5. Conclusion
Through comparative analysis, we confirm that Deferred Join is an efficient solution for optimizing MySQL deep pagination queries. This approach splits the query into two phases: first uses covering indexes to quickly locate required primary key IDs, then obtains complete row data through primary key association.
Core Advantages:
- Drastically reduced back-to-table operations: From 1.8 million to just 10 operations (99.99% reduction)
- Lower physical I/O overhead: Significantly reduces disk scan range using high-density index data
- Eliminated full table scans: Utilizes index ordering to avoid expensive filesort operations
- Strong versatility: Applicable to any deep pagination scenario with indexed sort fields
The Deferred Join method offers low modification cost and wide applicability, making it the preferred solution for MySQL deep pagination performance issues. For practical applications, we recommend implementing this optimization for pagination queries exceeding 100 pages.
Additionally, consider combining with other optimization strategies such as business-level maximum pagination depth limits and cursor-based pagination to build more comprehensive pagination solutions.
Top comments (0)