SQLFlash

Posted on Sep 24

5 Steps to Optimize MySQL Deep Pagination

#sql #mysql

Introduction

MySQL deep pagination is a common performance bottleneck in real-world development, particularly evident when paginating through large datasets. Many developers habitually use the LIMIT offset, size syntax to implement pagination. However, when querying large page numbers or accessing data located far back in the result set, this straightforward approach can lead to severe performance issues. Excessively large offsets force the database to scan and discard vast amounts of data, resulting in drastically increased response times and excessive system resource consumption. This article delves into the root causes of deep pagination problems, validates an efficient solution—Deferred Join—through comprehensive experiments, and compares performance differences before and after optimization, offering developers a practical approach to enhancement.

1. Experimental Environment and Data Preparation

1.1 Hardware Configuration

Component	Specification
CPU	4-core
Memory	8GB
Storage	1TB SSD
MySQL Version	5.7.24

1.2 Table Structure Design

CREATE TABLE `access_logs` (
`log_id` bigint(20) NOT NULL AUTO_INCREMENT,
`user_id` bigint(20) DEFAULT NULL,
`request_url` varchar(500) DEFAULT NULL,
`http_method` varchar(10) DEFAULT NULL,
`response_code` int(11) DEFAULT NULL,
`response_time_ms` int(11) DEFAULT NULL,
`ip_address` varchar(50) DEFAULT NULL,
`user_agent` varchar(200) DEFAULT NULL,
`referrer` varchar(200) DEFAULT NULL,
`created_at` datetime DEFAULT CURRENT_TIMESTAMP,
PRIMARY KEY (`log_id`),
KEY `idx_time` (`created_at`)
) ENGINE=InnoDB AUTO_INCREMENT=2000001 DEFAULT CHARSET=UTF8MB4

2. Data Generation Method

Generated 2 million test records, sufficient for deep pagination testing.

DELIMITER $$

CREATE PROCEDURE populate_test_data(IN log_count INT)
BEGIN
    DECLARE i INT DEFAULT 1;

    SET i = 1;
    WHILE i <= log_count DO
        INSERT INTO access_logs(user_id, request_url, http_method, response_code, response_time_ms, ip_address, user_agent, referrer)
        VALUES (
            FLOOR(1 + RAND()*user_count),
            CONCAT('/api/v1/resource/', FLOOR(RAND()*1000)),
            ELT(FLOOR(1 + RAND()*4), 'GET','POST','PUT','DELETE'),
            ELT(FLOOR(1 + RAND()*5), 200, 200, 200, 404, 500), 
            FLOOR(RAND()*3000),
            CONCAT('192.168.', FLOOR(RAND()*255), '.', FLOOR(RAND()*255)),
            ELT(FLOOR(1 + RAND()*4), 'Chrome','Safari','Firefox','Edge'),
            ELT(FLOOR(1 + RAND()*4), 'google.com','bing.com','baidu.com','direct')
        );
        SET i = i + 1;
    END WHILE;

END$$
DELIMITER ;

CALL populate_test_data(2000000);

3. Performance Comparison Experiment

3.1 Original SQL

SELECT
  log_id,
  user_id,
  request_url,
  ip_address,
  created_at
FROM
  access_logs
ORDER BY
  created_at
LIMIT
  1800000, 10;

3.2 Optimized SQL

We attempted to rewrite using SQLFlash.

Rewritten SQL Query:

SELECT t1.log_id, t1.user_id, t1.request_url, t1.ip_address, t1.created_at
FROM access_logs t1
INNER JOIN (
  SELECT log_id
  FROM access_logs
  ORDER BY created_at
  LIMIT 1800000, 10
) t2 ON t1.log_id = t2.log_id
ORDER BY t1.created_at;

View Detailed Optimization Report

3.3 Performance Metrics Comparison

Metric	Original SQL	Optimized SQL	Improvement
Execution Time	2.34s	0.46s	80.3% faster
Rows Scanned	1,800,010	20	99.998% reduction
Back-to-Table Operations	1,800,010	10	99.999% reduction

3.4 SQLFlash Analysis

Compared to the original SQL, the rewritten SQL achieves optimization through covering indexes and split queries:

Reduced Back-to-Table Operations: Subquery scans index without back-to-table, reducing from 1,800,010 to 10 operations
Smaller Physical Scan Range: Higher storage density of index entries significantly reduces scanned pages
Optimized Execution Order: Filters primary keys first then retrieves complete data, avoiding redundant data transfer

4. Execution Plan Deep Analysis

4.1 Original SQL Execution Plan

+----+-------------+-------------+------------+------+---------------+------+---------+------+---------+----------+----------------+
| id | select_type | table       | partitions | type | possible_keys | key  | key_len | ref  | rows    | filtered | Extra          |
+----+-------------+-------------+------------+------+---------------+------+---------+------+---------+----------+----------------+
|  1 | SIMPLE      | access_logs | NULL       | ALL  | NULL          | NULL | NULL    | NULL | 1987219 |   100.00 | Using filesort |
+----+-------------+-------------+------------+------+---------------+------+---------+------+---------+----------+----------------+

Analysis: The original SQL performs full table scan with in-memory filesort, processing nearly 2 million rows.

4.2 Optimized SQL Execution Plan

+----+-------------+-------------+------------+--------+---------------+----------+---------+-----------+---------+----------+-------------+
| id | select_type | table       | partitions | type   | possible_keys | key      | key_len | ref       | rows    | filtered | Extra       |
+----+-------------+-------------+------------+--------+---------------+----------+---------+-----------+---------+----------+-------------+
|  1 | PRIMARY     | <derived2>  | NULL       | ALL    | NULL          | NULL     | NULL    | NULL      | 1800010 |   100.00 | NULL        |
|  1 | PRIMARY     | t1          | NULL       | eq_ref | PRIMARY       | PRIMARY  | 8       | t2.log_id |       1 |   100.00 | NULL        |
|  2 | DERIVED     | access_logs | NULL       | index  | NULL          | idx_time | 6       | NULL      | 1800010 |   100.00 | Using index |
+----+-------------+-------------+------------+--------+---------------+----------+---------+-----------+---------+----------+-------------+

Analysis: The optimized execution plan uses index scanning for primary key retrieval followed by efficient primary key matching.

5. Conclusion

Through comparative analysis, we confirm that Deferred Join is an efficient solution for optimizing MySQL deep pagination queries. This approach splits the query into two phases: first uses covering indexes to quickly locate required primary key IDs, then obtains complete row data through primary key association.

Core Advantages:

Drastically reduced back-to-table operations: From 1.8 million to just 10 operations (99.99% reduction)
Lower physical I/O overhead: Significantly reduces disk scan range using high-density index data
Eliminated full table scans: Utilizes index ordering to avoid expensive filesort operations
Strong versatility: Applicable to any deep pagination scenario with indexed sort fields

The Deferred Join method offers low modification cost and wide applicability, making it the preferred solution for MySQL deep pagination performance issues. For practical applications, we recommend implementing this optimization for pagination queries exceeding 100 pages.

Additionally, consider combining with other optimization strategies such as business-level maximum pagination depth limits and cursor-based pagination to build more comprehensive pagination solutions.

DEV Community