DEV Community

Cover image for 5 Steps to Optimize MySQL Deep Pagination
SQLFlash
SQLFlash

Posted on

5 Steps to Optimize MySQL Deep Pagination

Introduction

MySQL deep pagination is a common performance bottleneck in real-world development, particularly evident when paginating through large datasets. Many developers habitually use the LIMIT offset, size syntax to implement pagination. However, when querying large page numbers or accessing data located far back in the result set, this straightforward approach can lead to severe performance issues. Excessively large offsets force the database to scan and discard vast amounts of data, resulting in drastically increased response times and excessive system resource consumption. This article delves into the root causes of deep pagination problems, validates an efficient solution—Deferred Join—through comprehensive experiments, and compares performance differences before and after optimization, offering developers a practical approach to enhancement.

1. Experimental Environment and Data Preparation

1.1 Hardware Configuration

Component Specification
CPU 4-core
Memory 8GB
Storage 1TB SSD
MySQL Version 5.7.24

1.2 Table Structure Design

CREATE TABLE `access_logs` (
`log_id` bigint(20) NOT NULL AUTO_INCREMENT,
`user_id` bigint(20) DEFAULT NULL,
`request_url` varchar(500) DEFAULT NULL,
`http_method` varchar(10) DEFAULT NULL,
`response_code` int(11) DEFAULT NULL,
`response_time_ms` int(11) DEFAULT NULL,
`ip_address` varchar(50) DEFAULT NULL,
`user_agent` varchar(200) DEFAULT NULL,
`referrer` varchar(200) DEFAULT NULL,
`created_at` datetime DEFAULT CURRENT_TIMESTAMP,
PRIMARY KEY (`log_id`),
KEY `idx_time` (`created_at`)
) ENGINE=InnoDB AUTO_INCREMENT=2000001 DEFAULT CHARSET=UTF8MB4
Enter fullscreen mode Exit fullscreen mode

2. Data Generation Method

Generated 2 million test records, sufficient for deep pagination testing.

DELIMITER $$

CREATE PROCEDURE populate_test_data(IN log_count INT)
BEGIN
    DECLARE i INT DEFAULT 1;

    SET i = 1;
    WHILE i <= log_count DO
        INSERT INTO access_logs(user_id, request_url, http_method, response_code, response_time_ms, ip_address, user_agent, referrer)
        VALUES (
            FLOOR(1 + RAND()*user_count),
            CONCAT('/api/v1/resource/', FLOOR(RAND()*1000)),
            ELT(FLOOR(1 + RAND()*4), 'GET','POST','PUT','DELETE'),
            ELT(FLOOR(1 + RAND()*5), 200, 200, 200, 404, 500), 
            FLOOR(RAND()*3000),
            CONCAT('192.168.', FLOOR(RAND()*255), '.', FLOOR(RAND()*255)),
            ELT(FLOOR(1 + RAND()*4), 'Chrome','Safari','Firefox','Edge'),
            ELT(FLOOR(1 + RAND()*4), 'google.com','bing.com','baidu.com','direct')
        );
        SET i = i + 1;
    END WHILE;

END$$
DELIMITER ;

CALL populate_test_data(2000000);

Enter fullscreen mode Exit fullscreen mode

3. Performance Comparison Experiment

3.1 Original SQL

SELECT
  log_id,
  user_id,
  request_url,
  ip_address,
  created_at
FROM
  access_logs
ORDER BY
  created_at
LIMIT
  1800000, 10;
Enter fullscreen mode Exit fullscreen mode

3.2 Optimized SQL

We attempted to rewrite using SQLFlash.

SQLFlash Demo

Rewritten SQL Query:

SELECT t1.log_id, t1.user_id, t1.request_url, t1.ip_address, t1.created_at
FROM access_logs t1
INNER JOIN (
  SELECT log_id
  FROM access_logs
  ORDER BY created_at
  LIMIT 1800000, 10
) t2 ON t1.log_id = t2.log_id
ORDER BY t1.created_at;
Enter fullscreen mode Exit fullscreen mode

View Detailed Optimization Report

3.3 Performance Metrics Comparison

Metric Original SQL Optimized SQL Improvement
Execution Time 2.34s 0.46s 80.3% faster
Rows Scanned 1,800,010 20 99.998% reduction
Back-to-Table Operations 1,800,010 10 99.999% reduction

3.4 SQLFlash Analysis

Compared to the original SQL, the rewritten SQL achieves optimization through covering indexes and split queries:

  1. Reduced Back-to-Table Operations: Subquery scans index without back-to-table, reducing from 1,800,010 to 10 operations
  2. Smaller Physical Scan Range: Higher storage density of index entries significantly reduces scanned pages
  3. Optimized Execution Order: Filters primary keys first then retrieves complete data, avoiding redundant data transfer

4. Execution Plan Deep Analysis

4.1 Original SQL Execution Plan

+----+-------------+-------------+------------+------+---------------+------+---------+------+---------+----------+----------------+
| id | select_type | table       | partitions | type | possible_keys | key  | key_len | ref  | rows    | filtered | Extra          |
+----+-------------+-------------+------------+------+---------------+------+---------+------+---------+----------+----------------+
|  1 | SIMPLE      | access_logs | NULL       | ALL  | NULL          | NULL | NULL    | NULL | 1987219 |   100.00 | Using filesort |
+----+-------------+-------------+------------+------+---------------+------+---------+------+---------+----------+----------------+
Enter fullscreen mode Exit fullscreen mode

Analysis: The original SQL performs full table scan with in-memory filesort, processing nearly 2 million rows.

4.2 Optimized SQL Execution Plan

+----+-------------+-------------+------------+--------+---------------+----------+---------+-----------+---------+----------+-------------+
| id | select_type | table       | partitions | type   | possible_keys | key      | key_len | ref       | rows    | filtered | Extra       |
+----+-------------+-------------+------------+--------+---------------+----------+---------+-----------+---------+----------+-------------+
|  1 | PRIMARY     | <derived2>  | NULL       | ALL    | NULL          | NULL     | NULL    | NULL      | 1800010 |   100.00 | NULL        |
|  1 | PRIMARY     | t1          | NULL       | eq_ref | PRIMARY       | PRIMARY  | 8       | t2.log_id |       1 |   100.00 | NULL        |
|  2 | DERIVED     | access_logs | NULL       | index  | NULL          | idx_time | 6       | NULL      | 1800010 |   100.00 | Using index |
+----+-------------+-------------+------------+--------+---------------+----------+---------+-----------+---------+----------+-------------+
Enter fullscreen mode Exit fullscreen mode

Analysis: The optimized execution plan uses index scanning for primary key retrieval followed by efficient primary key matching.

5. Conclusion

Through comparative analysis, we confirm that Deferred Join is an efficient solution for optimizing MySQL deep pagination queries. This approach splits the query into two phases: first uses covering indexes to quickly locate required primary key IDs, then obtains complete row data through primary key association.

Core Advantages:

  1. Drastically reduced back-to-table operations: From 1.8 million to just 10 operations (99.99% reduction)
  2. Lower physical I/O overhead: Significantly reduces disk scan range using high-density index data
  3. Eliminated full table scans: Utilizes index ordering to avoid expensive filesort operations
  4. Strong versatility: Applicable to any deep pagination scenario with indexed sort fields

The Deferred Join method offers low modification cost and wide applicability, making it the preferred solution for MySQL deep pagination performance issues. For practical applications, we recommend implementing this optimization for pagination queries exceeding 100 pages.

Additionally, consider combining with other optimization strategies such as business-level maximum pagination depth limits and cursor-based pagination to build more comprehensive pagination solutions.

Top comments (0)