Viktor Dorotovič

Posted on Oct 11, 2022 • Originally published at toomanyafterthoughts.com

Partitioning InnoDB tables by time-based pseudo-sequential UUIDs

#database #mysql #postgres #architecture

Partitioning has multiple uses – spreading load onto multiple disks, cold storage of older data on cheaper disks, and probably others. Most importantly though, partitions are not for performance.

The main use case I’m going to explain is time-based partitioning as a tool for limiting the scope of stored data. This could be required to comply with a data retention policy or simply to save money on disk space.

A straightforward approach would be periodically issuing DELETE queries to get rid of rows that are older than a certain time period:

DELETE FROM temperature WHERE time_of_measurement < '2020-01-01';

This incurs a lot of I/O though. Each row has to be found, then removed from the page, while also updating each index pointing to the row.

If the table was range-partitioned by a time column, whole chunks of rows could be removed in one go:

ALTER TABLE temperature DROP PARTITION p_2019_11, p_2019_12;

Dropping a partition is much faster than deleting individual rows. Each partition is essentially a separate table, so when it is dropped, the space occupied by the table can just be marked as unused, instead of updating data structures row-by-row.

Case in point

As an example, I’m going to use a toy database for storing information about workshops and participants in IT conferences. There are going to be 2 entity classes, with a many-to-many relationship between them:

Workshop
Participant

A workshop can be attended by multiple participants, and a participant can attend multiple workshops. For simplicity, the fact that a participant is not able to attend two workshops at the same time will not be checked.

There are different data retention requirements – workshops will be stored for 5 years for historical reference, while participant records can be deleted after 2 years.

Apart from listing the individual members of each entity class, there are 2 read operations that the setup will need to support:

Get all participants attending a given workshop
Get all workshops that a given participant has signed up for

I ran the first attempt at defining the table schema:

CREATE TABLE workshops (
  workshop_id BIGINT NOT NULL AUTO_INCREMENT PRIMARY KEY,
  name VARCHAR(255) NOT NULL,
  time_scheduled DATETIME NOT NULL
)
PARTITION BY RANGE COLUMNS(time_scheduled) (
  PARTITION p_default VALUES LESS THAN MAXVALUE
);

CREATE TABLE participants (
  participant_id BIGINT NOT NULL AUTO_INCREMENT PRIMARY KEY,
  name VARCHAR(255) NOT NULL,
  time_created DATETIME NOT NULL
)
PARTITION BY RANGE COLUMNS(time_created) (
  PARTITION p_default VALUES LESS THAN MAXVALUE
);

CREATE TABLE workshop_participants (
  workshop_participant_id BIGINT NOT NULL AUTO_INCREMENT PRIMARY KEY,
  workshop_id BIGINT NOT NULL,
  participant_id BIGINT NOT NULL,  
  time_created DATETIME NOT NULL
)
PARTITION BY RANGE COLUMNS(time_created) (
  PARTITION p_default VALUES LESS THAN MAXVALUE
);

Uh-oh, MySQL is throwing an error:

A PRIMARY KEY must include all columns in the table's partitioning function (prefixed columns are not considered).

Apparently, this limitation applies to any unique constraint (index), not just primary keys, as described in MySQL’s documentation: Partitioning Keys, Primary Keys, and Unique Keys. I have not found any resource explaining why this is necessary though. PostgreSQL has the same restriction and explains it as follows: “This limitation exists because the individual indexes making up the constraint can only directly enforce uniqueness within their own partitions; therefore, the partition structure itself must guarantee that there are no duplicates in different partitions.“

Working example #1, but not particularly good

To get around this limitation, the primary keys can be defined as compound keys over the autoincremented ID and the time_created field:

CREATE TABLE workshops (
  workshop_id BIGINT NOT NULL AUTO_INCREMENT,
  name VARCHAR(255) NOT NULL,
  time_scheduled DATETIME NOT NULL,
  PRIMARY KEY (workshop_id, time_scheduled)
)
PARTITION BY RANGE COLUMNS(time_scheduled) (
  PARTITION p_future VALUES LESS THAN MAXVALUE
);

CREATE TABLE participants (
  participant_id BIGINT NOT NULL AUTO_INCREMENT,
  name VARCHAR(255) NOT NULL,
  time_created DATETIME NOT NULL,
  PRIMARY KEY (participant_id, time_created)
)
PARTITION BY RANGE COLUMNS(time_created) (
  PARTITION p_future VALUES LESS THAN MAXVALUE
);

CREATE TABLE workshop_participants (
  workshop_participant_id BIGINT NOT NULL AUTO_INCREMENT,
  workshop_id BIGINT NOT NULL,
  participant_id BIGINT NOT NULL,  
  time_created DATETIME NOT NULL,
  PRIMARY KEY (workshop_participant_id, time_created),
  INDEX wp_i_participant_id (participant_id),
  INDEX wp_i_workshop_id (workshop_id)
)
PARTITION BY RANGE COLUMNS(time_created) (
  PARTITION p_future VALUES LESS THAN MAXVALUE
);

The 2 required read operations will be served by these SELECT queries:

-- Get all participants attending a given workshop
SELECT p.participant_id, p.name, p.time_created 
FROM workshop_participants wp
JOIN participants p ON p.participant_id = wp.participant_id
WHERE wp.workshop_id = :workshop_id: ;


-- Get all workshops that a given participant has signed up for
SELECT w.workshop_id, w.name, w.time_scheduled
FROM workshop_participants wp
JOIN workshops w ON w.workshop_id = wp.workshop_id
WHERE wp.participant_id = :participant_id: ;

There are a couple of problems with this approach. Firstly, workshop_id and participant_id are being used as if they were unique, which are not actually guaranteed to be – it’s only guaranteed if the autoincrement default value is not overridden.

Secondly, partition pruning cannot be used. Partition pruning is an optimizer feature that uses existing information contained in the query to establish a subset of partitions that is guaranteed to contain all the data that’s being looked for. To put it another way, it prunes partitions that cannot contain any matching rows based on the WHERE condition.

In this case, I can validate this by creating the partitions p_2020_01, p_2020_02, adding some sample data and looking at the output of EXPLAIN:

ALTER TABLE workshops
REORGANIZE PARTITION p_future INTO (
        PARTITION p_old VALUES LESS THAN ('2020-01-01'),
        PARTITION p_2020_01 VALUES LESS THAN ('2020-02-02')
        PARTITION p_2020_02 VALUES LESS THAN ('2020-02-03')
        PARTITION p_future VALUES LESS THAN MAXVALUE
);

INSERT INTO workshops (workshop_id, name, time_scheduled) VALUES
(1, 'UUIDs - when and how to use them', '2020-01-18 10:11'),
(2, 'Partitioning InnoDB tables', '2020-01-18 11:32'),
(2, 'Denormalizing database schemas for performance', '2020-02-05 14:30');

INSERT INTO participants (participant_id, name, time_created) VALUES 
(1, 'john Doe', '2020-01-01 15:00');

INSERT INTO workshop_participants (workshop_id, participant_id, time_created) VALUES
(1, 1, '2020-01-06 10:02'),
(1, 2, '2020-01-06 10:05');

SELECT wp.workshop_id FROM workshop_participants AS wp WHERE wp.participant_id=1;

-- Output:
-- | workshop_id |
-- | ----------- |
-- | 1           |
-- | 2           |

-- for MySQL < 5.8, use EXPLAIN PARTIIONS
EXPLAIN SELECT w.workshop_id, w.name, w.time_scheduled FROM workshops AS w WHERE w.workshop_id IN (1, 2);

-- Output:
-- | id  | select_type | table | partitions                         | type  | possible_keys | key     | ...
-- | --- | ----------- | ----- | ---------------------------------- | ----- | ------------- | ------- | ...
-- | 1   | SIMPLE      | w     | p_old,p_2020_01,p_2020_02,p_future | range | PRIMARY       | PRIMARY | ...



-- Try it yourself with DB Fiddle: https://www.db-fiddle.com/f/uJbWPwS5dU9XNUCQPBMuNs/0

The EXPLAIN output shows that all of the defined partitions need to be visited when evaluating the SELECt query, even though all the fetched rows are stored in a single partition.

Note that instead of using a JOIN or a WHERE IN (SELECT ...) condition, the queries are separated into 2. The first one fetches a list of IDs from the bridging table, and the second includes a constant IN(…) clause. It’s not clear to me at the time of writing, whether partition pruning can work with dynamically obtained values during query execution in InnoDB, I’ve not seen it mentioned anywhere. PostgreSQL supports run-time pruning.

Working example #2, getting closer

The proper solution would be using the whole compound keys as foreign keys in the bridging table:

CREATE TABLE workshop_participants (
  workshop_participant_id BIGINT NOT NULL AUTO_INCREMENT,
  workshop_id BIGINT NOT NULL,
  workshop_time_scheduled DATETIME NOt NULL,
  participant_id BIGINT NOT NULL,  
  participant_time_created DATETIME NOT NULL,
  time_created DATETIME NOT NULL,
  PRIMARY KEY (workshop_participant_id, time_created),
  INDEX wp_i_workshop (workshop_id, workshop_time_scheduled),
  INDEX wp_i_participant (participant_id, participant_time_created)
)
PARTITION BY RANGE COLUMNS(time_created) (
  PARTITION p_future VALUES LESS THAN MAXVALUE
);

SELECT wp.workshop_time_scheduled, wp.workshop_id FROM workshop_participants AS wp WHERE wp.participant_id=1;

-- Output:
-- | workshop_time_scheduled | workshop_id |
-- | ----------------------- | ----------- |
-- | 2020-01-18 10:11:00     | 1           |
-- | 2020-01-18 11:32:00     | 2           |

EXPLAIN SELECT w.workshop_id, w.name, w.time_scheduled
FROM workshops AS w
WHERE (w.time_scheduled, w.workshop_id) IN (
('2020-01-18 10:11:00', 1),
('2020-01-18 11:32:00', 2)
)

-- | id  | select_type | table | partitions | type  | possible_keys | key     | key_len | ref | rows | filtered | Extra       |
-- | --- | ----------- | ----- | ---------- | ----- | ------------- | ------- | ------- | --- | ---- | -------- | ----------- |
-- | 1   | SIMPLE      | w     | p_2020_01  | range | PRIMARY       | PRIMARY | 13      |     | 2    | 100      | Using where |



--- Try it in DD Fiddle: https://www.db-fiddle.com/f/vQCCQFcqTj9dndNwPDmqVT/0

Now that the WHERE ... IN pairs start with the value for the partitioning column, the query optimizer can remove non-matching partitions from the plan. This is quite unwieldy though and may be especially problematic when combined with an object-relational mapper that may not support compound primary keys.

A better solution — using UUID7

Apart from other UUID advantages over AUTO_INCREMENT, UUID7 is prefixed by the UNIX timestamp, which means that the lexicographic ordering of UUID7 values is the same as the order in which the IDs were generated in time. To get a better picture of the structure of UUID7, take a look at my previous post about UUID performance in indices. The timestamp prefix makes it possible to define discrete ranges of UUID7 values the same way time ranges would be defined. Be aware though, that UUID version 7 is defined as an Internet Draft at the time of writing, and may change.

When comparing a time to a UUID7, equality comparisons cannot be used, because the random part of UUID7 may be any value. Only less-than or greater-than relations are useful.

I made a function that can create a UUID7 from a custom datetime object and fills in the rest of the bytes with either 0x00 or 0xff depending on whether the value is going to be used as a lower or upper boundary:

import datetime
from uuid6 import UUID, uuid7

def uuid7_time_boundary(dt, lower=True):
    timestamp_ms = int(dt.timestamp() * 1000)
    uuid_int = (timestamp_ms & 0xFFFFFFFFFFFF) << 80
    fill_byte = "00" if lower else "ff"
    uuid_int = uuid_int | int(fill_byte * 10, 16)
    # using the UUID constructor from uuid6 library, which automatically populates the version and variant fields
    return UUID(int=uuid_int, version=7)

The uuid6 library in (alpha) version 2022.6.25 implements both uuid6 and uuid7 from the draft. It doesn’t match draft 4 exactly though — it stores the submillisecond parts of the timestamp at the beginning of what’s supposed to be pseudorandom bits. The boundary values for comparison are only computed with 1-second precision, so they would work with either implementation.

UUID7 < time

To find UUID7-s generated before 2022–04–03 10:30:00, we can just build a boundary UUID from this date and compare UUID objects directly:

dt = datetime.datetime.strptime("2022-04-03 10:30:00", "%Y-%m-%d %H:%M:%S").replace(tzinfo=datetime.timezone.utc)

    ####################################################################################################################
    # Using an upper boundary with <=:
    ####################################################################################################################
    boundary_uuid = uuid7_time_boundary(dt, lower=True)

    earlier_uuid = uuid7_time_boundary(dt - datetime.timedelta(milliseconds=1), lower=False)

    # compare with earlier uuid
    print("%s\n<\n%s\n== %s\n" % (earlier_uuid, boundary_uuid, earlier_uuid < boundary_uuid))
    # 017feef9-743f-7fff-bfff-ffffffffffff
    # <
    # 017feef9-7440-7000-8000-000000000000
    # == True

    later_uuid = UUID('018ea382-283f-7bfc-addb-a6c06bf382da')  # 2024-04-03

    # compare with later uuid
    print("%s\n<\n%s\n== %s\n" % (later_uuid, boundary_uuid, later_uuid < boundary_uuid))
    # 018ea382-283f-7bfc-addb-a6c06bf382da
    # <
    # 017feef9-7440-7000-8000-000000000000
    # == False

    ####################################################################################################################
    # Using an upper boundary with <=:
    ####################################################################################################################

    upper_boundary_uuid = uuid7_time_boundary(dt - datetime.timedelta(milliseconds=1), lower=False)

    print("%s\n<=\n%s\n== %s\n" % (earlier_uuid, upper_boundary_uuid, earlier_uuid <= upper_boundary_uuid))
    # 017feef9-743f-7fff-bfff-ffffffffffff
    # <=
    # 017feef9-743f-7fff-bfff-ffffffffffff
    # == True

    print("%s\n<=\n%s\n== %s\n" % (later_uuid, upper_boundary_uuid, later_uuid <= upper_boundary_uuid))
    # 018ea382-283f-7bfc-addb-a6c06bf382da
    # <=
    # 017feef9-743f-7fff-bfff-ffffffffffff
    # == False

Partition boundaries

In the world of MySQL, the partitions’ limits can be defined by using hexadecimal constants starting with 0x. This is valid SQL:

CREATE TABLE workshops (
  workshop_id BINARY(16) NOT NULL, -- UUIDs will be stored in compact binary form
  name VARCHAR(255) NOT NULL,
  time_scheduled DATETIME NOT NULL,
  PRIMARY KEY (workshop_id, time_scheduled)
)
PARTITION BY RANGE COLUMNS(workshop_id) (
  -- monthly partitions
  -- for the LESS THAN value we're using the boundary UUID encoded as a hex constant (with dashes removed)
  PARTITION p_old     VALUES LESS THAN (0x016f5e66e80070008000000000000000), -- < 2020-01-01 00:00:00
  PARTITION p_2020_01 VALUES LESS THAN (0x016ffe0c0c0070008000000000000000), -- < 2020-02-01 00:00:00
  PARTITION p_2020_02 VALUES LESS THAN (0x01709364780070008000000000000000), -- < 2020-03-01 00:00:00  
  PARTITION p_future  VALUES LESS THAN MAXVALUE
);

CREATE TABLE participants (
  participant_id BINARY(16) NOT NULL,
  name VARCHAR(255) NOT NULL,
  time_created DATETIME NOT NULL,
  PRIMARY KEY (participant_id, time_created)
)
PARTITION BY RANGE COLUMNS(participant_id) (
  PARTITION p_old     VALUES LESS THAN (0x016f5e66e80070008000000000000000), -- < 2020-01-01 00:00:00
  PARTITION p_2020_01 VALUES LESS THAN (0x016ffe0c0c0070008000000000000000), -- < 2020-02-01 00:00:00
  PARTITION p_2020_02 VALUES LESS THAN (0x01709364780070008000000000000000), -- < 2020-03-01 00:00:00  
  PARTITION p_future  VALUES LESS THAN MAXVALUE
);

CREATE TABLE workshop_participants (
  workshop_participant_id BINARY(16) NOT NULL,
  workshop_id BINARY(16) NOT NULL,
  participant_id BINARY(16) NOT NULL,  
  PRIMARY KEY (workshop_participant_id),
  INDEX wp_i_workshop_id (workshop_id),
  INDEX wp_i_participant_id (participant_id)
)
PARTITION BY RANGE COLUMNS(workshop_participant_id) (
  PARTITION p_old     VALUES LESS THAN (0x016f5e66e80070008000000000000000), -- < 2020-01-01 00:00:00
  PARTITION p_2020_01 VALUES LESS THAN (0x016ffe0c0c0070008000000000000000), -- < 2020-02-01 00:00:00
  PARTITION p_2020_02 VALUES LESS THAN (0x01709364780070008000000000000000), -- < 2020-03-01 00:00:00  
  PARTITION p_future  VALUES LESS THAN MAXVALUE
);

Splitting the many-to-many bridging table

In its current form, the bridging table is not great, because selecting from workshop_participants either by workshop_id or by participant_id would not use partition pruning, because none of these foreign keys are in the partitioning expression. As a solution, the bridging table can be split into two:

CREATE TABLE workshop_participants (
  workshop_id BINARY(16) NOT NULL,
  participant_id BINARY(16) NOT NULL,
  time_created DATETIME NOT NULL,
  PRIMARY KEY (workshop_id, participant_id)
)
-- partitioning condition is satisfied - the partitioning column is part of the primary key
PARTITION BY RANGE COLUMNS(workshop_id) (
  -- ...  
  PARTITION p_future  VALUES LESS THAN MAXVALUE
);

CREATE TABLE participant_workshops (
  participant_id BINARY(16) NOT NULL,
  workshop_id BINARY(16) NOT NULL,
  time_created DATETIME NOT NULL,
  PRIMARY KEY (participant_id, workshop_id)
)
-- partitioning condition is satisfied - the partitioning column is part of the primary key
PARTITION BY RANGE COLUMNS(participant_id) (
  -- ...  
  PARTITION p_future  VALUES LESS THAN MAXVALUE
);

This technique also solves the problem of when partitions are going to be purged from the bridging table if the 2 connected tables have different data retention times. If one side is going to be deleted earlier than the other, there are going to be hanging rows in the other table of the split, but that’s bearable. Since actually defined foreign keys are not supported with partitioning the application logic must be written in such a way as to deal with the possibility of missing rows.

As a side effect, the split tables will also probably take up less disk space than the original combined one, because instead of 3 indexes — 1 clustered and two secondary indexes for the foreign keys, there are now only 2 clustered indexes — 1 for each table. I haven’t checked it though.

The SELECTs for the required read operations also need to be updated. Selecting all participants attending a given workshop will be accomplished as follows:

SELECT HEX(participant_id) FROM workshop_participants WHERE workshop_id = 0x016fb86c8f807d1d9a8960a15b7eb63e;

-- Output:
-- | HEX(participant_id)              |
-- | -------------------------------- |
-- | 016D87D5D180713484C5D4E471CCA418 |
-- | 016F619EE18071D085FD3968AA46EF0B |
EXPLAIN SELECT HEX(participant_id) FROM workshop_participants WHERE workshop_id = 0x016fb86c8f807d1d9a8960a15b7eb63e;
-- Explain:
-- | id  | select_type | table                 | partitions | type | possible_keys | key     | key_len | ref   | rows | filtered | Extra                    |
-- | --- | ----------- | --------------------- | ---------- | ---- | ------------- | ------- | ------- | ----- | ---- | -------- | ------------------------ |
-- | 1   | SIMPLE      | workshop_participants | p_2020_01  | ref  | PRIMARY       | PRIMARY | 16      | const | 2    | 100      | Using where; Using index |

SELECT HEX(participant_id), name, time_created FROM participants WHERE participant_id IN (0x016D87D5D180713484C5D4E471CCA418, 0x016F619EE18071D085FD3968AA46EF0B);

-- Output:
-- | HEX(participant_id)              | name     | time_created        |
-- | -------------------------------- | -------- | ------------------- |
-- | 016D87D5D180713484C5D4E471CCA418 | Jane Doe | 2019-10-01 15:00:00 |
-- | 016F619EE18071D085FD3968AA46EF0B | john Doe | 2020-01-01 15:00:00 |

EXPLAIN SELECT HEX(participant_id), name, time_created FROM participants WHERE participant_id IN (0x016D87D5D180713484C5D4E471CCA418, 0x016F619EE18071D085FD3968AA46EF0B);
-- Explain
-- | id  | select_type | table        | partitions      | type  | possible_keys | key     | key_len | ref | rows | filtered | Extra       |
-- | --- | ----------- | ------------ | --------------- | ----- | ------------- | ------- | ------- | --- | ---- | -------- | ----------- |
-- | 1   | SIMPLE      | participants | p_old,p_2020_01 | range | PRIMARY       | PRIMARY | 16      |     | 2    | 100      | Using where |

To get all workshops that a given participant has signed up for, the other table of the split many-to-many bridge is accessed:

SELECT HEX(workshop_id) FROM participant_workshops WHERE participant_id = 0x016f619ee18071d085fd3968aa46ef0b;

-- Output:
-- | HEX(workshop_id)                 |
-- | -------------------------------- |
-- | 016FB82267207367A7E6E580C4871E32 |
-- | 016FB86C8F807D1D9A8960A15B7EB63E |

EXPLAIN SELECT HEX(workshop_id) FROM participant_workshops WHERE participant_id = 0x016f619ee18071d085fd3968aa46ef0b;

-- Explain:
-- | id  | select_type | table                 | partitions | type | possible_keys | key     | key_len | ref   | rows | filtered | Extra                    |
-- | --- | ----------- | --------------------- | ---------- | ---- | ------------- | ------- | ------- | ----- | ---- | -------- | ------------------------ |
-- | 1   | SIMPLE      | participant_workshops | p_2020_01  | ref  | PRIMARY       | PRIMARY | 16      | const | 2    | 100      | Using where; Using index |

SELECT HEX(workshop_id), name, time_scheduled FROM workshops WHERE workshop_id IN (0x016FB82267207367A7E6E580C4871E32, 0x016FB86C8F807D1D9A8960A15B7EB63E);

-- Output:
-- | HEX(workshop_id)                 | name                             | time_scheduled      |
-- | -------------------------------- | -------------------------------- | ------------------- |
-- | 016FB82267207367A7E6E580C4871E32 | UUIDs - when and how to use them | 2020-01-18 10:11:00 |
-- | 016FB86C8F807D1D9A8960A15B7EB63E | Partitioning InnoDB tables       | 2020-01-18 11:32:00 |

EXPLAIN SELECT workshop_id, name, time_scheduled FROM workshops WHERE workshop_id IN (0x016FB82267207367A7E6E580C4871E32, 0x016FB86C8F807D1D9A8960A15B7EB63E);

-- Explain:
-- | id  | select_type | table     | partitions | type  | possible_keys | key     | key_len | ref | rows | filtered | Extra       |
-- | --- | ----------- | --------- | ---------- | ----- | ------------- | ------- | ------- | --- | ---- | -------- | ----------- |
-- | 1   | SIMPLE      | workshops | p_2020_01  | range | PRIMARY       | PRIMARY | 16      |     | 2    | 100      | Using where |

All of the above queries take advantage of partition pruning.

The sample data and all queries are available for playing around with at DB Fiddle, or as a downloadable .sql file.

Hiccups

Listing partitions in INFORMATION_SCHEMA.PARTITIONS

The read-only view can be used to list all partitions of a given table. For range partitions, the boundary value is included in the PARTITION_DESCRIPTION field. This works fine for datetime fields but binary values are garbled. If you need to read back this value, it’s better to manually encode it in the partition’s name.

SELECT PARTITION_NAME, PARTITION_ORDINAL_POSITION, PARTITION_DESCRIPTION FROM INFORMATION_SCHEMA.PARTITIONS WHERE TABLE_NAME = 'workshops';

-- Output MySQL 5.7:
-- | PARTITION_NAME | PARTITION_ORDINAL_POSITION | PARTITION_DESCRIPTION |
-- | -------------- | -------------------------- | --------------------- |
-- | p_old          | 1                          | 'o^f                 |
-- | p_2020_01      | 2                          | 'o                   |
-- | p_2020_02      | 3                          | 'p                   |
-- | p_future       | 4                          | MAXVALUE              |

Update: It’s been fixed in MySQL 8.0:

SELECT PARTITION_NAME, PARTITION_ORDINAL_POSITION, PARTITION_DESCRIPTION FROM INFORMATION_SCHEMA.PARTITIONS WHERE TABLE_NAME = 'workshops';
-- Output MySQL 8.0:

-- | PARTITION_NAME | PARTITION_ORDINAL_POSITION | PARTITION_DESCRIPTION                      |
-- | -------------- | -------------------------- | ------------------------------------------ |
-- | p_future       | 4                          | MAXVALUE                                   |
-- | p_2020_02      | 3                          | _binary 0x01709364780070008000000000000000 |
-- | p_2020_01      | 2                          | _binary 0x016FFE0C0C0070008000000000000000 |
-- | p_old          | 1                          | _binary 0x016F5E66E80070008000000000000000 |

ORDER BY partitioning_col LIMIT N

A typical query for this would be “Get me the latest 10 items”. When the table is partitioned by time, it would make sense to use the implicit ordering or partitions and start from the latest partition. If fewer than 10 items were found, then continue onto the next partition, and so on. After 10 items were found, the query can immediately finish. AFAIK this is not implemented in MYSQL, instead, it will access all partitions and do a merge sort. PostgreSQL has an improved query planner and supports ordered partition scans since version 12.

Other DB Engines

I haven’t tested this on DB engines other than MySQL (with InnoDB storage), but the basic principle should work. It all depends on whether binary sequences can be used as partition boundaries. I’d be curious to know if you’re using a similar approach, as well as what works and what doesn’t.

DEV Community