Franck Pachot for YugabyteDB

Posted on Nov 13, 2023

IN() list filter with Top-N

#yugabytedb #distributed #sql #postgres

Combining two indexing patterns, one to filter to multiple ranges, and the other to sort rows for a Top-N query makes SQL performance complex. Today we will optimize:

SELECT * FROM docs
WHERE status           IN ('draft', 'sent')
  AND sender_reference IN ('Custom/1175', 'Client/362', 'Custom/280')
ORDER BY sent_at DESC NULLS LAST 
LIMIT 20;

What is the best index for it, and is there a more efficient way to write this query?

This blog post is inspired by a typical pattern seen with Ruby on Rails Active Record and other ORMs, as demonstrated by Benoit Tigeot's test case: https://gist.github.com/benoittgt/ab72dc4cfedea2a0c6a5ee809d16e04d

You should read the comments, and Peter Geoghegan's patch to enhance ScalarArrayOp and bring Dynamic Scan to PostgreSQL, as well as this Twitter thread.

I'll execute an identical example using YugabyteDB:

Testcase in YugabyteDB

To set up the tables and indexes, I follow the instructions outlined in Benoit Tigeot's Gist:

DROP TABLE IF EXISTS docs;
CREATE TABLE docs (
  id SERIAL PRIMARY KEY,
  type varchar(40) DEFAULT 'pdf' NOT NULL,
  status varchar(40) NOT NULL,
  sender_reference varchar(40) NOT NULL,
  sent_at TIMESTAMPTZ,
  created_at TIMESTAMPTZ DEFAULT now() NOT NULL
);

-- fast bulk loads from single sessions with no concurrent access
set yb_disable_transactional_writes=on;
set yb_enable_upsert_mode=on;

INSERT INTO
  docs (type, status, sender_reference, sent_at)
SELECT
  ('{pdf,doc,raw}'::text[])[ceil(random()*3)],
  ('{sent,draft,suspended}'::text[])[ceil(random()*3)],
  ('{Custom,Client}'::text[])[ceil(random()*2)] || '/' || floor(random() * 2000),
  (LOCALTIMESTAMP - interval '2 years' * random())::timestamptz
FROM generate_series(1, 60000000) g;

-- reconnect to reset the session parameters
\c

CREATE INDEX docs_sent_at_idx   ON docs USING btree (sender_reference asc, status, sent_at DESC NULLS LAST);
CREATE INDEX docs_sent_at_idx_1 ON docs USING btree (sent_at DESC NULLS LAST, sender_reference, status);
CREATE INDEX docs_sent_at_idx_2 ON docs USING btree (sender_reference asc, sent_at DESC NULLS LAST);

ANALYZE docs;

I made explicit the ASC order of the indexed columns, as YugabyteDB use HASH on the first column by default.

Check table and indexes distribution

The table in YugabyteDB is hashed based on its primary key, and initially starts with one tablet per node. Indexes are range sharded and start with one tablet when empty. As the data grows, YugabyteDB's auto-splitting feature distributes it to multiple tablets. To check these tablets, you can use the following query:

yugabyte=> with o(name) as (
 values ('docs'),('docs_sent_at_idx'),('docs_sent_at_idx_1'),('docs_sent_at_idx_2')
) select name, reltuples, pg_size_pretty(pg_table_size(name))
 , yb_table_properties.*
 , coalesce(pg_get_indexdef(name::regclass),pg_get_indexdef((name||'_pkey')::regclass))
 from o left outer join pg_class on (pg_class.oid=o.name::regclass)
 cross join yb_table_properties(name::regclass)
order by pg_table_size(name) desc
;

        name        | reltuples | pg_size_pretty | num_tablets | num_hash_key_columns | is_colocated | tablegroup_oid | colocation_id |                                                       coalesce
--------------------+-----------+----------------+-------------+----------------------+--------------+----------------+---------------+----------------------------------------------------------------------------------------------------------------------
 docs               |     6e+07 | 3491 MB        |           9 |                    1 | f            |                |               | CREATE UNIQUE INDEX docs_pkey ON public.docs USING lsm (id HASH)
 docs_sent_at_idx_1 |     6e+07 | 2347 MB        |           4 |                    0 | f            |                |               | CREATE INDEX docs_sent_at_idx_1 ON public.docs USING lsm (sent_at DESC NULLS LAST, sender_reference ASC, status ASC)
 docs_sent_at_idx   |     6e+07 | 1989 MB        |           4 |                    0 | f            |                |               | CREATE INDEX docs_sent_at_idx ON public.docs USING lsm (sender_reference ASC, status ASC, sent_at DESC NULLS LAST)
 docs_sent_at_idx_2 |     6e+07 | 1896 MB        |           4 |                    0 | f            |                |               | CREATE INDEX docs_sent_at_idx_2 ON public.docs USING lsm (sender_reference ASC, sent_at DESC NULLS LAST)
(4 rows)

I'm prepared to test the query on 60 million rows, getting a subset of status and sender reference with multiple values, sorting them by send_at, and retrieving the top 20.

Index Full Scan but Sorted

I am currently running my query without activating the cost-based optimizer. As a result, the query planner has decided to access the docs_sent_at_idx_1 index, which is already sorted on the columns we are ordering by. However, this index does not provide fast access to the ranges of our WHERE clause.

yugabyte=> EXPLAIN (ANALYZE, COSTS off, VERBOSE, BUFFERS, DIST)
SELECT * FROM docs
WHERE status IN ('draft', 'sent') AND sender_reference IN ('Custom/1175', 'Client/362', 'Custom/280') ORDER BY sent_at DESC NULLS LAST LIMIT 20 OFFSET 0;

                                                                                                                                                                                                                                    QUERY PLAN
------------------------------------------------------------------------------------------------------------------------------------------------------------------
 Limit (actual time=23.638..1115.891 rows=20 loops=1)
   Output: id, type, status, sender_reference, sent_at, created_at
   ->  Index Scan using docs_sent_at_idx_1 on public.docs (actual time=23.637..1115.870 rows=20 loops=1)
         Output: id, type, status, sender_reference, sent_at, created_at
         Filter: (((docs.status)::text = ANY ('{draft,sent}'::text[])) AND ((docs.sender_reference)::text = ANY ('{Custom/1175,Client/362,Custom/280}'::text[])))
         Rows Removed by Filter: 63466
         Storage Table Read Requests: 62
         Storage Table Read Execution Time: 1047.225 ms
         Storage Index Read Requests: 62
         Storage Index Read Execution Time: 3.881 ms
 Planning Time: 0.109 ms
 Execution Time: 1116.027 ms
 Storage Read Requests: 124
 Storage Read Execution Time: 1051.106 ms
 Storage Write Requests: 0
 Catalog Read Requests: 0
 Catalog Write Requests: 0
 Storage Flush Requests: 0
 Storage Execution Time: 1051.106 ms
 Peak Memory Usage: 8 kB
(20 rows)

The plan is similar to that of PostgreSQL 14. Both databases scan through thousands of rows because they, even if they are sorted, these rows must be filtered before selecting the top 20. The downside of this strategy in YugabyteDB is that it takes longer to fetch these rows from the distributed tablets (DocDB) to the PostgreSQL backend (YSQL) which does the final Rows Removed by Filter and Limit. This is why YugabyteDB's response time is slower compared to that of PostgreSQL.

Index Loose Scan on First Column, Not Sorted

In YugabyteDB, I don't need to delete an index to test another query plan. Instead, I can use hints to choose another index, such as docs_sent_at_idx_2, which provides fast access to sender_reference:

yugabyte=> EXPLAIN (ANALYZE, COSTS off, VERBOSE, BUFFERS, DIST)
/*+ IndexScan(docs docs_sent_at_idx_2) */
SELECT * FROM docs
WHERE status IN ('draft', 'sent') AND sender_reference IN ('Custom/1175', 'Client/362', 'Custom/280') ORDER BY sent_at DESC NULLS LAST LIMIT 20 OFFSET 0;

                                                                                                                                                                                                                                                                                                                                                                 QUERY PLAN
-----------------------------------------------------------------------------------------------------------------
 Limit (actual time=859.454..859.460 rows=20 loops=1)
   Output: id, type, status, sender_reference, sent_at, created_at
   ->  Sort (actual time=859.453..859.455 rows=20 loops=1)
         Output: id, type, status, sender_reference, sent_at, created_at
         Sort Key: docs.sent_at DESC NULLS LAST
         Sort Method: top-N heapsort  Memory: 26kB
         ->  Index Scan using docs_sent_at_idx_2 on public.docs (actual time=27.932..855.330 rows=30019 loops=1)
               Output: id, type, status, sender_reference, sent_at, created_at
               Index Cond: ((docs.sender_reference)::text = ANY ('{Custom/1175,Client/362,Custom/280}'::text[]))
               Filter: ((docs.status)::text = ANY ('{draft,sent}'::text[]))
               Rows Removed by Filter: 14866
               Storage Table Read Requests: 45
               Storage Table Read Execution Time: 806.829 ms
               Storage Index Read Requests: 46
               Storage Index Read Execution Time: 4.811 ms
 Planning Time: 0.193 ms
 Execution Time: 859.542 ms
 Storage Read Requests: 91
 Storage Read Execution Time: 811.640 ms
 Storage Write Requests: 0
 Catalog Read Requests: 0
 Catalog Write Requests: 0
 Storage Flush Requests: 0
 Storage Execution Time: 811.640 ms
 Peak Memory Usage: 50 kB
(25 rows)

YugabyteDB employs Loose Index Scan to read three ranges using a single Index Scan, reducing the number of rows removed later.PostgreSQL 14 employs Bitmap Scan for this and shows a similar number of Rows Removed by Filter.

Index Loose Scan on all Columns, Not Sorted

As YugabyteDB Loose Index Scan can read multiple ranges, I will use docs_sent_at_idx to filter on the two columns:

yugabyte=> EXPLAIN (ANALYZE, COSTS off, VERBOSE, BUFFERS, DIST)
/*+ IndexScan(docs docs_sent_at_idx) */
SELECT * FROM docs
WHERE status IN ('draft', 'sent') AND sender_reference IN ('Custom/1175', 'Client/362', 'Custom/280') ORDER BY sent_at DESC NULLS LAST LIMIT 20 OFFSET 0;

                                                                                 QUERY PLAN
----------------------------------------------------------------------------------------------------------------------------------------------------------------------------
 Limit (actual time=507.265..507.270 rows=20 loops=1)
   Output: id, type, status, sender_reference, sent_at, created_at
   ->  Sort (actual time=507.263..507.265 rows=20 loops=1)
         Output: id, type, status, sender_reference, sent_at, created_at
         Sort Key: docs.sent_at DESC NULLS LAST
         Sort Method: top-N heapsort  Memory: 26kB
         ->  Index Scan using docs_sent_at_idx on public.docs (actual time=22.317..503.131 rows=30019 loops=1)
               Output: id, type, status, sender_reference, sent_at, created_at
               Index Cond: (((docs.sender_reference)::text = ANY ('{Custom/1175,Client/362,Custom/280}'::text[])) AND ((docs.status)::text = ANY ('{draft,sent}'::text[])))
               Storage Table Read Requests: 30
               Storage Table Read Execution Time: 471.460 ms
               Storage Index Read Requests: 31
               Storage Index Read Execution Time: 5.259 ms
 Planning Time: 0.175 ms
 Execution Time: 507.341 ms
 Storage Read Requests: 61
 Storage Read Execution Time: 476.719 ms
 Storage Write Requests: 0
 Catalog Read Requests: 0
 Catalog Write Requests: 0
 Storage Flush Requests: 0
 Storage Execution Time: 476.719 ms
 Peak Memory Usage: 50 kB
(23 rows)

YugabyteDB can scan the precise ranges for both conditions, thus minimizing the number of rows to read. However, since the rows are not sorted, many must be read, retrieved, and sorted before selecting only the Top 20.

Union All and Merge Append

By using UNION ALL to select each range independently, the LIMIT can be pushed down and the sorted index used for each range:

yugabyte=> EXPLAIN (ANALYZE, COSTS off, VERBOSE, BUFFERS, DIST)
/*+ IndexScan(docs docs_sent_at_idx) */
SELECT *
FROM
  (
    (
      (
          SELECT * FROM docs
          WHERE status = 'draft'
            AND sender_reference = 'Custom/1175'
          ORDER BY
            sent_at DESC NULLS LAST
          LIMIT
            20 OFFSET 0
      )
      UNION ALL
        (
          SELECT * FROM docs
          WHERE status = 'sent'
            AND sender_reference = 'Custom/1175'
          ORDER BY
            sent_at DESC NULLS LAST
          LIMIT
            20 OFFSET 0
        )
      UNION ALL
      (
          SELECT * FROM docs
          WHERE status = 'draft'
            AND sender_reference = 'Client/362'
          ORDER BY
            sent_at DESC NULLS LAST
          LIMIT
            20 OFFSET 0
      )
      UNION ALL
        (
          SELECT * FROM docs
          WHERE status = 'sent'
            AND sender_reference = 'Client/362'
          ORDER BY
            sent_at DESC NULLS LAST
          LIMIT
            20 OFFSET 0
        )
        UNION ALL
        (
          SELECT * FROM docs
          WHERE status = 'draft'
            AND sender_reference = 'Custom/280'
          ORDER BY
            sent_at DESC NULLS LAST
          LIMIT
            20 OFFSET 0
      )
      UNION ALL
        (
          SELECT * FROM docs
          WHERE status = 'sent'
            AND sender_reference = 'Custom/280'
          ORDER BY
            sent_at DESC NULLS LAST
          LIMIT
            20 OFFSET 0
        )
    )
  ) docs
ORDER BY
  sent_at DESC NULLS LAST
LIMIT
  20 OFFSET 0;
                                                                                                                                                                                                                           QUERY PLAN
--------------------------------------------------------------------------------------------------------------------------------------
 Limit (actual time=14.783..14.807 rows=20 loops=1)
   Output: docs.id, docs.type, docs.status, docs.sender_reference, docs.sent_at, docs.created_at
   ->  Merge Append (actual time=14.782..14.802 rows=20 loops=1)
         Sort Key: docs.sent_at DESC NULLS LAST
         ->  Limit (actual time=2.987..2.990 rows=4 loops=1)
               Output: docs.id, docs.type, docs.status, docs.sender_reference, docs.sent_at, docs.created_at
               ->  Index Scan using docs_sent_at_idx on public.docs (actual time=2.987..2.989 rows=4 loops=1)
                     Output: docs.id, docs.type, docs.status, docs.sender_reference, docs.sent_at, docs.created_at
                     Index Cond: (((docs.sender_reference)::text = 'Custom/1175'::text) AND ((docs.status)::text = 'draft'::text))
                     Storage Table Read Requests: 1
                     Storage Table Read Execution Time: 2.095 ms
                     Storage Index Read Requests: 1
                     Storage Index Read Execution Time: 0.704 ms
         ->  Limit (actual time=2.604..2.606 rows=4 loops=1)
               Output: docs_1.id, docs_1.type, docs_1.status, docs_1.sender_reference, docs_1.sent_at, docs_1.created_at
               ->  Index Scan using docs_sent_at_idx on public.docs docs_1 (actual time=2.602..2.605 rows=4 loops=1)
                     Output: docs_1.id, docs_1.type, docs_1.status, docs_1.sender_reference, docs_1.sent_at, docs_1.created_at
                     Index Cond: (((docs_1.sender_reference)::text = 'Custom/1175'::text) AND ((docs_1.status)::text = 'sent'::text))
                     Storage Table Read Requests: 1
                     Storage Table Read Execution Time: 1.873 ms
                     Storage Index Read Requests: 1
                     Storage Index Read Execution Time: 0.582 ms
         ->  Limit (actual time=2.768..2.770 rows=4 loops=1)
               Output: docs_2.id, docs_2.type, docs_2.status, docs_2.sender_reference, docs_2.sent_at, docs_2.created_at
               ->  Index Scan using docs_sent_at_idx on public.docs docs_2 (actual time=2.767..2.768 rows=4 loops=1)
                     Output: docs_2.id, docs_2.type, docs_2.status, docs_2.sender_reference, docs_2.sent_at, docs_2.created_at
                     Index Cond: (((docs_2.sender_reference)::text = 'Client/362'::text) AND ((docs_2.status)::text = 'draft'::text))
                     Storage Table Read Requests: 1
                     Storage Table Read Execution Time: 1.648 ms
                     Storage Index Read Requests: 1
                     Storage Index Read Execution Time: 1.011 ms
         ->  Limit (actual time=2.628..2.629 rows=2 loops=1)
               Output: docs_3.id, docs_3.type, docs_3.status, docs_3.sender_reference, docs_3.sent_at, docs_3.created_at
               ->  Index Scan using docs_sent_at_idx on public.docs docs_3 (actual time=2.626..2.627 rows=2 loops=1)
                     Output: docs_3.id, docs_3.type, docs_3.status, docs_3.sender_reference, docs_3.sent_at, docs_3.created_at
                     Index Cond: (((docs_3.sender_reference)::text = 'Client/362'::text) AND ((docs_3.status)::text = 'sent'::text))
                     Storage Table Read Requests: 1
                     Storage Table Read Execution Time: 1.520 ms
                     Storage Index Read Requests: 1
                     Storage Index Read Execution Time: 0.994 ms
         ->  Limit (actual time=1.878..1.882 rows=4 loops=1)
               Output: docs_4.id, docs_4.type, docs_4.status, docs_4.sender_reference, docs_4.sent_at, docs_4.created_at
               ->  Index Scan using docs_sent_at_idx on public.docs docs_4 (actual time=1.877..1.880 rows=4 loops=1)
                     Output: docs_4.id, docs_4.type, docs_4.status, docs_4.sender_reference, docs_4.sent_at, docs_4.created_at
                     Index Cond: (((docs_4.sender_reference)::text = 'Custom/280'::text) AND ((docs_4.status)::text = 'draft'::text))
                     Storage Table Read Requests: 1
                     Storage Table Read Execution Time: 1.345 ms
                     Storage Index Read Requests: 1
                     Storage Index Read Execution Time: 0.405 ms
         ->  Limit (actual time=1.911..1.914 rows=7 loops=1)
               Output: docs_5.id, docs_5.type, docs_5.status, docs_5.sender_reference, docs_5.sent_at, docs_5.created_at
               ->  Index Scan using docs_sent_at_idx on public.docs docs_5 (actual time=1.910..1.912 rows=7 loops=1)
                     Output: docs_5.id, docs_5.type, docs_5.status, docs_5.sender_reference, docs_5.sent_at, docs_5.created_at
                     Index Cond: (((docs_5.sender_reference)::text = 'Custom/280'::text) AND ((docs_5.status)::text = 'sent'::text))
                     Storage Table Read Requests: 1
                     Storage Table Read Execution Time: 1.363 ms
                     Storage Index Read Requests: 1
                     Storage Index Read Execution Time: 0.426 ms
 Planning Time: 0.740 ms
 Execution Time: 14.899 ms
 Storage Read Requests: 12
 Storage Read Execution Time: 13.967 ms
 Storage Write Requests: 0
 Catalog Read Requests: 0
 Catalog Write Requests: 0
 Storage Flush Requests: 0
 Storage Execution Time: 13.967 ms
 Peak Memory Usage: 120 kB
(68 rows)

This is the fastest query for PostgreSQL 14 and for YugabyteDB. This is the fastest query for PostgreSQL 14 and YugabyteDB. It benefits from Merge Append which avoids additional sorting.

Index Only Scan, Union All and Merge Append

When analyzing Index Scan, I refer to the output section of explain (verbose) to create a covering index:

create index docs_sent_at_cov on docs (
  status ASC, sender_reference ASC, sent_at DESC NULLS LAST
  , type, sender_reference, created_at, id 
);

The covering index technique can be used in PostgreSQL, however, it may not have the same advantages, and the availability of Bitmap Scan is often interesting. Even with Index Only Scan, PostgreSQL still needs to retrieve the visibility information from the table, which can be a costly operation if the table was not recently vacuumed. Moreover, having too many indexes in PostgreSQL can lead to increased bloat. On the other hand, YugabyteDB doesn't implement a Bitmap Scan yet but uses a different MVCC implementation where indexes carry the visibility and index maintenance is lighter. With this index, all branches are Index Only Scan, which allows the Top-20 rows to be sorted without having to access the table.

yugabyte=> EXPLAIN (ANALYZE, COSTS off, VERBOSE, BUFFERS, DIST)
/*+ IndexOnlyScan(docs docs_sent_at_cov) */
SELECT *
FROM
  (
    (
      (
          SELECT * FROM docs
          WHERE status = 'draft'
            AND sender_reference = 'Custom/1175'
          ORDER BY
            sent_at DESC NULLS LAST
          LIMIT
            20 OFFSET 0
      )
      UNION ALL
        (
          SELECT * FROM docs
          WHERE status = 'sent'
            AND sender_reference = 'Custom/1175'
          ORDER BY
            sent_at DESC NULLS LAST
          LIMIT
            20 OFFSET 0
        )
      UNION ALL
      (
          SELECT * FROM docs
          WHERE status = 'draft'
            AND sender_reference = 'Client/362'
          ORDER BY
            sent_at DESC NULLS LAST
          LIMIT
            20 OFFSET 0
      )
      UNION ALL
        (
          SELECT * FROM docs
          WHERE status = 'sent'
            AND sender_reference = 'Client/362'
          ORDER BY
            sent_at DESC NULLS LAST
          LIMIT
            20 OFFSET 0
        )
        UNION ALL
        (
          SELECT * FROM docs
          WHERE status = 'draft'
            AND sender_reference = 'Custom/280'
          ORDER BY
            sent_at DESC NULLS LAST
          LIMIT
            20 OFFSET 0
      )
      UNION ALL
        (
          SELECT * FROM docs
          WHERE status = 'sent'
            AND sender_reference = 'Custom/280'
          ORDER BY
            sent_at DESC NULLS LAST
          LIMIT
            20 OFFSET 0
        )
    )
  ) docs
ORDER BY
  sent_at DESC NULLS LAST
LIMIT
  20 OFFSET 0;

                                                                                                                                       QUERY PLAN
-------------------------------------------------------------------------------------------------------------------------------
 Limit (actual time=7.658..7.686 rows=20 loops=1)
   Output: docs.id, docs.type, docs.status, docs.sender_reference, docs.sent_at, docs.created_at
   ->  Merge Append (actual time=7.657..7.681 rows=20 loops=1)
         Sort Key: docs.sent_at DESC NULLS LAST
         ->  Limit (actual time=1.656..1.660 rows=4 loops=1)
               Output: docs.id, docs.type, docs.status, docs.sender_reference, docs.sent_at, docs.created_at
               ->  Index Only Scan using docs_sent_at_cov on public.docs (actual time=1.642..1.646 rows=4 loops=1)
                     Output: docs.id, docs.type, docs.status, docs.sender_reference, docs.sent_at, docs.created_at
                     Index Cond: ((docs.sender_reference = 'Custom/1175'::text) AND (docs.status = 'draft'::text))
                     Heap Fetches: 0
                     Storage Index Read Requests: 1
                     Storage Index Read Execution Time: 1.522 ms
         ->  Limit (actual time=1.362..1.364 rows=4 loops=1)
               Output: docs_1.id, docs_1.type, docs_1.status, docs_1.sender_reference, docs_1.sent_at, docs_1.created_at
               ->  Index Only Scan using docs_sent_at_cov on public.docs docs_1 (actual time=1.361..1.363 rows=4 loops=1)
                     Output: docs_1.id, docs_1.type, docs_1.status, docs_1.sender_reference, docs_1.sent_at, docs_1.created_at
                     Index Cond: ((docs_1.sender_reference = 'Custom/1175'::text) AND (docs_1.status = 'sent'::text))
                     Heap Fetches: 0
                     Storage Index Read Requests: 1
                     Storage Index Read Execution Time: 1.294 ms
         ->  Limit (actual time=1.214..1.217 rows=4 loops=1)
               Output: docs_2.id, docs_2.type, docs_2.status, docs_2.sender_reference, docs_2.sent_at, docs_2.created_at
               ->  Index Only Scan using docs_sent_at_cov on public.docs docs_2 (actual time=1.213..1.216 rows=4 loops=1)
                     Output: docs_2.id, docs_2.type, docs_2.status, docs_2.sender_reference, docs_2.sent_at, docs_2.created_at
                     Index Cond: ((docs_2.sender_reference = 'Client/362'::text) AND (docs_2.status = 'draft'::text))
                     Heap Fetches: 0
                     Storage Index Read Requests: 1
                     Storage Index Read Execution Time: 1.159 ms
         ->  Limit (actual time=1.161..1.163 rows=2 loops=1)
               Output: docs_3.id, docs_3.type, docs_3.status, docs_3.sender_reference, docs_3.sent_at, docs_3.created_at
               ->  Index Only Scan using docs_sent_at_cov on public.docs docs_3 (actual time=1.161..1.162 rows=2 loops=1)
                     Output: docs_3.id, docs_3.type, docs_3.status, docs_3.sender_reference, docs_3.sent_at, docs_3.created_at
                     Index Cond: ((docs_3.sender_reference = 'Client/362'::text) AND (docs_3.status = 'sent'::text))
                     Heap Fetches: 0
                     Storage Index Read Requests: 1
                     Storage Index Read Execution Time: 1.104 ms
         ->  Limit (actual time=1.204..1.208 rows=4 loops=1)
               Output: docs_4.id, docs_4.type, docs_4.status, docs_4.sender_reference, docs_4.sent_at, docs_4.created_at
               ->  Index Only Scan using docs_sent_at_cov on public.docs docs_4 (actual time=1.203..1.206 rows=4 loops=1)
                     Output: docs_4.id, docs_4.type, docs_4.status, docs_4.sender_reference, docs_4.sent_at, docs_4.created_at
                     Index Cond: ((docs_4.sender_reference = 'Custom/280'::text) AND (docs_4.status = 'draft'::text))
                     Heap Fetches: 0
                     Storage Index Read Requests: 1
                     Storage Index Read Execution Time: 1.153 ms
         ->  Limit (actual time=1.055..1.060 rows=7 loops=1)
               Output: docs_5.id, docs_5.type, docs_5.status, docs_5.sender_reference, docs_5.sent_at, docs_5.created_at
               ->  Index Only Scan using docs_sent_at_cov on public.docs docs_5 (actual time=1.055..1.059 rows=7 loops=1)
                     Output: docs_5.id, docs_5.type, docs_5.status, docs_5.sender_reference, docs_5.sent_at, docs_5.created_at
                     Index Cond: ((docs_5.sender_reference = 'Custom/280'::text) AND (docs_5.status = 'sent'::text))
                     Heap Fetches: 0
                     Storage Index Read Requests: 1
                     Storage Index Read Execution Time: 1.006 ms
 Planning Time: 0.717 ms
 Execution Time: 7.778 ms
 Storage Read Requests: 6
 Storage Read Execution Time: 7.239 ms
 Storage Write Requests: 0
 Catalog Read Requests: 0
 Catalog Write Requests: 0
 Storage Flush Requests: 0
 Storage Execution Time: 7.239 ms
 Peak Memory Usage: 120 kB
(62 rows)

WITH clause and LATERAL join

Using a UNION ALL query to generate all possible combinations is not the most efficient approach. Instead, I prefer to use a WITH clause and define the lists as Common Table Expressions (CTE) using VALUES. To read the table for each combination, I use LATERAL join:

yugabyte=> EXPLAIN (ANALYZE, COSTS off, VERBOSE, BUFFERS, DIST)
with
 -- list of status
 status_list(status) as ( values ('draft'),('sent') )
 -- list of sender_reference
,sender_reference_list(sender_reference) as ( values ('Custom/1175'),('Client/362'),('Custom/280') )
 -- combination
,list as ( select * from status_list cross join sender_reference_list)
 -- lateral join
select docs.* from list cross join lateral (
 -- subquery for each combination
 SELECT * FROM docs
 WHERE status =list.status AND sender_reference = list.sender_reference
 -- push down the Top-N
 ORDER BY sent_at DESC NULLS LAST LIMIT 20 OFFSET 0
) as docs
 ORDER BY sent_at DESC NULLS LAST LIMIT 20 OFFSET 0;
                                                                                                                                    QUERY PLAN
--------------------------------------------------------------------------------------------------------------------------
 Limit (actual time=7.768..7.773 rows=20 loops=1)
   Output: docs.id, docs.type, docs.status, docs.sender_reference, docs.sent_at, docs.created_at
   CTE status_list
     ->  Values Scan on "*VALUES*" (actual time=0.001..0.002 rows=2 loops=1)
           Output: "*VALUES*".column1
   CTE sender_reference_list
     ->  Values Scan on "*VALUES*_1" (actual time=0.000..0.003 rows=3 loops=1)
           Output: "*VALUES*_1".column1
   CTE list
     ->  Nested Loop (actual time=0.007..0.018 rows=6 loops=1)
           Output: status_list.status, sender_reference_list.sender_reference
           ->  CTE Scan on status_list (actual time=0.003..0.006 rows=2 loops=1)
                 Output: status_list.status
           ->  CTE Scan on sender_reference_list (actual time=0.001..0.004 rows=3 loops=2)
                 Output: sender_reference_list.sender_reference
   ->  Sort (actual time=7.767..7.769 rows=20 loops=1)
         Output: docs.id, docs.type, docs.status, docs.sender_reference, docs.sent_at, docs.created_at
         Sort Key: docs.sent_at DESC NULLS LAST
         Sort Method: top-N heapsort  Memory: 27kB
         ->  Nested Loop (actual time=1.505..7.714 rows=120 loops=1)
               Output: docs.id, docs.type, docs.status, docs.sender_reference, docs.sent_at, docs.created_at
               ->  CTE Scan on list (actual time=0.008..0.025 rows=6 loops=1)
                     Output: list.status, list.sender_reference
               ->  Limit (actual time=1.257..1.277 rows=20 loops=6)
                     Output: docs.id, docs.type, docs.status, docs.sender_reference, docs.sent_at, docs.created_at
                     ->  Index Only Scan using docs_sent_at_cov on public.docs (actual time=1.242..1.258 rows=20 loops=6)
                           Output: docs.id, docs.type, docs.status, docs.sender_reference, docs.sent_at, docs.created_at
                           Index Cond: ((docs.sender_reference = list.sender_reference) AND (docs.status = list.status))
                           Heap Fetches: 0
                           Storage Index Read Requests: 1
                           Storage Index Read Execution Time: 1.192 ms
 Planning Time: 0.239 ms
 Execution Time: 7.863 ms
 Storage Read Requests: 6
 Storage Read Execution Time: 7.152 ms
 Storage Write Requests: 0
 Catalog Read Requests: 0
 Catalog Write Requests: 0
 Storage Flush Requests: 0
 Storage Execution Time: 7.152 ms
 Peak Memory Usage: 142 kB
(41 rows)

If you are familiar with the Oracle Star Transformation that replaces many IN() filters with a cross join on dimensions, this is a similar idea: get one temporary table with all combinations and use the result to join to the table using one index only.

The winner is the last example that builds the ranges using Common Table Expressions and a Nested Loop with the appropriate index to read a single range sorted on our Top-N columns.

Reinvent your career. Join DEV.

It takes one minute and is worth it for your career.

Get started

DEV Community

IN() list filter with Top-N

Testcase in YugabyteDB

Check table and indexes distribution

Index Full Scan but Sorted

Index Loose Scan on First Column, Not Sorted

Index Loose Scan on all Columns, Not Sorted

Union All and Merge Append

Index Only Scan, Union All and Merge Append

WITH clause and LATERAL join

Top comments (0)

Read next

Breakthrough: Simpler Vision-Language AI Matches Performance of Models 10x Larger

AI Model That Learns During Test Time Achieves 20% Accuracy Boost Without Retraining

AI System Masters Particle Physics by Learning to Complete 3D Trajectory Puzzles

General AI Model Beats Eye Specialist at Disease Detection, but Not at Predicting Overall Health