SELECT DISTINCT pushdown to do a loose index scan (skip scan)

#yugabytedb #distributed #postgres #sql

In the previous post I used WITH RECURSIVE to emulate a DISTINCT ON, by scanning an index with a skip to the next distinct value on the first column. This is known as skip scan or loose index scan. YugabyteDB can do it when scanning an index with a condition on the second column, thanks to the hybrid scan method.

In the last release (version 2.17.1.0 build 439) a new pushdown has been introduced for DISTINCT. This is described in the commit.

Without DISTINCT pushdown

With the table and index created in the previous post, listing the 2000 truck_id from 20 million records takes more than 30 seconds even with an Index Scan:

yugabyte=# explain (costs off, analyze, dist) select distinct truck_id from truck_readings;
                                      QUERY PLAN
---------------------------------------------------------------------------------------
 HashAggregate (actual time=42093.894..42094.277 rows=2000 loops=1)
   Group Key: truck_id
   ->  Seq Scan on truck_readings (actual time=2.490..38914.325 rows=20000000 loops=1)
         Storage Table Read Requests: 19533
         Storage Table Execution Time: 35100.091 ms
 Planning Time: 0.065 ms
 Execution Time: 42094.523 ms
 Storage Read Requests: 19533
 Storage Write Requests: 0
 Storage Execution Time: 35100.091 ms
 Peak Memory Usage: 9081 kB

yugabyte=# /*+ IndexOnlyScan(truck_readings) */ explain (costs off, analyze, dist) select distinct truck_id from truck_readings;

                                                      QUERY PLAN
-----------------------------------------------------------------------------------------------------------------------
 Unique (actual time=3.034..47393.512 rows=2000 loops=1)
   ->  Index Only Scan using truck_last_reading on truck_readings (actual time=3.032..45894.507 rows=20000000 loops=1)
         Heap Fetches: 0
         Storage Index Read Requests: 19532
         Storage Index Execution Time: 38068.095 ms
 Planning Time: 0.098 ms
 Execution Time: 47394.577 ms
 Storage Read Requests: 19532
 Storage Write Requests: 0
 Storage Execution Time: 38068.095 ms
 Peak Memory Usage: 8 kB
(11 rows)

In both cases, Seq Scan or Index Only Scan, 20 million rows has been fetched from the storage, and aggregated in the SQL layer.

Note that the queries above do not use the new feature, even in version 2.17.1.0 build 439 because:

without hint, the SELECT without WHERE clause doesn't use the index
with IndexOnlyScan hint, the new feature is not used

With DISTINCT pushdown

If I force a range scan with a WHERE clause (I use truck_id>=0 here but you can also use truck_id>(select min(truck_id) from truck_readings) because the aggregate is also pushed down, and fast), the DISTINCT is pushed down:

yugabyte=# explain (costs off, analyze, dist, verbose)
           select distinct truck_id from truck_readings
           where truck_id>=0;

                                                       QUERY PLAN
-----------------------------------------------------------------------------------------------------------------------
 Unique (actual time=8.614..17.083 rows=2000 loops=1)
   Output: truck_id
   ->  Index Only Scan using truck_last_reading on public.truck_readings (actual time=8.612..16.715 rows=2001 loops=1)
         Output: truck_id
         Index Cond: (truck_readings.truck_id >= 0)
         Heap Fetches: 0
         Storage Index Read Requests: 3
         Storage Index Execution Time: 16.000 ms
 Planning Time: 0.071 ms
 Execution Time: 17.197 ms
 Storage Read Requests: 3
 Storage Write Requests: 0
 Storage Execution Time: 16.000 ms
 Peak Memory Usage: 8 kB
(14 rows)

The main difference, besides the response time, is the number of rows returned from the Index Scan to the SQL layer: rows=2001 has skipped all rows with duplicate truck_id

DISTINCT ON without LATERAL instead of RECURSIVE

I will apply this to simplify the query in my previous post. I'll still use a CTE (Common Table Expression, aka WITH clause) for better readability, but it is not recursive ans simpler to understand:


with
-- get the first value (for fun, but can use the known minimum)
 "first" as (
 select min(truck_id) truck_id from truck_readings
),
-- use pushed down DISTINCT to get the list of truck_id
 "distinct" as (
  select distinct truck_id from truck_readings
  where truck_id>=(select truck_id from "first")
)
-- get the last reading for each truck
select truck_readings.* from "distinct", lateral (
 select * from truck_readings 
 where truck_id= "distinct".truck_id
 order by ts desc limit 1
) as truck_readings 
;

The explain (analyze, costs off, dist, verbose, buffers) of it is shows how it works:

                                                                                                                                     QUERY PLAN
---------------------------------------------------------------------------------------------------------------------------------------------
 Nested Loop (actual time=11.454..9108.708 rows=2000 loops=1)
   CTE first
     ->  Result (actual time=0.706..0.706 rows=1 loops=1)
           InitPlan 1 (returns $0)
             ->  Limit (actual time=0.703..0.704 rows=1 loops=1)
                   ->  Index Only Scan using truck_last_reading on truck_readings truck_readings_1 (actual time=0.702..0.702 rows=1 loops=1)
                         Index Cond: (truck_id IS NOT NULL)
                         Heap Fetches: 0
                         Storage Index Read Requests: 1
                         Storage Index Execution Time: 0.000 ms
   CTE distinct
     ->  Unique (actual time=1.374..22.448 rows=2000 loops=1)
           InitPlan 3 (returns $2)
             ->  CTE Scan on first (actual time=0.707..0.708 rows=1 loops=1)
           ->  Index Only Scan using truck_last_reading on truck_readings truck_readings_2 (actual time=1.373..20.668 rows=2001 loops=1)
                 Index Cond: (truck_id >= $2)
                 Heap Fetches: 0
                 Storage Index Read Requests: 4
                 Storage Index Execution Time: 20.000 ms
   ->  CTE Scan on "distinct" (actual time=1.376..24.344 rows=2000 loops=1)
   ->  Limit (actual time=4.540..4.541 rows=1 loops=2000)
         ->  Index Only Scan Backward using truck_last_reading on truck_readings (actual time=4.529..4.529 rows=1 loops=2000)
               Index Cond: (truck_id = "distinct".truck_id)
               Heap Fetches: 0
               Storage Index Read Requests: 2
               Storage Index Execution Time: 4.496 ms
 Planning Time: 0.221 ms
 Execution Time: 9109.490 ms
 Storage Read Requests: 3042
 Storage Write Requests: 0
 Storage Execution Time: 9012.023 ms
 Peak Memory Usage: 354 kB
(32 rows)

Time: 9136.663 ms (00:09.137)

The execution is approximately two times faster than WITH RECURSIVE. The main difference is that I used an inequality (truck_id <) to get to the right row in each loop, but here it is an equality (truck_id =).

DEV Community

SELECT DISTINCT pushdown to do a loose index scan (skip scan)

Without DISTINCT pushdown

With DISTINCT pushdown

DISTINCT ON without LATERAL instead of RECURSIVE

Latest comments (0)