DEV Community

Franck Pachot
Franck Pachot

Posted on • Updated on

EXPLAIN (ANALYZE, DIST) πŸš€ YugabyteDB distributed execution plan

In the previous post about "truly distributed UDF" I displayed some execution plans to show how YugabyteDB distributes User Defined Functions (UDF) for reads (Remote Filter) and writes (Function Based Indexes for point queries on the function).

I didn't realize that the latest version I was running the demo on (YugabyteDB 2.17) has already the dist option of explain analyze to show the remove calls between the query layer (YSQL - the PostgreSQL based backend) and the storage server (DocDB - the distributed storage and transaction).

Without push-down

If I disable Expression Push-Down, all rows are read from the storage, to be filtered by the PostgreSQL code. This is visible as Rows Removed by Filter. When displaying the Distributed Storage statistics, with DIST it also shows the number of remote calls (Storage Table Read Requests) and the time take for them (Storage Table Execution Time):

yugabyte=> set yb_enable_expression_pushdown=off;
SET

yugabyte=> explain (costs off, analyze, dist)
           select add(a,b) from demo where add(a,b) <= 42;

                           QUERY PLAN
-----------------------------------------------------------------
 Seq Scan on demo (actual time=88.161..3511.977 rows=21 loops=1)
   Filter: ((a + b) <= 42)
   Rows Removed by Filter: 999979
   Storage Table Read Requests: 978
   Storage Table Execution Time: 3300.013 ms
 Planning Time: 0.052 ms
 Execution Time: 3512.088 ms
 Storage Read Requests: 978
 Storage Write Requests: 0
 Storage Execution Time: 3300.013 ms
 Peak Memory Usage: 8 kB
(11 rows)
Enter fullscreen mode Exit fullscreen mode

Here the only optimization is to send rows by batches. 1 million rows received by 978 fetch requests (the default --ysql_prefetch_limit is 1000 rows).

Most of the time is spent remotely (3300 out of 3511 milliseconds), so the filtering itself is not the problem. The problem is that this filtering is applied in the single postgres backend rather than in the storage servers close to data.

This is why, even if the query layer is based on PostgreSQL code, YugabyteDB adds the necessary pushdowns to distribute the work.

With expression pushdown

With expression pushdown, the function, that has been inlined thanks to the PostgreSQL code, is distributed, visible as Remote Filter. The time is now faster (1176 milliseconds instead of 3300) because there is minimal data transfer:

yugabyte=> set yb_enable_expression_pushdown=on;
SET

yugabyte=> explain (costs off, analyze, dist)
           select add(a,b) from demo where add(a,b) <= 42;

                            QUERY PLAN
------------------------------------------------------------------
 Seq Scan on demo (actual time=1178.692..1178.699 rows=21 loops=1)
   Remote Filter: ((a + b) <= 42)
   Storage Table Read Requests: 1
   Storage Table Execution Time: 1176.005 ms
 Planning Time: 0.054 ms
 Execution Time: 1178.753 ms
 Storage Read Requests: 1
 Storage Write Requests: 0
 Storage Execution Time: 1176.005 ms
 Peak Memory Usage: 8 kB
(10 rows)
Enter fullscreen mode Exit fullscreen mode

There is only one read request because there are only 21 rows to return. 1176.005/1178.753=99.7% of the processing time was offloaded to the distributed storage.

Function Based Index

When creating an index on the function, not only the reads fron the table are pushed down but also the writes to the index. This, in addition to remote filter, allows point queries, visible as Index Cond:

yugabyte=> create index demo_udf on demo( add(a,b) asc ) include (a,b);
CREATE INDEX

yugabyte=> explain (costs off, analyze, dist, timing off)
           select add(a,b) from demo where add(a,b) <= 42;

                           QUERY PLAN
-----------------------------------------------------------------
 Index Only Scan using demo_udf on demo (actual rows=21 loops=1)
   Index Cond: (((a + b)) <= 42)
   Heap Fetches: 0
   Storage Index Read Requests: 1
 Planning Time: 0.075 ms
 Execution Time: 0.909 ms
 Storage Read Requests: 1
 Storage Write Requests: 0
 Storage Execution Time: 4.000 ms
 Peak Memory Usage: 8 kB
(10 rows)
Enter fullscreen mode Exit fullscreen mode

The read requests are still at their minimum, as only 21 rows are fetched, from one tablet, filtered by the index range scan. The time spent in storage is also reduced thanks to the index access: 0.9 millisecond, bringing the total response time, to get 21 rows out of 1 million, to single digit milliseconds.


This is a good example of how YugabyteDB re-uses PostgreSQL: the API is the same, the PostgreSQL features are available, but data reads and writes are distributed and replicated. The execution plan looks familiar to the PostgreSQL users, and still has enough informations to understand the distributed execution.

Top comments (0)