In the previous post about "truly distributed UDF" I displayed some execution plans to show how YugabyteDB distributes User Defined Functions (UDF) for reads (Remote Filter
) and writes (Function Based Indexes for point queries on the function).
I didn't realize that the latest version I was running the demo on (YugabyteDB 2.17) has already the dist
option of explain analyze
to show the remove calls between the query layer (YSQL - the PostgreSQL based backend) and the storage server (DocDB - the distributed storage and transaction).
Without push-down
If I disable Expression Push-Down, all rows are read from the storage, to be filtered by the PostgreSQL code. This is visible as Rows Removed by Filter
. When displaying the Distributed Storage statistics, with DIST
it also shows the number of remote calls (Storage Table Read Requests
) and the time take for them (Storage Table Execution Time
):
yugabyte=> set yb_enable_expression_pushdown=off;
SET
yugabyte=> explain (costs off, analyze, dist)
select add(a,b) from demo where add(a,b) <= 42;
QUERY PLAN
-----------------------------------------------------------------
Seq Scan on demo (actual time=88.161..3511.977 rows=21 loops=1)
Filter: ((a + b) <= 42)
Rows Removed by Filter: 999979
Storage Table Read Requests: 978
Storage Table Execution Time: 3300.013 ms
Planning Time: 0.052 ms
Execution Time: 3512.088 ms
Storage Read Requests: 978
Storage Write Requests: 0
Storage Execution Time: 3300.013 ms
Peak Memory Usage: 8 kB
(11 rows)
Here the only optimization is to send rows by batches. 1 million rows received by 978
fetch requests (the default --ysql_prefetch_limit
is 1000 rows).
Most of the time is spent remotely (3300
out of 3511
milliseconds), so the filtering itself is not the problem. The problem is that this filtering is applied in the single postgres backend rather than in the storage servers close to data.
This is why, even if the query layer is based on PostgreSQL code, YugabyteDB adds the necessary pushdowns to distribute the work.
With expression pushdown
With expression pushdown, the function, that has been inlined thanks to the PostgreSQL code, is distributed, visible as Remote Filter
. The time is now faster (1176
milliseconds instead of 3300
) because there is minimal data transfer:
yugabyte=> set yb_enable_expression_pushdown=on;
SET
yugabyte=> explain (costs off, analyze, dist)
select add(a,b) from demo where add(a,b) <= 42;
QUERY PLAN
------------------------------------------------------------------
Seq Scan on demo (actual time=1178.692..1178.699 rows=21 loops=1)
Remote Filter: ((a + b) <= 42)
Storage Table Read Requests: 1
Storage Table Execution Time: 1176.005 ms
Planning Time: 0.054 ms
Execution Time: 1178.753 ms
Storage Read Requests: 1
Storage Write Requests: 0
Storage Execution Time: 1176.005 ms
Peak Memory Usage: 8 kB
(10 rows)
There is only one read request because there are only 21 rows to return. 1176.005
/1178.753
=99.7% of the processing time was offloaded to the distributed storage.
Function Based Index
When creating an index on the function, not only the reads fron the table are pushed down but also the writes to the index. This, in addition to remote filter, allows point queries, visible as Index Cond
:
yugabyte=> create index demo_udf on demo( add(a,b) asc ) include (a,b);
CREATE INDEX
yugabyte=> explain (costs off, analyze, dist, timing off)
select add(a,b) from demo where add(a,b) <= 42;
QUERY PLAN
-----------------------------------------------------------------
Index Only Scan using demo_udf on demo (actual rows=21 loops=1)
Index Cond: (((a + b)) <= 42)
Heap Fetches: 0
Storage Index Read Requests: 1
Planning Time: 0.075 ms
Execution Time: 0.909 ms
Storage Read Requests: 1
Storage Write Requests: 0
Storage Execution Time: 4.000 ms
Peak Memory Usage: 8 kB
(10 rows)
The read requests are still at their minimum, as only 21 rows are fetched, from one tablet, filtered by the index range scan. The time spent in storage is also reduced thanks to the index access: 0.9 millisecond, bringing the total response time, to get 21 rows out of 1 million, to single digit milliseconds.
This is a good example of how YugabyteDB re-uses PostgreSQL: the API is the same, the PostgreSQL features are available, but data reads and writes are distributed and replicated. The execution plan looks familiar to the PostgreSQL users, and still has enough informations to understand the distributed execution.
Top comments (0)