DEV Community

Franck Pachot for YugabyteDB

Posted on • Updated on

YugabyteDB Predicate Push Down

I've written, in a previous blog post, about some very basic pushdowns. More are coming. YugabyteDB started by plugging PostgreSQL on top of the DocDB Distributed Storage and Transaction layer, and then optimize it by pushing down the relevant operations to reduce the remote calls between the nodes. In the latest preview version, some pushdown filters can be enabled. Here is a simple example.

Rows Removed by Filter

I have the following table with no index on n:

yugabyte=# create table demo (id bigint primary key, value int);

yugabyte=# insert into demo (id, value)
           select n, n from generate_series(1,1000) n;

yugabyte=# show yb_enable_expression_pushdown;
(1 row)

yugabyte=# explain (costs off, analyze, verbose)
           select * from demo where value=1;

                            QUERY PLAN
 Seq Scan on public.demo (actual time=6.040..6.058 rows=1 loops=1)
   Output: id, value
   Filter: (demo.value = 1)
   Rows Removed by Filter: 999
 Planning Time: 0.038 ms
 Execution Time: 6.095 ms
(6 rows)
Enter fullscreen mode Exit fullscreen mode

With yb_enable_expression_pushdown set to off, the default in version 2.13.2, 1000 rows have been read from the remote nodes, 999 filtered by the node I'm connected to, to finally output 1 row.

This uses the PostgreSQL explain plan where the rows in the operation shows the final output and the Rows Removed by Filter counts what has been read, and filtered afterwards. Then the ratio of discarded rows is high, it is a sign of inefficient access. This probably requires a better index for the condition in Filter:.

The consequence is worse in a distributed SQL database because those rows, in addition to being read, are sent to the session backend to be filtered there. This takes time, and increases the network cost in a cloud environment. The solution may be in better indexing, but the database can also improve this by pushing down the filter to the remote node.

Remote Filter

By setting yb_enable_expression_pushdown to on, the rows are filtered on the tablet servers:

yugabyte=# set yb_enable_expression_pushdown to on;

yugabyte=# explain (costs off, analyze, verbose)
           select * from demo where value=1;

                            QUERY PLAN
 Seq Scan on public.demo (actual time=2.697..2.698 rows=1 loops=1)
   Output: id, value
   Remote Filter: (demo.value = 1)
 Planning Time: 0.034 ms
 Execution Time: 2.734 ms
(5 rows)
Enter fullscreen mode Exit fullscreen mode

The difference in time is small here, as I'm running this on a lab with all nodes on the same machine, but less rows to send also means less CPU, less context switches, and a better distribution of the load. In a real deployment over multiple availability zone, or even regions, the difference will be huge.

Only for Seq Scan

The pushdowns are currently implemented for Seq Scan only, where it always makes sense. If I create a "bad" index and force it with hint:

yugabyte=# create unique index demoi on demo(id) include (value);

yugabyte=# set yb_enable_expression_pushdown=on;

yugabyte=# explain (costs off, analyze, verbose) 
           /*+ IndexOnlyScan(demo) */ 
           select * from demo where value=1;

                                      QUERY PLAN
 Index Only Scan using demoi on public.demo (actual time=6.825..6.872 rows=1 loops=1)
   Output: id, value
   Filter: (demo.value = 1)
   Rows Removed by Filter: 999
   Heap Fetches: 0
 Planning Time: 0.086 ms
 Execution Time: 6.897 ms
(7 rows)
Enter fullscreen mode Exit fullscreen mode

This index doesn't filter the rows (no Index Cond.) and all 100 rows have been returned to the PostgreSQL backed to be filtered there.

If you see Rows Removed by Filter from an index access, then better think about the index. For this query, the index should be on demo(value) include (id) or maybe a partial index where value=1.

I mentioned that predicate pushdown is implemented for SeqScan, but if you are interested in YugabyteDB internals, and how it reuses PostgreSQL, you will see that the SeqScan displayed above is actually a ForeignScan in disguise. The Foreign Data Wrapper is the PostgreSQL operation that has the code to manage remote access, which makes sense in a distributed database. The pushdown you see as "Remote Filter" is in explain.c ForeignScan. And a ForeignScan for a SELECT on a YugabyteDB table is actually displayed as "Seq Scan" in explain.c to avoid confusion for PostgreSQL users. You don't need this knowledge of implementation details for common usage. But it is good to know if you use query planner hints. For example, trying the above example with a /*+ SeqScan(demo) */ will not push down the predicate because is forces the true SeqScan rather than the hidden ForeignScan.

Also for Updates and Delete

Note that predicate push down also occur on DML. You see it in the SeqScan node. For example, this removes no rows but has to read them all as I have no index to find them:

yugabyte=# explain (costs off, analyze) 
           delete from demo where value=0;

                            QUERY PLAN
 Delete on demo (actual time=2.188..2.188 rows=0 loops=1)
   ->  Seq Scan on demo (actual time=2.187..2.187 rows=0 loops=1)
         Remote Filter: (value = 0)
 Planning Time: 0.044 ms
 Execution Time: 2.228 ms
(5 rows)
Enter fullscreen mode Exit fullscreen mode

Without pushdown, this would have displayed Filter: (value = 0) and Rows Removed by Filter: 999. Here no ros had to be returned to the SQL processing layer.

set yb_enable_expression_pushdown = on

You may wonder why this awesome feature is not enabled by default? Remember that you can do online rolling upgrades with YugabyteDB. You don't want to be connected to a newer version tserver that will push down a filter to a previous version that doesn't have it. That's the reason, but as soon as you have upgraded the whole cluster, you can switch it on. Like in many areas, this will also improve soon to make this automated (conservative defaults for rolling upgrades but enabling new feature as soon as the whole cluster has them).

Which predicates?

Of course not all predicates can be pushed down and giving a list would be obsolete soon as you can expect more to be implemented. The code is here.

The beauty of open source software is that the tests are also open. Then, if you want to get an idea of the predicates that can be pushed down, why not grep for Remote Filter on the regression tests, like this:

curl -s \
| grep "Remote Filter"

   Remote Filter: (i1 = 1)
   Remote Filter: (t1 ~~ 'val%'::text)
   Remote Filter: (ts1 < 'Thu Nov 11 11:11:12 2021'::timestamp without time zone)
   Remote Filter: (ts2 < 'Thu Nov 11 11:11:10 2021 UTC'::timestamp with time zone)
   Remote Filter: isfinite(ts1)
   Remote Filter: ("left"(t1, i1) = 'v'::text)
   Remote Filter: ((i1)::double precision < ('10'::double precision + (random() * '90'::double precision)))
   Remote Filter: (ts1 IS NULL)
   Remote Filter: (ts1 IS NOT NULL)
   Remote Filter: ((i1 = 1) OR ((NOT isfinite(ts1)) AND (ts2 > 'Mon Jan 01 08:01:01 2001 UTC'::timestamp with time zone)))
   Remote Filter: CASE WHEN ((i1 % 2) = 0) THEN (ts1 < 'Fri Nov 12 00:00:00 2021'::timestamp without time zone) WHEN ((i1 % 2) = 1) THEN (ts2 > 'Sat Jan 01 07:00:00 2022 UTC'::timestamp with time zone) ELSE NULL::boolean END
         Remote Filter: (i1 = 1)
   Remote Filter: (i1 = 1)
         Remote Filter: (tag = 'foo'::text)
   Remote Filter: (v = 'foo'::text)
Enter fullscreen mode Exit fullscreen mode

You see equality, inequality, null test, boolean expressions, CASE WHEN THEN END, and some functions that are safe to push down.

And for my Oracle DBA followers, and some google search keywords, you can see these pushdowns are very similar to Oracle Exadata Smart Scan with predicate offloading. Even if Exadata is a monolithic database machine, and reads full blocks rather than rows and columns, it has the same goal: avoid sending rows though the network that will be filtered out later. The grep "Remote Filter" is my equivalent of select * from V$SQLFN_METADATA where offloadable='YES' 😋

I'm writing this as of preview version 2.13.2 and I recommend to to test it and give feedback 🙏

Top comments (0)