A very short demo on updates in YugabyteDB

#yugabytedb #postgres #distributed #sql

Here is a very short demo to answer this question: when updating a row, is it faster to have only the primary key in the where clause, or all the column values? From what I know about database storage, I don't see how it could be faster with additional columns. Except maybe on heap tables where a secondary index may have a better clustering factor than the primary key.

But I've heard this on YugabyteDB, with is wrong. A row is found in the LSM-Tree by its primary key. Verifying other column values is additional work. But always good to provide an example.

I used my ybwr script to gather storage statistics in a lab:

-- load YBWR to gather statistics
\! curl -s https://raw.githubusercontent.com/FranckPachot/ybdemo/main/docker/yb-lab/client/ybwr.sql | grep -v '\watch' > ybwr.sql
\i ybwr.sql

I create a table with many columns and indexes on them:

-- create DEMO table
drop table if exists demo;
create table demo ( id bigint primary key
 , c0  int , c1  int , c2  int , c3  int , c4  int , c5  int , c6  int , c7  int , c8  int , c9  int
);
insert into demo select generate_series(1,1000),0,0,0,0,0,0,0,0,0,0;
create index democ0 on demo(c0);
create index democ1 on demo(c1);
create index democ2 on demo(c2);
create index democ3 on demo(c3);

Here is an update with a predicate on the primary key:

-- update with the primary key in where clause
execute snap_reset;
update demo  set c0=0 where id=42;
execute snap_table;

I run this on a database where packed rows is disabled. Each column value is a sub-document in DocDB. Here are the reads on the table and the index: 8 seek() into the LSM-Tree, and 20 next.

execute snap_table;
 ybwr metrics
--------------
(0 rows)

yugabyte=# update demo  set c0=0 where id=42;
UPDATE 1
yugabyte=# execute snap_table;
 rocksdb_seek | rocksdb_next | rocksdb_insert |      dbname / relname / tserver / tabletid / leader
--------------+--------------+----------------+-----------------------------------------------------------
              |              |              1 | yugabyte demo hash_split: [0xAAAA, 0xFFFF]   10.0.0.142
              |              |              1 | yugabyte demo hash_split: [0xAAAA, 0xFFFF]   10.0.0.143
            5 |           17 |              1 | yugabyte demo hash_split: [0xAAAA, 0xFFFF] L 10.0.0.141
              |              |              2 | yugabyte democ0 hash_split: [0xAAAA, 0xFFFF]   10.0.0.141
              |              |              2 | yugabyte democ0 hash_split: [0xAAAA, 0xFFFF]   10.0.0.142
            3 |            3 |              2 | yugabyte democ0 hash_split: [0xAAAA, 0xFFFF] L 10.0.0.143
(6 rows)

Here is the same with additional predicates on all columns:

-- update with additional columns
update demo  set c0=0 where id=42 and c0=0 and c1=0 and c2=0 and c3=0 and c4=0 and c5=0 and c6=0 and c7=0 and c8=0 and c9=0;
execute snap_table;

The result shows that it is a bit more expensive, with 11 seek() and 29 next()

yugabyte=# -- update with additional columns
yugabyte=# update demo  set c0=0 where id=42 and c0=0 and c1=0 and c2=0 and c3=0 and c4=0 and c5=0 and c6=0 and c7=0 and c8=0 and c9=0;
UPDATE 1
yugabyte=# execute snap_table;
 rocksdb_seek | rocksdb_next | rocksdb_insert |      dbname / relname / tserver / tabletid / leader
--------------+--------------+----------------+-----------------------------------------------------------
              |              |              1 | yugabyte demo hash_split: [0xAAAA, 0xFFFF]   10.0.0.142
              |              |              1 | yugabyte demo hash_split: [0xAAAA, 0xFFFF]   10.0.0.143
            6 |           20 |              1 | yugabyte demo hash_split: [0xAAAA, 0xFFFF] L 10.0.0.141
              |              |              2 | yugabyte democ0 hash_split: [0xAAAA, 0xFFFF]   10.0.0.141
              |              |              2 | yugabyte democ0 hash_split: [0xAAAA, 0xFFFF]   10.0.0.142
            5 |            9 |              2 | yugabyte democ0 hash_split: [0xAAAA, 0xFFFF] L 10.0.0.143
(6 rows)

With packed rows (--ysql_enable_packed_row=true) where each row is a document, the different is lower. But, anyway, when you have the key you don't need anything else to find a row. However, you might have other reasons, like verifying that the row didn't change since your last read, to do something like cross-transaction optimistic locking, and the results above show that it is not very expensive to do so.

DEV Community

A very short demo on updates in YugabyteDB

Top comments (0)