Here is a very short demo to answer this question: when updating a row, is it faster to have only the primary key in the where clause, or all the column values? From what I know about database storage, I don't see how it could be faster with additional columns. Except maybe on heap tables where a secondary index may have a better clustering factor than the primary key.
But I've heard this on YugabyteDB, with is wrong. A row is found in the LSM-Tree by its primary key. Verifying other column values is additional work. But always good to provide an example.
I used my ybwr script to gather storage statistics in a lab:
-- load YBWR to gather statistics
\! curl -s https://raw.githubusercontent.com/FranckPachot/ybdemo/main/docker/yb-lab/client/ybwr.sql | grep -v '\watch' > ybwr.sql
\i ybwr.sql
I create a table with many columns and indexes on them:
-- create DEMO table
drop table if exists demo;
create table demo ( id bigint primary key
, c0 int , c1 int , c2 int , c3 int , c4 int , c5 int , c6 int , c7 int , c8 int , c9 int
);
insert into demo select generate_series(1,1000),0,0,0,0,0,0,0,0,0,0;
create index democ0 on demo(c0);
create index democ1 on demo(c1);
create index democ2 on demo(c2);
create index democ3 on demo(c3);
Here is an update with a predicate on the primary key:
-- update with the primary key in where clause
execute snap_reset;
update demo set c0=0 where id=42;
execute snap_table;
I run this on a database where packed rows is disabled. Each column value is a sub-document in DocDB. Here are the reads on the table and the index: 8 seek() into the LSM-Tree, and 20 next.
execute snap_table;
ybwr metrics
--------------
(0 rows)
yugabyte=# update demo set c0=0 where id=42;
UPDATE 1
yugabyte=# execute snap_table;
rocksdb_seek | rocksdb_next | rocksdb_insert | dbname / relname / tserver / tabletid / leader
--------------+--------------+----------------+-----------------------------------------------------------
| | 1 | yugabyte demo hash_split: [0xAAAA, 0xFFFF] 10.0.0.142
| | 1 | yugabyte demo hash_split: [0xAAAA, 0xFFFF] 10.0.0.143
5 | 17 | 1 | yugabyte demo hash_split: [0xAAAA, 0xFFFF] L 10.0.0.141
| | 2 | yugabyte democ0 hash_split: [0xAAAA, 0xFFFF] 10.0.0.141
| | 2 | yugabyte democ0 hash_split: [0xAAAA, 0xFFFF] 10.0.0.142
3 | 3 | 2 | yugabyte democ0 hash_split: [0xAAAA, 0xFFFF] L 10.0.0.143
(6 rows)
Here is the same with additional predicates on all columns:
-- update with additional columns
update demo set c0=0 where id=42 and c0=0 and c1=0 and c2=0 and c3=0 and c4=0 and c5=0 and c6=0 and c7=0 and c8=0 and c9=0;
execute snap_table;
The result shows that it is a bit more expensive, with 11 seek() and 29 next()
yugabyte=# -- update with additional columns
yugabyte=# update demo set c0=0 where id=42 and c0=0 and c1=0 and c2=0 and c3=0 and c4=0 and c5=0 and c6=0 and c7=0 and c8=0 and c9=0;
UPDATE 1
yugabyte=# execute snap_table;
rocksdb_seek | rocksdb_next | rocksdb_insert | dbname / relname / tserver / tabletid / leader
--------------+--------------+----------------+-----------------------------------------------------------
| | 1 | yugabyte demo hash_split: [0xAAAA, 0xFFFF] 10.0.0.142
| | 1 | yugabyte demo hash_split: [0xAAAA, 0xFFFF] 10.0.0.143
6 | 20 | 1 | yugabyte demo hash_split: [0xAAAA, 0xFFFF] L 10.0.0.141
| | 2 | yugabyte democ0 hash_split: [0xAAAA, 0xFFFF] 10.0.0.141
| | 2 | yugabyte democ0 hash_split: [0xAAAA, 0xFFFF] 10.0.0.142
5 | 9 | 2 | yugabyte democ0 hash_split: [0xAAAA, 0xFFFF] L 10.0.0.143
(6 rows)
With packed rows (--ysql_enable_packed_row=true
) where each row is a document, the different is lower. But, anyway, when you have the key you don't need anything else to find a row. However, you might have other reasons, like verifying that the row didn't change since your last read, to do something like cross-transaction optimistic locking, and the results above show that it is not very expensive to do so.
Top comments (0)