Franck Pachot for YugabyteDB Distributed PostgreSQL Database

Posted on Feb 1

Best Practice: use the same datatypes for comparisons, like joins and foreign keys

#yugabytedb #postgres #distributed #database

It is essential to ensure that columns in two tables with referential integrity or joined together have the same data type. Even if there is an implicit casting and the performance appears good, some corner cases may cause issues. For instance, in YugabyteDB, buffering is used to maintain high performance and overcome the inherent latency of distributed transactions. However, it is crucial to ensure that the behavior is not changed when buffering reads and writes. Special care must be taken to buffer the operations when a function, expression, or type casting occurs. Here are two examples of buffering not occurring in the current version (YugabyteDB 2.20) because of datatype inconsistency.

I define two tables, with with bigint and one with integer:

yugabyte=# create table t1 (k bigint  primary key);
CREATE TABLE
yugabyte=# create table t2 (k integer primary key);
CREATE TABLE
yugabyte=# insert into t1 select generate_series(1,1000);
INSERT 0 1000
yugabyte=# insert into t2 select generate_series(1,1000);
INSERT 0 1000

Join between different datatypes

I join them and Batched Nested Loop is used:

yugabyte=# set yb_bnl_batch_size=1024;
SET

yugabyte=# explain (analyze, dist, costs off)
           select * from t2 join t1 using(k);

                                               QUERY PLAN
---------------------------------------------------------------------------------------------------------
 YB Batched Nested Loop Join (actual time=5.848..6.812 rows=1000 loops=1)
   Join Filter: (t2.k = t1.k)
   ->  Seq Scan on t2 (actual time=0.962..1.721 rows=1000 loops=1)
         Storage Table Read Requests: 3
         Storage Table Read Execution Time: 1.472 ms
   ->  Index Scan using t1_pkey on t1 (actual time=3.514..3.767 rows=1000 loops=1)
         Index Cond: (k = ANY (ARRAY[(t2.k)::bigint, ($1)::bigint, ($2)::bigint, ..., ($1023)::bigint]))
         Storage Table Read Requests: 1
         Storage Table Read Execution Time: 2.467 ms
 Planning Time: 0.961 ms
 Execution Time: 7.721 ms
 Storage Read Requests: 4
 Storage Read Execution Time: 3.939 ms
 Storage Write Requests: 0.000
 Catalog Read Requests: 0
 Catalog Write Requests: 0.000
 Storage Flush Requests: 0
 Storage Execution Time: 3.939 ms
 Peak Memory Usage: 4970 kB
(19 rows)

The tables are distributed to 3 servers but the latency doesn't matter because all the 1000 rows have been fetched with 4 Read Requests, in 7 milliseconds.

However you see that the outer values t2.k that are integer have been converted to the bigger type bigint to push down the join condition. This is safe because casting to a bigger type is lossless.

The other way would take a risk and to avoid it, batching is disabled:

yugabyte=# explain (analyze, dist, costs off)
           /*+ leading((t1 t2))*/
           select * from t2 join t1 using(k)
;
                                    QUERY PLAN
-----------------------------------------------------------------------------------
 Nested Loop (actual time=1.441..359.137 rows=1000 loops=1)
   ->  Seq Scan on t1 (actual time=0.812..1.574 rows=1000 loops=1)
         Storage Table Read Requests: 3
         Storage Table Read Execution Time: 0.707 ms
   ->  Index Scan using t2_pkey on t2 (actual time=0.347..0.347 rows=1 loops=1000)
         Index Cond: (k = t1.k)
         Storage Table Read Requests: 1
         Storage Table Read Execution Time: 0.316 ms
 Planning Time: 0.168 ms
 Execution Time: 359.681 ms
 Storage Read Requests: 1003
 Storage Read Execution Time: 316.600 ms
 Storage Write Requests: 0.000
 Catalog Read Requests: 0
 Catalog Write Requests: 0.000
 Storage Flush Requests: 0
 Storage Execution Time: 316.600 ms
 Peak Memory Usage: 67 kB
(18 rows)

Without Bathing, there are 1000 loops which add 1000 Read Requests, and takes in total 316 milliseconds.

It is possible to apply Batched Nested Loop but with additional code that checks the range of the outer bigint and compare it only if it matches the range of integer. This has been added in YugabyteDB 2.21 with #20715 YSQL: Allow BNL on joins over different integer types to help migrations from PostgreSQL with such datatype inconsistencies.

I create another table with the same datatype:

yugabyte=# create table t3 (k bigint primary key);
CREATE TABLE
yugabyte=# insert into t3 select generate_series(1,1000);
INSERT 

yugabyte=# explain (analyze, dist, costs off)
           /*+ leading((t1 t3))*/
           select * from t3 join t1 using(k)
;
                                    QUERY PLAN
-----------------------------------------------------------------------------------
 YB Batched Nested Loop Join (actual time=5.179..6.083 rows=1000 loops=1)
   Join Filter: (t3.k = t1.k)
   ->  Seq Scan on t1 (actual time=0.658..1.496 rows=1000 loops=1)
         Storage Table Read Requests: 3
         Storage Table Read Execution Time: 1.237 ms
   ->  Index Scan using t3_pkey on t3 (actual time=3.152..3.417 rows=1000 loops=1)
         Index Cond: (k = ANY (ARRAY[t1.k, $1, $2, ..., $1023]))
         Storage Table Read Requests: 1
         Storage Table Read Execution Time: 2.160 ms
 Planning Time: 0.489 ms
 Execution Time: 6.475 ms
 Storage Read Requests: 4
 Storage Read Execution Time: 3.397 ms
 Storage Write Requests: 0.000
 Catalog Read Requests: 0
 Catalog Write Requests: 0.000
 Storage Flush Requests: 0
 Storage Execution Time: 3.397 ms
 Peak Memory Usage: 746 kB
(19 rows)

Batched Nested Loop, and fast response time, is possible with both join directions.

Foreign Key between different datatypes

I create a table that references t2, with the primary key as integer from the foreign key in bigint:

yugabyte=# create table t0
           (a bigint references t1, b bigint references t2 )
;
CREATE TABLE

I insert 1000 rows to it and it takes more than one second:

yugabyte=# explain (analyze, dist, costs off)
            insert into t0 select t1.k, t1.k from t1
;
                            QUERY PLAN
-------------------------------------------------------------------
 Insert on t0 (actual time=28.196..28.196 rows=0 loops=1)
   ->  Seq Scan on t1 (actual time=0.949..7.929 rows=1000 loops=1)
         Storage Table Read Requests: 3
         Storage Table Read Execution Time: 0.384 ms
         Storage Table Write Requests: 1000.000
         Storage Flush Requests: 2
         Storage Flush Execution Time: 11.550 ms
 Planning Time: 1.762 ms
 Trigger for constraint t0_a_fkey: time=23.323 calls=1000
 Trigger for constraint t0_b_fkey: time=1324.301 calls=1000
 Execution Time: 1376.368 ms
 Storage Read Requests: 2004
 Storage Read Execution Time: 1198.803 ms
 Storage Write Requests: 1000.000
 Catalog Read Requests: 22
 Catalog Read Execution Time: 13.464 ms
 Catalog Write Requests: 0.000
 Storage Flush Requests: 2
 Storage Flush Execution Time: 11.550 ms
 Storage Execution Time: 1223.817 ms
 Peak Memory Usage: 2460 kB
(21 rows)

The verification of the referential integrity t0_b_fkey that references t2 takes 1.3 seconds for 1000 rows because of 2000 read requests (per row, one to read the value, and one to lock it). They were not batched because of the implicit casting between two datatypes.

I change the datatype of t2 primary key from integer to bigint:

yugabyte=# alter table t2 rename to t2_old;
ALTER TABLE
yugabyte=# create table t2 (k bigint primary key);
CREATE TABLE
yugabyte=# insert into t2 select * from t2_old;
INSERT 0 1000
yugabyte=# drop table t2_old cascade;
NOTICE:  00000: drop cascades to constraint t0_b_fkey on table t0
LOCATION:  reportDependentObjects, dependency.c:1016
DROP TABLE
yugabyte=# alter table t0 add foreign key (b) references t2;
ALTER TABLE

Now, the insert takes 51 milliseconds:

yugabyte=# explain (analyze, dist, costs off)
insert into t0 select t1.k, t1.k from t1;
                            QUERY PLAN
-------------------------------------------------------------------
 Insert on t0 (actual time=28.479..28.479 rows=0 loops=1)
   ->  Seq Scan on t1 (actual time=0.932..8.794 rows=1000 loops=1)
         Storage Table Read Requests: 3
         Storage Table Read Execution Time: 1.477 ms
         Storage Table Write Requests: 1000.000
         Storage Flush Requests: 2
         Storage Flush Execution Time: 12.181 ms
 Planning Time: 0.045 ms
 Trigger for constraint t0_a_fkey: time=21.523 calls=1000
 Trigger for constraint t0_b_fkey: time=0.850 calls=1000
 Execution Time: 51.012 ms
 Storage Read Requests: 5
 Storage Read Execution Time: 1.478 ms
 Storage Write Requests: 1000.000
 Catalog Read Requests: 7
 Catalog Read Execution Time: 5.507 ms
 Catalog Write Requests: 0.000
 Storage Flush Requests: 2
 Storage Flush Execution Time: 12.181 ms
 Storage Execution Time: 19.165 ms
 Peak Memory Usage: 2505 kB
(21 rows)

With the same datatypes, the verification can be batched in 2 Read Requests.

Best practice: compare with the same datatype

Even if YugabyteDB can add some optimizations, the best practice is to makes those comparisons safer and simple by using the same datatype. Don't use integer as one day you will reach the limit and change to bigint, with this risk of inconsistencies. PostgreSQL provides many datatypes, but that's not a reason to use all of them. Put text in text, integers in bigint, datetime in timestamptz, decimal (human) numbers in numeric, floating point (machine) numbers in double precision and your life will be easier.

Here is a quick query to check data type mismatch in referential integrity constraints:

/*+ set ( random_page_cost 1e42 ) */ -- https://github.com/yugabyte/yugabyte-db
with con as (
 -- get all foreign key constraints columns
 select * from
 (select oid conoid,contype,conkey,confkey,connamespace,conname con_name,conrelid,confrelid  from pg_constraint) as con
 cross join unnest(conkey ) with ordinality as conkey_unnest (unnestn, conkeyattnum )
 cross join unnest(confkey) with ordinality as confkey_unnest(unnestn, confkeyattnum)
 natural join (select oid connamespace, nspname con_schema from pg_namespace) con_namespace
 natural join (select oid conrelid, relname con_relname from pg_class) con_class
 natural join (select oid confrelid, relname con_frelname from pg_class) con_fclass
 natural join (select attrelid conrelid,  attnum conkeyattnum,  attname con_attname , atttypid con_atttypid from pg_attribute) con_attribute
 where contype='f' and conkeyattnum=confkeyattnum
)
-- display those with different data types
select format ('Table %I.%I, constraint %I, column (%I %I) references %I(%I %I)'
 , con_schema, con_relname, con_name, con_attname, con_typname, con_frelname, con_fattname, con_ftypname
) as "data type mismatch in referential integrity constraints" from con
 natural join (select attrelid confrelid, attnum confkeyattnum, attname con_fattname, atttypid con_fatttypid from pg_attribute) con_fattribute
 natural join (select oid con_atttypid, typname con_typname from pg_type) as con_type
 natural join (select oid con_fatttypid, typname con_ftypname from pg_type) as con_ftype
 where con_atttypid!=con_fatttypid
;

With my example, it returned:

           data type mismatch in referential integrity constraints
------------------------------------------------------------------------------
 Table public.t0, constraint t0_b_fkey, column (a int8) references t2(k int4)

(1 row)

When a query in YugabyteDB is slower than in PostgreSQL, look at the execution plan with the (analyze, dist) option and look at the number of remote calls for reads (Read Requests) and writes (Flush Requests). Like in all databases, row-by-row is slow, SQL operates on row sets and this must be send by batches to scale,

DEV Community

Best Practice: use the same datatypes for comparisons, like joins and foreign keys

Join between different datatypes

Foreign Key between different datatypes

Best practice: compare with the same datatype

Top comments (0)

Read next

Building a Full-Stack CRUD App with Next.js 14, Prisma, and PostgreSQL

How to Set Up Prometheus Exporters for PostgreSQL and MongoDB in Kubernetes

AI Research Assistant with Semantic Document Search System

Finding Tables Without Primary Keys in PostgreSQL Databases