YugabyteDB bulk inserts with function: faster with SQL compared to PL/pgSQL

#yugabytedb #postgres #distributed #sql

When you need to load a large amount of data, and transform it, it is common to call a function that encapsulates the necessary logic in a separate module. This is a reasonable approach to take.

Let's take an example. If you have a sequence and need to insert thousands of rows into a table, you can fetch from the sequence directly.

yugabyte=# create table demo (a bigint primary key, b bigint);
CREATE TABLE
yugabyte=# create sequence s;
CREATE SEQUENCE
yugabyte=# explain (analyze, dist, costs off)
            insert into demo select nextval('s')   , 0
            from generate_series(1,4242) ;

                                         QUERY PLAN
--------------------------------------------------------------------------------------------
 Insert on demo (actual time=96.141..96.141 rows=0 loops=1)
   ->  Subquery Scan on "*SELECT*" (actual time=4.552..72.723 rows=4242 loops=1)
         ->  Function Scan on generate_series (actual time=4.549..71.282 rows=4242 loops=1)
 Planning Time: 0.045 ms
 Execution Time: 106.457 ms
 Storage Read Requests: 0
 Storage Write Requests: 4242.000
 Catalog Read Requests: 12
 Catalog Read Execution Time: 7.948 ms
 Catalog Write Requests: 0.000
 Storage Flush Requests: 2
 Storage Flush Execution Time: 8.491 ms
 Storage Execution Time: 16.439 ms
 Peak Memory Usage: 321 kB
(14 rows)

Each Write Request is buffered and flushed in batches (Storage Flush Requests). This makes the process faster as the network latency with the Raft leader and a follower occurs only at flush.

PL/pgSQL function called between writes

I want to put the logic in a function, the response time is much slower:

yugabyte=# create function f_pls() returns bigint as $$
 begin return nextval('s'); end ;
 $$ language plpgsql;

CREATE FUNCTION

yugabyte=# explain (analyze, dist, costs off)
            insert into demo select f_pls()        , 0
            from generate_series(1,4242) ;

                                          QUERY PLAN
----------------------------------------------------------------------------------------------
 Insert on demo (actual time=4442.535..4442.536 rows=0 loops=1)
   ->  Subquery Scan on "*SELECT*" (actual time=1.200..4329.721 rows=4242 loops=1)
         ->  Function Scan on generate_series (actual time=1.197..4322.613 rows=4242 loops=1)
 Planning Time: 0.045 ms
 Execution Time: 4443.567 ms
 Storage Read Requests: 0
 Storage Write Requests: 4242.000
 Catalog Read Requests: 4
 Catalog Read Execution Time: 2.762 ms
 Catalog Write Requests: 0.000
 Storage Flush Requests: 4242
 Storage Flush Execution Time: 4069.624 ms
 Storage Execution Time: 4072.386 ms
 Peak Memory Usage: 333 kB
(14 rows)

I see one Storage Flush Requests per row, which increases the response time from 100 milliseconds to 4 seconds.

When using a PL/pgSQL procedure, buffering the writes can potentially break the procedural logic. This is because the function may perform additional tasks such as counting or logging its calls, and if we buffer 100 writes and the 50th write fails, the function would have counted or logged 100 calls, which is not the expected behavior. To avoid this issue, YugabyteDB flushes the buffered writes on each call.

PL/pgSQL function called before any write

When I know the function is called for each row, I can use an ORDER BY in my select so that it is called for all rows before writing:

yugabyte=# explain (analyze, dist, costs off)
 insert into demo select nextval('s')   , 0 from generate_series(1,4242) order by 1;
                                            QUERY PLAN
--------------------------------------------------------------------------------------------------
 Insert on demo (actual time=101.220..101.220 rows=0 loops=1)
   ->  Subquery Scan on "*SELECT*" (actual time=50.340..53.468 rows=4242 loops=1)
         ->  Sort (actual time=50.337..50.929 rows=4242 loops=1)
               Sort Key: (nextval('s'::regclass))
               Sort Method: quicksort  Memory: 391kB
               ->  Function Scan on generate_series (actual time=1.080..47.837 rows=4242 loops=1)
 Planning Time: 0.386 ms
 Execution Time: 123.067 ms
 Storage Read Requests: 0
 Storage Write Requests: 4242.000
 Catalog Read Requests: 0
 Catalog Write Requests: 0.000
 Storage Flush Requests: 2
 Storage Flush Execution Time: 19.257 ms
 Storage Execution Time: 19.257 ms
 Peak Memory Usage: 775 kB
(16 rows)

When the function was called, no write operations had been applied. When writes occurred, the function call no longer happened. Then, the flush behavior is similar to SQL, with batches of writes.

SQL function with write buffering

If possible, it would be better to use an SQL function instead of PL/pgSQL:

yugabyte=# create function f_sql() returns bigint as $$
 select nextval('s');             
$$ language sql;

CREATE FUNCTION

yugabyte=# explain (analyze, dist, costs off)
            insert into demo select f_sql()        , 0 
            from generate_series(1,4242) ;

                                         QUERY PLAN
--------------------------------------------------------------------------------------------
 Insert on demo (actual time=106.005..106.005 rows=0 loops=1)
   ->  Subquery Scan on "*SELECT*" (actual time=0.262..80.522 rows=4242 loops=1)
         ->  Function Scan on generate_series (actual time=0.260..79.220 rows=4242 loops=1)
 Planning Time: 0.951 ms
 Execution Time: 117.021 ms
 Storage Read Requests: 0
 Storage Write Requests: 4242.000
 Catalog Read Requests: 4
 Catalog Read Execution Time: 2.279 ms
 Catalog Write Requests: 0.000
 Storage Flush Requests: 2
 Storage Flush Execution Time: 9.535 ms
 Storage Execution Time: 11.813 ms
 Peak Memory Usage: 320 kB
(14 rows)

Buffering occurred, with batch flushes, because there was no procedural logic involved.

To summarize

YugabyteDB can scale out with high performance by buffering operations that need to wait for a network call. However, this is not possible when writes are mixed with procedural logic, such as in PL/pgSQL. To ensure scalability, it is better to use SQL functions instead of plpgsql ones. Alternatively, you can declare a SQL statement that separates the read and write phases.