Size matters...for arrays

#yugabytedb #postgres #plpgsql

A database can perform at best if it's processing has the lowest possible latency for execution of statements. The ultimate lowest latency is achieved when such processing is performed inside the database itself. If you wonder if this is possible: Yes; PostgreSQL and therefore YugabyteDB YSQL has PLpgSQL, the procedural extension of the SQL language.

This blogpost is about an Operating System memory allocation anomaly, which at least happens with using arrays in PLpgSQL, but might happen in other places too.

The test in PLpgSQL

PLpgSQL is also very helpful to create small constructions for testing certain aspects of the PostgreSQL database engine. One of the tests that I performed was allocating memory using an array. This is very easy, here is what I did:

set my.count to 409600;
set my.size to 1020;
do 
$$
  declare
    array text[];
    counter int:= current_setting('my.count',true);
    size int:= current_setting('my.size',true);
  begin
    raise info 'Pid: %', pg_backend_pid();
    raise info 'Array element size: %, count: %', size, counter;
    for count in 1..counter loop
      array[count]:=repeat('x',size);
    end loop;
    raise info 'done!';
    perform pg_sleep(60);
  end 
$$;

The set commands allow dynamic changing of some properties of execution of the anonymous PLpgSQL block. This block does nothing more than filling an array named 'array', print 'done!' when it has performed filling the array, and then sleep for 60 seconds. The reason for the sleep is to keep the array allocated to allow investigating the array allocation.

The array element size

The insertion of the each array element is 1020 characters, for which the actual size is:

yugabyte=# select pg_size_pretty(pg_column_size(array[repeat('x',1020)])::numeric);
 pg_size_pretty
----------------
 1048 bytes

This shows there is some overhead in using an array, because without the array, the size is:

yugabyte=# select pg_size_pretty(pg_column_size(repeat('x',1020))::numeric);
 pg_size_pretty
----------------
 1024 bytes

This means that by executing (my.count) 409600 loops, the array should allocate: 409600 * 1048 = 429,260,800 bytes = 410MB.

PostgreSQL malloc

Allocations in PostgreSQL and therefore in YSQL are done in the PostgreSQL code using the palloc() function, which is an abstraction layer in PostreSQL for allocating memory. In this way, memory allocation can reasonably easy be done using different memory allocators. In PostgreSQL, the memory allocator used is called 'ptmalloc', and in YSQL we use a memory allocator called 'tcmalloc'.

Anonymous memory allocation

Any memory allocation that a process does, and in this case PostgreSQL, needs to take operating system memory for storing data. This memory (allocation) has no file attached to it. Such an allocation is called 'Anonymous'.

The result of the allocation can be seen on the operating system level in linux in multiple places. Any allocation changes the total memory overview in /proc/meminfo, and it changes the process specific overviews in /proc/PID/maps, /proc/PID/smaps, plus others. An overview of the memory categories that are seen in smaps can be seen in smaps_rollup, which is available starting from EL8.

Procedure memory allocation

If the anonymous procedure prints 'done!', on the same PostgreSQL/YSQL host in a shell, execute:

sudo grep -E '([RP]ss|Anonymous|Swap)' /proc/7998/smaps_rollup

(replace PID with the pid printed by the anonymous procedure)

This command prints a number of values 'smaps_rollup'.

In the above case this prints the following size allocation categories:

$ sudo grep -E '([RP]ss|Anonymous|Swap)' /proc/5867/smaps_rollup
Rss:              478132 kB
Pss:              453973 kB
Anonymous:        447544 kB
Swap:                  0 kB
SwapPss:               0 kB

The Rss, resident set size, is 478132 kB. This is memory the process has mapped into its address space.
The Pss, proportional set size, is 453973 kB. This is the RSS where for each shared page the page size is divided with the number of processes it is shared with. Any anonymous allocation a backend performs is unique, and therefore will add equally to Rss and Pss.
The Anonymous size is where the array allocation is accounted in. The Anonymous size is larger than the calculated array size.
Swap: I added Swap, which shows the memory that is swapped for this process, just to be able to see if memory was allocated and then paged out into swap. The reason is this decreases the Rss/Pss/Anonymous sizes, and can make investigations very confusing if 'swapping' happens.

To get an idea: for YSQL, the same overview for a backend that has just started looks like this:

$ sudo grep -E '([RP]ss|Anonymous|Swap)' /proc/PID/smaps_rollup
Rss:               38772 kB
Pss:               14627 kB
Anonymous:         11740 kB
Swap:                  0 kB
SwapPss:               0 kB

This gives a nice insight into how the YSQL backend that just was forked by the postmaster shares a lot of its allocations with its parent via the difference in size between Rss and Pss.

For this investigation, it shows that when running the PLpgSQL anonymous procedure, there are more allocations done that are accounted in 'Anonymous':
447,544 (array allocation) - 11,740 (no work) = 435,804 kB.
435,804 - 419,200 (calculated array size) = 16,604 kB.

This is normal, lots of processing needs memory to be able to run the process, and run the application/executable in the process.

So the conclusion so far is that everything looks as suspected.

A slight change: the anomaly

Now perform the following change:

yugabyte=# set my.size to 1024;
SET

And run the anonymous PLpgSQL block again, and once it reports the allocation is done, look at the operating system memory allocation:

$ sudo grep -E '([RP]ss|Anonymous|Swap)' /proc/6007/smaps_rollup
Rss:              887272 kB
Pss:              865977 kB
Anonymous:        856492 kB
Swap:                  0 kB
SwapPss:               0 kB

Wow!
By changing the allocation size from 1020 to 1024, the operating system level allocation of memory nearly doubled!

Is the PostgreSQL level allocation changed that much?

yugabyte=# select pg_size_pretty(pg_column_size(array[repeat('x',1024)])::numeric);
 pg_size_pretty
----------------
 1052 bytes

Nope: the size of an array element gone from 1048 to 1052 bytes.

Versions

This exercise is done on Yugabyte version 2.17.0.0, which uses the PostgreSQL 11 codebase. When I test this on native PostgreSQL 11, I get the same result.

I should mention that if I repeat the case, the outcome is not always consistent, but if you perform this in a new session, it is as far as I can see.

PostgreSQL 13

PostgreSQL 13: if I repeat this case on PostgreSQL version 13, a size of 1024 does give me an Anonymous allocation of 454320 kB, which is an appropriate size. HOWEVER, if I increase the size of the array element to 1032, the Anonymous allocation jumps up to 863988 kB again.