DEV Community

Frits Hoogland for YugabyteDB

Posted on

Postgres and the curious case of the cursory cursors

Recently I was asked to look into a case where an identical SQL executed on the same database in two different ways resulted in two quite different latencies: 133ms versus 750ms.

One was acceptable for the client, and the other one wasn't.

How can the same SQL yield two such different latencies? First let me explain the setting: the first latency figure, the fast one, was a plain SQL. The second one was the exact same SQL, however it was inside a PLpgSQL procedure, and using a refcursor. Both returned around 60.000 rows. The SQL statement in both cases did not have any variables, they were completely identical.

For the plain SQL, it's easy to diagnose the execution plan and the time it spent: just put explain analyze in front of the SQL, execute it, and look at the statistics. However, for the SQL executed in PLpgSQL procedure, it is not possible to get the execution plan from inside the PLpgSQL procedure. The general way to deal with this is to use the plain SQL approach, by copying the SQL from the procedure and explain it. And that was the problem: once executed as plain SQL it exposed an acceptable latency, only when executed as PLpgSQL it exposed a different, not acceptable latency.

So, what can you do about it? It took me a few stabs at it before I found the issue.

We had an issue created for this, and a colleague found that if the order by from the statement was removed, it would function correct again.

I took an approach quite native to me, and ran linux perf to record execution stacks with call-graphs (perf record -g -p <PID>) to see what it was doing.

Many of the principal parts of an execution plan can be seen back in the C functions. One way of doing that is to look in the perf output (using perf report) for the function ExecScan, and look at what that function calls. This is the execution plan:

Sort  (cost=14603.24..14764.65 rows=64562 width=20)
  Sort Key: tile_row, tile_col, block_row, block_col
  ->  Bitmap Heap Scan on block_tbl b  (cost=2139.33..9445.26 rows=64562 width=20)
        Recheck Cond: ((world_key = 19301) AND (perimeter_num = 8007) AND (updated_ts = '1655419238118'::bigint))
        Filter: ((updated_user_key = 0) AND (frozen_flag = 0))
        ->  Bitmap Index Scan on block_perimeter  (cost=0.00..2123.18 rows=65108 width=0)
              Index Cond: ((world_key = 19301) AND (perimeter_num = 8007) AND (updated_ts = '1655419238118'::bigint))
Enter fullscreen mode Exit fullscreen mode

And if I allocate ExecScan, and look at it's surrounding calls, it looks like this:

- 27.44% ExecSort
   - 12.93% tuplesort_gettupleslot
      - 9.15% tuplesort_gettuple_common
         - 4.42% mergereadnext
            - 2.84% readtup_heap
                 1.58% __mcount_internal
            - 0.95% getlen
                 0.63% __mcount_internal
              0.63% __mcount_internal
         - 3.47% tuplesort_heap_replace_top
            - 2.84% comparetup_heap
                 1.58% __mcount_internal
           1.26% __mcount_internal
        2.52% __mcount_internal
        0.95% _mcount
   - 6.31% ExecScan
      - 5.36% BitmapHeapNext
         - 3.15% MultiExecBitmapIndexScan
            - btgetbitmap
               - 2.21% _bt_next
                  - _bt_readpage
                     - _bt_checkkeys
                          0.63% FunctionCall2Coll
               - 0.95% tbm_add_tuples
                    0.63% tbm_get_pageentry
           1.26% tbm_iterate
           0.63% heap_hot_search_buffer
      - 0.63% ExecInterpExpr
Enter fullscreen mode Exit fullscreen mode

There we got the sort first (ExecSort) and the scan second (ExecScan), which performs a function called MultiExecBitmapIndexScan. I can see the similarity of the functions and the executionplan.

Okay, now how about the perf report from the procedure? That looks like this:

- 13.55% ExecScan
   - 10.57% IndexNext
      - 9.76% index_getnext
         - 9.49% index_fetch_heap
            - 3.25% ReadBufferExtended
               - ReadBuffer_common
                  - 1.08% BufTableLookup
                       0.54% __mcount_internal
                       0.54% hash_search_with_hash_value
                    0.81% PinBuffer
                    0.54% LWLockAcquire
              1.90% heap_hot_search_buffer
              1.63% heap_page_prune_opt
              0.81% __mcount_internal
            - 0.81% ReleaseAndReadBuffer
              0.54% _mcount
              0.54% LWLockAcquire
      - 0.81% ExecStoreTuple
           0.54% __mcount_internal
   - 1.63% ExecInterpExpr
        0.54% slot_getsomeattrs
Enter fullscreen mode Exit fullscreen mode

This looks different. It's performing IndexNext instead of BitmapHeapNext, so it's definitely using a different execution plan. Also, I was not able to find the ExecSort function. So the SQL execution is not performing sorting.

At this point I could prove there was another execution plan by looking at the low level execution. My suspicion at this point was that the planner might optimise for PLpgSQL, and leave some work (such as maybe the sorting) to PLpgSQL, which might be why it's gone, and another plan was chosen. But PLpgSQL is not a core component of the postgres code, it's a loadable procedural language, which you can validate by looking at the extensions.

What I tried next was creating the SQL as a function, and then use the function as a table inside the procedure. Functions are known for being able to be a 'barrier' for the planner to optimise SQL, and are treated as a 'black box', meaning it's left untouched. And that actually worked! By creating a function that returns a table, I get near the SQL performance.

However: that would mean a lot of work to revert all the SQLs in the procedures to individual functions, and swap the SQL with that function. And of course it doesn't explain why the procedure did influence the SQL optimization.

Then the client came back to me and said that if he used the refcursor for obtaining one or a small number of rows, it was actually pretty quick...but that that wasn't useful to him, because in the way it's used, it needs all the (60.000) rows.

That got me thinking: wait a minute...what if it isn't the procedure that is influencing this, but the cursor that is created? And a cursor can be explained! Explaining a cursor works in the following way:

postgres=# begin;
postgres=# explain declare tt cursor for select b.tile_row, b.tile_col, b.block_row, b.block_col, b.block_value
postgres-#     from block_tbl b
postgres-#    where b.world_key        = 19301
postgres-#     and b.updated_ts       = 1655419238118
postgres-#     and b.updated_user_key = 0
postgres-#     and b.frozen_flag      = 0
postgres-#     and b.perimeter_num    = 8007
postgres-# order by b.tile_row, b.tile_col, b.block_row, b.block_col;
                                                QUERY PLAN
 Index Scan using block_tile on block_tbl b  (cost=0.42..40080.63 rows=64562 width=20)
   Index Cond: ((world_key = 19301) AND (frozen_flag = 0))
   Filter: ((updated_ts = '1655419238118'::bigint) AND (updated_user_key = 0) AND (perimeter_num = 8007))
Enter fullscreen mode Exit fullscreen mode

Bingo! This shows a different execution plan than earlier, which used a sort and bitmap scans.

Okay, this proves another plan is captured based on the SQL used for a cursor. But why is this happening?

It turns out SQL based on cursors is automagically optimised for a percentage of first rows based on the parameter cursor_tuple_fraction, which defaults to 0.1, alias 10%, whilst regular SQL execution is optimised for fetching 100%.

This is visible in the source code in parsenodes.h and planner.c. With cursors, or with the SPI_prepare_cursor function. The CURSOR_OPT_FAST_PLAN flag can be set, which then performs the first row optimization. Sadly, the different optimization goal is not shown, so it can't be seen from the execution plan.


The postgres planner optimises for getting 100% of the rows for execution. This is different for cursors, which are optimised for getting the first 10% of rows, which can lead to different execution plans and different latency if you do not want 10% of the rows as fast as possible, but 100% of the rows as fast as possible.

Top comments (2)

silvadimart8 profile image
Silva Escritor

For those who are a little lost on this subject, a cursor is a database object that allows data to be accessed one row at a time, as opposed to all at once. Cursors can be used to retrieve data from a database one row at a time, or to perform an operation on a database one row at a time.

Postgres cursors are used to fetch data from a database one row at a time. Cursors are created by issuing the DECLARE CURSOR statement. After a cursor is declared, it can be opened and used to fetch data from the database. The fetch operation retrieves rows from the database one at a time and stores them in the cursor. The data in the cursor can then be accessed using the FETCH statement.

To manipulate PostgreSQL using Python, there is a useful library called Psycopg2.

Postgres cursors can be used to perform operations on a database. To do this, the cursor is opened and the UPDATE statement is used to update the row. The row is then fetched from the database and the changes are made to the row in the cursor. The changes are then saved back to the database using the UPDATE statement.

Cursors can also be used to delete rows from a database. To do this, the cursor is opened and the DELETE statement is used to delete the row. The row is then fetched from the database and the changes are made to the row in the cursor. The changes are then saved back to the database using the DELETE statement.

fritshooglandyugabyte profile image
Frits Hoogland

Thank you for the addition Silva.
I tried to describe the investigation and findings in a logical way to take the reader by the hand for that, but indeed skipped the explanation about the starting point, which you correctly did!