DEV Community

Franck Pachot for AWS Heroes

Posted on • Edited on

When are PostgreSQL prepared statements fully prepared for execution?

The PostgreSQL documentation says:

  • A prepared statement is a server-side object that can be used to optimize performance.
  • When the PREPARE statement is executed, the specified statement is parsed, analyzed, and rewritten.
  • When an EXECUTE command is subsequently issued, the prepared statement is planned and executed.

The basic idea is that, within a session, you have the statement parsed, compiled, and optimized only once, and then you execute directly the execution plan, multiple times. This is a simple idea that all developers know with compiled languages: build the procedural code with parameters, and execute it directly with different values

"When the PREPARE statement is executed, the specified statement is parsed, analyzed, and rewritten. _" is correct, but needs to be explained. Note that this is about the execution of PREPARE, not the execution of the statement (which is EXECUTE). The terms "_parsed, analyzed, and rewritten" and "planned" are carefully chosen.

The optimization part, which builds the execution plan to execute, is the planning phase. This one may be done for multiple executions.

If you want that EXECUTE does only the execution you need to have already a generic execution plan.

Example

I create a simple table:

postgres=# create table demo(id int primary key);
CREATE TABLE
Enter fullscreen mode Exit fullscreen mode

With a simple query the planning time takes longer than the execution time:

postgres=# explain(analyze, buffers) select * from demo where id=42;
                                                     QUERY PLAN
---------------------------------------------------------------------------------------------------------------------
 Index Only Scan using demo_pkey on demo  (cost=0.15..8.17 rows=1 width=4) (actual time=0.011..0.011 rows=0 loops=1)
   Index Cond: (id = 42)
   Heap Fetches: 0
   Buffers: shared hit=2
 Planning:
   Buffers: shared hit=24 read=1
 Planning Time: 0.195 ms
 Execution Time: 0.177 ms
(8 rows)
Enter fullscreen mode Exit fullscreen mode

You see it in the timing, but the number of buffer hits and reads is what counts to compare the performance. 24 buffers read for finding the best execution plan for an execution that reads only two buffers. It can be worse with complex queries involving many tables and indexes.

If I execute it again, many things are cached, but the planning phase still reads one buffer:

postgres=# explain(analyze, buffers) select * from demo where id=42;
                                                     QUERY PLAN
---------------------------------------------------------------------------------------------------------------------
 Index Only Scan using demo_pkey on demo  (cost=0.15..8.17 rows=1 width=4) (actual time=0.005..0.005 rows=0 loops=1)
   Index Cond: (id = 42)
   Heap Fetches: 0
   Buffers: shared hit=2
 Planning:
   Buffers: shared hit=1
 Planning Time: 0.056 ms
 Execution Time: 0.019 ms
(8 rows)
Enter fullscreen mode Exit fullscreen mode

Prepare

Now as I'll use prepared statements, I'll reconnect each time to compare. Using a prepared statement does the parse, analyze and rewrite in advance, but doesn't improve anything for the planning phase:

postgres=# \c
You are now connected to database "postgres" as user "postgres".

postgres=# prepare c as select * from demo where id=$1;
PREPARE
postgres=# explain (analyze, buffers) execute c(42);
                                                     QUERY PLAN
---------------------------------------------------------------------------------------------------------------------
 Index Only Scan using demo_pkey on demo  (cost=0.15..8.17 rows=1 width=4) (actual time=0.005..0.005 rows=0 loops=1)
   Index Cond: (id = 42)
   Heap Fetches: 0
   Buffers: shared hit=2
 Planning:
   Buffers: shared hit=1
 Planning Time: 0.077 ms
 Execution Time: 0.017 ms
(8 rows)
Enter fullscreen mode Exit fullscreen mode

It is not only for the first EXECUTE after PREPARE, the next is the same:

postgres=# explain (analyze, buffers) execute c(42);
                                                     QUERY PLAN
---------------------------------------------------------------------------------------------------------------------
 Index Only Scan using demo_pkey on demo  (cost=0.15..8.17 rows=1 width=4) (actual time=0.005..0.005 rows=0 loops=1)
   Index Cond: (id = 42)
   Heap Fetches: 0
   Buffers: shared hit=2
 Planning:
   Buffers: shared hit=1
 Planning Time: 0.078 ms
 Execution Time: 0.017 ms
(8 rows)
Enter fullscreen mode Exit fullscreen mode

Only after many executions (twelve in my case) I finally get an EXECUTE that doesn't read any buffer and is faster than the execution:

...

postgres=# explain (analyze, buffers) execute c(42);
                                                     QUERY PLAN
---------------------------------------------------------------------------------------------------------------------
 Index Only Scan using demo_pkey on demo  (cost=0.15..8.17 rows=1 width=4) (actual time=0.006..0.007 rows=0 loops=1)
   Index Cond: (id = $1)
   Heap Fetches: 0
   Buffers: shared hit=2
 Planning:
   Buffers: shared hit=1
 Planning Time: 0.078 ms
 Execution Time: 0.020 ms
(8 rows)

postgres=# explain (analyze, buffers) execute c(42);
                                                     QUERY PLAN
---------------------------------------------------------------------------------------------------------------------
 Index Only Scan using demo_pkey on demo  (cost=0.15..8.17 rows=1 width=4) (actual time=0.011..0.011 rows=0 loops=1)
   Index Cond: (id = $1)
   Heap Fetches: 0
   Buffers: shared hit=2
 Planning Time: 0.013 ms
 Execution Time: 0.030 ms
(6 rows)
Enter fullscreen mode Exit fullscreen mode

The reason is that it took a while to realize that a generic plan, which can be reused for different values, is still good. You see the parameter ($1) in the plan. For complex queries, it may be better to compile a custom plan optimized for the values but in OLTP you probably prefer the predictability of generic plans.

Generic plan

I had to wait twelve executions because, by default, the first five are custom plan and then the average execution time is compared to the planning time to decide for a generic plan. Since PostgreSQL 12 you can opt immediately for a generic plan so that there's no plannning effort as soon as the second execution:

postgres=# \c
You are now connected to database "postgres" as user "postgres".

postgres=# prepare c as select * from demo where id=$1;
PREPARE
postgres=# set plan_cache_mode = force_generic_plan;
SET
postgres=# explain (analyze, buffers) execute c(42);
                                                     QUERY PLAN
---------------------------------------------------------------------------------------------------------------------
 Index Only Scan using demo_pkey on demo  (cost=0.15..8.17 rows=1 width=4) (actual time=0.004..0.004 rows=0 loops=1)
   Index Cond: (id = $1)
   Heap Fetches: 0
   Buffers: shared hit=2
 Planning:
   Buffers: shared hit=56
 Planning Time: 0.269 ms
 Execution Time: 0.017 ms
(8 rows)

postgres=# explain (analyze, buffers) execute c(42);
                                                     QUERY PLAN
---------------------------------------------------------------------------------------------------------------------
 Index Only Scan using demo_pkey on demo  (cost=0.15..8.17 rows=1 width=4) (actual time=0.011..0.011 rows=0 loops=1)
   Index Cond: (id = $1)
   Heap Fetches: 0
   Buffers: shared hit=2
 Planning Time: 0.013 ms
 Execution Time: 0.029 ms
(6 rows)

Enter fullscreen mode Exit fullscreen mode

If you EXPLAIN without ANALYZE it does the planning and then, if it is a generic plan, the first EXECUTE doesn't have to do the planning again:

postgres=# \c
You are now connected to database "postgres" as user "postgres".

postgres=# prepare c as select * from demo where id=$1;
PREPARE
postgres=# set plan_cache_mode = force_generic_plan;
SET
postgres=# explain execute c(null);
                                QUERY PLAN
---------------------------------------------------------------------------
 Index Only Scan using demo_pkey on demo  (cost=0.15..8.17 rows=1 width=4)
   Index Cond: (id = $1)
(2 rows)

postgres=# explain (analyze, buffers) execute c(42);
                                                     QUERY PLAN
---------------------------------------------------------------------------------------------------------------------
 Index Only Scan using demo_pkey on demo  (cost=0.15..8.17 rows=1 width=4) (actual time=0.013..0.013 rows=0 loops=1)
   Index Cond: (id = $1)
   Heap Fetches: 0
   Buffers: shared hit=2
 Planning Time: 0.013 ms
 Execution Time: 0.030 ms
(6 rows)
Enter fullscreen mode Exit fullscreen mode

In summary if you want that the EXECUTE does only the execution, you can do the following to get the execution plan for your prepared statement:

  • set plan_cache_mode to force_generic_plan
  • PREPARE
  • EXPLAIN

With an application that has a small set of statements, this can be done during the initialization phase of the connection pool so that when a connection is grabbed, all plans are there. Of course, there are memory considerations with this.

Top comments (0)