loading...
Cover image for How to identify slow queries in PostgreSQL

How to identify slow queries in PostgreSQL

pythonmeister profile image pythonmeister ・3 min read

Ok, you've done your job, really properly:

  • Normalization/denormalization has created a perfectly balanced schema
  • constraints are applied carefully
  • you thought of indexes, chose the right columns based on your filter and ordering criteria
  • your PL/pgSQL stored procedures and trigger code follows best-practices
  • you avoided obvious traps like sharing sequences over several tables
  • you defined partitions which give almost equally sized buckets
  • the server is properly sized, IO is potent
  • PostgreSQL configuration is thoughtfully done

And yet, some queries are slow.

Now you throw prints and logs into your application code and gather runtime information to see where performance is not optimal and you sooner or later identify the database as a possible source. But which query is problematic and how do you identify?

First, you need to ask your self: what means slow in your context? What is the maximum response time that you can tolerate in your application or service? In a web app, a response time of 100ms may be too long, whereas you might be very happy when the large financial quarterly report runs under 8 hours.

During development and testing, the problem might not have shown. This is very typical because too often the testing and production environment are not comparable (which is a big, big mistake and should be addressed properly in a sane DevOps setting, btw.).

log_min_duration_statement
When you know which maximum query runtime is acceptable, then you can instruct PostgreSQL to log statements which take longer by adding this to postgresql.conf:

log_min_duration_statement=5000

This would log all statements which take longer than 5 seconds.
To activate tell PostgreSQL to re-load the config file in a SQL session:

SELECT pg_reload_conf();

You can test this when you issue

SELECT pg_sleep(7.5);

and then watch the logfile:

2019-03-07 18:44:12.727 CET [31308] postgres@postgres LOG:  duration: 7515.906 ms  statement: SELECT pg_sleep(7.5);

pg_stat_statements
This is a PostgreSQL extension which a) needs to be loaded during the server process starts and b) activated as an extension.
To do so, add this line to postgresql.conf:

shared_preload_libraries = 'pg_stat_statements'

The extension then needs to be bound to the database where you want it to be active, connect to the database and issue:

CREATE EXTENSION pg_stat_statements;

This adds a very comprehensive set of information for every query that is executed. Below is the schema definition for the view to use.

                    View "public.pg_stat_statements"
       Column        |       Type       | Collation | Nullable | Default
---------------------+------------------+-----------+----------+---------
 userid              | oid              |           |          |
 dbid                | oid              |           |          |
 queryid             | bigint           |           |          |
 query               | text             |           |          |
 calls               | bigint           |           |          |
 total_time          | double precision |           |          |
 min_time            | double precision |           |          |
 max_time            | double precision |           |          |
 mean_time           | double precision |           |          |
 stddev_time         | double precision |           |          |
 rows                | bigint           |           |          |
 shared_blks_hit     | bigint           |           |          |
 shared_blks_read    | bigint           |           |          |
 shared_blks_dirtied | bigint           |           |          |
 shared_blks_written | bigint           |           |          |
 local_blks_hit      | bigint           |           |          |
 local_blks_read     | bigint           |           |          |
 local_blks_dirtied  | bigint           |           |          |
 local_blks_written  | bigint           |           |          |
 temp_blks_read      | bigint           |           |          |
 temp_blks_written   | bigint           |           |          |
 blk_read_time       | double precision |           |          |
 blk_write_time      | double precision |           |          |

Example:

select query,calls,total_time,min_time,max_time,mean_time,stddev_time,rows from pg_stat_statements order by mean_time desc;

which gives (query string shortened):

pg_stat_statements output

All this information should help you to limit the search to most time-consuming queries and then identify the root cause.

You can also use this to get insight into a specific workload. For example, if you know that the problem shows up shortly after a certain event or time you can reset the statistics just before and query the view when the problematic parts happened. For this just call pg_stat_statements_reset() before the desired time or event:

SELECT pg_stat_statements_reset();

Happy querying ;-)

Cover image by Chris Dodds, licensed under Creative Commons Attribution-Share Alike 2.0 Generic

Posted on by:

pythonmeister profile

pythonmeister

@pythonmeister

Senior DevOps with emphasis on Ops, Python fanatic, PostgreSQL lover.

Discussion

pic
Editor guide
 

Thanks for sharing this information.
While working with PostgreSQL queries I prefer using Query Profiler, it helps to analyze the queries and determine whether they can be optimized.

 

Thanks for the post.
Sorry if it is a stupid question, how do I shorten the query string to have a proper table view in the terminal?

 

is it safe to run this in a heavy loaded / High Traffic Database?

CREATE EXTENSION pg_stat_statements;

 

Yes, the statitic gathering does not create additional load.
Especially in such environments you want to have this in place to monitor bad performing queries.