Ok, you've done your job, really properly:
- Normalization/denormalization has created a perfectly balanced schema
- constraints are applied carefully
- you thought of indexes, chose the right columns based on your filter and ordering criteria
- your PL/pgSQL stored procedures and trigger code follows best-practices
- you avoided obvious traps like sharing sequences over several tables
- you defined partitions which give almost equally sized buckets
- the server is properly sized, IO is potent
- PostgreSQL configuration is thoughtfully done
And yet, some queries are slow.
Now you throw prints and logs into your application code and gather runtime information to see where performance is not optimal and you sooner or later identify the database as a possible source. But which query is problematic and how do you identify?
First, you need to ask your self: what means slow in your context? What is the maximum response time that you can tolerate in your application or service? In a web app, a response time of 100ms may be too long, whereas you might be very happy when the large financial quarterly report runs under 8 hours.
During development and testing, the problem might not have shown. This is very typical because too often the testing and production environment are not comparable (which is a big, big mistake and should be addressed properly in a sane DevOps setting, btw.).
When you know which maximum query runtime is acceptable, then you can instruct PostgreSQL to log statements which take longer by adding this to postgresql.conf:
This would log all statements which take longer than 5 seconds.
To activate tell PostgreSQL to re-load the config file in a SQL session:
You can test this when you issue
and then watch the logfile:
2019-03-07 18:44:12.727 CET  postgres@postgres LOG: duration: 7515.906 ms statement: SELECT pg_sleep(7.5);
This is a PostgreSQL extension which a) needs to be loaded during the server process starts and b) activated as an extension.
To do so, add this line to postgresql.conf:
shared_preload_libraries = 'pg_stat_statements'
The extension then needs to be bound to the database where you want it to be active, connect to the database and issue:
CREATE EXTENSION pg_stat_statements;
This adds a very comprehensive set of information for every query that is executed. Below is the schema definition for the view to use.
View "public.pg_stat_statements" Column | Type | Collation | Nullable | Default ---------------------+------------------+-----------+----------+--------- userid | oid | | | dbid | oid | | | queryid | bigint | | | query | text | | | calls | bigint | | | total_time | double precision | | | min_time | double precision | | | max_time | double precision | | | mean_time | double precision | | | stddev_time | double precision | | | rows | bigint | | | shared_blks_hit | bigint | | | shared_blks_read | bigint | | | shared_blks_dirtied | bigint | | | shared_blks_written | bigint | | | local_blks_hit | bigint | | | local_blks_read | bigint | | | local_blks_dirtied | bigint | | | local_blks_written | bigint | | | temp_blks_read | bigint | | | temp_blks_written | bigint | | | blk_read_time | double precision | | | blk_write_time | double precision | | |
select query,calls,total_time,min_time,max_time,mean_time,stddev_time,rows from pg_stat_statements order by mean_time desc;
which gives (query string shortened):
All this information should help you to limit the search to most time-consuming queries and then identify the root cause.
You can also use this to get insight into a specific workload. For example, if you know that the problem shows up shortly after a certain event or time you can reset the statistics just before and query the view when the problematic parts happened. For this just call pg_stat_statements_reset() before the desired time or event:
Happy querying ;-)
Cover image by Chris Dodds, licensed under Creative Commons Attribution-Share Alike 2.0 Generic