DEV Community

Franck Pachot for YugabyteDB Distributed PostgreSQL Database

Posted on • Edited on

pg_stat_statements in YugabyteDB

In YugabyteDB, distributed SQL database using PostgreSQL, the pg_stat_statements extension is installed by default. However, statements are gathered locally in each node, which mean that you need to query pg_stat_statements or call pg_stat_statements_reset() on each node. YugabyteDB is cloud native, you can add and remove nodes. With YugabyteDB Managed or YugabyteDB Anywhere this is done automatically. If you are using YugabyteDB for free, you have to automate it.

I'm showing an easy way to do it here.

First, let's create a table that will hold the statements gathered from all nodes, with the host it comes from and ts the timestamp when it was gathered:

drop table if exists ybwr_statements ;
create table ybwr_statements  as
select now() as ts,'' as host, * 
from pg_stat_statements where null is not null;
Enter fullscreen mode Exit fullscreen mode

Using a Create Table As Select makes it independent on the table structure which may evolve though versions.

Now, the plpgsql code to gather from all nodes, as an example.

  • I query the list of servers from yb_servers()
  • I use COPY TO PROGRAM to execute ysqlsh, the psql equivalent that is present on each node
  • I find it from the postmaster.opts file. You can also find it or get it from the yb-tserver working directory
  • I run an insert into ybwr_statements select '%s','%s',* from pg_stat_statements which I pass as STDIN to ysqlsh though the COPY TO as I need an input query input anyway
  • I also reset with pg_stat_statements_reset() is the gathering is successful (calling ysqlshwith -v ON_ERROR_STOP=1)
  • to debug it, I add > /tmp/log.txt 2>&1 at the end of the command line. You may prefer to use COPY FROM PROGRAM, passing the queries with -c, and store the output in a table.

Here it is:

do $DO$
declare i record;
begin
for i in (select now() as ts, host from yb_servers()) loop
 execute format(
  $COPY$
  copy (select $SQL$ insert into ybwr_statements select '%s','%s',* from pg_stat_statements; select pg_stat_statements_reset(); $SQL$ ) to program
   $BASH$ $(awk '{sub("/postgres/bin/postgres .*","");print}' ./postmaster.opts)/bin/ysqlsh -h $(hostname) -v ON_ERROR_STOP=1 $BASH$
  $COPY$
 ,i.ts,i.host);
end loop;
end;
$DO$
;
Enter fullscreen mode Exit fullscreen mode

Then, you can query ybwr_statements:

select ts, host, calls, total_time, query 
from ybwr_statements 
order by total_time;
Enter fullscreen mode Exit fullscreen mode

Here is an example on an idle database where I captured only my capturing queries:
screenshot

This is an example to show the technique. You can schedule it, like every hour, and you will have hourly snapshots of queries in the ybwr_statements table. You may choose not to call pg_stat_statements_reset() and will diff the snapshots on querying.

Top comments (0)