Arctype Team for Arctype

Posted on Jun 3, 2021 • Edited on Jun 28, 2021 • Originally published at arctype.com

Monitor PostgreSQL Performance Using a Database Health Dashboard

#schema #postgres #data #guide

Why Monitor PostgreSQL Database Health?

Just like monitoring our applications, it is very much necessary to monitor our database system. We want to monitor things substantially more granular than system-level processes—things such as:

How many queries actually make use of the index?
How effective is the database cache?
The number of open connections.

And so on.

Hardware metrics are essential, but can only get us so far, and will not help us make concrete decisions—like which queries to optimize and how we are using our database in general. Since PostgreSQL is an open-source database, it offers a lot of insights for free without having to rely on costly tools. Most of the metrics we are going to track are available by default since they are required for the query planner, but there are other critical areas for which we have to enable certain extensions/contrib modules that ship along with PostgreSQL.

Introduction to the Arctype Tool

Arctype is a very neat database tool that can be used not just as an SQL editor but also as a platform upon which complex dashboards can be both custom-built and shared around with developers/users. To understand its true power, we are going to consider a practical use case—monitoring a PostgreSQL database health.

Getting Started with Arctype Dashboards

To get ourselves familiar with the dashboard capabilities we are going to build a very simple dashboard with just three components. The Arctype dashboard is built right into the Arctype tool itself. To access the functionality, just click on Dashboards button as shown below.

Click on the plus icon and create a dashboard named - Postgres Health Monitoring.

Now that the dashboard is created, let's create some components.

Creating a List of PostgreSQL Tables from Information_Schema

Adding a table component/chart component is covered in one of our previous articles. Drag and drop the Table into the empty dashboard pane. Next, we will use a simple query that talks to the underlying database (PostgreSQL) and gets all of the tables with their schemas present excluding the information_schema and the system tables.

SELECT
  table_schema, table_name
FROM
  information_schema.tables
WHERE
  table_schema <> 'information_schema'
  AND table_name NOT LIKE 'pg_%'
ORDER BY
  table_schema,
  table_name;

Visualizing Row Counts of Tables in a Database Schema

Let's add a Chart component (similar to table) next to our table component which uses the below query:

SELECT
  relname as table_name,
  reltuples as rows
FROM
  pg_class C
  LEFT JOIN pg_namespace N ON (N.oid = C .relnamespace)
WHERE
  nspname NOT IN ('pg_catalog', 'information_schema')
  AND relkind = 'r'
ORDER BY
  reltuples DESC;

This query simply counts the records for all the tables in our databases. In the right pane, select table_name for X-Axis, and rows for Y-Axis.

Adding Images to an Arctype Dashboard

After adding the table and chart, let's add an image to our dashboard from the PostgreSQL website along with a text field for explanation. After adding all of the above steps, our dashboard will look like this:

At this point, you should have a decent feel for dashboards, so now, let's build a real health dashboard and see how it can help us visualize our database health.

Identifying Critical Areas in the PostgreSQL Database

PostgreSQL has a lot of areas upon which we can focus in order to understand the database's health. In the below sections, we will see a handful of them and demonstrate how you can use Arctype to visualize them.

Querying to Check PostgreSQL Open Connections

It is always a good idea to monitor how many open connections are currently present in our database cluster. We can get this from the pg_stat_activity table:

SELECT
  COUNT(*) as connections,
  backend_type
FROM
  pg_stat_activity
GROUP BY
  backend_type
ORDER BY
  connections DESC

Determining the Size of a PostgreSQL Database

As our database expands over time, it is important to keep the database size in check. We can get a list of databases and their respective sizes using the below query:

SELECT
  datname as database_name,
  pg_database_size(datname)/1024/1024 as size
FROM
  pg_database
WHERE
  datistemplate = false;

We could use the pg_size_pretty instead of calculating the MB directly, but we want to plot this as a bar chart to see and compare the sizes. Keeping tabs on individual database sizes helps us to get an overall picture.

Note: These sizes can vary from disk size since PostgreSQL does a lot of compression behind the scenes.

Calculating Database Cache-Hit Ratios in PostgreSQ

Any relational database/operational database serves most of its traffic from its cache. If that's not happening, we definitely need to look into it. The PostgreSQL cache area is known as shared_buffers, which acts as the database layer cache on top of the cache provided by the operating system. We can use the below query to understand the cache hit ratio:

SELECT
  sum(heap_blks_read) as reads,
  sum(heap_blks_hit) as hits,
  ROUND(
    sum(heap_blks_hit) / (sum(heap_blks_hit) + sum(heap_blks_read)),
    4
  ) as hit_ratio
FROM
  pg_statio_user_tables;

From the query plan blog, there are a couple of tuples that are coming from heap read and heap hits. The pg_statio_user_tables has the information regarding this. In the above query, we are simply getting the values of hits and reads (coming from the disk) and calculating the ratio rounded off to four digits. It is important to note that in a healthy production database, this ratio should come to around 97+ or close to it—97% of the hits are coming from the cache.

Scanning for Unused Indexes

The statistics collector in PostgreSQL records which indexes were accessed how frequently. We can construct a query from one of the statistics tables called pg_stat_all_indexes.

SELECT
  COUNT(*) as count,
  relname as table_name
FROM
  pg_stat_all_indexes
WHERE
  idx_scan = 0
  and schemaname = 'public'
GROUP BY
  table_name

This kind of gives us an indication of tables that have the greatest number of unused indexes. The idx_scan=0 gives us the indexes that were never used even once. We can customize this value depending on our database traffic pattern.

Inspecting PostgreSQL Database Caches

In order to get more insight out of our cache, we need to enable another contrib module called pg_buffercache. After enabling it and running a few queries we can now inspect the cache:

SELECT
  c .relname AS entity_name,
  count(*) AS buffers
FROM
  pg_buffercache b
  INNER JOIN pg_class c ON b.relfilenode = pg_relation_filenode(c .oid)
  AND b.reldatabase IN (
    0,
    (
      SELECT
        oid
      FROM
        pg_database
      WHERE
        datname = current_database()
    )
  )
WHERE
  c .relname NOT LIKE 'pg_%'
GROUP BY
  c .relname
ORDER BY
  2 DESC;

This query was taken from the official documentation and then modified a little bit. It shows how many pages in shared_buffers are occupied by different tables and indexes. You can assume pages as a metric directly proportional to the number of rows—the larger table being queried, the more pages get moved to the cache. This can also mean that some tables can be moved to an application cache or a fast key-value store, such as Redis, for faster querying—and to free up the database cache for indexes.

Monitoring Average Postgres Query Performance

For this part, we need to enable the pg_stat_statements extension. It is a built-in extension/contrib module, and in some cloud providers, it is enabled by default. If not, it can be easily enabled using the shared_preload_libraries. This extension helps us capture a ton of information regarding query performance. Using this extension, we are going to build a query that outputs the query performance based on the number of rows the query has scanned:

SELECT
  mean_time as "timing in ms",
  rows
FROM
  pg_stat_statements
ORDER BY
  mean_time DESC
LIMIT
  100;

Limiting the rows and doing the order by gives us the top 100 most consumed queries and the rows column indicates the total number of rows retrieved or affected by the statement. We can also get the top 10 worst queries by running time using the above table. This can be used as an alternative for slow log analysis, as this is more real-time.

Building a Dashboard with PostgreSQL Metrics

If we combine all of the queries in the mentioned chart types, it should look like a proper analytics dashboard that can be easily visualized/consumed:

The mapping for each of the components is shown in the table below. Arctype is not limited to the below charts and has an incredibly feature-rich platform upon which we can represent our visualization.

Module	Type
Open Connections	Pie Chart
Cache Hit Ratio	Normal Table
Unused Indexes	Bar Chart
Database Size	Bar Chart
Shared Buffers	Bar Chart
Query Response Times	Scatter Chart

Each widget here has its own description and the graphs have details on what is present in X-Axis and Y-Axis. It is important that the dashboard is self-explanatory, after all, a picture is worth a thousand words. These charts might look different depending upon the underlying data. They can be expanded and reduced by simply dragging their borders while in edit mode, which can be very useful if the dataset is larger—charts can overflow.

The above dashboard was created in an example database that I had used before. Let's create the dashboard over a more realistic database such as the Covid DB one that comes inbuilt with Arctype (available during the Signup flow). Since the dataset is huge, I have created the dashboard in two parts:

As we have more data points, the dashboard looks a lot richer and more meaningful. The goal is not to be a "silver bullet" solution, but rather to provide some perspective as to potential areas of focus for an operational database. There are a lot of tools out there that offer different functionalities, but at the end of the day, I believe database developers should know their databases in more depth. What's better than a tool that easily lets us translate queries into really nice charts and graphs with almost no learning overhead? Answer: probably not much.

Insights and Intelligence from the Health Dashboard

This dashboard allows us to monitor the database in real-time and helps us to more fully understand some features which are unique to PostgreSQL. In doing database optimizations, it is always a rule that "you cannot improve what you cannot measure," and this dashboard solves the first (yet often overlooked) step, which is actually measuring things. The Covid DB demo database has a 99% cache hit rate, which means that almost all of the traffic is served from the cache (shared_buffers) and will respond very quickly. This is one good sign of a well-optimized database.

Monitoring our database during peak traffic times will give us a lot of insight into which areas can be improved and how to measure this improvement. Some of the ways in which we can improve database performance based upon these metrics are:

Using a client-side/server-side connection pool.
Code optimizations to release connections back to the pool/close the connection once the work is done.
Doing proper database modeling and splitting the traffic between different clusters.
Increasing the size of the shared buffers to accommodate more data/indexes in memory.
Optimizing poorly written queries by either adding an index or re-writing them entirely.
Removing unused indexes to speed up writes.

Whatever we have seen so far is only the tip of the iceberg. Database maintenance and improvement is an art of its own, and one which truly takes several years of hands-on experience. But is always important to use a toolchain that is friendly, cost-effective, and can help us achieve our goal easily.

Keeping the Dashboard Up-to-Date

After building it, it is important to keep the dashboard up-to-date. This can be done using various ways depending on our needs.

Manually Refreshing Arctype Dashboards

At the top right beside the Edit Mode button, there is a reload button, which we can click to reload the entire dashboard.

Automatically Refreshing the Entire Dashboard

We can also configure the whole dashboard to be refreshed periodically.

Refreshing Individual Components Automatically

By clicking on an individual component and clicking on Advanced, we can configure the component in question to be refreshed at specified and automated intervals.

We can choose what kind of refresh type we want, but keep in mind that the queries to monitor the database itself may be resource-intensive and can therefore degrade performance.

Sharing the Dashboard (and Closing Thoughts)

Another very important feature that is missing amongst many database tools, but one that is present in Arctype, is the ability to share dashboards.

By clicking the Share button on the right top screen in the dashboard viewer, we get various options to share.

Sharing an Arctype Workspace via Email

This would invite the user to be part of the workspace. For more advanced controls, check out the teams functionality.

Linking to Your Workspace

This is the same as above, except that anyone with the link can join the workspace.

Creating a Public Read-Only Link

This will send a read-only link that can be viewed in the browser as well. This can be very handy for sharing dashboards with different (perhaps non-technical) stakeholders on a team.

Note: You cannot share a locally-hosted database (or any affiliated dashboards)

Conclusion

These dashboards are incredibly useful when:

Running a performance test.
Monitoring a live database for a specified period.
General database monitoring.

Fine-grained control provided by Arctype makes this dashboard easily shareable across colleagues, teams, and even to the general internet. Go ahead and download Arctype if you haven't already, and start creating these dashboards—knowing database health and monitoring them should no longer be limited to certain tools or people!

Retry later

DEV Community