Frits Hoogland for YugabyteDB

Posted on Nov 10, 2021

Postgres pgagroal connectionpool

#linux #postgres #yugabyte #database

This blogpost is about a connectionpool that is lesser known than pgbouncer, which is pgagroal. Both are socalled 'external connectionpools', which mean they can serve application/user connections but are not part of an application.

They also serve the same function, which is to serve as a proxy between clients and applications on one side, and to a postgres instance on the other side. In that position, the first obvious advantage is that it can perform as an edge service, concentrating connections from one network, and proxy the requests to the database in a non-exposed network.

Another advantage is that the client/application side connections are decoupled from the database side connections, and therefore can serve badly behaving applications (which create and destroy connections to a database repeatedly) by linking the database connection request to an already setup database connection, instead of initializing and destroying a connection.

CentOS/RHEL/Alma/Rocky/enz. 8 only

Pgagroal is EL version 8 only, because its build scripts check minimal required versions. When you try to build pgagroal on CentOS 7, it will error with the message:

CMake Error at CMakeLists.txt:1 (cmake_minimum_required):
  CMake 3.14.0 or higher is required.  You are running version 2.8.12.2

'EL' is a general naming for all Linux distributions that take RedHat's Enterprise distribution as a basis.

Installation

However, when you are on EL version 8, you can use the postgres yum repository to install pgagroal in a very simple way. There is no need to download the source and compile it yourself.

Add the EL 8 postgres yum repositories:

sudo dnf install -y https://download.postgresql.org/pub/repos/yum/reporpms/EL-8-x86_64/pgdg-redhat-repo-latest.noarch.rpm

Install pgagroal:
```
sudo dnf -y install pgagroal
```
Add the pgagroal user:
```
sudo useradd pgagroal
```

Now you're set, and can use the pgagroal by starting pgagroal via systemd: sudo systemctl start pgagroal.

Configuration: autostart

By default, pgagroal adds a systemd unit file, but does not enable it (so it gets started during startup), and does not start it. If you want pgagroal to be started by systemd automatically, you must enable the systemctl unit: sudo systemctl enable pgagroal.

Configuration: listen configuration

In order to be defensive and careful, pgagroal by default listens at localhost, at port 2345, which is the reverse of the postgres default port 5432. If you want to use pgagroal as a connection pool in front of an instance of postgres, you probably should change the host and maybe the port settings in the /etc/pgagroal/pgagroal.conf file in the section [pgagroal], so that the clients can reach and communicate with pgagroal at the set host:port combination.

Configuration: server section

To specify where pgagroal needs to connect to, there is a section called [primary] in the /etc/pgagroal/pgagroal.conf file, which allows you to set host and port.

This is where pgagroal is fundamentally different from pgbouncer: pgbouncer allows you to specify multiple databases on multiple machines (see 'section [databases]'), pgagraol allows you to specify a single primary server.

Configuration: pgagroal_hba.conf

Just like postgres, pgagroal can perform host based authentication using its own hba.conf file in /etc/pgagroal, called /etc/pgagroal/pgagraol_hba.conf, which has the same fields as a normal postgres hba.conf file (type, database, user, address, method). By default it performs no authentication.

Configuration: pgagroal_databases.conf

For the configured postgres instance at the host and port number set in the /etc/pgagroal/pgagroal.conf file in the [primary] section, pgagroal can be configured to apply limits for a database, a user or both. The limit is the number of connections for a database, user or database and user combination.

A number of initial connections (so connections created before an application or user request for it) can also be set, but for that a user definition must be created, so pgagroal can use that username and password to authenticate and build a pool of connections.

Configuration: pipeline

Another really important configuration setting in /etc/pgagroal/pgagroal.conf is pipeline. The setting of pipeline defines how a connection is managed by pgagroal. The default value is 'auto', which makes pgagroal choose the pipeline setting based on the configuration.

The most minimal and therefore fastest implementation is 'performance'. This setting does not support transport layer security, and binds a client connection to a database connection for the duration of the session.

The next pipeline configuration option is 'session', which, quite obviously binds a client connection to a database connection for the duration of the session too, but supports all configuration options.

The last pipeline configuration option is 'transaction'. This is a special configuration, because it binds a client to a database connection for the duration of a transaction.

This has the wonderful property that you can have a socalled 'asynchronous connection count', which means that you can have (much) more client connections than having database connections. In other words: this is a potential solution for the often excessively oversized application connection pools.

But there is a huge caveat: because the dynamic transactional binding of clients with database connections, you cannot have any construction used by a client that sets and depends on a server side setting or configuration. This means concretely things like 'SET', 'LISTEN', 'WITH HOLD CURSOR' and 'PREPARE' and 'DEALLOCATE' cannot be used.

Configuration: metrics

A feature that is not present in pgbouncer is the ability to expose runtime statistics in prometheus format, which means statistics can be scraped by a prometheus server.

Conclusion

I cannot come to a definite verdict between pgbouncer and pgagroal. There are reports of issues with pgbouncer in the past, for which I don't know the current state. Pgbouncer configuration feels and seems much less straightforward than pgagroal configuration.

But pgbouncer can serve as a proxy to multiple machines, while pgagraol is limited to one. Pgagroal advertises the explicit design for performance, which I have not tested, while pgbouncer seems to be more generalistic.

DEV Community