DEV Community

Cover image for YugabyteDB Connection Manager: a Database Resident Connection Pool with Shared Processes

YugabyteDB Connection Manager: a Database Resident Connection Pool with Shared Processes

One fundamental issue in PostgreSQL's architecture is its process-per-connection design. In this design, each new connection to the database creates a dedicated process that consumes resources until the connection is terminated, even when it's idle. This model presents two significant challenges for scalability:

  • New connections incur high overhead as the operating system needs to initiate a new process for each connection. This is particularly inefficient in environments where connections are frequent but short-lived.

  • Idle connections can waste resources by maintaining dedicated processes, leading to inefficient resource utilization.

Dedicated backend processes in PostgreSQL can affect the scalability of applications, especially those with high concurrency or microservices architecture, with numerous idle connections consuming resources. While this is not a scalability issue in YugabyteDB, as all nodes are active and distribute connections to an elastic cluster of nodes, minimizing connection latency and reducing the connection footprint to lower costs is still essential.

Two common strategies—application connection pooling and external connection pooling—are typically used to address the process-per-connection issue in SQL databases.

Application Connection Pooling

With application connection pooling, the application manages a pool of database connections. This allows it to reuse existing connections to the database instead of creating new ones each time, reducing the overhead of spawning new processes. A static pool maintains a fixed number of connections by configuring the same value for the maximum and minimum connections. However, if most connections remain idle, it consumes too many resources. A dynamic pool disconnects idle connections after a while, saving on database resources and adjusting the number of active connections based on demand. However, it may experience connection storms when the workload increases or the database is slow.

External Connection Pooling

Additional components such as PgBouncer, Pgpool-II, or Odyssey can be used as proxies between the application and the database. These tools maintain a pool of connections shared across multiple application servers, providing more efficient connection management for microservices. However, since they are external, they do not communicate with the application or the database to determine when a connection can be reused for another without risking losing some state information or leaking data. They can be configured in two modes:

  • Session Mode: In this mode, each logical session maintains a physical connection to the database from when it connects to when it disconnects. This approach is safe regarding the session state but does not effectively address the problem of idle resources. This configuration only deals with frequent connection issues.

  • Transaction Mode: In this mode, logical connections are allocated a physical connection only for a transaction, freeing up the resources for others when it's not in use. However, some PostgreSQL features, such as session parameters, temporary tables, or prepared statements, may not be supported in this mode because their scope goes beyond the transaction boundaries.

While these approaches help alleviate PostgreSQL's connection scalability limitations, they limit the SQL features, have a performance overhead, and require complex configuration and tuning for deployment.


YugabyteDB Integrated Connection Manager

YugabyteDB integrates a connection pool in its connection manager, offering several advantages over PostgreSQL’s traditional model and external connection pools. It provides efficient resource usage, similar to external connection pools, but with the added benefit of being integrated into the query layer. This integration lets it be aware of the session state and provides all features transparently.

A simple way to understand this is to simulate two logical connections from psql and then check the Linux process ID (PID) to see if they use the same physical connection.

PostgreSQL Dedicated Backends

In PostgreSQL, I can run two concurrent sessions with psql by nesting the second one with \!. This allows me to run a host command from the psql prompt while the first session is connected. I can check the physical connection using pg_backend_pid() and gather additional information from pg_stat_activity() about the logical connection, such as the application_name I've set.


psql (16.2, server 14.13)
SSL connection (protocol: TLSv1.3, cipher: TLS_AES_256_GCM_SHA384, compression: off)

postgres=> -- this is session 1

postgres=> set application_name to session1;
SET

postgres=> select datname, pid, usename, application_name, backend_start 
           from pg_stat_activity where pid=pg_backend_pid();

 datname | pid  | usename | application_name |         backend_start
---------+------+---------+------------------+-------------------------------
 main    | 2732 | franck  | session1         | 2024-09-30 00:24:36.867228+00

postgres=> -- run another psql for session 2

postgres=> \! psql -c "set application_name to session2" -c "select datname, pid, usename, application_name, backend_start from pg_stat_activity where pid=pg_backend_pid();"

SET

 datname | pid  | usename | application_name |         backend_start
---------+------+---------+------------------+-------------------------------
 postgres| 2828 | franck  | session2         | 2024-09-30 00:25:20.837349+00

postgres=> -- back to session 1

postgres=> select datname, pid, usename, application_name, backend_start 
           from pg_stat_activity where pid=pg_backend_pid();
 datname | pid  | usename | application_name |         backend_start
---------+------+---------+------------------+-------------------------------
 postgres| 2732 | franck  | session1         | 2024-09-30 00:24:36.867228+00

postgres=>


Enter fullscreen mode Exit fullscreen mode

Even if the two connections are not active simultaneously, the PID identifies the two backend processes using more resources than necessary.

YugabyteDB Shared Connection

Let's do the same with YugabyteDB, where the connection manager is enabled. I'm using version 2.23, where the Connection Manager is in preview. I enabled it by setting --enable_ysql_conn_mgr to true and adding enable_ysql_conn_mgr to allowed_preview_flags_csv. Here is how to do this when starting with yugabyted:

yugabyted start --tserver_flags=enable_ysql_conn_mgr=true,allowed_preview_flags_csv={enable_ysql_conn_mgr}
Enter fullscreen mode Exit fullscreen mode

I'm running the same psql commands as I did with PostgreSQL, as YugabyteDB is PostgreSQL compatible, and checking some additional parameters related to the YugabyteDB connection manager.

psql (16.2, server 11.2-YB-2.23.0.0-b0)
Type "help" for help.

yugabyte=# -- check that the connection manager is enabled

yugabyte=# show yb_is_client_ysqlconnmgr;
 yb_is_client_ysqlconnmgr
--------------------------
 on

yugabyte=# -- this is session 1

yugabyte=# set application_name to session1;
SET

yugabyte=# select datname, pid, usename, application_name, backend_start 
           from pg_stat_activity where pid=pg_backend_pid();

 datname  |   pid   | usename  | application_name |        backend_start
----------+---------+----------+------------------+------------------------------
 yugabyte | 2685120 | yugabyte | session1         | 2024-09-29 21:41:46.77405+00

yugabyte=# -- run another psql for session 2

yugabyte=# \! psql -c "set application_name to session2" -c "select datname, pid, usename, application_name, backend_start from pg_stat_activity where pid=pg_backend_pid();"

SET

 datname  |   pid   | usename  | application_name |        backend_start
----------+---------+----------+------------------+------------------------------
 yugabyte | 2685120 | yugabyte | session2         | 2024-09-29 21:41:46.77405+00

yugabyte=# -- back to session 1

yugabyte=# select datname, pid, usename, application_name, backend_start 
           from pg_stat_activity where pid=pg_backend_pid();

 datname  |   pid   | usename  | application_name |        backend_start
----------+---------+----------+------------------+------------------------------
 yugabyte | 2685120 | yugabyte | session1         | 2024-09-29 21:41:46.77405+00

(1 row)

yugabyte=# show ysql_conn_mgr_sticky_object_count;

 ysql_conn_mgr_sticky_object_count
-----------------------------------
 0
(1 row)

Enter fullscreen mode Exit fullscreen mode

The outcome is comparable to that of PostgreSQL. In this scenario, each logical connection has its application name set but shares the same physical connection, resulting in the same PID. When the second session was executed during the first session's idle state, its physical process was reused. The session parameters, such as application_name, are stored in a shared memory segment rather than private process memory to facilitate this without disrupting session states.

There's no need to configure a session or transaction mode. Because it is integrated with the database, the connection manager detects objects beyond the transaction and updates a stickiness counter. It is visible here as ysql_conn_mgr_sticky_object_count, which is zero because no object requires a dedicated session.

YugabyteDB Sticky Connection

It is only possible to reuse the process used by an idle session when there is no dedicated state. The application_name is a dedicated state managed with shared memory. Other cases are handled differently: a physical connection can only be returned to the pool when the stickiness counter is down to zero. For example, if a session creates a temporary table with a session scope, it should not be shared with another session. YugabyteDB increases the ysql_conn_mgr_sticky_object_count when such an object makes the physical connection sticky.

yugabyte=# -- this is session 1

yugabyte=# set application_name to session1;
SET
yugabyte=# create temporary table session1_temp (id bigint)
           on commit preserve rows;
CREATE TABLE

yugabyte=# select datname, pid, usename, application_name, backend_start 
           from pg_stat_activity where pid=pg_backend_pid();

 datname  |   pid   | usename  | application_name |         backend_start
----------+---------+----------+------------------+-------------------------------
 yugabyte | 2748829 | yugabyte | session1         | 2024-09-30 00:39:08.914216+00

yugabyte=# -- run another psql for session 2

yugabyte=# \! psql -c "set application_name to session2" -c "select datname, pid, usename, application_name, backend_start from pg_stat_activity where pid=pg_backend_pid();"

SET

 datname  |   pid   | usename  | application_name |         backend_start
----------+---------+----------+------------------+-------------------------------
 yugabyte | 2748861 | yugabyte | session2         | 2024-09-30 00:39:17.652615+00

yugabyte=# -- back to session 1

yugabyte=# select datname, pid, usename, application_name, backend_start 
           from pg_stat_activity where pid=pg_backend_pid();

 datname  |   pid   | usename  | application_name |         backend_start
----------+---------+----------+------------------+-------------------------------
 yugabyte | 2748829 | yugabyte | session1         | 2024-09-30 00:39:08.914216+00

yugabyte=# show ysql_conn_mgr_sticky_object_count;
 ysql_conn_mgr_sticky_object_count
-----------------------------------
 1

yugabyte=# drop table session1_temp;
DROP TABLE

yugabyte=# show ysql_conn_mgr_sticky_object_count;
 ysql_conn_mgr_sticky_object_count
-----------------------------------
 0

yugabyte=# -- run another psql for session 2

yugabyte=# \! psql -c "set application_name to session2" -c "select datname, pid, usename, application_name, backend_start from pg_stat_activity where pid=pg_backend_pid();"

SET

 datname  |   pid   | usename  | application_name |         backend_start
----------+---------+----------+------------------+-------------------------------
 yugabyte | 2748829 | yugabyte | session2         | 2024-09-30 00:39:08.914216+00

Enter fullscreen mode Exit fullscreen mode

In this scenario, when session 1 created a temporary table, session 2 utilized a separate physical connection (identified by a different pid). YugabyteDB handles this by keeping track of the objects that necessitate a dedicated connection. Once the temporary table is deleted, the physical connection can be reused.

Conclusion

PostgreSQL's process-per-connection model can be challenging for scalability, especially in modern, high-concurrency environments. Although application and external connection pools offer some solutions, they still have limitations within this architecture, require complex tuning, and are not transparent to SQL features.

Dedicated processes per application connection are a common issue in SQL databases. To address these problems, Oracle Database has introduced several features, such as Shared Servers (MTS), Database Resident Connection Pool (DRCP), and an external connection manager (CMAN). Discussions on the PostgreSQL mailing list have been about making PostgreSQL multi-threaded but would require significant refactoring of the code.

In contrast, YugabyteDB has chosen to maintain compatibility with PostgreSQL by using physical connections as PostgreSQL backend processes and integrating a connection pooler (Odyssey) into the database itself as an integrated Connection Manager. This setup allows multiple logical connections to share physical resources while maintaining awareness of the session state, resulting in improved resource utilization and scalability.


Built-in Connection Manager Turns Key  PostgreSQL Weakness into a Strength | Yugabyte

YugabyteDB's Connection Manager revolutionizes PostgreSQL connection pooling, offering 20x faster speeds and supporting 5x more connections seamlessly.

favicon yugabyte.com

Top comments (0)