DEV Community

Pawan Kukreja
Pawan Kukreja

Posted on

[Summary] Chapter#11 "The Internals of PostgreSQL" Streaming Replication

Starting the Streaming Replication

Three types of processes work cooperatively in streaming replication:

  • Walsender Process
  • Walreceiver Process
  • Startup Process

Walserver is Primary Server which sends WAL data to standby server, and other two processes are standby servers receive and replays these data, and walsender and walreceiver communicate using single TCP connection.

Following is process how the connection started and how the connection stablished.

  • Start primary and standby servers.
  • The standby server starts a startup process
  • The standby server starts a walreceiver process.
  • The walreceiver sends a connection request to the primary server and server is not running then walreciever sends these requests periodically.
  • When the primary server receives a connection request, it starts a walsender process and a TCP connection is established between the walsender and walreceiver.
  • The walreceiver sends the latest LSN of standby's database cluster. In general, this phase is known as handshaking in the field of information technology.
  • If the standby's latest LSN is less than the primary's latest LSN (Standby's LSN < Primary's LSN), the walsender sends WAL data from the former LSN to the latter LSN. Such WAL data are provided by WAL segments stored in the primary's pg_xlog subdirectory (in version 10 or later, pg_wal subdirectory). Then, the standby server replays the received WAL data. In this phase, the standby catches up with the primary, so it is called catch-up.
  • Streaming Replication begins to work.

How to Conduct Streaming Replication

It is bases on two aspects:

  • Log shipping: Streaming replication is bases on Log shipping and primary server sends the WAL data to connected standby servers when writing of them occurs.

  • Database synchronisation: It requires synchronous replication, each server communicates with multiple server to synchronize its database clusters.

Behavior when a failure occurs
Even if the synchronous standby server has failed and is no longer able to return an ACK response, the primary server continues to wait for responses forever. So, the running transactions cannot commit and subsequent query processing cannot be started.

Managing Multiple-Standby Servers

Discussed Streaming replication works with multiple standby servers is described.

Sync_priority and sync_state
Primary server gives sync_priority and sync_state to all managed standby servers, and treats each standby server depending on its respective values. sync_priority indicates the priority of standby server in synchronous-mode and is a fixed value. The smaller value shows the higher priority, while 0 is the special value that means ‘in asynchronous-mode’

How the Primary Manages Multiple-standbys
The primary server confirms only synchronous standby's writing and flushing of WAL data. Streaming replication, therefore, ensures that only synchronous standby is in the consistent and synchronous state with the primary.

Behavior when a failure occurs
When either a potential or an asynchronous standby server has failed, the primary server terminates the walsender process connected to the failed standby and continues all processing. In other words, transaction processing of the primary server would not be affected by the failure of either type of standby server. When a synchronous standby server has failed, the primary server terminates the walsender process connected to the failed standby, and replaces synchronous standby with the highest priority potential standby.

Detecting Failure of standby Servers
There are two common failure procedures that will not require any special hardware.

  • Failure detection of standby server process When connection drop between walsender and walreciever has been detected, the primary server immediately determines that the standby server or walreceiver process is faulty.

Failure detection of hardware and networks
If a walreceiver returns nothing within the time set for the parameter wal_sender_timeout, the primary server determines that the standby server is faulty.

Top comments (0)