Adam Furmanek for Metis

Posted on Dec 14, 2023 • Originally published at metisdata.io

High Availability in SQL: A Guide to High Availability Databases

#sql #database #postgres

We want our applications to never fail. To achieve that, we need to guarantee that the data storage is always on. How to do that with SQL databases? How to make sure that the engine never fails? And how to scale it to get better performance?

Introduction

High Availability (HA) means that the database is available always - 24 hours a day, 7 days a week. No matter if we plan the maintenance, no matter what the load is, no matter how many clients we have - the database must be on. SQL High availability (HA) consists of specific technologies, custom solutions, and processes around that.

One might find it surprising. How are we going to take the system down for maintenance and yet keep it running 24/7? There is no magic here, if the system needs to go down, then we need to have some other system that is up and running. Basically, we need to have multiple copies of the database turned on, so when one of the copies goes down, then some other copy can take the load.

However, things may get much harder when we scale things out. We would like to have the data storage as close to the users as possible. We’d like to have the database in the US for the US-based users, and the same for Europe or Australia. If we regionalize our storage like this, then we may face issues when a single entity is modified between continents. We somehow need to decide how to resolve conflicts.

Let’s see how to achieve all of that.

Glossary

Before moving on, let us define some terms.

Master node Primary node that controls the communication and servers as a source of truth. Also called primary, source, sender, coordinator.
Replica Secondary node that follows the master but doesn’t decide on anything. Also called secondary, follower, slave, receiver.
Failover Process in which a new master node is selected.
WAL (Write-ahead Logging, WAL) Technique in which we first log the changes to the log file, and then we apply them to the real data.
Read-only replica Node which allows to execute read-only queries and cannot modify the data.
Synchronous operation Operation that reports its status (success or failure) only after it completes at all nodes.
Asynchronous operation Operation that returns before all internal tasks are complete.

Techniques

HA can be guaranteed with multiple approaches, like:

WAL replication
Data storage share
2-Phase Commit (2PC)
3-Phase Commit (3PC)

We’ll cover some of them in this blog post.

WAL replication

WAL replication technique is based on the idea that we can send the Write-ahead Log from one server to another. This way the sender can decide if a given transaction should be executed, next the WAL is sent to the receiver, and then the receiver can apply logically the same changes to the data storage.

There are multiple ways to replicate WAL. We can stream the logs over a TCP/IP connection. In this approach logs are sent as they appear, which minimizes the latency between applying changes in the primary and the secondary nodes. This requires a direct network connection between nodes, and may get unreliable if the connection breaks.

Another way is to use the so-called log shipping. In this approach log files are sent from the primary to the secondary periodically over any suitable channel. This could be SCP, FTP, shared drive, cloud blob storage, etc. This introduces higher latency, but allows the replica to catch up over a longer period of time.

Another approach is to mirror the data storage to log files consistent. This allows to delegate the log shipping to some other infrastructure, so the database can just “assume” files are up to date. This can be achieved via mirroring or sharing a network storage.

It’s possible to use all these approaches at the same time. This way we can minimize the latency (thanks to log streaming), and provide reliably in case of failures (thanks to log shipping). If we can use only one way, then log shipping provides higher reliability in general.

Depending on the database engine we use, there may be many more flavors of the WAL replication. They can use NAS, shared drives, network connections, middleware agents, etc. For example, see PostgreSQL documentation.

By default the replication works in an asynchronous mode which means that the transaction is committed by the primary server, and then the logs are sent to the replica. WAL replication can be also configured in a synchronous mode in which the log is sent to the replicas, and the transaction is confirmed only after the replicas receive the changes.

Data storage share

Data storage share approach works by sharing the same data storage between database instances. This can be as simple as just connecting to the same storage physically (over a wire or a drive interface), or much more complex with shared DNS name, IP address, operating system, etc.

Not all databases support this approach. Some engines need additional configuration on the operating system level infrastructure, for instance Windows Server Failover Cluster.

This approach is highly configurable and allows for easy and fast failover. However, it may require that only one host is a master, and all other hosts are read-only replicas.

2-Phase Commit and 3-Phase Commit

2-Phase Commit (2PC) and 3-Phase Commit (3PC) are protocols for running distributed transactions. They work on the idea that changes to the database are committed atomically across all databases. General idea is:

There is a coordinator that knows what changes to introduce
The coordinator opens up a transaction on each of the nodes
The coordinator sends the changes
Each node prepares the changes and answers to the coordinator whether the transaction can be committed or not
The coordinator makes the final decision - either to commit the transaction or roll it back

This protocol ensures that the changes are applied atomically to all the databases. 3PC changes in the way how the catastrophic outage is handled to improve the reliability.

However, both 2PC and 3PC introduce additional roundtrips over the wire that may reduce the performance significantly. However, they provide great features of ACID across distributed databases.

2PC can be used in PostgreSQL with the Prepare Transaction statement. The typical flow is:

BEGIN;

-- Do the things

PREPARE TRANSACTION 't1';

-- Decide if commit or rollback

COMMIT PREPARED 't1' || ROLLBACK PREPARED 't1'

Summary

Data storage is the most crucial component of our systems. We can’t let it go down, we need to make sure it works as expected and is always available. There are multiple ways to guarantee the High Availability, based on log shipping, data sharing, or even distributing transactions. Metis provides the ultimate monitoring solution to make sure your databases are always on. Sign up for free to test Metis now or book a demo with one of our reps for a walkthrough.

DEV Community

High Availability in SQL: A Guide to High Availability Databases

Introduction

Techniques

WAL replication

Data storage share

2-Phase Commit and 3-Phase Commit

Summary

Top comments (0)

Read next

Despliegue de migración de base de datos con .Net 6 y Entity Framework Core - Parte 1

Chat with News: Interact with your articles

Hands-on with Apache Iceberg & Dremio on Your Laptop within 10 Minutes

Code Review Assistant using Open Source AI Challenge with pgai and Ollama