Concurrency control may sound like a complex term, but it's an essential mechanism for maintaining atomicity and isolation in databases when multiple transactions are running simultaneously. Understanding the basics of concurrency control can help you build more robust and reliable applications. In this blog post, we'll go over the basic information required to understand how concurrency control is implemented in the PostgreSQL relational database management system.
Concurrency Control in Relational Databases
There are three broad concurrency control techniques: Multi-version Concurrency Control (MVCC), Strict Two-Phase Locking (S2PL), and Optimistic Concurrency Control (OCC). Each technique has its variations, but let's focus on the main ideas.
MVCC, as the name suggests, creates multiple versions of a data item when a write operation occurs while retaining the old version. When a transaction reads a data item, the system selects the appropriate version to ensure transaction isolation. One significant advantage of MVCC is that readers and writers don't block each other. For example, in an S2PL-based system, readers are blocked when a writer writes an item because the writer acquires an exclusive lock. PostgreSQL and some other Relational Database Management Systems (RDBMSs) use a variation of MVCC called Snapshot Isolation (SI), which offers similar benefits.
Concurrency Control in Postgres
To implement Snapshot Isolation, PostgreSQL uses rollback segments or a simpler method of inserting new data items directly into the relevant table page. When reading items, PostgreSQL applies visibility check rules to select the appropriate version for each transaction.
Snapshot Isolation (SI) prevents three anomalies defined in the ANSI SQL-92 standard: Dirty Reads, Non-Repeatable Reads, and Phantom Reads. However, SI falls short of achieving true serializability because it allows serialization anomalies like Write Skew and Read-only Transaction Skew. It's important to note that the ANSI SQL-92 standard's classical serializability definition is not equivalent to the definition in modern theory.
To address this issue, Serializable Snapshot Isolation (SSI) was introduced in PostgreSQL version 9.1. SSI can detect serialization anomalies and resolve conflicts caused by such anomalies, providing a true SERIALIZABLE isolation level. It's worth mentioning that SQL Server also uses SSI, while Oracle continues to use only SI.
We will now go over some basic concepts required to understand how concurrency control is implemented in postgres.
Transaction ID:
When a transaction begins in PostgreSQL, it is assigned a unique identifier called a transaction id(txid). The txid is a 32-bit unsigned integer, allowing for approximately 4.2 billion possible values. The built-in function txid_current() can be used to retrieve the current txid within a transaction.
There are three special txids in PostgreSQL:
- 0 represents an invalid txid.
- 1 is the bootstrap txid, used during the initialization of the database cluster.
- 2 is the frozen txid
PostgreSQL treats the txid space as a circular range due to limitations on the number of available txids. The previous 2.1 billion txids are considered "in the past," while the next 2.1 billion txids are considered "in the future." This circular range allows for efficient comparison and management of txids.
Tuple Structure
A heap tuple consists of three parts: the HeapTupleHeaderData structure, a NULL bitmap, and user data. The HeapTupleHeaderData structure contains several fields, listed below:
- t_xmin (transaction id of the inserting transaction)
- t_xmax (transaction id of the deleting or updating transaction)
- t_cid (command id indicating the number of executed SQL commands within the current transaction)
- t_ctid (tuple identifier pointing to itself or a new tuple).
Top comments (0)