DEV Community

Nile Lazarus
Nile Lazarus

Posted on • Updated on

Demystifying the Internals of PostgreSQL - Chapter 5

Welcome back to another step in our journey towards understanding the Internals of PostgreSQL.

In the last blog, we covered chapter 4 which explains Foreign Data Wrappers and Parallel Query.

Now we're going to be moving on to chapter 5 which explains how PostgreSQL manages Concurrency Control. Let's jump right in!

Introduction

When multiple transactions are running simultaneously in a database, concurrency control is needed to maintain atomicity and isolation which are two crucial ACID properties.

There are three concurrency control techniques commonly used:

  1. Multi-version Concurrency Control (MVCC)
  2. Strict Two-Phase Locking (S2PL)
  3. Optimistic Concurrency Control (OCC) Each technique has its own variations. PostgreSQL uses a variation of MVCC called Snapshot Isolation (SI). In MVCC, when a write is performed, a new version of the data item is created while also retaining the old version. Then when a transaction attempts to read a data item, the selects one of the old versions to ensure isolation. In this way, read operations dont block write operations and vice versa.

In SI used by other RDBMSs, old versions of the data items being written to are stored in rollback segments. In the variation PostgreSQL uses, the new data item is inserted directly into the target table. When reading an item, PostgreSQL selects the appropriate version of the item through visibility check rules.

SI does not allow Dirty Reads, Non-Repeatable Reads, and Phantom Reads. However, SI allows serialisation anomalies which makes it unable to achieve true serializability. To handle this issue, PostgreSQL uses Serializable Snapshot Isolation (SSI) which was added as of version 9.1. This enables PostgreSQL to offer true SERIALIZABLE isolation level.

Transaction ID

When a transaction begins, it is assigned a unique identifier called transaction id (txid) by the transaction manager. In postgreSQL, the txid is a 32-bit unsigned integer. To view the current transactions txid, execute the txid_current(). PostgreSQL will then return either the current txid or one of the three following txids reserved by it:

  • 0 which means Invalid txid
  • 1 which means Bootstrap txid (used in the initialization of database cluster)
  • 2 which means Frozen txid

txids can be compared with one another however a concept of past and present is to be kept in mind. If your current txid is 100, you can only view txids less than that since they are considered past. Transaction ids greater than that will not be viewable as they are considered future and hence invisible.

Tuple Structure

Heap tuples in table pages have three parts: HeapTupleHeaderData structure, NULL bitmap, and user data.

The HeapTupleHeaderData has seven fields, for of which are described in this chapter:

  • t_xmin contains the txid of the transaction which inserted this tuple.
  • t_xmax contains the txid of the transaction which deleted or updated this tuple while 0 would mean this tuple has not been deleted or updated as of yet
  • t_cid holds the current command id (cid)
  • t_ctid holds the tuple id (tid) that points to itself or to a new tuple id if tuple has been updated.

Inserting, Deleting and Updating Tuples

This section of the chapter contains in-depth examples of what happens when inserting, deleting or updating a tuple and of how a Free Space Map (FSM) is used by PostgreSQL.

I highly recommend reading this section from the book to better understand through the examples and diagrams provided.

Commit Log (clog)

The statuses of transactions performed are stored in the Commit Log (clog) in PostgreSQL.

PostgreSQL has 4 defined transaction states:

  • IN_PROGRESS: transaction is in progress
  • COMMITTED: transaction has been committed
  • ABORTED: transaction has been aborted
  • SUB_COMMITTED: denotes sub-transactions (not elaborated upon in the book)

The clog takes up one or more pages of 8KB each in the shared memory and logically forms an array. The indices of this array match respective transaction ids. The status of each transaction is stored at its respective index in the table.
When the current txid goes beyond the capacity of the clog, a new page is appended.

In the case that PostgreSQL shuts down or if the checkpoint process is run, the contents of the clog are copied into files stored in the pg_xact sub directory. The files are named from 0000, 0001 to so on and so forth. The maximum file size is 256 KB. When PostgreSQL starts up, the data from files stored in pg_xact is used to initialise the clog.

Transaction Snapshot

A transaction snapshot stores data about whether transactions are or are not active at a certain point in time.
In PostgreSQL, this is textually represented in a form such as '100 : 100 :' which denotes that txids less than 100 are not active and txids greater than or equal to 100 are active.

Transaction Snapshots are provided by the Transaction Manager. These snapshots are then used for visibility checks by PostgreSQL as mentioned above.

Visibility Check Rules

Visibility checks rules determine whether a tuple is invisible (future) or visible (past) by using the data stored in t_xmin, t_xmax, clog, and the transaction snapshot. This chapter only goes into the minimal rules used for visibility checks.

  • Rule 1: if t_xmin status is ABORTED, tuple is invisible
  • Rule 2: if t_xmin status is IN_PROGRESS, its value is equal to current txid, and t_xmax is INVALID then transaction is visible

If Status(t_xmin) = IN_PROGRESS ∧ t_xmin = current_txid ∧ t_xmax = INVALID ⇒ Visible

  • Rule 3: if t_xmin status is IN_PROGRESS and its value is equal to current txid (given that t_xmax is not INVALID), tuple is invisible

If Status(t_xmin) = IN_PROGRESS ∧ t_xmin = current_txid ∧ t_xmax ≠ INVALID ⇒ Invisible

  • Rule 4: if the tuple is inserted by another transaction (txid is not equal to current txid) and t_xmin staus is IN_PROGRESS, tuple is invisible

If Status(t_xmin) = IN_PROGRESS ∧ t_xmin ≠ current_txid ⇒ Invisible

  • Rule 5: If Status(t_xmin) = COMMITTED ∧ Snapshot(t_xmin) = active ⇒ Invisible

  • Rule 6: If Status(t_xmin) = COMMITTED ∧ (t_xmax = INVALID ∨ Status(t_xmax) = ABORTED) ⇒ Visible

  • Rule 7: If Status(t_xmin) = COMMITTED ∧ Status(t_xmax) = IN_PROGRESS ∧ t_xmax = current_txid ⇒ Invisible

  • Rule 8: If Status(t_xmin) = COMMITTED ∧ Status(t_xmax) = IN_PROGRESS ∧ t_xmax ≠ current_txid ⇒ Visible

  • Rule 9: If Status(t_xmin) = COMMITTED ∧ Status(t_xmax) = COMMITTED ∧ Snapshot(t_xmax) = active ⇒ Visible

  • Rule 10: If Status(t_xmin) = COMMITTED ∧ Status(t_xmax) = COMMITTED ∧ Snapshot(t_xmax) ≠ active ⇒ Invisible

This chapter goes into further detail about how these rules are implemented along with how Lost Updates* are prevented through scenarios and examples which I highly recommend reading.

  • a Lost Update (also called a ww-conflict) is an anomaly which occurs when two transactions attempt to updates the same rows simultaneously

Serializable Snapshot Isolation

If a cycle is created containing conflicts in the precedence graph, a serialisation anomaly will occur.

There are three types of conflicts: wr-conflicts (Dirty Reads), ww-conflicts (Lost Updates), and rw-conflicts. wr and ww conflicts are already prevented by PostgreSQL, hence SSI implementation in PostgreSQL only handles rw-conflicts using the following strategy:

  1. Record all objects (tuples, pages, relations) accessed by transactions as SIREAD locks.
  2. Detect rw-conflicts using SIREAD locks whenever any heap or index tuple is written.
  3. Abort the transaction if a serialisation anomaly is detected by checking detected rw-conflicts.

Required Maintenance Processes

The following maintenance processes are required by PostgreSQL's concurrency control mechanism:

  1. Remove dead tuples and index tuples that point to corresponding dead tuples
  2. Remove unnecessary parts of the clog
  3. Freeze old txids*
  4. Update FSM, VM, and the statistics
  • FREEZE is a process in PostgreSQL whereby a frozen txid is defined in such a way that it is always older than other txids and is hence always inactive and visible

Top comments (0)