In this part i will discuss basic information required for understanding concurrency control in Postgres.
PostgreSQL uses a variation of Multi-version Concurrency Control (MVCC) called Snapshot Isolation (SI).
In MVCC, Each write operation creates a new version of a data item while retaining the old version. When a transaction reads a data item, the system selects one of the versions to ensure isolation of the individual transaction. The main advantage of MVCC is that readers don’t block writers, and writers don’t block readers
In SI, A new data item is inserted directly into the relevant table page. When reading items, PostgreSQL selects the appropriate version of an item in response to an individual transaction by applying visibility check rules (will talk about this later).
Transaction ID
Whenever a transaction begins, a unique identifier, referred to as a transaction id (txid), which is a 32-bit unsigned integer assigned by the transaction manager.
The built-in txid_current() function can be used after a transaction starts to check the current txid.
However there are three special txids reserved by Postgres:
- 0:Invalid txid.
- 1:Bootstrap txid, which is only used in the initialization of the database cluster.
- 2:Frozen txid.
Txids are sequential, and for a given id n, all ids > n are invisible to it, and all ids < n are visible to it.
Tuple Structure
A heap tuple consists of three parts:
- HeapTupleHeaderData structure
- NULL bitmap
- user data
Here are four relevant fields of the HeapTupleHeaderData:
- t_xmin holds the txid of the transaction that inserted this tuple.
- t_xmax holds the txid of the transaction that deleted or updated this tuple. If this tuple has not been deleted or updated, t_xmax is set to 0, which means INVALID.
- t_cid holds the command id (cid), which means how many SQL commands were executed before this command was executed within the current transaction beginning from 0.
- t_ctid holds the tuple identifier (tid) that points to itself or a new tuple. tid is used to identify a tuple within a table. When this tuple is updated, the t_ctid of this tuple points to the new tuple; otherwise, the t_ctid points to itself.
Free Space Map (FSM)
When inserting a heap or an index tuple, PostgreSQL uses the FSM of the corresponding table or index to select the page which can be inserted it.
Commit Log (clog)
PostgreSQL holds the statuses of transactions in the Commit Log (often called the clog). It is allocated to the shared memory, and is used throughout transaction processing.
Transaction Status
PostgreSQL defines four transaction states:
- IN_PROGRESS
- COMMITTED
- ABORTED
- SUB_COMMITTED (for sub-transactions)
How it works
The clog comprises one or more 8 KB pages in shared memory. The clog logically forms an array. The indices of the array correspond to the respective transaction ids, and each item in the array holds the status of the corresponding transaction id.
Maintenance of the Clog
When PostgreSQL shuts down or whenever the checkpoint process runs, the data of the clog are written into files stored under the pg_xact subdirectory (called pg_clog in Version 9.6 or earlier.) These files are named 0000, 0001.
When PostgreSQL starts up, the data stored in the pg_xact's files are loaded to initialize the clog.
The size of the clog continuously increases on each new page, but since not all data in the clog are necessary, the VACUUM process regularly removes old data (both the clog pages and files).
Transaction Snapshot
A transaction snapshot is a dataset that stores information about whether all transactions are active at a certain point in time for an individual transaction. Here an active transaction means it is in progress or has not yet started.
When using the obtained snapshot for the visibility check, active transactions in the snapshot must be treated as in progress even if they have actually been committed or aborted. This rule is important because it causes the difference in the behaviour between READ COMMITTED and REPEATABLE READ.
Visibility Check Rules
Visibility check rules are a set of rules used to determine whether each tuple is visible or invisible using both the t_xmin and t_xmax of the tuple, the clog, and the obtained transaction snapshot.
Following are some minimal rules:
Status of t_xmin is ABORTED
Rule 1: A tuple whose t_xmin status is ABORTED is always invisible because the transaction that inserted this tuple has been aborted.
Status of t_xmin is IN_PROGRESS
Rules 3 & 4: A tuple whose t_xmin status is IN_PROGRESS is invisible, except under one condition.
Status of t_xmin is COMMITTED
A tuple whose t_xmin status is COMMITTED is visible (Rules 6,8, and 9), except under three conditions.
Top comments (0)