DEV Community

Hadi Atef
Hadi Atef

Posted on

" The Internals of Postgresql" chapter #5

The transaction id

In PostgreSQL, a transaction id (txid) is a unique identifier assigned by the transaction manager to each transaction. The built-in function txid_current() returns the current txid of a transaction. PostgreSQL reserves three special txids: 0 for Invalid txid, 1 for Bootstrap txid used in database cluster initialization, and 2 for Frozen txid.

Txids can be compared with each other, and from the viewpoint of a particular txid, txids greater than it are considered to be 'in the future' and invisible, while txids less than it are 'in the past' and visible. However, since the txid space is limited to approximately 4.2 billion, PostgreSQL treats the txid space as a circle, where the previous 2.1 billion txids are 'in the past' and the next 2.1 billion txids are 'in the future'. This means that the txid space can wrap around, causing the so-called txid wraparound problem.

Tuple Structure

A heap tuple in a PostgreSQL table page consists of three parts: the HeapTupleHeaderData structure, a NULL bitmap, and user data.

The HeapTupleHeaderData structure contains seven fields, but four are relevant for subsequent sections. These fields are:

  • t_xmin: which holds the transaction ID of the transaction that inserted the tuple.
  • t_xmax: which holds the transaction ID of the transaction that deleted or updated the tuple.
  • t_cid: which holds the command ID indicating how many SQL commands were executed before the current command within the transaction.
  • t_ctid: which holds the tuple ID that points to itself or a new tuple when updated.

Insert, Update and Delete

This section explains how tuples are inserted, deleted, and updated in PostgreSQL.

  • Insertion: a new tuple is inserted directly into a page of the target table.
  • Deletion: the target tuple is deleted logically by setting the value of the txid that executes the DELETE command to the t_xmax of the tuple.
  • Update: tuple is logically deleted, and a new one is inserted.

PostgreSQL uses the Free Space Map (FSM) to select the page in which a tuple can be inserted and to store information about the free space capacity of each page within the corresponding table or index file. Dead tuples should eventually be removed from pages through VACUUM processing.

Transaction Snapshots

A transaction snapshot is a dataset that stores information about whether all transactions are active at a certain point in time for an individual transaction.

PostgreSQL internally defines the textual representation format of transaction snapshots as 'xmin:xmax:xip_list'.

The function pg_current_snapshot shows a snapshot of the current transaction.
In READ COMMITTED isolation level, the transaction obtains a snapshot whenever an SQL command is executed. Otherwise (REPEATABLE READ or SERIALIZABLE), the transaction only gets a snapshot when the first SQL command is executed.

The obtained transaction snapshot is used for a visibility check of tuples.

Active transactions in the snapshot must be treated as in progress even if they have actually been committed or aborted.

The transaction manager always holds information about currently running transactions. Transaction snapshots are provided by the transaction manager.

Visibility check rules

In database systems, visibility check rules are used to determine whether a tuple (row) in a table is visible or invisible to a transaction, based on the transaction's snapshot, the commit and abort status of the transactions that have modified the tuple, and the tuple's transaction information (t_xmin and t_xmax). There are 10 rules in total, which can be classified into three cases depending on the status of t_xmin: ABORTED, IN_PROGRESS, and COMMITTED.

  • If t_xmin status is ABORTED, the tuple is always invisible (Rule 1) because the transaction that inserted the tuple has been aborted.

  • If t_xmin status is IN_PROGRESS, the tuple is essentially invisible (Rules 3 and 4), except when it was inserted by the current transaction and t_xmax is not INVALID, in which case it must be visible (Rule 2). Otherwise, if the tuple was inserted by another transaction or has been updated or deleted by the current transaction, it is invisible (Rules 3 and 4).

  • If t_xmin status is COMMITTED, the tuple is visible (Rules 6, 8, and 9), unless t_xmin is active in the obtained transaction snapshot (Rule 5), or t_xmax is the current txid (Rule 7), or t_xmax is COMMITTED but not active in the obtained transaction snapshot (Rule 10). Otherwise, if t_xmax is INVALID or ABORTED, the tuple is visible (Rule 6), or if t_xmax is IN_PROGRESS and not the current txid, the tuple is also visible (Rule 8), or if t_xmax is COMMITTED and active in the obtained transaction snapshot, the tuple is visible (Rule 9).

Visibility check

The visibility check is performed based on the transaction isolation level and the status of the transaction ID (txid), which is obtained using functions like TransactionIdIsInProgress, TransactionIdDidCommit, and TransactionIdDidAbort.

PostgreSQL uses hint bits to efficiently check the status of t_xmin and t_xmax of each tuple and sets them to the t_infomask of the tuple when reading or writing the tuple.

By following these rules and using hint bits, PostgreSQL ensures that each transaction sees a consistent view of the data and prevents the anomalies defined in the ANSI SQL-92 Standard.

Preventing Lost Updates

To prevent Lost Updates, PostgreSQL uses a combination of locking and transaction isolation levels.

When an UPDATE command is executed, the function ExecUpdate is invoked.

ExecUpdate checks if the target row is being updated by another transaction, and if so, it waits for that transaction to complete before proceeding.

If the transaction that updated the target row has committed and the isolation level of the current transaction is REPEATABLE READ or SERIALIZABLE, the current transaction is aborted to prevent a Lost Update.

If the target row has been updated by another concurrent transaction, and the isolation level of the current transaction is READ COMMITTED, the current transaction updates the target row. otherwise, it is aborted.

If the target row has not been updated, the current transaction updates the target row and proceeds to the next row to be updated.
This process is repeated until all rows have been updated or a Lost Update is prevented.

Serializable Snapshot Isolation

Serializable Snapshot Isolation (SSI) is a technique implemented in PostgreSQL to provide a true SERIALIZABLE isolation level. It detects rw-conflicts, which are conflicts where one transaction reads a tuple and another transaction writes to the same tuple, and prevents serialization anomalies such as Write-Skew.

SSI works by recording all objects accessed by transactions as SIREAD locks and detecting rw-conflicts using these locks whenever any heap or index tuple is written. If a serialization anomaly is detected, the transaction causing the anomaly is aborted.

SSI uses two main data structures: SIREAD locks and rw-conflicts, which are stored in shared memory.

  • SIREAD locks are created for all objects accessed by transactions in SERIALIZABLE mode and have three levels: tuple, page, and relation.
  • Rw-conflicts are created by the CheckTargetForConflictIn function whenever an INSERT, UPDATE, or DELETE command is executed in SERIALIZABLE mode and a conflict is detected.

The CheckTargetForConflictOut and PreCommit_CheckForSerializationFailure functions check for serialization anomalies using the rw-conflicts and abort transactions if necessary.

Required Maintenance Processes

PostgreSQL's concurrency control mechanism requires several maintenance processes including removing dead tuples and index tuples, removing unnecessary parts of the clog, freezing old transaction IDs, and updating FSM, VM, and statistics.

The transaction ID wraparound problem can occur when the server has been running for a long time and the current transaction ID exceeds a certain limit. To address this, PostgreSQL introduced a frozen transaction ID and implemented a freeze process, which is invoked by the VACUUM command and rewrites the t_xmin of tuples to the frozen transaction ID if they are older than the current transaction ID minus the vacuum_freeze_min_age.

In version 9.4 or later, the XMIN_FROZEN bit is set to the t_infomask field of tuples rather than rewriting the t_xmin of tuples to the frozen transaction ID.

Top comments (0)