This blog aims to assist you in understanding the initial concepts of Chapter:5 [Concurrency Control] from the book The Internals of PostgreSQL.
Note: Ensure that you have a thorough understanding of Chapter 4 and basics of PostreSQL before we proceed to Chapter 5 Part-1, as it forms the foundation for our exploration.
So, Let's Start:
Introduction to Chapter
Concurrency Control ensures atomicity and isolation in databases when multiple transactions run concurrently.
Three main concurrency control techniques are Multi-version Concurrency Control (MVCC), Strict Two-Phase Locking (S2PL), and Optimistic Concurrency Control (OCC).
MVCC creates new versions of data items during write operations while retaining old versions to ensure isolation.
MVCC's main advantage is that it allows readers and writers to operate concurrently without blocking each other.
PostgreSQL and some RDBMSs use a variation of MVCC called Snapshot Isolation (SI) for implementing concurrency control.
SI avoids anomalies such as Dirty Reads, Non-Repeatable Reads, and Phantom Reads.
Serializable Snapshot Isolation (SSI) was introduced in PostgreSQL version 9.1 to address serialization anomalies and provide true SERIALIZABLE isolation level.
Transaction ID
Whenever a transaction begins, a unique identifier, referred to as a transaction id (txid), is assigned by the transaction manager.
PostgreSQL's txid is a 32-bit unsigned integer.
If you execute the built-in txid_current() function after a transaction starts, the function returns the current txid as follows.
- PostgreSQL reserves the following three special txids:
0 means Invalid txid.
1 means Bootstrap txid, which is only used in the initialization of the database cluster.
2 means Frozen txid.Transaction ids in PostgreSQL is depicted in figure below:
Tuple Structure
Heap tuples in table pages are classified as a usual data tuple and a TOAST tuple.
A heap tuple comprises three parts, i.e. the HeapTupleHeaderData structure, NULL bitmap, and user data.
Tuple structure in PostgreSQL is depicted in figure below:
The HeapTupleHeaderData structure is defined in
src/include/access/htup_details.h
.While the HeapTupleHeaderData structure contains seven fields, four fields are required in the subsequent sections.
t_xmin holds the txid of the transaction that inserted this tuple.
t_xmax holds the txid of the transaction that deleted or updated this tuple. If this tuple has not been deleted or updated, t_xmax is set to 0, which means INVALID.
t_cid holds the command id (cid), which means how many SQL commands were executed before this command was executed within the current transaction beginning from 0.
t_ctid holds the tuple identifier (tid) that points to itself or a new tuple. When this tuple is updated, the t_ctid of this tuple points to the new tuple; otherwise, the t_ctid points to itself.
Inserting, Deleting and Updating Tuples
Representation of tuples in PostgreSQL is depicted in figure below:
Insertion
- With the insertion operation, a new tuple is inserted directly into a page of the target table.
Tuple insertion in PostgreSQL is depicted in figure below:
Analysis of above tuple insertion is given below:
- t_xmin is set to 99 because this tuple is inserted by txid 99.
- t_xmax is set to 0 because this tuple has not been deleted or updated.
- t_cid is set to 0 because this tuple is the first tuple inserted by txid 99.
- t_ctid is set to (0,1), which points to itself, because this is the latest tuple.
Deletion
- In the deletion operation, the target tuple is deleted logically. The value of the txid that executes the DELETE command is set to the t_xmax of the tuple.
Tuple deletion in PostgreSQL is depicted in figure below:
Analysis of above tuple deletion is given below:
t_xmax is set to 111.
If txid 111 is committed, Tuple_1 is no longer required. Generally, unneeded tuples are referred to as dead tuples in PostgreSQL.
Dead tuples should eventually be removed from pages. Cleaning dead tuples is referred to as VACUUM Processing.
Update
- In the update operation, PostgreSQL logically deletes the latest tuple and inserts a new one.
Tuple updation twice the row in PostgreSQL is depicted in figure below:
- As with the delete operation, if txid 100 is committed, Tuple_1 and Tuple_2 will be dead tuples, and, if txid 100 is aborted, Tuple_2 and Tuple_3 will be dead tuples.
Free Space Map
When inserting a heap or an index tuple, PostgreSQL uses the Free Space Map (FSM) of the corresponding table or index to select the page which can be inserted it.
Each FSM stores the information about the free space capacity of each page within the corresponding table or index file.
pg_freespacemap in PostgreSQL is depicted in figure below:
Commit Log (Clog)
PostgreSQL holds the statuses of transactions in the Commit Log also known as Clog.
Clog is allocated to the shared memory, and is used throughout transaction processing.
Transaction Status
- PostgreSQL defines four transaction states, i.e. IN_PROGRESS, COMMITTED, ABORTED, and SUB_COMMITTED.
The first three statuses are obvious. For example, when a transaction is in progress, its status is IN_PROGRESS, etc.
SUB_COMMITTED is for sub-transactions, and its description is omitted in this document.
How Clog Performs
How the Clog operates in PostgreSQL is depicted in figure below:
The Clog comprises one or more 8 KB pages in shared memory.
The Clog logically forms an array.
The indices of the array correspond to the respective transaction ids, and each item in the array holds the status of the corresponding transaction id.
When the current txid advances and the Clog can no longer store it, a new page is appended.
When the status of a transaction is needed, the internal functions are invoked. Those functions read the Clog and return the status of the requested transaction.
Maintenance of the Clog
When PostgreSQL shuts down or during checkpoint processes, the Clog data is written into files stored in the pg_xact (pg_clog) subdirectory.
Files in pg_xact are named 0000, 0001, etc., with a maximum file size of 256 KB.
When PostgreSQL starts up, the data stored in the pg_xact's files are loaded to initialize the clog.
The clog size grows continuously by appending new pages when it gets filled up.
Regular vacuum processing removes old data from the clog, including both pages and files.
I hope, this blog has helped you in understanding the initial concepts of Concurrency Control in PostreSQL.
Check out summary of Chapter : 5 Part-2
If you want to understand PostgreSQL In-Depth.
Top comments (0)