Internals of PostgreSQL Chapter 5 : Concurrency Control

#apacheage #postgressql #database #opensource

The layers of transaction isolation and the PostgreSQL concurrency management techniques are covered in this chapter. It introduces the Multi-version Concurrency Control (MVCC), Strict Two-Phase Locking (S2PL), and Optimistic Concurrency Control (OCC) concurrency control strategies. Concurrency control is mostly accomplished by PostgreSQL via MVCC, specifically a variant known as Snapshot Isolation (SI).

Snapshot Isolation:
Each write operation using the Snapshot Isolation (SI) technique generates a new version of a data item while preserving the previous one. The system chooses the proper version when a transaction reads a data item to ensure isolation for that transaction. The benefit of MVCC is that writers and readers do not compete with one another. When a writer writes an item, other strategies like S2PL demand that readers be blocked.

In contrast to Oracle, PostgreSQL implements SI by directly integrating new data items into the pertinent table pages. PostgreSQL employs visibility check criteria when reading items to choose the right version for each transaction. The ANSI SQL-92 standard lists three abnormalities that SI prevents: dirty reads, non-repeatable reads, and phantom reads.

To be truly serializable, however, requires more than SI can provide, which is why Write Skew and Read-only Transaction Skew are permitted. Version 9.1 of PostgreSQL included Serializable Snapshot Isolation (SSI) to address this. A real SERIALIZABLE isolation level is offered by SSI, which recognizes and resolves conflicts brought on by serialization abnormalities.

Tuple Structure:
The HeapTupleHeaderData structure, a NULL bitmap, and user data make up the PostgreSQL tuple structure. The HeapTupleHeaderData structure, which has crucial fields for handling the tuple, will be the main emphasis of this summary.

Although there are seven fields in the HeapTupleHeaderData structure, which is defined in src/include/access/htup_details.h, this overview will only focus on the four that apply to the next sections.
1) t_xmin
2) t_xmax
3) t_cid
4) t_ctid

These four crucial attributes (among others) are part of the PostgreSQL tuple structure, represented by the HeapTupleHeaderData structure, which is crucial for maintaining tuple data, such as transaction IDs, command IDs, and table-level tuple identifiers.

Brief Summary:
The chapter offers a thorough explanation of PostgreSQL concurrency management, concentrating on the MVCC-based Snapshot Isolation (SI) method. It describes SI in PostgreSQL and discusses the advantages of MVCC over alternative concurrency control techniques. The chapter also discusses how Serializable Snapshot Isolation (SSI) can be used as a workaround for SI's shortcomings in achieving genuine serializability.

The chapter's organisational structure makes it easier for readers to understand the ideas and specifics of how they were implemented. Important subjects such transaction ids, tuple structure, tuple insertion, deletion, and updating, as well as the function of the commit log (clog) in keeping track of transaction statuses, are all covered. More clarification is added by outlining the Free Space Map (FSM) and how it relates to tuple insertion/updates.

How I Cut 22.3 Seconds Off an API Call with Sentry 🕒

Struggling with slow API calls? Dan Mindru walks through how he used Sentry's new Trace View feature to shave off 22.3 seconds from an API call.

Get a practical walkthrough of how to identify bottlenecks, split tasks into multiple parallel tasks, identify slow AI model calls, and more.

DEV Community

Internals of PostgreSQL Chapter 5 : Concurrency Control

How I Cut 22.3 Seconds Off an API Call with Sentry 🕒

Top comments (0)

Read next

Top AI Tools for Generating SQL in 2024

The MinIO alternative for Time-Series Based Data

Building a Local AI Task Planner with ClientAI and Ollama

Exploring new AWS Aurora DSQL. What is it ? Why it is important ? How to quickstart ?

Okay