Harry Do

Posted on Oct 19

Part 4: MySQL vs PostgreSQL - Transaction Processing and ACID Compliance

#postgres #database #mysql #architecture

1. Overview: Key Architectural Differences
2. Isolation Levels and Concurrency Control
3. MVCC and Transaction Isolation
4. SERIALIZABLE Isolation: Pessimistic vs Optimistic Strategies

Transaction processing reveals fundamental architectural differences between MySQL and PostgreSQL. MySQL prioritizes performance and predictability through pessimistic locking. PostgreSQL prioritizes consistency and concurrency through optimistic conflict detection. Understanding these differences will help you write better applications and avoid subtle data integrity issues.

1. Overview: Key Architectural Differences

This section provides a high-level comparison of how MySQL and PostgreSQL handle transactions. We'll cover three fundamental areas where they differ: ACID compliance, default isolation levels, and MVCC implementation.

Core Architecture Comparison

Aspect	MySQL 8.4	PostgreSQL 17
ACID Compliance	Engine-dependent - InnoDB provides full ACID, MyISAM does not	Built-in - ACID compliance is part of core architecture
Default Isolation Level	REPEATABLE READ - Stricter by default	READ COMMITTED - More concurrent by default
MVCC Implementation	Undo log - Automatic cleanup, zero maintenance	Tuple versioning - Requires VACUUM maintenance
Phantom Read Prevention	Gap locking (at REPEATABLE READ)	Snapshot isolation (at REPEATABLE READ)

What Each Difference Means

ACID Compliance:
MySQL's ACID support depends on which storage engine you use. InnoDB (the default) provides full ACID compliance, but MyISAM does not. PostgreSQL has ACID compliance built into its core architecture—every storage mechanism supports it.

Default Isolation Levels:
MySQL defaults to REPEATABLE READ, providing stronger isolation out of the box. PostgreSQL defaults to READ COMMITTED, favoring higher concurrency. Both prevent phantom reads at REPEATABLE READ level, but through different mechanisms (gap locking vs snapshot isolation).

MVCC Maintenance:
MySQL's undo log approach means old row versions are automatically cleaned up by a background purge thread—zero operational overhead. PostgreSQL's tuple versioning approach stores multiple row versions in the table itself, requiring VACUUM to reclaim space from dead tuples.

2. Isolation Levels and Concurrency Control

💡 Key Insight: Both MySQL and PostgreSQL are overachievers—they prevent phantom reads at REPEATABLE READ even though the SQL standard doesn't require it! 🎓 But here's the plot twist: MySQL is like that friend who locks ALL the doors (gap locking), while PostgreSQL takes a snapshot and says "trust, but verify" (snapshot isolation). Different philosophies, different trade-offs!

2.1 Isolation Levels Comparison

The SQL standard defines four isolation levels, but MySQL and PostgreSQL interpret them quite differently. The key difference? Their default choices reveal their priorities.

Isolation Level	MySQL 8.4 Behavior	PostgreSQL 17 Behavior
READ UNCOMMITTED	Rarely Used - In practice behaves like READ COMMITTED in InnoDB	Not truly supported - Behaves exactly like READ COMMITTED
READ COMMITTED	Available - Allows non-repeatable reads and phantom reads	Default - Allows non-repeatable reads and phantom reads
REPEATABLE READ	Default - Gap locking + MVCC prevents phantoms, allows write-skew	Available - Snapshot isolation prevents phantoms, allows write-skew
SERIALIZABLE	Pessimistic - Converts plain SELECTs to FOR SHARE, prevents conflicts via locking	Optimistic - SSI detects conflicts at commit, can abort transactions

Notice the defaults? MySQL ships with REPEATABLE READ—it's saying "I'll protect you even if it means blocking more." PostgreSQL ships with READ COMMITTED—it's saying "Let's be fast and concurrent; upgrade isolation when you need it."

Here's the fascinating part: Both databases go beyond the SQL standard at REPEATABLE READ by preventing phantom reads (when new rows appear in repeated queries). The standard doesn't require this! But they achieve it in completely opposite ways:

MySQL's approach: Use gap locks to physically block inserts that would create phantoms. It's like putting a "Reserved" sign on every empty seat at a restaurant.
PostgreSQL's approach: Take a snapshot and ignore any new rows added after. It's like taking a photo of the restaurant—even if new people arrive, your photo stays the same.

Critical insight for your app: Neither prevents write-skew at REPEATABLE READ (we'll explain this next). If you need protection against sophisticated anomalies, both databases require SERIALIZABLE—but they implement it VERY differently.

2.2 Write-Skew Anomaly

What is Write-Skew?

Write-skew is the sneakiest of database anomalies—it's like two people independently making perfectly reasonable decisions that somehow create chaos when combined. 🤔

Let me explain with a real-world scenario that'll make this crystal clear.

The Doctor On-Call Scheduling Problem

Picture this: You're running a hospital with one iron-clad rule—at least 2 doctors must be on-call at all times. Right now, Alice, Bob, and Charlie are all on-call, so you're safely above the minimum. 🏥

It's Friday evening, and both Alice and Bob are exhausted. They each want to go off-call, but they're responsible professionals—they'll only leave if there are still enough doctors remaining.

Here's what happens at REPEATABLE READ isolation level:

Timeline:

Time    Transaction 1 (Alice)                    Transaction 2 (Bob)
----    ---------------------------              ---------------------------
T1      BEGIN                                    BEGIN
T2      SELECT COUNT(*) FROM doctors
        WHERE on_call = true;
        → Returns 3 ✅

T3                                               SELECT COUNT(*) FROM doctors
                                                 WHERE on_call = true;
                                                 → Returns 3 ✅

T4      "Great! 3 doctors on-call,
        safe for me to leave."
        UPDATE doctors
        SET on_call = false
        WHERE name = 'Alice';

T5                                               "Great! 3 doctors on-call,
                                                 safe for me to leave."
                                                 UPDATE doctors
                                                 SET on_call = false
                                                 WHERE name = 'Bob';

T6      COMMIT ✅                                COMMIT ✅

The Result: Both transactions succeed! But now only Charlie is on-call. 😱 The "at least 2" constraint is violated!

Why did this happen?

This is write-skew. Here's what makes it so tricky:

Both transactions read the same data (count of on-call doctors = 3)
Both made reasonable decisions based on what they saw
Both updated DIFFERENT rows (Alice's record vs Bob's record)
No direct conflict occurred—traditional locks see no problem!
The constraint was violated because neither transaction saw the other's changes

In database terms: Write-skew occurs when two concurrent transactions read overlapping data sets, make decisions based on what they read, then write to disjoint (non-overlapping) sets of rows in a way that violates an integrity constraint.

Why don't traditional locks catch this?

Because there's no write-write conflict! Alice updates Alice's row. Bob updates Bob's row. Different rows = no lock conflict. The database doesn't know that these two independent updates, when combined, violate a business rule. 🤷‍♂️

It's like two people checking the fridge, seeing 3 beers, each taking one—nobody's stealing the other's beer, but somehow the third roommate ends up with nothing. The reads overlap (both see the same count), but the writes don't (different rows).

The Key Difference:

Aspect	MySQL REPEATABLE READ	PostgreSQL REPEATABLE READ
Write-Skew Protection	❌ Not prevented	❌ Not prevented
Why It Occurs	Gap locks only prevent phantom reads, not cross-row constraint violations	Snapshot Isolation doesn't detect conflicts on different rows
Solution	Must use SERIALIZABLE or explicit locking (`SELECT ... FOR UPDATE`)	Must use SERIALIZABLE (SSI detects and aborts) or explicit locking

Bottom Line: Both databases allow write-skew at REPEATABLE READ. The real difference appears at SERIALIZABLE level—PostgreSQL's SSI can detect and abort write-skew patterns automatically, while MySQL's gap locking only prevents conflicts through blocking, not intelligent detection.

2.3 Gap Locking vs Predicate Locks

Remember how we said both MySQL and PostgreSQL prevent phantom reads at REPEATABLE READ? This is how they do it—and the approaches couldn't be more different. Understanding this difference is crucial because it directly impacts your application's concurrency under load.

The Core Problem: Phantom Reads

Imagine you're counting inventory:

SELECT COUNT(*) FROM products WHERE category = 'Electronics';
-- Returns 100

During your transaction, someone inserts a new product with category = 'Electronics'. If you run the same query again and suddenly get 101, that's a phantom read—a row that "appeared" out of nowhere. 👻

Both databases prevent this at REPEATABLE READ, but with radically different strategies.

MySQL's Gap Locking: The Pessimistic Gatekeeper

MySQL uses gap locks—it literally locks the "gaps" (empty spaces) in the index between existing records. Think of it like reserving not just the occupied tables at a restaurant, but also the empty spaces between them.

How it works:

When you run a query with FOR UPDATE, MySQL doesn't just lock the rows that match—it locks the index ranges those rows occupy, including the gaps. This physically prevents anyone from inserting new rows that would fall into those gaps.

Example: Lock products with IDs between 100-200:

SELECT * FROM products WHERE id BETWEEN 100 AND 200 FOR UPDATE;

MySQL locks:

All existing rows with IDs 100-200 (obviously)
The gap before 100 (e.g., 95-99)
The gap after 200 (e.g., 201-205)
All gaps between existing rows in the range

The catch: Gap locks are range-based, not logic-based. They don't understand your WHERE clause's full meaning—they just lock index ranges. This can block inserts that are logically unrelated to your query.

PostgreSQL's Predicate Locks: The Precision Specialist

PostgreSQL uses predicate locks (also called SIREAD locks)—it remembers the actual predicate (WHERE clause) you used and only blocks operations that would violate that specific predicate.

How it works:

Instead of locking physical gaps in an index, PostgreSQL tracks the logical condition of your query. It says "I'm watching for any inserts that match WHERE category = 'Electronics'"—and only those get blocked.

Example: Lock electronics products:

SELECT * FROM products WHERE category = 'Electronics' FOR UPDATE;

PostgreSQL blocks:

Only inserts where category = 'Electronics'
Inserts with other categories proceed freely

The advantage: Higher concurrency because only actual predicate matches get blocked. If your query was WHERE category = 'Electronics' AND price > 100, PostgreSQL only blocks inserts matching both conditions.

The Key Difference:

Aspect	MySQL (Gap Locking)	PostgreSQL (Predicate Locks)
Locking Scope	🔴 Range-based - Locks index gaps, may block unrelated inserts	🟢 Predicate-based - Only locks rows matching WHERE clause
Blocking Behavior	Blocks inserts in or near locked range, even outside query conditions	Blocks only inserts that match the exact query predicate
Concurrency Impact	Lower - Can block logically unrelated operations	Higher - Only blocks actual conflicts
Example	Query `WHERE date BETWEEN '2024-02-01' AND '2024-06-30'` may block insert at `2024-01-31` (adjacent gap)	Same query only blocks inserts matching both date range AND other conditions

Example Scenario:

-- Lock contracts: WHERE start_date BETWEEN '2024-02-01' AND '2024-06-30' AND office_id = 1

Insert Attempt	MySQL Gap Locking	PostgreSQL Predicate Locks
`office_id=1, date='2024-03-15'`	❌ BLOCKS (in range)	❌ BLOCKS (matches predicate)
`office_id=1, date='2024-01-31'`	❌ BLOCKS (adjacent gap)	✅ SUCCEEDS (outside date range)
`office_id=2, date='2024-03-15'`	✅ SUCCEEDS (different office)	✅ SUCCEEDS (different office)

Bottom Line: MySQL's gap locks are pessimistic and broader (blocking adjacent ranges), while PostgreSQL's predicate locks are precise (only blocking exact predicate matches). This makes PostgreSQL more concurrent at REPEATABLE READ level.

3. MVCC and Transaction Isolation

Note: For detailed MVCC storage implementation, see Part 2: Storage Architecture, Section 3.

Both databases use MVCC (Multi-Version Concurrency Control) to enable concurrent transactions, but their different MVCC implementations directly impact transaction isolation behavior. Understanding how MVCC affects transactions helps explain why certain isolation anomalies occur.

How MVCC Enables Isolation

MVCC allows readers to see consistent snapshots of data without blocking writers. The key question: When does a transaction see changes made by other transactions?

MySQL's Approach:

Uses undo log to reconstruct old row versions
At REPEATABLE READ: Creates a consistent snapshot at first read
Readers see the snapshot, even if other transactions commit changes
Writers acquire locks, creating potential blocking

PostgreSQL's Approach:

Uses tuple versioning with transaction IDs
At REPEATABLE READ: Creates a snapshot at transaction start
Readers see the snapshot, completely isolated from concurrent changes
Writers don't block readers (true snapshot isolation)

Transaction Behavior Implications

Scenario: Two concurrent transactions updating the same row

Event	MySQL (REPEATABLE READ)	PostgreSQL (REPEATABLE READ)
T1: BEGIN	Snapshot created on first read	Snapshot created at BEGIN
T1: SELECT balance	Sees 1000, creates snapshot	Sees 1000
T2: UPDATE balance = 900	T2 acquires row lock	T2 proceeds
T2: COMMIT	Lock released	Commits successfully
T1: SELECT balance again	Still sees 1000 (snapshot)	Still sees 1000 (snapshot)
T1: UPDATE balance = 800	Must wait if T2 holds lock	Proceeds, creates new version

Key Difference: PostgreSQL's tuple versioning allows both transactions to proceed without blocking, potentially leading to lost updates. MySQL's locking approach blocks T1's UPDATE until T2 commits.

Why This Matters for Isolation Levels

The MVCC implementation explains why:

Both prevent phantom reads at REPEATABLE READ - MySQL uses gap locks, PostgreSQL uses snapshots
Both allow write-skew at REPEATABLE READ - Neither detects cross-row constraint violations
PostgreSQL's SSI at SERIALIZABLE is more powerful - Tuple versioning enables dependency tracking
MySQL requires more explicit locking - Undo log approach is coupled with pessimistic locking

Bottom Line: MVCC isn't just about storage—it fundamentally shapes how transactions interact. MySQL's undo log approach favors predictability through locking. PostgreSQL's tuple versioning favors concurrency but requires VACUUM maintenance.

4. SERIALIZABLE Isolation: Pessimistic vs Optimistic Strategies

💡 Key Insight: This is the ultimate showdown! 🥊 At SERIALIZABLE isolation level, the philosophical differences between MySQL and PostgreSQL reach their peak. MySQL is the bouncer who doesn't let anyone suspicious near the door (pessimistic locking). PostgreSQL is the cool host who lets everyone in, then kicks out troublemakers at the end (optimistic SSI). MySQL says "better safe than sorry," while PostgreSQL says "let's roll the dice and see what happens!" Both approaches work—just depends on whether you prefer blocking or retrying.

SERIALIZABLE is the strictest isolation level—it guarantees that concurrent transactions produce the same result as if they ran one at a time, in some serial order. Sounds great, right? The catch is how you achieve this. Both databases get there, but the experience is totally different.

Note: For detailed locking mechanisms, see Section 2.3.

The Key Differences

Aspect	MySQL (Pessimistic)	PostgreSQL (Optimistic - SSI)
Strategy	Block conflicts before they happen	Detect conflicts at commit time
SELECT Behavior	Plain SELECTs become `SELECT ... FOR SHARE` (acquire shared locks)	Plain SELECTs don't acquire locks
Blocking Point	Early - On reads (FOR UPDATE/FOR SHARE)	Late - On commit (conflict detection)
Concurrency	❌ Lower - Extensive blocking	✅ Higher - Concurrent execution allowed
Transaction Failures	✅ Rare (blocking prevents conflicts)	❌ More common (serialization errors)
Write-Skew Detection	❌ No - Only prevents through blocking	✅ Yes - SSI detects and aborts
Application Code	✅ Simpler (no retry logic)	❌ More complex (must handle retries)

What this means in practice:

When you run transactions at SERIALIZABLE in MySQL, even plain SELECT statements automatically acquire shared locks (as if you wrote SELECT ... FOR SHARE). This means Transaction 2 trying to read what Transaction 1 is reading? Blocked. Transaction 2 trying to update what Transaction 1 read? Blocked. Everything waits politely in line.

PostgreSQL does the opposite. Transactions run freely, reading and writing concurrently. PostgreSQL's SSI (Serializable Snapshot Isolation) tracks dependencies between transactions. Only at commit time does PostgreSQL ask: "Would this ordering violate serializability?" If yes, one transaction gets aborted with a serialization error. If no, everyone commits happily.

This is why PostgreSQL can detect write-skew anomalies (remember the doctor scheduling problem?) while MySQL can't—SSI is smart enough to see the dangerous pattern. MySQL just blocks aggressively and hopes for the best.

Behavior Comparison

Scenario: Two concurrent transactions reading and updating different rows

Event	MySQL SERIALIZABLE	PostgreSQL SERIALIZABLE
T1: SELECT	Acquires shared locks	No locks, proceeds
T2: SELECT FOR UPDATE	❌ BLOCKS waiting for T1	✅ Proceeds concurrently
T1: UPDATE + COMMIT	Completes, T2 unblocks	Completes
T2: UPDATE + COMMIT	Completes after wait	May ABORT if SSI detects conflict

The Trade-off

Choose MySQL When	Choose PostgreSQL When
✅ Need predictable behavior (rare failures)	✅ Need maximum concurrency
✅ Want simpler application code (no retries)	✅ Need write-skew detection
✅ Low contention workload	✅ Have retry logic implemented
✅ Can tolerate blocking delays	✅ High-contention workload benefits from optimism

Bottom Line: MySQL = fewer failures but more blocking. PostgreSQL = more failures but higher throughput.

DEV Community