Part 2 - MySQL vs PostgreSQL: Storage Architecture

#postgres #mysql #database #webdev

MySQL and PostgreSQL take fundamentally different approaches to data storage. MySQL (InnoDB) uses a clustered index architecture where table data is physically organized around the primary key, while PostgreSQL uses heap storage where data is stored unordered and all indexes are secondary. This architectural difference profoundly impacts insert performance, query patterns, index design, and maintenance requirements. Understanding these storage models is crucial for optimizing database performance and making informed indexing decisions.

1. MySQL: The Clustered Index Approach

MySQL's InnoDB storage engine automatically organizes your table data around the primary key using a clustered index. Think of it like a dictionary where entries are automatically sorted alphabetically — the data itself is stored in order, making lookups by the primary key incredibly fast, but inserting new entries in random order requires shifting things around.

How Clustered Indexes Work:

Primary Key Clustering: InnoDB automatically creates a clustered index on the primary key — this isn't optional, it's the foundation of how InnoDB stores data. If you don't define a primary key, InnoDB will use the first UNIQUE index with all NOT NULL columns. If no suitable index exists, InnoDB generates a hidden 6-byte row ID (GEN_CLUST_INDEX) as the clustered index
Data Storage: Table data is physically stored in primary key order, with the actual row data living in the leaf nodes of the B-tree index structure. The non-leaf pages of the B-tree contain index keys and pointers to other pages, while the leaf pages contain the complete row data including all columns
Secondary Indexes: Each secondary index entry contains the indexed column(s) plus a copy of the primary key value. When you query using a secondary index, InnoDB first searches the secondary index to find the primary key value, then uses that primary key to search the clustered index to retrieve the full row data — a two-step lookup process (secondary index → primary key value → clustered index → row data). This is why keeping primary keys small is crucial — every secondary index stores a copy of it
Index Structure: The clustered index B-tree is typically 2-4 levels deep, with data pages themselves forming the leaf level. This means primary key lookups require only 2-4 disk/memory page accesses to reach the actual data

The Good Stuff:

Lightning Fast Primary Key Lookups: Direct access to data without any additional lookup step
Exceptional Range Queries: Sequential primary key reads are incredibly fast since data is physically ordered
Storage Efficiency: Data is stored at the index leaf level, eliminating the need for separate data storage
Automatic Optimization: No configuration required—InnoDB handles everything for you
Smaller Secondary Indexes: Secondary indexes store primary key values instead of row pointers, which can be more efficient for small primary keys

The Not-So-Good:

Random Insert Pain: Non-sequential primary keys (like UUIDs) cause frequent page splits and reorganization
Secondary Index Overhead: Every secondary index lookup requires two steps—first finding the primary key, then looking up the data (unless you use covering indexes that include all needed columns, which eliminates the second lookup)
Hotspot Issues: High concurrency inserts on sequential keys (like auto-increment IDs) create contention at the "hot" end of the index. Modern MySQL versions (5.7+) mitigate this with "consecutive" lock mode using lightweight mutexes instead of table-level locks, but some contention remains under very high concurrency
Table Fragmentation: Random inserts can fragment clustered data over time, degrading performance
Large Primary Keys Are Costly: Since every secondary index stores the primary key, large primary keys (like UUIDs) bloat all your indexes

2. PostgreSQL: The Heap Storage Approach

PostgreSQL uses heap storage, which means your data is stored in whatever order it arrives, there's no automatic physical ordering. Think of it like throwing papers into a filing cabinet in any order, then using index cards to find what you need. Every index, including the primary key, is just a pointer to the actual location in the heap.

How Heap Storage Works:

No Clustered Indexes: PostgreSQL uses heap storage where data is not physically ordered by default. New rows are inserted into any available space in the table (typically at the end, but also in gaps left by deleted rows if there's enough space)
All Indexes Are Secondary: Every index, including the primary key, points directly to heap tuple locations using a TID (Tuple Identifier). A TID consists of two components: a block number (which 4KB page in the table file) and an offset number (which slot within that page). For example, TID (5,3) means block 5, slot 3
Index Structure: All indexes are completely separate from data storage, each maintaining their own B-tree structures. An index lookup retrieves the TID, then PostgreSQL uses that TID to directly fetch the row from the heap table by reading the specific block and slot
CLUSTER Command: You can manually reorder table data by an index using the CLUSTER command, but it's a one-time operation that doesn't persist as new data arrives. PostgreSQL must rewrite the entire table to cluster it, acquiring an ACCESS EXCLUSIVE lock that blocks all operations during the process

The Good Stuff:

Consistent Insert Performance: No clustering overhead regardless of key pattern — random or sequential, it doesn't matter
Direct Index Access: All indexes point directly to heap locations with a single lookup
Better Concurrent Inserts: No hotspot issues with sequential keys since there's no physical ordering to maintain
Flexible Access Patterns: All columns have equal access performance—no column is "special" like the primary key in MySQL
No Page Split Drama: Inserts don't cause the reorganization headaches that clustered indexes do

The Not-So-Good:

Primary Key Overhead: Even primary key lookups require index traversal + heap access (two steps)
No Automatic Clustering: Cannot automatically take advantage of physical ordering for range queries
Larger Storage Footprint: Separate index and heap storage means more disk space used
More Index Maintenance: All indexes require separate maintenance — when you update a single row, this is what happens:
- The old version of the row is marked as "dead"
- A new version of the row is inserted elsewhere in the table with a new physical address (TID)
- All indexes on the table must be updated to point to this new location
- Important exception: PostgreSQL's HOT (Heap-Only Tuple) optimization can avoid index updates when: (1) no indexed columns are modified, AND (2) there's enough free space in the same block to store the new tuple. In this case, the old tuple points to the new tuple within the same page, and indexes don't need updating. However, HOT updates only work when these conditions are met — any update to an indexed column or when the block is full requires updating all indexes
VACUUM Dependency: Requires regular VACUUM operations to reclaim space from dead tuples (more on this below)

3. MVCC: How Each Database Handles Concurrent Access

Both MySQL and PostgreSQL use MVCC (Multi-Version Concurrency Control) to allow simultaneous reads and writes without extensive locking. MVCC works by keeping multiple versions of data rows so that readers can access old versions while writers create new ones. However, the two databases implement this completely differently — MySQL uses a separate undo log, while PostgreSQL stores versions directly in the table.

What Is MVCC?

MVCC is a strategy that allows databases to handle concurrent access without locking readers. Instead of overwriting data, the database keeps old versions around so that ongoing transactions can still see the data as it existed when they started. This requires:

Keeping old versions of data instead of simply overwriting it
When a row is updated, preserving the original data as an older version while creating a new version
A mechanism to manage these different row versions
Eventual cleanup of old versions once no active transaction needs them

3.1. MySQL (InnoDB): Undo Log Approach

MySQL's InnoDB storage engine implements MVCC using a separate undo log — a dedicated space outside the main table where old row versions are stored. Think of it like keeping a separate notebook for your editing history while your main document always shows the latest version.

How InnoDB's Undo Log Works:

In-Place Updates: When you update a row, InnoDB modifies the row directly in the main table (the clustered index). The current row in the clustered index always represents the latest committed version
Hidden System Columns: InnoDB adds three hidden fields to each row:
- DB_TRX_ID (6 bytes): Transaction ID of the transaction that last inserted or updated the row
- DB_ROLL_PTR (7 bytes): Roll pointer that points to the undo log record containing the previous version
- DB_ROW_ID (6 bytes): Row ID that increases monotonically (only if no primary key is defined)
Undo Log Storage: The old version of the data is written to a separate undo log tablespace, not the main table. Undo logs are organized into rollback segments and can be stored in system tablespace or separate undo tablespaces
Version Reconstruction: When a transaction needs to see an older version (based on its read view/snapshot), InnoDB follows the DB_ROLL_PTR chain in the undo log to reconstruct the row as it existed at the required point in time. This may require following multiple undo log records if there were multiple updates
Automatic Cleanup: Background "purge" threads (configurable via innodb_purge_threads, up to 32 threads) automatically clean the undo log once no active transaction needs those old versions. Purge threads also physically remove delete-marked rows from indexes
Selective Index Updates: Only indexes affected by the update need to be modified. For secondary indexes, if the indexed column didn't change, the index entry doesn't need updating

The Good Stuff:

Clean Main Table: Your main table only stores the current version, keeping it compact
Automatic Maintenance: No manual intervention needed—purge threads handle cleanup automatically
Faster Updates: Updates are generally faster because only affected indexes need updating
Predictable Performance: The undo log is a separate structure with dedicated cleanup processes
Efficient Rollback: Rolling back transactions is fast since old versions are readily available

The Not-So-Good:

Undo Log Growth: Long-running transactions can cause the undo log to grow extremely large
Undo Log Contention: Heavy write workloads can create contention on undo log access
Delayed Cleanup: Long-running read transactions prevent purge threads from cleaning up old versions
Hidden Storage Costs: The undo log can consume significant disk space during peak periods
Tablespace Management: You need to monitor and potentially resize the undo tablespace

3.2. PostgreSQL: Tuple Versioning Approach

PostgreSQL implements MVCC by storing multiple versions of rows directly in the main table itself—a technique called tuple versioning. Think of it like keeping all your document revisions in the same file, with markers showing which version is current and which are old.

How PostgreSQL's Tuple Versioning Works:

Full Row Copies: When you update a row, PostgreSQL creates a completely new copy of the entire row in the table. This is a full physical copy with all columns, not just the changed values
In-Table Storage: Both the old version (now "dead") and the new version live in the same table file, side by side. Dead tuples remain in place until VACUUM removes them
Transaction Visibility Fields: Each tuple header contains critical visibility information:
- xmin: Transaction ID that inserted this tuple (when the row version was created)
- xmax: Transaction ID that deleted or updated this tuple (0 if still current)
- t_ctid: Points to the newer version of the row if updated, or to itself if it's the current version
- These fields determine which transactions can see this tuple based on their snapshot isolation level
Visibility Rules: A tuple is visible to a transaction if:
- The inserting transaction (xmin) has committed and is in the transaction's snapshot
- AND the deleting transaction (xmax) has not committed or is not in the transaction's snapshot
All Index Updates: Since the new row has a different physical location (TID), all indexes on the table must be updated to point to the new location (except in HOT update cases)
VACUUM Cleanup: A VACUUM process (manual or autovacuum) eventually removes dead tuples and reclaims space. VACUUM scans the table, identifies tuples where all active transactions have moved past their xmax, and marks that space as reusable

The Good Stuff:

Simple Architecture: Everything is in one place—no separate undo log to manage
No Version Reconstruction: Old versions are complete rows, no need to reconstruct from logs
Predictable Read Performance: Reading old versions is straightforward since they're complete tuples
Better for Short Transactions: Works well when transactions are short and VACUUM can keep up
Core Database Feature: MVCC is deeply integrated into PostgreSQL's architecture

The Not-So-Good:

Table Bloat: Dead tuples accumulate in the table, causing it to grow and degrade performance
Index Bloat: All indexes also bloat because they contain pointers to dead tuples
VACUUM Dependency: Performance is highly dependent on proper VACUUM tuning and frequency
All Indexes Updated: Every update requires updating every index on the table, regardless of which columns changed
Write Amplification: A single row update creates a full new copy of the row plus updates all indexes
Manual Tuning Required: You need to carefully tune autovacuum settings for write-heavy workloads

3.3. MVCC Comparison: Side by Side

MVCC Feature	MySQL (InnoDB)	PostgreSQL
Implementation	Undo Log - Engine-specific, separate from table	Tuple Versioning - Core architecture, in-table
Update Method	In-place modification, old data to undo log	Full new row copy, old version marked dead
Version Storage	Separate undo log space	Within the main table file
Old Version Format	Delta/changes only	Complete row copy
Cleanup Mechanism	Automatic background purge threads	VACUUM process (autovacuum or manual)
Index Updates per Update	Only affected indexes	All indexes on the table
Main Storage Impact	Main table stays compact	Table grows with dead tuples
Maintenance Requirement	Minimal (mostly automatic)	Requires VACUUM tuning
Bloat Risk	Undo log can grow large	Table and all indexes can bloat
Best For	Mixed workloads, long transactions	Short transactions, read-heavy workloads

Why This Matters:

Understanding these MVCC differences is crucial for performance tuning:

MySQL's Challenge: Long-running transactions prevent undo log cleanup, causing the undo log to grow indefinitely. Monitor your slowest queries and transactions to prevent this.
PostgreSQL's Challenge: Write-heavy workloads create dead tuples faster than autovacuum can clean them, leading to severe bloat. You need aggressive autovacuum tuning (lower autovacuum_vacuum_scale_factor, higher autovacuum_max_workers) and potentially manual VACUUM during maintenance windows.
Design Impact: In PostgreSQL, avoid adding unnecessary indexes since every index adds overhead on updates. In MySQL, be mindful of long-running read transactions that hold up undo log cleanup.

4. Key Takeaways: Side-by-Side Comparison

Factor	MySQL (Clustered Index)	PostgreSQL (Heap Storage)
Data Organization	Physically ordered by primary key	Unordered heap storage
Primary Key Lookup	Single step (direct access)	Two steps (index + heap)
Secondary Index Lookup	Two steps (index → PK → data)	Single step (index → heap)
Insert Performance	Sequential fast, random slow	Consistent regardless of pattern
Range Query Performance	Excellent on primary key	Good on any indexed column
Storage Space	More efficient (data in index)	Larger (separate index + heap)
Update Overhead	Only affected indexes updated	All indexes must be updated (except HOT updates)
Primary Key Choice	Critical (affects all indexes)	Less critical (just another index)
MVCC Implementation	Undo log (separate from table)	Tuple versioning (in table)
Maintenance	Automatic purge	Requires VACUUM tuning
Best For	Primary key access, range queries	Flexible access patterns, heavy writes

Bottom Line:

MySQL's clustered index architecture is optimized for primary key access and provides excellent performance for sequential inserts and primary key range queries, but requires careful primary key design and suffers with random inserts.
PostgreSQL's heap storage provides consistent insert performance and flexible access patterns, but requires all indexes to be updated on row changes (except for HOT updates) and needs proper VACUUM maintenance to prevent bloat.

Choose based on your access patterns, primary key characteristics, and operational requirements.