Mustafa Siddiqui

Posted on Oct 7

How MySQL Actually Works: A Deep Dive into Database Internals

#architecture #performance #mysql #database

You write this query:

SELECT * FROM users WHERE id = 42;

And it works. Magic happens. Your database returns the row in milliseconds. You feel like a wizard.

But what actually happened when you pressed enter? How did MySQL find that one row among millions? Why does adding an index make queries 1000x faster? And why does your production database grind to a halt when you forget WHERE on an UPDATE?

I spent years blissfully writing SQL, knowing just enough to be dangerous. I understood indexes were "good for performance" and that transactions prevented "bad things." I could write a JOIN and knew what a primary key was. I thought that was enough.

Then I tried to scale a service past 10,000 requests per second. My queries that worked fine in development took 30 seconds in production. My carefully designed schema caused deadlocks under load. That's when I realized I had no clue what MySQL actually did when I ran a query.

This isn't another tutorial about SELECT statements and JOIN syntax. This is about what happens beneath those abstractions - how InnoDB stores data on disk, how the query optimizer decides execution plans, and how MVCC lets thousands of transactions run simultaneously without stepping on each other.

The Architecture: Layers All The Way Down

I used to think MySQL was just "the database." One monolithic thing that stored data and ran queries. This is technically accurate but completely useless for understanding what's actually happening.

Here's the model that finally clicked: MySQL is a three-layer cake, and most people only taste the frosting.

When you execute a query, it flows through distinct layers that each have specific responsibilities. Understanding these layers explains why certain operations are fast while others bring your database to its knees.

Layer 1: The Connection Layer

Before MySQL does anything with your query, it needs to authenticate you and manage your connection. This layer handles the network protocol, connection pooling, and thread management.

-- When you connect to MySQL, this happens behind the scenes
mysql -h localhost -u root -p
-- MySQL creates a connection thread
-- Authenticates your credentials
-- Allocates a buffer for this connection
-- You're now ready to send queries

Each connection gets its own thread. This should sound familiar if you read my threading article - it means MySQL faces the same context switching and memory overhead issues that killed my multithreaded server. This is why connection pooling exists, and why cloud databases charge you per connection.

The connection layer maintains state for your session: which database you're using, your session variables, your transaction state. All of this lives in memory for as long as your connection exists.

Layer 2: The SQL Layer

This is where your query gets parsed, optimized, and executed. The SQL layer is storage-engine-agnostic - it doesn't care whether you're using InnoDB, MyISAM, or some other engine underneath.

SELECT u.name, COUNT(o.id) 
FROM users u 
JOIN orders o ON u.id = o.user_id 
WHERE u.created_at > '2024-01-01' 
GROUP BY u.id;

The SQL layer parses this text into an Abstract Syntax Tree, validates that the tables and columns exist, checks your permissions, and then hands it to the query optimizer.

The optimizer is where the magic - and sometimes tragedy - happens. It looks at available indexes, table statistics, and join orderings to produce an execution plan. But here's the critical insight: the optimizer is guessing. It estimates costs based on statistics, and sometimes it guesses wrong.

This is why the same query can be fast one day and slow the next. The statistics changed. The data distribution shifted. What the optimizer thought was a good plan turned out to be terrible.

Layer 3: The Storage Engine Layer

This is where data actually gets written to disk and read back. MySQL's pluggable storage engine architecture means you can swap this layer out entirely, but in practice, everyone uses InnoDB.

InnoDB is where the real complexity lives. It manages data files, buffer pools, transaction logs, and all the intricate machinery that makes a database work. When you understand InnoDB, you understand MySQL.

The beauty of this layered architecture is that the SQL layer doesn't care about the storage details. It just asks the storage engine for data, and the engine figures out how to retrieve it. But the tragedy is that bad queries at the SQL layer can force the storage engine to do massive amounts of unnecessary work.

InnoDB: Where Your Data Actually Lives

Every InnoDB table is organized as a clustered index on the primary key. This single fact explains more about MySQL performance than any other concept.

When you create a table and insert rows, InnoDB doesn't just append data to a file. It builds a B+Tree structure where:

The leaf nodes contain your actual row data
The tree is sorted by primary key
All data access goes through this tree

CREATE TABLE users (
    id INT PRIMARY KEY AUTO_INCREMENT,
    email VARCHAR(255),
    name VARCHAR(100),
    created_at TIMESTAMP
);

-- InnoDB builds a B+Tree that looks like this:
--
--                [100]
--              /       \
--         [50]           [150]
--        /    \         /     \
--    [1-49] [50-99] [100-149] [150-200]
--
-- Leaf nodes contain the actual row data

This clustered index structure has profound implications that ripple through everything else.

Primary Keys: The Most Important Choice You'll Make

Your primary key isn't just a unique identifier. It's the physical organization of your data on disk. Choose wrong, and you'll pay for it forever.

-- BAD: UUID primary key
CREATE TABLE bad_users (
    id CHAR(36) PRIMARY KEY DEFAULT (UUID()),  -- Random values
    email VARCHAR(255),
    name VARCHAR(100)
);

-- GOOD: Auto-increment primary key
CREATE TABLE good_users (
    id BIGINT PRIMARY KEY AUTO_INCREMENT,  -- Sequential values
    email VARCHAR(255),
    name VARCHAR(100)
);

Why does this matter? Because UUIDs are random, and InnoDB inserts them all over the B+Tree. Every insert might cause page splits as InnoDB makes room in the middle of the tree. The tree becomes fragmented, cache efficiency plummets, and inserts get slower and slower.

Auto-increment integers are sequential. InnoDB appends them to the end of the tree. No page splits, no fragmentation, perfect cache behavior. My production database went from 5,000 inserts/second with UUIDs to 50,000 inserts/second with auto-increment integers. Ten times faster, just from choosing the right primary key.

The worst part? You can't easily change the primary key after the fact. Rebuilding a large table takes hours or days, and requires downtime or complex migration strategies.

Secondary Indexes: The Performance Multiplier

Secondary indexes let you look up rows by columns other than the primary key. But they're not standalone structures - they're intimately connected to the clustered index.

CREATE INDEX idx_email ON users(email);

-- This index stores:
-- email value -> primary key
-- NOT: email value -> row data

A secondary index stores the indexed column value and the primary key. That's it. When you query by email, MySQL performs two lookups:

Find the primary key in the secondary index
Look up the row data using the primary key in the clustered index

SELECT * FROM users WHERE email = 'user@example.com';

-- Step 1: Look up in idx_email
-- email='user@example.com' -> id=42

-- Step 2: Look up in clustered index  
-- id=42 -> (42, 'user@example.com', 'John Doe', '2024-01-01 00:00:00')

This two-step lookup is called a "bookmark lookup" or "table access by index rowid." It's why covering indexes matter so much.

Covering Indexes: The Secret Weapon

If your query only needs columns that exist in the secondary index, MySQL can skip the second lookup entirely:

-- Add name to the index
CREATE INDEX idx_email_name ON users(email, name);

-- Now this query only needs the index
SELECT name FROM users WHERE email = 'user@example.com';

-- MySQL finds: email='user@example.com' -> (id=42, name='John Doe')
-- Done! No need to touch the clustered index

This is huge for performance. I've seen queries go from 100ms to 1ms just by adding one column to an index. The database does less work, touches fewer pages, and uses less cache space.

But covering indexes have a cost: they make the index bigger, which means more disk I/O for index scans and more memory used in the buffer pool. Everything is a tradeoff.

The Buffer Pool: The Single Most Important Tuning Parameter

InnoDB doesn't read from disk for every query. It maintains a buffer pool - a cache of data pages in memory. This cache is the difference between fast queries and slow queries.

-- Check your buffer pool size
SHOW VARIABLES LIKE 'innodb_buffer_pool_size';

-- On a dedicated database server, set it to ~70-80% of RAM
SET GLOBAL innodb_buffer_pool_size = 8589934592;  -- 8GB

Data pages (16KB each) get loaded into the buffer pool on first access and stay there until evicted by LRU (Least Recently Used) algorithm. A warm buffer pool with high hit rates is the foundation of good database performance.

-- Check buffer pool statistics
SHOW ENGINE INNODB STATUS;

-- Look for these metrics:
-- Buffer pool hit rate: should be >99% for read-heavy workloads
-- Pages read from disk vs from cache
-- Free buffers available

When your buffer pool is too small, MySQL thrashes - constantly evicting pages that will be needed soon. Queries slow down, disk I/O spikes, and your database becomes the bottleneck.

I once debugged a production issue where queries suddenly got 10x slower. The cause? Someone had deployed a new service that connected to the same database server. The new service's queries evicted hot pages from the buffer pool, tanking performance for the original application. We fixed it by increasing the buffer pool size, but the real lesson was understanding cache pressure.

Query Execution: From SQL to Results

Let's trace what happens when you run a query. Understanding this flow explains why EXPLAIN is so important and why "simple" queries sometimes aren't simple at all.

SELECT u.name, COUNT(o.id) 
FROM users u 
JOIN orders o ON u.id = o.user_id 
WHERE u.created_at > '2024-01-01' 
GROUP BY u.id;

Step 1: Parsing

MySQL transforms your SQL text into an Abstract Syntax Tree - a structured representation of what you're asking for. This catches syntax errors and validates that tables and columns exist.

The parser doesn't care about performance yet. It just wants to understand what you're asking for. This is why syntax errors are instant - parsing is fast.

Step 2: Query Optimization

The optimizer's job is to figure out how to get the data you asked for. There are usually multiple ways to execute a query, and they have wildly different performance characteristics.

-- The optimizer considers:
-- 1. Which index to use for the WHERE clause?
--    - Full table scan?
--    - idx_created_at?
--    - Something else?

-- 2. What join algorithm to use?
--    - Nested loop join?
--    - Hash join?
--    - Index join?

-- 3. In what order to join tables?
--    - users first, then orders?
--    - orders first, then users?

The optimizer uses table statistics to estimate costs. It knows approximately how many rows are in each table, the cardinality of indexes, and the data distribution. Based on these statistics, it picks the plan it thinks will be cheapest.

But here's the problem: the statistics are estimates, and estimates can be wrong. The data might have changed since statistics were updated. The distribution might be skewed in ways the optimizer doesn't understand. The optimizer might have terrible selectivity estimates for your WHERE clause.

This is why identical queries can have different performance at different times. The statistics changed, and the optimizer picked a different plan.

Step 3: Execution

The execution engine walks through the plan, requesting data from the storage engine and processing it according to the query logic.

-- For our example query, execution might look like:

-- 1. Scan users table with idx_created_at
--    WHERE created_at > '2024-01-01'
--    Returns: 10,000 user rows

-- 2. For each user (nested loop join)
--    Look up orders WHERE user_id = ?
--    Returns: variable number of orders per user

-- 3. Aggregate with GROUP BY u.id
--    Count orders for each user

-- 4. Return results

If the optimizer chose poorly, execution might scan millions of rows unnecessarily. If it chose well, execution might use indexes efficiently and touch minimal data.

Understanding EXPLAIN: Your Window Into The Optimizer

EXPLAIN shows you the optimizer's execution plan before running the query. Learning to read EXPLAIN output is essential for understanding query performance.

EXPLAIN SELECT * FROM users WHERE email = 'user@example.com'\G

*************************** 1. row ***************************
           id: 1
  select_type: SIMPLE
        table: users
   partitions: NULL
         type: ref
possible_keys: idx_email
          key: idx_email
      key_len: 1022
          ref: const
         rows: 1
     filtered: 100.00
        Extra: NULL

The type column tells you how MySQL accesses data:

const: Single row lookup by primary key or unique index (fastest)
ref: Index lookup returning multiple rows (great)
range: Index range scan (good)
index: Full index scan (okay)
ALL: Full table scan (usually bad)

The rows column shows the estimated number of rows MySQL will examine. If this number is way higher than you expect, your query needs help.

-- Bad EXPLAIN output
EXPLAIN SELECT * FROM orders WHERE user_id = 42;

-- type: ALL (full table scan)
-- rows: 1000000 (examining 1M rows to find ~10)
-- Missing index!

-- Fix it
CREATE INDEX idx_user_id ON orders(user_id);

-- Now:
-- type: ref (index lookup)
-- rows: 10 (only examining relevant rows)

I once debugged a query that took 45 seconds in production but ran instantly in development. EXPLAIN showed that in production, MySQL was doing a full table scan because statistics indicated the table was tiny. In reality, the production table had 50 million rows. We ran ANALYZE TABLE to update statistics, and the query went from 45 seconds to 50 milliseconds.

Transactions and MVCC: The Invisible Complexity

MySQL's InnoDB uses Multi-Version Concurrency Control (MVCC) to handle transactions. This mechanism is brilliant, subtle, and the source of many production mysteries.

How MVCC Works

When you start a transaction, InnoDB takes a snapshot of the database state. Reads see data as it existed at transaction start, even if other transactions modify it.

-- Transaction 1
START TRANSACTION;
SELECT balance FROM accounts WHERE id = 1;
-- Returns: 1000

-- Transaction 2 (in another connection)
START TRANSACTION;
UPDATE accounts SET balance = 500 WHERE id = 1;
COMMIT;

-- Back to Transaction 1
SELECT balance FROM accounts WHERE id = 1;
-- Still returns: 1000 (repeatable read!)

COMMIT;

How does MySQL show you the old value after another transaction updated it? Undo logs.

InnoDB keeps old versions of rows in the undo log. When you read, InnoDB checks:

Is this row version visible to my transaction?
If not, walk the undo log to find the right version

This is why MVCC is magical - reads don't block writes, and writes don't block reads. Thousands of transactions can run simultaneously without waiting for each other.

But there's a cost.

The Undo Log: Where Performance Goes To Die

Old row versions accumulate in the undo log. The purge thread eventually cleans them up, but only when no transaction needs them anymore.

-- Transaction 1: Runs for 10 minutes
START TRANSACTION;
SELECT * FROM users WHERE id = 1;
-- ... does other work for 10 minutes ...

-- Meanwhile, Transaction 2-1000 update other rows
-- InnoDB can't purge any of these old versions
-- because Transaction 1 might need them

-- Undo log grows and grows
-- Queries slow down
-- Disk usage increases

Long-running transactions are poison for database performance. They prevent purge, causing the undo log to grow, which slows down everything. I've seen databases grind to a halt because someone left a transaction open overnight.

The fix is simple but requires discipline: keep transactions short. Start them as late as possible, commit them as soon as possible, and never do slow operations inside a transaction.

Transaction Isolation Levels

MySQL supports different isolation levels that trade consistency for performance:

-- READ UNCOMMITTED: Can see uncommitted changes
-- Don't use this unless you hate data integrity

-- READ COMMITTED: Each query sees committed data at query time
-- Common in other databases, but not MySQL's default

-- REPEATABLE READ: Snapshot at transaction start (MySQL default)
-- Prevents non-repeatable reads, enables consistent backups

-- SERIALIZABLE: Full isolation with read locks
-- Rarely needed, high performance cost

The default REPEATABLE READ is the right choice for most applications. It provides strong consistency guarantees without the overhead of SERIALIZABLE.

But REPEATABLE READ has a subtle behavior that catches people by surprise:

-- Transaction 1
START TRANSACTION;
SELECT COUNT(*) FROM users;
-- Returns: 100

-- Transaction 2
INSERT INTO users VALUES (...);
COMMIT;

-- Transaction 1 (continued)
SELECT COUNT(*) FROM users;
-- Still returns: 100 (repeatable read)

-- But...
SELECT * FROM users WHERE email = 'new@example.com';
-- Returns the new row! (phantom read)

InnoDB uses gap locks to prevent phantom reads, but they can cause unexpected deadlocks. Understanding these behaviors matters when debugging production issues.

The Redo Log: Write-Ahead Logging Explained

InnoDB uses write-ahead logging: changes are first written to the redo log (fast, sequential writes), then eventually flushed to data files (slower, random writes).

-- When you commit:
UPDATE accounts SET balance = balance - 100 WHERE id = 1;
COMMIT;

-- InnoDB:
-- 1. Writes changes to redo log (sequential disk write)
-- 2. Returns success to you
-- 3. Eventually flushes dirty pages to data files

This decoupling of commit from data file writes is how InnoDB achieves good write performance. Sequential log writes are orders of magnitude faster than random data file writes.

The redo log is circular. Once transactions commit and their data is flushed, the log space gets reused. But the redo log has a fixed size.

-- Check redo log size
SHOW VARIABLES LIKE 'innodb_log_file_size';

-- If your workload generates a lot of writes:
-- Larger redo log = less frequent flushing = better performance
-- But: Larger redo log = longer crash recovery time

If you write faster than InnoDB can flush dirty pages, you'll hit log space exhaustion. MySQL will stall all writes until space becomes available. I've seen this cause production incidents where write latency suddenly spikes to seconds.

The fix is either increase the redo log size or reduce write volume. But identifying which is happening requires understanding the relationship between redo logs, dirty pages, and the buffer pool.

Locking and Concurrency: When Things Go Wrong

InnoDB uses row-level locks to protect data integrity, but the locking behavior creates complexity that destroys production systems if you don't understand it.

Row-Level Locks

When you update a row, InnoDB acquires an exclusive lock on that row:

-- Transaction 1
START TRANSACTION;
UPDATE users SET balance = 500 WHERE id = 1;
-- Holds exclusive lock on id=1

-- Transaction 2
UPDATE users SET balance = 600 WHERE id = 1;
-- Blocks waiting for Transaction 1 to commit or rollback

This is straightforward. But there's more complexity lurking beneath.

Gap Locks: The Invisible Locks

To prevent phantom reads in REPEATABLE READ isolation, InnoDB locks "gaps" between index entries.

-- Transaction 1
START TRANSACTION;
SELECT * FROM users WHERE age BETWEEN 20 AND 30 FOR UPDATE;

-- InnoDB locks:
-- - All existing rows where 20 <= age <= 30
-- - The "gap" where age=25 would go (even if no row exists!)

-- Transaction 2
INSERT INTO users (age) VALUES (25);
-- Blocks! Gap lock prevents this insert

Gap locks prevent other transactions from inserting rows that would appear in your result set if you re-ran the query. This preserves REPEATABLE READ semantics, but it creates surprising blocking behavior.

I once debugged a production issue where INSERT statements were mysteriously blocked. No row-level lock was visible in SHOW ENGINE INNODB STATUS. The culprit was a gap lock from a SELECT query that had been running for minutes.

Deadlocks: The Mutual Destruction Problem

When two transactions wait for each other's locks, you get a deadlock:

-- Transaction 1
START TRANSACTION;
UPDATE users SET balance = 500 WHERE id = 1;

-- Transaction 2
START TRANSACTION;
UPDATE users SET balance = 600 WHERE id = 2;

-- Transaction 1 (continued)
UPDATE users SET balance = 700 WHERE id = 2;  -- Blocks on Transaction 2

-- Transaction 2 (continued)
UPDATE users SET balance = 800 WHERE id = 1;  -- Blocks on Transaction 1
-- DEADLOCK!

InnoDB detects deadlocks and rolls back the smaller transaction (less work done). But deadlocks tank performance and can cause data inconsistencies if not handled properly.

The fix is always acquiring locks in a consistent order across all transactions:

-- Safe approach: order by primary key
UPDATE users SET balance = balance - 100 
WHERE id IN (1, 2) 
ORDER BY id;

Production deadlocks are often caused by application code that updates rows in random order based on user input. The fix requires careful code review and consistent lock ordering.

Replication: How MySQL Scales Reads

MySQL replication is how you scale reads and enable high availability. But replication is asynchronous by default, and that has implications.

The Binary Log

The source server writes all changes to the binary log:

-- Check binary log format
SHOW VARIABLES LIKE 'binlog_format';

-- ROW: Logs actual row changes (safe, recommended)
-- STATEMENT: Logs SQL statements (compact but dangerous)
-- MIXED: Switches between them (complicated)

Row-based replication logs the actual before/after row data. It's larger but handles edge cases that statement-based replication gets wrong.

How Replication Works

Source writes changes to binlog
Replica's IO thread reads binlog and writes to relay log
Replica's SQL thread reads relay log and applies changes

The key insight: there's always lag. The replica is behind the source by some amount of time.

-- On the replica
SHOW REPLICA STATUS\G

-- Look for:
Seconds_Behind_Source: 5

If you write to the source and immediately read from a replica, you might not see your write. This causes bugs that are hard to reproduce because they're timing-dependent.

-- User updates their profile
UPDATE users SET name = 'New Name' WHERE id = 42;

-- Application immediately reads from replica
SELECT name FROM users WHERE id = 42;
-- Returns: 'Old Name' (replication lag!)

-- User sees stale data, thinks update failed

The fix is either:

Read from source after writes (but this defeats the point of replicas)
Accept eventual consistency and design your application accordingly
Use synchronous replication (much slower, but consistent)

GTID: The Modern Way

GTIDs (Global Transaction Identifiers) give each transaction a unique ID across the replication topology:

-- GTID format
server_uuid:transaction_id
-- Example: 3E11FA47-71CA-11E1-9E33-C80AA9429562:23

-- Enable GTIDs
gtid_mode = ON
enforce_gtid_consistency = ON

GTIDs make failover easier because you can precisely track which transactions each replica has applied. Vitess uses GTIDs extensively for resharding operations.

Connecting This to Vitess

Understanding MySQL internals explains why Vitess exists and how it works:

Buffer Pool: Vitess shards data, so each shard's buffer pool caches a fraction of total data. Better cache hit rates, more efficient memory usage.

Secondary Indexes: If your query uses only a secondary index without the sharding key, Vitess must scatter-gather across all shards. Understanding the secondary index lookup explains why this is expensive.

Replication: Vitess manages replica pools per shard. It routes reads to replicas, but you must understand replication lag.

Transactions: Cross-shard transactions require two-phase commit. They're much slower than single-shard transactions because they involve coordination between shards.

Binary Logs: Vitess's VReplication uses the binlog to copy data during resharding. Understanding binlog format and replication mechanics explains how online schema changes work.

Why Understanding This Matters

When I first learned SQL, I thought databases were magic boxes. I wrote queries without understanding the execution model, chose primary keys randomly, and wondered why production was slow when development was fast.

Now I understand why my queries failed. UUIDs as primary keys caused page splits and fragmentation. Missing indexes forced full table scans. Long-running transactions prevented undo log purge. Poor lock ordering caused deadlocks under load.

Understanding the internals changed how I think about databases:

Primary keys matter: Choose sequential values for clustered index efficiency
Secondary indexes have costs: Understand the two-step lookup and covering indexes
The buffer pool is everything: Size it appropriately and monitor hit rates
Query optimization is statistical: EXPLAIN and ANALYZE are essential tools
MVCC isn't free: Long transactions kill performance through undo log growth
Replication is asynchronous: Design for eventual consistency or pay the cost of synchronous replication
Lock ordering prevents deadlocks: Consistency matters across all code paths

This knowledge directly influenced my schema designs. Instead of UUIDs everywhere, I use auto-increment integers or ULIDs. Instead of ignoring indexes, I design covering indexes for hot queries. Instead of long transactions, I keep them as short as possible.

The Hidden Complexity

SELECT * FROM users WHERE id = 42 looks simple, but it triggers:

Query parsing and optimization
Buffer pool lookup or disk read
B+Tree traversal
Page latch acquisition
Row version checking (MVCC)
Result set construction

Every convenience in SQL hides complexity. Understanding these mechanisms helps you use them effectively.

When you see performance problems in production, they're usually not because MySQL is broken. They're because the query pattern didn't match what the optimizer expected, the working set didn't fit in the buffer pool, or lock contention wasn't considered.

Race conditions happen because replication is asynchronous. Deadlocks happen because lock ordering wasn't consistent. Performance problems happen because the schema design didn't match the access patterns.

What's Next

This is just the foundation. Real-world database work involves query optimization, schema design, backup strategies, and operational concerns like monitoring and capacity planning.

But now you understand what happens when MySQL executes a query. You know why choosing the right primary key matters, why indexes speed up reads, and how MVCC enables concurrent transactions.

You understand that the buffer pool is the key to performance, that the query optimizer uses statistics to guess execution plans, and that replication lag is a fundamental property of asynchronous replication.

That mental model changes everything. Databases stop being magic and become engineering. You can reason about performance characteristics, debug production issues, and design schemas that scale effectively.

The next time someone asks you "how does MySQL work," you won't just say "it stores data and runs queries." You'll understand the B+Tree structure, the query execution flow, the transaction machinery, and the replication mechanics. You'll know why database performance is both powerful and fragile.

And maybe, just maybe, you won't make the same mistakes I did when trying to scale past 10,000 requests per second.

DEV Community