Philip McClarence

Posted on Mar 1

PostgreSQL vs MySQL in 2026: Performance, Features and When to Use Each

#database #performance #postgres #sql

PostgreSQL vs MySQL: An Honest Comparison from Someone Who Uses Both

Last year I migrated a mid-size SaaS application from MySQL 8 to PostgreSQL 16. The migration itself was uneventful -- schema translation, data pump, a weekend of testing. What surprised me was the Monday after. A reporting query that had been a persistent thorn -- 45 seconds, running against a 12-million-row orders table with three subqueries and two window functions -- finished in 1.8 seconds on PostgreSQL. Same data. Same hardware. The query planner just understood what I was asking for.

That single moment captures the core difference between these two databases better than any benchmark. MySQL is not slow. It handles straightforward workloads extremely well. But when your queries start expressing complex business logic -- CTEs referencing CTEs, window functions with frame clauses, JSONB filtering nested inside joins -- PostgreSQL's optimizer has a fundamentally deeper understanding of what to do with them.

This is not a "PostgreSQL is better" article. I still run MySQL in production for specific workloads where it genuinely fits. But the differences are real, they matter, and choosing the wrong one for your use case costs you months of workarounds.

Architecture: Processes vs Threads, MVCC vs Undo Logs

The architectural divergence starts at the process model and ripples through everything.

PostgreSQL forks a new OS process for each client connection. Each process has its own memory space. This provides strong isolation -- a misbehaving query cannot corrupt another session's memory -- but comes with overhead. Connection pooling (PgBouncer, Supavisor) is effectively mandatory for production deployments with more than a few hundred connections.

MySQL (InnoDB) uses a thread-per-connection model. Threads share memory space within a single process, which is lighter and scales to thousands of connections without an external pooler. The tradeoff is that a corrupted thread can theoretically affect others, though this is extremely rare in practice.

MVCC Implementation

This is where things get architecturally interesting.

PostgreSQL's MVCC stores multiple physical versions of each row directly in the table heap. When you update a row, PostgreSQL writes a new version and marks the old one as dead. Both versions exist in the table until VACUUM reclaims the dead tuple. Readers never block writers, and writers never block readers -- each transaction sees a consistent snapshot based on transaction IDs.

The consequence: PostgreSQL needs autovacuum running continuously. Dead tuples accumulate, tables bloat, and performance degrades if vacuum falls behind. This is the single biggest operational concern with PostgreSQL and the one that trips up teams migrating from MySQL.

MySQL/InnoDB uses an undo log approach. The current row version lives in the table, and undo records in a separate rollback segment reconstruct older versions for transactions that need them. When all transactions that could see an old version complete, the undo record is purged.

The consequence: MySQL avoids table bloat from dead tuples, but long-running transactions can cause undo log growth. And critically, the undo log reconstruction adds overhead for long-running read transactions that need to traverse many undo records to find their visible version.

Neither approach is objectively superior. PostgreSQL's model gives it simpler, faster reads at the cost of vacuum maintenance. MySQL's model avoids bloat at the cost of undo log traversal for certain workloads.

Write-Ahead Logging

Both databases use WAL (write-ahead logging) for durability, but the implementations differ. PostgreSQL's WAL is a unified, page-level log of all changes. MySQL's InnoDB has a two-layer system: the redo log (for crash recovery) and the binary log (for replication). This dual-log architecture adds complexity and is one reason MySQL replication has historically been more fragile than PostgreSQL's streaming replication.

Performance: Where Each Database Wins

This is the section where most comparison articles get dishonest. Let me be direct.

MySQL is Faster For:

Simple primary key lookups. A SELECT * FROM users WHERE id = 12345 on MySQL is marginally faster. The thread model has less overhead than process forking for high-connection-count OLTP, and InnoDB's clustered index means the primary key lookup is a single B-tree traversal that lands directly on the row data.

High-concurrency simple writes. Inserting millions of rows per second into a simple table with minimal indexing, MySQL's thread pool handles the connection overhead more efficiently. For workloads that are 90% single-row INSERTs and simple SELECTs, MySQL can sustain higher throughput with less tuning.

Read replicas for simple queries. MySQL's replication for read scaling of simple queries is mature and well-understood.

PostgreSQL is Faster For:

Complex analytical queries. This is where PostgreSQL pulls ahead decisively. The query planner is more sophisticated -- it considers more join strategies, more accurately estimates costs, and makes better decisions on large multi-table joins.

Parallel query execution. PostgreSQL can parallelize sequential scans, joins, aggregations, and sorts across multiple CPU cores. A query scanning 100 million rows can use 8 workers and finish in a fraction of the sequential time:

-- PostgreSQL parallel aggregate
SET max_parallel_workers_per_gather = 8;

EXPLAIN ANALYZE
SELECT region, count(*), avg(order_total)
FROM orders
WHERE created_at >= '2025-01-01'
GROUP BY region;

-- Gather Merge  (actual time=0.8..1,240 ms)
--   Workers Planned: 8
--   Workers Launched: 8
--   -> Partial HashAggregate (actual time=900..920 ms)
--        -> Parallel Seq Scan on orders (actual time=0.1..600 ms)

MySQL 8 added limited parallel read support, but it is restricted to SELECT COUNT(*) and a handful of specific operations. It cannot parallelize joins, sorts, or general aggregations.

CTEs and recursive queries. PostgreSQL's Common Table Expressions are fully optimized. Recursive CTEs are indispensable for tree structures, graph traversals, and hierarchical data:

-- Recursive CTE: find all subordinates of a manager
WITH RECURSIVE team AS (
    SELECT employee_id, name, manager_id, 1 AS depth
    FROM employees WHERE employee_id = 100
    UNION ALL
    SELECT e.employee_id, e.name, e.manager_id, t.depth + 1
    FROM employees e JOIN team t ON e.manager_id = t.employee_id
    WHERE t.depth < 10
)
SELECT * FROM team ORDER BY depth, name;

MySQL 8 added CTEs, but the optimizer handles them less efficiently, particularly when the CTE is referenced multiple times.

Window functions. Both databases support window functions, but PostgreSQL's implementation is more complete and better optimized:

-- Running totals with frame clause
SELECT
    order_date,
    revenue,
    sum(revenue) OVER (
        ORDER BY order_date
        ROWS BETWEEN 6 PRECEDING AND CURRENT ROW
    ) AS rolling_7_day_revenue,
    rank() OVER (PARTITION BY region ORDER BY revenue DESC) AS region_rank
FROM daily_revenue;

PostgreSQL supports GROUPS, EXCLUDE, and FILTER clauses within window functions that MySQL does not.

SQL Feature Comparison

This is where PostgreSQL's advantage is most stark.

JSONB

PostgreSQL's JSONB is a first-class binary JSON type with indexing, operators, and full query integration:

-- GIN index on JSONB for fast lookups
CREATE INDEX idx_metadata ON events USING gin (metadata);

-- Query nested JSON with indexing support
SELECT * FROM events
WHERE metadata @> '{"source": "api", "version": 2}';

-- Update specific nested keys
UPDATE events
SET metadata = jsonb_set(metadata, '{processed}', 'true')
WHERE metadata->>'source' = 'api';

MySQL's JSON type stores data as text internally with a binary representation, but the indexing story is weaker. You need generated columns with indexes on extracted paths -- functional, but more boilerplate and less flexible than PostgreSQL's GIN operator classes.

Array Types

-- Native array columns
CREATE TABLE products (
    product_id SERIAL PRIMARY KEY,
    name TEXT,
    tags TEXT[],
    price_history NUMERIC[]
);

-- Array operators
SELECT * FROM products WHERE tags @> ARRAY['electronics', 'sale'];
SELECT * FROM products WHERE 'premium' = ANY(tags);

MySQL has no array type. The workaround is a JSON array or a junction table.

Range Types

-- Date ranges for booking systems
CREATE TABLE reservations (
    reservation_id SERIAL PRIMARY KEY,
    room_id INT,
    during DATERANGE,
    EXCLUDE USING gist (room_id WITH =, during WITH &&)
);

-- The EXCLUDE constraint prevents overlapping bookings automatically
INSERT INTO reservations (room_id, during)
VALUES (1, '[2026-03-01, 2026-03-05)');  -- succeeds

INSERT INTO reservations (room_id, during)
VALUES (1, '[2026-03-04, 2026-03-08)');  -- fails: overlapping range

This kind of constraint is impossible to express declaratively in MySQL. You would need application-level logic or triggers.

Full-Text Search

PostgreSQL has built-in full-text search with tsvector, tsquery, ranking, and language-specific stemming:

-- Built-in full-text search
CREATE INDEX idx_fts ON articles USING gin (to_tsvector('english', title || ' ' || body));

SELECT title, ts_rank(to_tsvector('english', body), query) AS relevance
FROM articles, to_tsquery('english', 'postgres & performance') query
WHERE to_tsvector('english', title || ' ' || body) @@ query
ORDER BY relevance DESC;

MySQL has full-text indexes on InnoDB tables, but the functionality is more limited -- no language-aware stemming, no ranking functions, no phrase search with proximity operators.

Replication and High Availability

PostgreSQL

Streaming replication is the default: a binary, byte-level WAL stream from primary to replicas. It is simple, reliable, and enables hot standby (read-only queries on replicas).

Logical replication (introduced in PostgreSQL 10, significantly improved through 16) replicates at the row level, allowing selective table replication, cross-version replication, and multi-directional setups. This is critical for zero-downtime major version upgrades.

MySQL

Binary log replication is MySQL's traditional approach, and it has improved significantly with GTID-based replication. MySQL Group Replication provides multi-primary capabilities, though it comes with restrictions (all tables must have primary keys, specific transaction isolation requirements).

MySQL's replication ecosystem is more mature for simple primary-replica setups. PostgreSQL's logical replication is more flexible for complex topologies and cross-version scenarios.

The Extension Ecosystem

This is PostgreSQL's ace card. There is no MySQL equivalent.

PostGIS transforms PostgreSQL into a full spatial database -- used by governments, logistics companies, and mapping platforms worldwide. It supports 2D/3D geometries, raster data, topology, and geocoding with spatial indexing.

TimescaleDB adds hypertables, automatic partitioning, columnar compression, and continuous aggregates for time-series data. Teams using this avoid the operational complexity of a separate time-series database.

pgvector enables vector similarity search for AI/ML embeddings. Store, index, and query vectors alongside your relational data:

-- Vector similarity search (pgvector)
CREATE TABLE documents (
    doc_id SERIAL PRIMARY KEY,
    content TEXT,
    embedding vector(1536)
);

CREATE INDEX ON documents USING hnsw (embedding vector_cosine_ops);

SELECT content, embedding <=> '[0.1, 0.2, ...]'::vector AS distance
FROM documents
ORDER BY distance
LIMIT 10;

pg_cron adds cron-style job scheduling inside the database. pg_stat_statements provides normalized query performance tracking. pg_trgm adds fuzzy string matching with index support.

MySQL has plugins, but they are narrower in scope. There is no MySQL extension that adds spatial capabilities at PostGIS's level, no native vector search, no time-series engine. If your application needs geospatial queries, vector search, or time-series analytics, PostgreSQL is the only choice among the two that does not require bolting on a separate specialized database.

Monitoring and Observability

PostgreSQL exposes its internals through a rich system catalog: pg_stat_statements for query performance, pg_stat_activity for live sessions, pg_stat_user_tables for table-level I/O and vacuum statistics, pg_locks for lock analysis. The auto_explain extension captures execution plans for slow queries automatically.

MySQL's Performance Schema provides similar capabilities -- query digests, wait event analysis, memory instrumentation -- but it carries a measurable overhead (typically 5-10%) and the schema is more complex to query directly.

The depth of PostgreSQL's catalog system enables specialized monitoring tools. Tools like myDBA.dev provide automated health checks across 10 domains, EXPLAIN plan capture with regression detection, and index advisory specifically for PostgreSQL -- the kind of deep observability that PostgreSQL's catalog system enables.

The difference is practical: when a PostgreSQL query regresses, you can trace it from pg_stat_statements to the specific plan change in pg_stat_plans to the table statistics that caused the planner to choose a different strategy. MySQL's equivalent investigation requires cross-referencing multiple Performance Schema tables with less direct causality.

Syntax Differences: Quick Reference

For teams evaluating a migration or working with both databases, here are the key syntax differences:

Operation	PostgreSQL	MySQL
Auto-increment	`GENERATED ALWAYS AS IDENTITY`	`AUTO_INCREMENT`
Upsert	`INSERT ... ON CONFLICT DO UPDATE`	`INSERT ... ON DUPLICATE KEY UPDATE`
String concatenation	`'hello' \	\
Boolean type	Native {% raw %}`BOOLEAN`	`TINYINT(1)` (aliased as BOOLEAN)
Case sensitivity	Case-sensitive by default	Case-insensitive by default (collation)
LIMIT with offset	`LIMIT 10 OFFSET 20`	`LIMIT 20, 10` or `LIMIT 10 OFFSET 20`
Current timestamp	`now()` or `CURRENT_TIMESTAMP`	`NOW()` or `CURRENT_TIMESTAMP`
Schema support	Full schema namespacing	Database = schema (no separation)
IF EXISTS in DDL	Supported on most DDL	Supported on most DDL
Type casting	`value::integer` or `CAST()`	`CAST()` only
Regex matching	`column ~ 'pattern'`	`column REGEXP 'pattern'`
JSON access	`data->>'key'`	`data->>'$.key'` or `JSON_EXTRACT()`

One syntax difference that catches people: PostgreSQL uses $$ dollar quoting for functions and procedures, while MySQL uses DELIMITER changes. PostgreSQL's approach is cleaner but unfamiliar to MySQL developers.

Feature Comparison Table

Feature	PostgreSQL	MySQL (InnoDB)
MVCC approach	Heap-based multi-version	Undo log reconstruction
Connection model	Process-per-connection	Thread-per-connection
Parallel queries	Full (scans, joins, aggregates, sorts)	Limited (COUNT only)
CTEs (WITH)	Fully optimized	Supported, less optimized
Window functions	Complete implementation	Basic implementation
JSONB	Binary, indexed, operators	Text-based, generated column indexes
Array types	Native	Not supported
Range types	Native with exclusion constraints	Not supported
Full-text search	Built-in with stemming and ranking	Basic, no stemming
Partial indexes	Yes	Not supported
Expression indexes	Yes	Functional indexes (8.0+)
Table inheritance	Yes	Not supported
Foreign data wrappers	Yes (query external sources as tables)	FEDERATED (limited)
Extensions	Rich ecosystem (PostGIS, TimescaleDB, pgvector)	Plugins (narrower scope)
Logical replication	Yes (selective, cross-version)	Binary log (GTID)
Partitioning	Declarative (range, list, hash)	Declarative (range, list, hash, key)
CHECK constraints	Enforced	Enforced (8.0.16+, ignored before)
Generated columns	Stored and virtual	Stored and virtual
Connection pooling	External (PgBouncer) required	Thread pool (Enterprise) or ProxySQL

When to Choose MySQL

Be honest about MySQL's strengths:

WordPress, Drupal, or legacy PHP applications. The ecosystem assumes MySQL. Fighting this creates more problems than it solves.
Simple CRUD workloads at high concurrency. If your application is 95% single-row lookups and inserts with thousands of concurrent connections, MySQL's thread model handles this natively without an external pooler.
Team expertise. A team that knows MySQL deeply will outperform the same team on PostgreSQL during the learning curve. Operational knowledge matters more than theoretical feature advantages.
Managed MySQL offerings. AWS Aurora MySQL, PlanetScale, and Vitess provide MySQL-compatible horizontal scaling that has no direct PostgreSQL equivalent at the same maturity level.

When to Choose PostgreSQL

Complex queries are central to your application. Reporting, analytics, business intelligence, any workload with multi-join queries, CTEs, and window functions.
You need JSONB, arrays, or range types. If your data model mixes relational and semi-structured data, PostgreSQL handles it natively without bolting on a document store.
Geospatial data. PostGIS is the industry standard. No other relational database comes close.
Vector search for AI/ML. pgvector keeps embeddings alongside your relational data.
Time-series data. TimescaleDB avoids the need for a separate time-series database.
Data integrity is non-negotiable. PostgreSQL's constraint system -- CHECK, EXCLUDE, partial indexes, deferrable foreign keys -- provides guardrails at the database level that MySQL cannot match.
You want one database, not five. PostgreSQL with extensions can replace separate deployments of a document store, a time-series database, a spatial database, and a vector database. Fewer moving parts means fewer things to break at 3 AM.

The Bottom Line

MySQL is a good database. PostgreSQL is a more capable one. That distinction matters when your requirements grow beyond simple CRUD -- and in my experience, they always do.

If you are starting a new project today and you do not have a specific reason to choose MySQL, choose PostgreSQL. The SQL feature set is richer, the extension ecosystem is unmatched, the planner is smarter, and the community is moving faster. You will never regret having range types, JSONB, CTEs, and parallel queries available when you need them.

If you are running MySQL and it is working well, do not migrate for the sake of migrating. But if you are hitting walls -- complex queries that cannot be expressed cleanly, missing data types that force application-level workarounds, a need for geospatial or vector search -- know that PostgreSQL solves these problems at the database layer.

DEV Community