PostgreSQL vs MySQL: An Honest Comparison from Someone Who Uses Both
Last year I migrated a mid-size SaaS application from MySQL 8 to PostgreSQL 16. The migration itself was uneventful -- schema translation, data pump, a weekend of testing. What surprised me was the Monday after. A reporting query that had been a persistent thorn -- 45 seconds, running against a 12-million-row orders table with three subqueries and two window functions -- finished in 1.8 seconds on PostgreSQL. Same data. Same hardware. The query planner just understood what I was asking for.
That single moment captures the core difference between these two databases better than any benchmark. MySQL is not slow. It handles straightforward workloads extremely well. But when your queries start expressing complex business logic -- CTEs referencing CTEs, window functions with frame clauses, JSONB filtering nested inside joins -- PostgreSQL's optimizer has a fundamentally deeper understanding of what to do with them.
This is not a "PostgreSQL is better" article. I still run MySQL in production for specific workloads where it genuinely fits. But the differences are real, they matter, and choosing the wrong one for your use case costs you months of workarounds.
Architecture: Processes vs Threads, MVCC vs Undo Logs
The architectural divergence starts at the process model and ripples through everything.
PostgreSQL forks a new OS process for each client connection. Each process has its own memory space. This provides strong isolation -- a misbehaving query cannot corrupt another session's memory -- but comes with overhead. Connection pooling (PgBouncer, Supavisor) is effectively mandatory for production deployments with more than a few hundred connections.
MySQL (InnoDB) uses a thread-per-connection model. Threads share memory space within a single process, which is lighter and scales to thousands of connections without an external pooler. The tradeoff is that a corrupted thread can theoretically affect others, though this is extremely rare in practice.
MVCC Implementation
This is where things get architecturally interesting.
PostgreSQL's MVCC stores multiple physical versions of each row directly in the table heap. When you update a row, PostgreSQL writes a new version and marks the old one as dead. Both versions exist in the table until VACUUM reclaims the dead tuple. Readers never block writers, and writers never block readers -- each transaction sees a consistent snapshot based on transaction IDs.
The consequence: PostgreSQL needs autovacuum running continuously. Dead tuples accumulate, tables bloat, and performance degrades if vacuum falls behind. This is the single biggest operational concern with PostgreSQL and the one that trips up teams migrating from MySQL.
MySQL/InnoDB uses an undo log approach. The current row version lives in the table, and undo records in a separate rollback segment reconstruct older versions for transactions that need them. When all transactions that could see an old version complete, the undo record is purged.
The consequence: MySQL avoids table bloat from dead tuples, but long-running transactions can cause undo log growth. And critically, the undo log reconstruction adds overhead for long-running read transactions that need to traverse many undo records to find their visible version.
Neither approach is objectively superior. PostgreSQL's model gives it simpler, faster reads at the cost of vacuum maintenance. MySQL's model avoids bloat at the cost of undo log traversal for certain workloads.
Write-Ahead Logging
Both databases use WAL (write-ahead logging) for durability, but the implementations differ. PostgreSQL's WAL is a unified, page-level log of all changes. MySQL's InnoDB has a two-layer system: the redo log (for crash recovery) and the binary log (for replication). This dual-log architecture adds complexity and is one reason MySQL replication has historically been more fragile than PostgreSQL's streaming replication.
Performance: Where Each Database Wins
This is the section where most comparison articles get dishonest. Let me be direct.
MySQL is Faster For:
Simple primary key lookups. A SELECT * FROM users WHERE id = 12345 on MySQL is marginally faster. The thread model has less overhead than process forking for high-connection-count OLTP, and InnoDB's clustered index means the primary key lookup is a single B-tree traversal that lands directly on the row data.
High-concurrency simple writes. Inserting millions of rows per second into a simple table with minimal indexing, MySQL's thread pool handles the connection overhead more efficiently. For workloads that are 90% single-row INSERTs and simple SELECTs, MySQL can sustain higher throughput with less tuning.
Read replicas for simple queries. MySQL's replication for read scaling of simple queries is mature and well-understood.
PostgreSQL is Faster For:
Complex analytical queries. This is where PostgreSQL pulls ahead decisively. The query planner is more sophisticated -- it considers more join strategies, more accurately estimates costs, and makes better decisions on large multi-table joins.
Parallel query execution. PostgreSQL can parallelize sequential scans, joins, aggregations, and sorts across multiple CPU cores. A query scanning 100 million rows can use 8 workers and finish in a fraction of the sequential time:
-- PostgreSQL parallel aggregate
SET max_parallel_workers_per_gather = 8;
EXPLAIN ANALYZE
SELECT region, count(*), avg(order_total)
FROM orders
WHERE created_at >= '2025-01-01'
GROUP BY region;
-- Gather Merge (actual time=0.8..1,240 ms)
-- Workers Planned: 8
-- Workers Launched: 8
-- -> Partial HashAggregate (actual time=900..920 ms)
-- -> Parallel Seq Scan on orders (actual time=0.1..600 ms)
MySQL 8 added limited parallel read support, but it is restricted to SELECT COUNT(*) and a handful of specific operations. It cannot parallelize joins, sorts, or general aggregations.
CTEs and recursive queries. PostgreSQL's Common Table Expressions are fully optimized. Recursive CTEs are indispensable for tree structures, graph traversals, and hierarchical data:
-- Recursive CTE: find all subordinates of a manager
WITH RECURSIVE team AS (
SELECT employee_id, name, manager_id, 1 AS depth
FROM employees WHERE employee_id = 100
UNION ALL
SELECT e.employee_id, e.name, e.manager_id, t.depth + 1
FROM employees e JOIN team t ON e.manager_id = t.employee_id
WHERE t.depth < 10
)
SELECT * FROM team ORDER BY depth, name;
MySQL 8 added CTEs, but the optimizer handles them less efficiently, particularly when the CTE is referenced multiple times.
Window functions. Both databases support window functions, but PostgreSQL's implementation is more complete and better optimized:
-- Running totals with frame clause
SELECT
order_date,
revenue,
sum(revenue) OVER (
ORDER BY order_date
ROWS BETWEEN 6 PRECEDING AND CURRENT ROW
) AS rolling_7_day_revenue,
rank() OVER (PARTITION BY region ORDER BY revenue DESC) AS region_rank
FROM daily_revenue;
PostgreSQL supports GROUPS, EXCLUDE, and FILTER clauses within window functions that MySQL does not.
SQL Feature Comparison
This is where PostgreSQL's advantage is most stark.
JSONB
PostgreSQL's JSONB is a first-class binary JSON type with indexing, operators, and full query integration:
-- GIN index on JSONB for fast lookups
CREATE INDEX idx_metadata ON events USING gin (metadata);
-- Query nested JSON with indexing support
SELECT * FROM events
WHERE metadata @> '{"source": "api", "version": 2}';
-- Update specific nested keys
UPDATE events
SET metadata = jsonb_set(metadata, '{processed}', 'true')
WHERE metadata->>'source' = 'api';
MySQL's JSON type stores data as text internally with a binary representation, but the indexing story is weaker. You need generated columns with indexes on extracted paths -- functional, but more boilerplate and less flexible than PostgreSQL's GIN operator classes.
Array Types
-- Native array columns
CREATE TABLE products (
product_id SERIAL PRIMARY KEY,
name TEXT,
tags TEXT[],
price_history NUMERIC[]
);
-- Array operators
SELECT * FROM products WHERE tags @> ARRAY['electronics', 'sale'];
SELECT * FROM products WHERE 'premium' = ANY(tags);
MySQL has no array type. The workaround is a JSON array or a junction table.
Range Types
-- Date ranges for booking systems
CREATE TABLE reservations (
reservation_id SERIAL PRIMARY KEY,
room_id INT,
during DATERANGE,
EXCLUDE USING gist (room_id WITH =, during WITH &&)
);
-- The EXCLUDE constraint prevents overlapping bookings automatically
INSERT INTO reservations (room_id, during)
VALUES (1, '[2026-03-01, 2026-03-05)'); -- succeeds
INSERT INTO reservations (room_id, during)
VALUES (1, '[2026-03-04, 2026-03-08)'); -- fails: overlapping range
This kind of constraint is impossible to express declaratively in MySQL. You would need application-level logic or triggers.
Full-Text Search
PostgreSQL has built-in full-text search with tsvector, tsquery, ranking, and language-specific stemming:
-- Built-in full-text search
CREATE INDEX idx_fts ON articles USING gin (to_tsvector('english', title || ' ' || body));
SELECT title, ts_rank(to_tsvector('english', body), query) AS relevance
FROM articles, to_tsquery('english', 'postgres & performance') query
WHERE to_tsvector('english', title || ' ' || body) @@ query
ORDER BY relevance DESC;
MySQL has full-text indexes on InnoDB tables, but the functionality is more limited -- no language-aware stemming, no ranking functions, no phrase search with proximity operators.
Replication and High Availability
PostgreSQL
Streaming replication is the default: a binary, byte-level WAL stream from primary to replicas. It is simple, reliable, and enables hot standby (read-only queries on replicas).
Logical replication (introduced in PostgreSQL 10, significantly improved through 16) replicates at the row level, allowing selective table replication, cross-version replication, and multi-directional setups. This is critical for zero-downtime major version upgrades.
MySQL
Binary log replication is MySQL's traditional approach, and it has improved significantly with GTID-based replication. MySQL Group Replication provides multi-primary capabilities, though it comes with restrictions (all tables must have primary keys, specific transaction isolation requirements).
MySQL's replication ecosystem is more mature for simple primary-replica setups. PostgreSQL's logical replication is more flexible for complex topologies and cross-version scenarios.
The Extension Ecosystem
This is PostgreSQL's ace card. There is no MySQL equivalent.
PostGIS transforms PostgreSQL into a full spatial database -- used by governments, logistics companies, and mapping platforms worldwide. It supports 2D/3D geometries, raster data, topology, and geocoding with spatial indexing.
TimescaleDB adds hypertables, automatic partitioning, columnar compression, and continuous aggregates for time-series data. Teams using this avoid the operational complexity of a separate time-series database.
pgvector enables vector similarity search for AI/ML embeddings. Store, index, and query vectors alongside your relational data:
-- Vector similarity search (pgvector)
CREATE TABLE documents (
doc_id SERIAL PRIMARY KEY,
content TEXT,
embedding vector(1536)
);
CREATE INDEX ON documents USING hnsw (embedding vector_cosine_ops);
SELECT content, embedding <=> '[0.1, 0.2, ...]'::vector AS distance
FROM documents
ORDER BY distance
LIMIT 10;
pg_cron adds cron-style job scheduling inside the database. pg_stat_statements provides normalized query performance tracking. pg_trgm adds fuzzy string matching with index support.
MySQL has plugins, but they are narrower in scope. There is no MySQL extension that adds spatial capabilities at PostGIS's level, no native vector search, no time-series engine. If your application needs geospatial queries, vector search, or time-series analytics, PostgreSQL is the only choice among the two that does not require bolting on a separate specialized database.
Monitoring and Observability
PostgreSQL exposes its internals through a rich system catalog: pg_stat_statements for query performance, pg_stat_activity for live sessions, pg_stat_user_tables for table-level I/O and vacuum statistics, pg_locks for lock analysis. The auto_explain extension captures execution plans for slow queries automatically.
MySQL's Performance Schema provides similar capabilities -- query digests, wait event analysis, memory instrumentation -- but it carries a measurable overhead (typically 5-10%) and the schema is more complex to query directly.
The depth of PostgreSQL's catalog system enables specialized monitoring tools. Tools like myDBA.dev provide automated health checks across 10 domains, EXPLAIN plan capture with regression detection, and index advisory specifically for PostgreSQL -- the kind of deep observability that PostgreSQL's catalog system enables.
The difference is practical: when a PostgreSQL query regresses, you can trace it from pg_stat_statements to the specific plan change in pg_stat_plans to the table statistics that caused the planner to choose a different strategy. MySQL's equivalent investigation requires cross-referencing multiple Performance Schema tables with less direct causality.
Syntax Differences: Quick Reference
For teams evaluating a migration or working with both databases, here are the key syntax differences:
| Operation | PostgreSQL | MySQL |
|---|---|---|
| Auto-increment | GENERATED ALWAYS AS IDENTITY |
AUTO_INCREMENT |
| Upsert | INSERT ... ON CONFLICT DO UPDATE |
INSERT ... ON DUPLICATE KEY UPDATE |
| String concatenation | `'hello' \ | \ |
| Boolean type | Native {% raw %}BOOLEAN
|
TINYINT(1) (aliased as BOOLEAN) |
| Case sensitivity | Case-sensitive by default | Case-insensitive by default (collation) |
| LIMIT with offset | LIMIT 10 OFFSET 20 |
LIMIT 20, 10 or LIMIT 10 OFFSET 20
|
| Current timestamp |
now() or CURRENT_TIMESTAMP
|
NOW() or CURRENT_TIMESTAMP
|
| Schema support | Full schema namespacing | Database = schema (no separation) |
| IF EXISTS in DDL | Supported on most DDL | Supported on most DDL |
| Type casting |
value::integer or CAST()
|
CAST() only |
| Regex matching | column ~ 'pattern' |
column REGEXP 'pattern' |
| JSON access | data->>'key' |
data->>'$.key' or JSON_EXTRACT()
|
One syntax difference that catches people: PostgreSQL uses $$ dollar quoting for functions and procedures, while MySQL uses DELIMITER changes. PostgreSQL's approach is cleaner but unfamiliar to MySQL developers.
Feature Comparison Table
| Feature | PostgreSQL | MySQL (InnoDB) |
|---|---|---|
| MVCC approach | Heap-based multi-version | Undo log reconstruction |
| Connection model | Process-per-connection | Thread-per-connection |
| Parallel queries | Full (scans, joins, aggregates, sorts) | Limited (COUNT only) |
| CTEs (WITH) | Fully optimized | Supported, less optimized |
| Window functions | Complete implementation | Basic implementation |
| JSONB | Binary, indexed, operators | Text-based, generated column indexes |
| Array types | Native | Not supported |
| Range types | Native with exclusion constraints | Not supported |
| Full-text search | Built-in with stemming and ranking | Basic, no stemming |
| Partial indexes | Yes | Not supported |
| Expression indexes | Yes | Functional indexes (8.0+) |
| Table inheritance | Yes | Not supported |
| Foreign data wrappers | Yes (query external sources as tables) | FEDERATED (limited) |
| Extensions | Rich ecosystem (PostGIS, TimescaleDB, pgvector) | Plugins (narrower scope) |
| Logical replication | Yes (selective, cross-version) | Binary log (GTID) |
| Partitioning | Declarative (range, list, hash) | Declarative (range, list, hash, key) |
| CHECK constraints | Enforced | Enforced (8.0.16+, ignored before) |
| Generated columns | Stored and virtual | Stored and virtual |
| Connection pooling | External (PgBouncer) required | Thread pool (Enterprise) or ProxySQL |
When to Choose MySQL
Be honest about MySQL's strengths:
- WordPress, Drupal, or legacy PHP applications. The ecosystem assumes MySQL. Fighting this creates more problems than it solves.
- Simple CRUD workloads at high concurrency. If your application is 95% single-row lookups and inserts with thousands of concurrent connections, MySQL's thread model handles this natively without an external pooler.
- Team expertise. A team that knows MySQL deeply will outperform the same team on PostgreSQL during the learning curve. Operational knowledge matters more than theoretical feature advantages.
- Managed MySQL offerings. AWS Aurora MySQL, PlanetScale, and Vitess provide MySQL-compatible horizontal scaling that has no direct PostgreSQL equivalent at the same maturity level.
When to Choose PostgreSQL
- Complex queries are central to your application. Reporting, analytics, business intelligence, any workload with multi-join queries, CTEs, and window functions.
- You need JSONB, arrays, or range types. If your data model mixes relational and semi-structured data, PostgreSQL handles it natively without bolting on a document store.
- Geospatial data. PostGIS is the industry standard. No other relational database comes close.
- Vector search for AI/ML. pgvector keeps embeddings alongside your relational data.
- Time-series data. TimescaleDB avoids the need for a separate time-series database.
- Data integrity is non-negotiable. PostgreSQL's constraint system -- CHECK, EXCLUDE, partial indexes, deferrable foreign keys -- provides guardrails at the database level that MySQL cannot match.
- You want one database, not five. PostgreSQL with extensions can replace separate deployments of a document store, a time-series database, a spatial database, and a vector database. Fewer moving parts means fewer things to break at 3 AM.
The Bottom Line
MySQL is a good database. PostgreSQL is a more capable one. That distinction matters when your requirements grow beyond simple CRUD -- and in my experience, they always do.
If you are starting a new project today and you do not have a specific reason to choose MySQL, choose PostgreSQL. The SQL feature set is richer, the extension ecosystem is unmatched, the planner is smarter, and the community is moving faster. You will never regret having range types, JSONB, CTEs, and parallel queries available when you need them.
If you are running MySQL and it is working well, do not migrate for the sake of migrating. But if you are hitting walls -- complex queries that cannot be expressed cleanly, missing data types that force application-level workarounds, a need for geospatial or vector search -- know that PostgreSQL solves these problems at the database layer.


Top comments (0)