SQL Query Optimization Guide
The difference between a query that runs in 50ms and one that runs in 50 seconds is almost never hardware — it's the query plan. This guide teaches you to read execution plans like a pro, choose the right indexes without guessing, partition tables for scale, and recognize the 20 most common SQL anti-patterns that silently destroy performance. Covers PostgreSQL and MySQL with engine-specific examples and EXPLAIN output analysis.
Key Features
-
Execution plan reading guide for both
EXPLAIN ANALYZE(PostgreSQL) andEXPLAIN FORMAT=JSON(MySQL) with annotated real-world examples - Index strategy framework covering B-tree, hash, GIN, GiST, and partial indexes with decision criteria for each
- 20 SQL anti-patterns with measured performance impact, detection queries, and refactored solutions
- Table partitioning patterns for range, list, and hash partitioning with partition pruning verification techniques
- JOIN optimization techniques including join order hints, materialized CTEs, lateral joins, and when to denormalize
- Subquery elimination patterns transforming correlated subqueries to JOINs, window functions, and lateral joins
-
Statistics and cardinality management — when to
ANALYZE, how to fix bad estimates, and extended statistics in PostgreSQL - Query rewrite cookbook with 15 before/after transformations showing plan changes and timing improvements
Quick Start
unzip sql-query-optimization.zip
cd sql-query-optimization/
# Run the query analyzer against your database
psql -h localhost -U admin -d myapp -f src/analysis/find_slow_queries.sql
# Check for missing indexes
psql -h localhost -U admin -d myapp -f src/indexes/index_advisor.sql
# Review anti-patterns in your schema
psql -h localhost -U admin -d myapp -f src/anti_patterns/detect_anti_patterns.sql
Start by finding your worst-performing queries:
-- PostgreSQL: top queries by total execution time
SELECT
LEFT(query, 80) AS query_preview,
calls,
ROUND(total_exec_time::numeric, 1) AS total_ms,
ROUND(mean_exec_time::numeric, 1) AS mean_ms,
rows AS avg_rows
FROM pg_stat_statements
WHERE calls > 5
ORDER BY total_exec_time DESC
LIMIT 10;
-- MySQL: equivalent from Performance Schema
SELECT
DIGEST_TEXT,
COUNT_STAR AS calls,
ROUND(SUM_TIMER_WAIT / 1e12, 2) AS total_sec,
ROUND(AVG_TIMER_WAIT / 1e9, 2) AS avg_ms,
SUM_ROWS_SENT AS rows_sent
FROM performance_schema.events_statements_summary_by_digest
ORDER BY SUM_TIMER_WAIT DESC
LIMIT 10;
Architecture / How It Works
sql-query-optimization/
├── src/
│ ├── analysis/
│ │ ├── find_slow_queries.sql # Identify worst performers
│ │ ├── explain_guide.md # How to read EXPLAIN output
│ │ └── plan_comparison.sql # Compare plans before/after index
│ ├── indexes/
│ │ ├── index_advisor.sql # Detect missing indexes
│ │ ├── index_types_guide.md # B-tree vs GIN vs GiST vs hash
│ │ ├── partial_indexes.sql # Conditional indexes for filtered queries
│ │ └── covering_indexes.sql # Include columns to avoid heap access
│ ├── anti_patterns/
│ │ ├── detect_anti_patterns.sql # Automated anti-pattern detection
│ │ ├── catalog.md # 20 anti-patterns with fixes
│ │ └── rewrites/ # Before/after for each pattern
│ │ ├── select_star.sql
│ │ ├── implicit_cast.sql
│ │ ├── correlated_subquery.sql
│ │ └── or_vs_union.sql
│ ├── partitioning/
│ │ ├── range_partition.sql # Date-based range partitioning
│ │ ├── list_partition.sql # Category-based partitioning
│ │ └── partition_pruning_test.sql # Verify pruning works
│ └── statistics/
│ ├── analyze_schedule.sql # When and what to ANALYZE
│ └── extended_statistics.sql # Multi-column statistics (PG 14+)
├── examples/
│ ├── ecommerce_optimization.sql
│ └── reporting_query_tuning.sql
└── config.example.yaml
Usage Examples
Reading an execution plan — what to look for:
EXPLAIN (ANALYZE, BUFFERS, FORMAT TEXT)
SELECT o.id, o.total, c.name
FROM orders o
JOIN customers c ON c.id = o.customer_id
WHERE o.created_at >= '2026-01-01'
AND o.status = 'completed';
-- Key things to check in the output:
-- 1. Seq Scan on large tables → needs an index
-- 2. "actual rows" >> "planned rows" → stale statistics, run ANALYZE
-- 3. "Buffers: shared read" >> "shared hit" → data not in cache
-- 4. Nested Loop with high outer rows → consider Hash Join
-- 5. Sort with "Sort Method: external merge" → increase work_mem
Anti-pattern: function on indexed column prevents index usage:
-- BAD: function wrapping indexed column forces sequential scan
SELECT * FROM users WHERE UPPER(email) = 'USER@EXAMPLE.COM';
-- Seq Scan on users (cost=0.00..25432.00 rows=1 width=128)
-- Filter: (upper(email) = 'USER@EXAMPLE.COM')
-- FIX OPTION 1: Create a functional index
CREATE INDEX idx_users_email_upper ON users (UPPER(email));
-- FIX OPTION 2: Store normalized data in the column
-- and avoid the function entirely in the query
SELECT * FROM users WHERE email_normalized = 'user@example.com';
Anti-pattern: correlated subquery → rewrite as JOIN:
-- BAD: correlated subquery executes once per row in outer query
SELECT
c.name,
(SELECT COUNT(*) FROM orders o WHERE o.customer_id = c.id) AS order_count
FROM customers c;
-- Executes N+1 queries (1 for customers + 1 per customer for count)
-- GOOD: LEFT JOIN with GROUP BY — single pass
SELECT
c.name,
COUNT(o.id) AS order_count
FROM customers c
LEFT JOIN orders o ON o.customer_id = c.id
GROUP BY c.id, c.name;
-- ALSO GOOD: window function if you need other columns from orders
SELECT DISTINCT ON (c.id)
c.name,
COUNT(o.id) OVER (PARTITION BY c.id) AS order_count
FROM customers c
LEFT JOIN orders o ON o.customer_id = c.id;
Partial index for filtered queries:
-- If 95% of queries filter on status = 'active', don't index all rows
CREATE INDEX idx_orders_active ON orders (customer_id, created_at)
WHERE status = 'active';
-- This index is ~20x smaller than a full index and faster to scan
-- The query must include the WHERE clause for the planner to use it:
SELECT * FROM orders
WHERE status = 'active'
AND customer_id = 1001
AND created_at >= '2026-01-01';
Configuration
# config.example.yaml
database:
engine: postgresql # postgresql | mysql
host: localhost
port: 5432
database: myapp
user: admin
password: YOUR_PASSWORD_HERE
analysis:
min_calls: 5 # ignore queries with fewer calls
min_total_time_ms: 1000 # only flag queries above this total time
seq_scan_min_rows: 10000 # ignore seq scans on small tables
cache_hit_warning_pct: 95 # warn if cache hit ratio below this
indexes:
max_index_per_table: 8 # warn if table has more indexes
min_usage_since_restart: 10 # flag indexes with fewer than N scans
check_duplicates: true # detect overlapping index definitions
partitioning:
recommend_above_rows: 10000000 # suggest partitioning above 10M rows
default_strategy: range # range | list | hash
partition_period: monthly # for date-range partitions
Best Practices
-
Always use
EXPLAIN ANALYZE, not justEXPLAIN. The actual row counts and timings reveal plan estimation errors that plainEXPLAINhides. - Index for your WHERE clause first, then for JOIN columns, then for ORDER BY. This priority order matches the planner's evaluation sequence.
-
Use covering indexes (
INCLUDEclause in PostgreSQL) for queries that read only indexed columns. This avoids heap lookups entirely. -
Run
ANALYZEafter bulk loads. The planner uses table statistics to choose between index scan and sequential scan. Stale statistics produce bad plans. -
Prefer
EXISTSoverINfor subqueries that check existence.EXISTSshort-circuits after the first match;INmaterializes the entire subquery result. -
Monitor
pg_stat_user_indexes. Drop indexes withidx_scan = 0after confirming they're not used for unique constraints or foreign keys.
Troubleshooting
| Problem | Cause | Fix |
|---|---|---|
| Planner chooses Seq Scan despite index | Stale statistics or small table | Run ANALYZE tablename; — if table is small, Seq Scan may actually be faster |
| Query fast in dev, slow in prod | Different data distribution or work_mem
|
Compare EXPLAIN ANALYZE in both environments; check work_mem setting |
| Index Scan slower than expected | High random I/O on spinning disks | Set random_page_cost = 1.1 for SSD storage (default 4.0 assumes HDD) |
| Hash Join using excessive memory | Large build input exceeding work_mem
|
Increase work_mem for the session: SET work_mem = '256MB'; before the query |
This is 1 of 9 resources in the Database Admin Pro toolkit. Get the complete [SQL Query Optimization Guide] with all files, templates, and documentation for $29.
Or grab the entire Database Admin Pro bundle (9 products) for $109 — save 30%.
Top comments (0)