DatanestDigital

Posted on Mar 23 • Edited on Jun 17 • Originally published at datanest-stores.pages.dev

SQL Query Optimization Guide

#database #performance #postgres #sql

SQL Query Optimization Guide

The difference between a query that runs in 50ms and one that runs in 50 seconds is almost never hardware — it's the query plan. This guide teaches you to read execution plans like a pro, choose the right indexes without guessing, partition tables for scale, and recognize the 20 most common SQL anti-patterns that silently destroy performance. Covers PostgreSQL and MySQL with engine-specific examples and EXPLAIN output analysis.

Key Features

Execution plan reading guide for both EXPLAIN ANALYZE (PostgreSQL) and EXPLAIN FORMAT=JSON (MySQL) with annotated real-world examples
Index strategy framework covering B-tree, hash, GIN, GiST, and partial indexes with decision criteria for each
20 SQL anti-patterns with measured performance impact, detection queries, and refactored solutions
Table partitioning patterns for range, list, and hash partitioning with partition pruning verification techniques
JOIN optimization techniques including join order hints, materialized CTEs, lateral joins, and when to denormalize
Subquery elimination patterns transforming correlated subqueries to JOINs, window functions, and lateral joins
Statistics and cardinality management — when to ANALYZE, how to fix bad estimates, and extended statistics in PostgreSQL
Query rewrite cookbook with 15 before/after transformations showing plan changes and timing improvements

Quick Start

unzip sql-query-optimization.zip
cd sql-query-optimization/

# Run the query analyzer against your database
psql -h localhost -U admin -d myapp -f src/analysis/find_slow_queries.sql

# Check for missing indexes
psql -h localhost -U admin -d myapp -f src/indexes/index_advisor.sql

# Review anti-patterns in your schema
psql -h localhost -U admin -d myapp -f src/anti_patterns/detect_anti_patterns.sql

Start by finding your worst-performing queries:

-- PostgreSQL: top queries by total execution time
SELECT
    LEFT(query, 80) AS query_preview,
    calls,
    ROUND(total_exec_time::numeric, 1) AS total_ms,
    ROUND(mean_exec_time::numeric, 1) AS mean_ms,
    rows AS avg_rows
FROM pg_stat_statements
WHERE calls > 5
ORDER BY total_exec_time DESC
LIMIT 10;

-- MySQL: equivalent from Performance Schema
SELECT
    DIGEST_TEXT,
    COUNT_STAR AS calls,
    ROUND(SUM_TIMER_WAIT / 1e12, 2) AS total_sec,
    ROUND(AVG_TIMER_WAIT / 1e9, 2) AS avg_ms,
    SUM_ROWS_SENT AS rows_sent
FROM performance_schema.events_statements_summary_by_digest
ORDER BY SUM_TIMER_WAIT DESC
LIMIT 10;

Architecture / How It Works

sql-query-optimization/
├── src/
│   ├── analysis/
│   │   ├── find_slow_queries.sql       # Identify worst performers
│   │   ├── explain_guide.md            # How to read EXPLAIN output
│   │   └── plan_comparison.sql         # Compare plans before/after index
│   ├── indexes/
│   │   ├── index_advisor.sql           # Detect missing indexes
│   │   ├── index_types_guide.md        # B-tree vs GIN vs GiST vs hash
│   │   ├── partial_indexes.sql         # Conditional indexes for filtered queries
│   │   └── covering_indexes.sql        # Include columns to avoid heap access
│   ├── anti_patterns/
│   │   ├── detect_anti_patterns.sql    # Automated anti-pattern detection
│   │   ├── catalog.md                  # 20 anti-patterns with fixes
│   │   └── rewrites/                   # Before/after for each pattern
│   │       ├── select_star.sql
│   │       ├── implicit_cast.sql
│   │       ├── correlated_subquery.sql
│   │       └── or_vs_union.sql
│   ├── partitioning/
│   │   ├── range_partition.sql         # Date-based range partitioning
│   │   ├── list_partition.sql          # Category-based partitioning
│   │   └── partition_pruning_test.sql  # Verify pruning works
│   └── statistics/
│       ├── analyze_schedule.sql        # When and what to ANALYZE
│       └── extended_statistics.sql     # Multi-column statistics (PG 14+)
├── examples/
│   ├── ecommerce_optimization.sql
│   └── reporting_query_tuning.sql
└── config.example.yaml

Usage Examples

Reading an execution plan — what to look for:

EXPLAIN (ANALYZE, BUFFERS, FORMAT TEXT)
SELECT o.id, o.total, c.name
FROM orders o
JOIN customers c ON c.id = o.customer_id
WHERE o.created_at >= '2026-01-01'
  AND o.status = 'completed';

-- Key things to check in the output:
-- 1. Seq Scan on large tables → needs an index
-- 2. "actual rows" >> "planned rows" → stale statistics, run ANALYZE
-- 3. "Buffers: shared read" >> "shared hit" → data not in cache
-- 4. Nested Loop with high outer rows → consider Hash Join
-- 5. Sort with "Sort Method: external merge" → increase work_mem

Anti-pattern: function on indexed column prevents index usage:

-- BAD: function wrapping indexed column forces sequential scan
SELECT * FROM users WHERE UPPER(email) = 'USER@EXAMPLE.COM';
-- Seq Scan on users (cost=0.00..25432.00 rows=1 width=128)
--   Filter: (upper(email) = 'USER@EXAMPLE.COM')

-- FIX OPTION 1: Create a functional index
CREATE INDEX idx_users_email_upper ON users (UPPER(email));

-- FIX OPTION 2: Store normalized data in the column
-- and avoid the function entirely in the query
SELECT * FROM users WHERE email_normalized = 'user@example.com';

Anti-pattern: correlated subquery → rewrite as JOIN:

-- BAD: correlated subquery executes once per row in outer query
SELECT
    c.name,
    (SELECT COUNT(*) FROM orders o WHERE o.customer_id = c.id) AS order_count
FROM customers c;
-- Executes N+1 queries (1 for customers + 1 per customer for count)

-- GOOD: LEFT JOIN with GROUP BY — single pass
SELECT
    c.name,
    COUNT(o.id) AS order_count
FROM customers c
LEFT JOIN orders o ON o.customer_id = c.id
GROUP BY c.id, c.name;

-- ALSO GOOD: window function if you need other columns from orders
SELECT DISTINCT ON (c.id)
    c.name,
    COUNT(o.id) OVER (PARTITION BY c.id) AS order_count
FROM customers c
LEFT JOIN orders o ON o.customer_id = c.id;

Partial index for filtered queries:

-- If 95% of queries filter on status = 'active', don't index all rows
CREATE INDEX idx_orders_active ON orders (customer_id, created_at)
    WHERE status = 'active';

-- This index is ~20x smaller than a full index and faster to scan
-- The query must include the WHERE clause for the planner to use it:
SELECT * FROM orders
WHERE status = 'active'
  AND customer_id = 1001
  AND created_at >= '2026-01-01';

Configuration

# config.example.yaml
database:
  engine: postgresql            # postgresql | mysql
  host: localhost
  port: 5432
  database: myapp
  user: admin
  password: YOUR_PASSWORD_HERE

analysis:
  min_calls: 5                  # ignore queries with fewer calls
  min_total_time_ms: 1000       # only flag queries above this total time
  seq_scan_min_rows: 10000      # ignore seq scans on small tables
  cache_hit_warning_pct: 95     # warn if cache hit ratio below this

indexes:
  max_index_per_table: 8        # warn if table has more indexes
  min_usage_since_restart: 10   # flag indexes with fewer than N scans
  check_duplicates: true        # detect overlapping index definitions

partitioning:
  recommend_above_rows: 10000000  # suggest partitioning above 10M rows
  default_strategy: range         # range | list | hash
  partition_period: monthly       # for date-range partitions

Best Practices

Always use EXPLAIN ANALYZE, not just EXPLAIN. The actual row counts and timings reveal plan estimation errors that plain EXPLAIN hides.
Index for your WHERE clause first, then for JOIN columns, then for ORDER BY. This priority order matches the planner's evaluation sequence.
Use covering indexes (INCLUDE clause in PostgreSQL) for queries that read only indexed columns. This avoids heap lookups entirely.
Run ANALYZE after bulk loads. The planner uses table statistics to choose between index scan and sequential scan. Stale statistics produce bad plans.
Prefer EXISTS over IN for subqueries that check existence. EXISTS short-circuits after the first match; IN materializes the entire subquery result.
Monitor pg_stat_user_indexes. Drop indexes with idx_scan = 0 after confirming they're not used for unique constraints or foreign keys.

Troubleshooting

Problem	Cause	Fix
Planner chooses Seq Scan despite index	Stale statistics or small table	Run `ANALYZE tablename;` — if table is small, Seq Scan may actually be faster
Query fast in dev, slow in prod	Different data distribution or `work_mem`	Compare `EXPLAIN ANALYZE` in both environments; check `work_mem` setting
Index Scan slower than expected	High random I/O on spinning disks	Set `random_page_cost = 1.1` for SSD storage (default 4.0 assumes HDD)
Hash Join using excessive memory	Large build input exceeding `work_mem`	Increase `work_mem` for the session: `SET work_mem = '256MB';` before the query

This is 1 of 9 resources in the Database Admin Pro toolkit. Get the complete [SQL Query Optimization Guide] with all files, templates, and documentation for $29.

Get the Full Kit →

Or grab the entire Database Admin Pro bundle (9 products) for $109 — save 30%.

Get the Complete Bundle →

DEV Community

SQL Query Optimization Guide

SQL Query Optimization Guide

Key Features

Quick Start

Architecture / How It Works

Usage Examples

Configuration

Best Practices

Troubleshooting

Related Articles

Top comments (0)