LeoJ

Posted on May 25 • Originally published at querydeck.app

10 SQL Query Optimization Tips Every Developer Should Know

Every application hits a point where the database becomes the bottleneck. Response times creep up, connection pools fill, and users start complaining. Most slow queries share common problems, and most fixes are straightforward once you know where to look.

1. Stop Using SELECT *

-- Bad: fetches 25 columns including blobs
SELECT * FROM orders WHERE customer_id = 1042;

-- Good: fetches only what you need
SELECT id, order_date, total_amount, status
FROM orders
WHERE customer_id = 1042;

Selecting only needed columns reduces disk I/O and network bandwidth. If an index covers all requested columns, the database can satisfy the query from the index alone (covering index scan).

2. Add the Right Indexes

-- No index: Seq Scan on 10M rows, takes seconds
SELECT id, order_date, total_amount
FROM orders
WHERE customer_id = 1042;

-- Fix: add an index
CREATE INDEX idx_orders_customer_id ON orders (customer_id);

-- For filter + sort, composite indexes are even better:
CREATE INDEX idx_orders_customer_date ON orders (customer_id, order_date DESC);

Indexes turn O(n) scans into O(log n) lookups. The trade-off is write overhead, but for read-heavy workloads the gains far outweigh the cost.

3. Use EXPLAIN ANALYZE Before Guessing

-- Don't guess, measure:
EXPLAIN ANALYZE
SELECT o.id, o.order_date, c.name
FROM orders o
JOIN customers c ON c.id = o.customer_id
WHERE o.status = 'pending'
  AND o.order_date > '2026-01-01';

The output tells you exactly what happened: which nodes are slow, where Seq Scans occur, whether estimates match reality. It replaces speculation with data.

Look for: Seq Scans on large tables, large gaps between estimated and actual row counts, sorts spilling to disk, nested loops with high loop counts.

4. Replace Correlated Subqueries with JOINs

-- Bad: subquery runs once per customer row (50,000 executions)
SELECT c.id, c.name,
  (SELECT MAX(o.order_date) FROM orders o WHERE o.customer_id = c.id)
    AS last_order_date
FROM customers c;

-- Good: single pass with a JOIN
SELECT c.id, c.name, MAX(o.order_date) AS last_order_date
FROM customers c
LEFT JOIN orders o ON o.customer_id = c.id
GROUP BY c.id, c.name;

The JOIN version scans orders once. Performance difference grows linearly with the number of outer rows.

5. Use CTEs Carefully

-- Bad (MySQL, PG < 12): materializes entire table first
WITH all_orders AS (
  SELECT * FROM orders
)
SELECT id, order_date FROM all_orders WHERE customer_id = 1042;

-- Good: push filters into the CTE, or just skip it
SELECT id, order_date
FROM orders
WHERE customer_id = 1042;

In PostgreSQL 12+, CTEs are automatically inlined. In older versions and MySQL, they materialize as temporary tables, blocking predicate pushdown. Use WITH ... AS NOT MATERIALIZED (PG 12+) when you need the CTE for readability but want optimization.

6. Fix N+1 Queries at the Application Layer

# Bad: 201 queries for 200 customers
customers = db.query("SELECT * FROM customers WHERE region = 'EU'")
for customer in customers:
    orders = db.query("SELECT * FROM orders WHERE customer_id = %s", customer.id)

# Good: single query with JOIN
query = """
SELECT c.id, c.name, o.id AS order_id, o.order_date, o.total_amount
FROM customers c
LEFT JOIN orders o ON o.customer_id = c.id
WHERE c.region = 'EU'
"""

The overhead is round-trip latency, not SQL execution. 200 queries at 2ms network round-trip = 400ms on network alone. Most ORMs have eager loading (select_related, includes, eager) to solve this.

7. Paginate with Cursors, Not OFFSET

-- Bad: page 100 scans 5,000 rows, returns 50
SELECT id, title, created_at
FROM articles
ORDER BY created_at DESC
LIMIT 50 OFFSET 4950;

-- Good: cursor-based, always scans only 50 rows
SELECT id, title, created_at
FROM articles
WHERE created_at < '2026-05-20T10:15:00Z'
ORDER BY created_at DESC
LIMIT 50;

Cursor-based pagination maintains constant performance regardless of depth. OFFSET degrades linearly. The trade-off: you lose arbitrary page jumping, but most modern UIs don't need that.

8. Batch Your Writes

-- Bad: 10,000 individual INSERTs
INSERT INTO events (user_id, event_type, created_at) VALUES (1, 'click', NOW());
INSERT INTO events (user_id, event_type, created_at) VALUES (2, 'view', NOW());
-- ... 9,998 more

-- Good: single multi-row INSERT
INSERT INTO events (user_id, event_type, created_at) VALUES
  (1, 'click', NOW()),
  (2, 'view', NOW()),
  (3, 'click', NOW());
  -- batch up to 500-1,000 rows per statement

For batch updates in PostgreSQL:

UPDATE products AS p
SET price = v.new_price
FROM (VALUES (101, 29.99), (102, 49.99), (103, 19.99)) AS v(id, new_price)
WHERE p.id = v.id;

Batching reduces round-trips, WAL overhead, and allows write path optimization.

9. Use Parameterized Queries

# Bad: string concatenation (SQL injection + no plan caching)
query = f"SELECT * FROM users WHERE email = '{user_email}'"

# Good: parameterized (safe + plan reuse)
query = "SELECT id, name, email FROM users WHERE email = $1"
db.execute(query, [user_email])

Parameterized queries let the database parse once and reuse the plan. Also completely prevents SQL injection. Every modern driver supports this.

10. Cache Expensive Query Results

-- PostgreSQL: materialized view for expensive aggregations
CREATE MATERIALIZED VIEW monthly_revenue AS
SELECT
  DATE_TRUNC('month', order_date) AS month,
  COUNT(*) AS total_orders,
  SUM(total_amount) AS revenue,
  AVG(total_amount) AS avg_order_value
FROM orders
WHERE order_date >= NOW() - INTERVAL '12 months'
GROUP BY DATE_TRUNC('month', order_date);

-- Refresh on schedule (e.g., hourly via pg_cron)
REFRESH MATERIALIZED VIEW CONCURRENTLY monthly_revenue;

For application-level caching, store results in Redis with a TTL matching your freshness requirements.

Checklist

When you encounter a slow query:

Run EXPLAIN ANALYZE (PostgreSQL) or EXPLAIN FORMAT=JSON (MySQL)
Check for missing indexes (Seq Scans on large tables)
Eliminate SELECT *
Flatten correlated subqueries into JOINs
Fix N+1 patterns with eager loading
Switch to keyset pagination
Batch writes
Use prepared statements
Cache expensive results
Re-measure with EXPLAIN ANALYZE to confirm the fix

I'm building QueryDeck, a native macOS database client with visual EXPLAIN ANALYZE and AI-assisted SQL. Currently in early access.

DEV Community