A practical SQL query tuning playbook: execution plans, joins, indexes, and the traps

#sql #database #performance #backend

SQL tuning is the process of making a database query run faster and cheaper — cutting response time while minimizing the system resources it burns. Here's the playbook I actually use, from "this query is slow" to "this query is fixed," with the traps that bite people in the middle.

The loop

Tuning is iterative. The shape is always the same:

Identify the problem. Find the slow query (logs, profiler, or user feedback) and measure a baseline — execution time and resource usage. You can't claim an improvement you didn't measure.
Analyze & rewrite. Review the SQL for redundant joins, unnecessary work, and complex subqueries. Tighten the WHERE, select only the columns you need, convert subqueries to joins where it helps.
Read the execution plan. Understand how the engine actually runs the query; find inefficient join orders and needless full scans.
Revisit indexes. Evaluate whether existing indexes help; add or restructure as needed.
Consider schema changes. If a column is updated so often that indexing it hurts, split it out. Sometimes the model is the bottleneck.
Tune settings/hardware if it comes to that.
Re-test and repeat. Apply changes, re-check the plan, confirm the gain, monitor.

Reading an execution plan

The execution plan shows how the DB will run your query — table scans, index access, join methods. Read it well and you can pinpoint where the time goes. Most engines expose it: EXPLAIN (MySQL/PostgreSQL), EXPLAIN PLAN FOR (Oracle), SET SHOWPLAN_ALL ON (SQL Server).

Operators to know:

Full Table Scan — reads every row. Happens when there's no suitable index, or the query can't use one.
Index Scan — scans via an index; usually cheaper than a table scan.
Index Seek — jumps to specific key values; very efficient, reads only the rows it needs.
Nested Loops / Hash Join / Merge Join — the three ways to join two tables (more below).
Sort — orders data; excessive sorting is a common performance drag.

Three numbers that matter:

Cost — estimated resources a step will consume. Lower is better.
Cardinality — estimated rows returned by a step. Bad cardinality estimates produce bad plans.
Selectivity — how much a filter narrows the set. Low selectivity means a lot of rows survive the filter; high selectivity means few do.

How to read it: start from the highest-cost operation — that's where your query spends itself. Follow the data flow, find the bottleneck, fix it (drop a join, add/restructure an index), then re-check. Iterate.

The cost model (and how to inspect it)

MySQL's optimizer estimates a "cost" for each candidate plan and picks the cheapest. Cost factors in I/O (reading data pages — the big one, and where index vs. full scan diverges hugely), CPU (condition evaluation, sorting, joins), memory (hash tables, sort buffers), and network (mostly relevant in distributed setups). It estimates all this from table statistics — table size, column distribution, index characteristics — so stale stats produce bad plans.

Tools to actually see what's happening:

-- See the plan + estimated cost
EXPLAIN SELECT * FROM your_table WHERE column_name = 'value';

-- MySQL Performance Schema: per-query aggregate stats
SELECT * FROM performance_schema.events_statements_summary_by_digest
WHERE SCHEMA_NAME = 'your_database_name';

-- MySQL SHOW PROFILES (session-level timing)
SET profiling = 1;
-- run your query, then:
SHOW PROFILES;

Keep statistics fresh:

ANALYZE TABLE your_table;

Enable the slow query log to surface candidates automatically:

[mysqld]
slow_query_log = 1
slow_query_log_file = /path/to/logfile.log
long_query_time = 2   # log queries slower than 2s

On Oracle, SQL*Trace (ALTER SESSION SET sql_trace = TRUE; → analyze the .trc with TKPROF) plus DBMS_XPLAN give you detailed timing and plan internals:

SELECT * FROM table(dbms_xplan.display_cursor(null, null, 'ALLSTATS LAST'));

Query-level rewrites

Select only the columns you need — less data to move, fewer rows to inspect.
Filter in WHERE, not HAVING — filter before aggregation, not after.
Make conditions index-friendly — write predicates so an index can actually be used.
Subquery → JOIN when a subquery runs repeatedly; consider WITH (CTE) to avoid re-executing a repeated subquery.
Push computation to the DB — it's often more efficient there than in the app layer (but watch type conversions — see below).
Batch big writes — chunk large INSERT/UPDATE/DELETE to spread load.

Indexing

Index the columns used in WHERE, JOIN, and ORDER BY.
Composite indexes: when multiple columns appear in conditions, a composite index helps — and column order matters.
Prune dead indexes: unused or low-value indexes only add maintenance cost and can slow writes.
Maintain them: on write-heavy tables, rebuild/reorganize periodically and keep statistics current.
Partial/filtered indexes are effective when you only frequently access a subset of rows.
Clustered vs. non-clustered: a clustered index orders the data itself by the key; a non-clustered index is kept separately. Pick deliberately.

The three join algorithms

Choosing — or nudging the optimizer toward — the right join is often the biggest win.

Nested Loops — for each row of the outer table, scan the inner table for matches. Great for small sets, and very fast when the inner table has a good index on the join key. Degrades on large data. Favor it when rows are few or the WHERE is highly selective.

Hash Join — build a hash table from the smaller table in memory, then probe it while scanning the larger table. Strong for large datasets, especially when there's no index on the join column — provided you have enough memory to hold the build side.

Merge Join — both inputs are sorted on the join key, then merged in a single pass. Efficient when the data is already sorted (or cheap to sort) and the key isn't full of duplicates. Each table is scanned once, so it scales well.

Selection guide:

Situation	Lean toward
Small dataset	Nested Loops
Large dataset, no useful index	Hash Join
Inputs already sorted on the key	Merge Join
Index exists on join key	Nested Loops
Plenty of memory	Hash Join

The implicit-cast trap

This one quietly kills index usage. If you compare a string column against a number (or otherwise force a type conversion), the DB may be unable to use the index on that column and fall back to a full scan — plus the conversion itself burns CPU/memory.

How to avoid it:

Match data types between constants, variables, and columns so the DB doesn't auto-cast.
Minimize explicit casts, especially inside JOIN/WHERE.
Fix it at modeling time — choose appropriate types up front to prevent app↔DB mismatches.
If business logic genuinely needs a converted value, create a computed/generated column and index that, moving the conversion into the database.

Subquery vs. JOIN

Neither always wins — it depends on data size, structure, indexes, and the optimizer.

Subquery — pros: readable, and the intermediate result can be reused. Cons: with large data, a correlated subquery may run per outer row (use WITH to avoid that), and it often can't leverage indexes, leading to full scans.

JOIN — pros: uses indexes well, and you can pick the join algorithm to fit. Cons: lower readability and easy to get wrong if you don't understand join order/type.

Rule of thumb: small data with complex logic → subquery (clarity). Large-volume processing → JOIN (performance). And always test alternatives against a realistic dataset — the optimizer can surprise you.

What's the single tuning fix that's saved you the most in production — a missing index, a rewritten join, or fresh statistics? Curious which one people hit most often.