Optimizing SQL Queries for High‑Traffic Applications
(A practical, step‑by‑step guide that works across MySQL, PostgreSQL, SQL Server, and Oracle)
1. Why “SQL‑only” Optimisation Matters
High‑traffic apps often hit the database thousands to millions of times per second. Even a 1 ms improvement per query can save seconds of CPU time, reduce latency, and keep your infrastructure costs down. The biggest gains come from making the database do less work, not just from adding more hardware.
2. The Optimization Lifecycle
Phase | Goal | Typical Tools / Artefacts |
---|---|---|
A. Baseline | Capture realistic workloads, identify hot queries. | Slow‑query log, pg_stat_statements, Query Store (SQL Server), AWR (Oracle), application logs. |
B. Diagnose | Understand why a query is slow. | EXPLAIN / EXPLAIN ANALYZE, visual explain plans, SHOW PROFILE , DMVs. |
C. Refactor | Rewrite / index / restructure to eliminate bottlenecks. | DDL changes, query rewrite, materialized views, partitioning. |
D. Validate | Verify that performance improved without regressions. | Same metrics as baseline, regression test suite, load‑testing tools (JMeter, k6). |
E. Guard | Prevent regressions in the future. | CI checks, automated plan‑stability tests, query‑cost thresholds, monitoring alerts. |
3. Foundations – “First‑Principles” Checks
# | Checklist | Why it matters |
---|---|---|
1 |
Use the right data types – avoid VARCHAR(255) for a 2‑digit code; prefer CHAR(2) or SMALLINT . |
Smaller rows → more fits in memory → fewer page reads. |
2 | Never SELECT * in production – enumerate columns. | Reduces I/O, avoids unnecessary row‑locking and cache pressure. |
3 | Keep transactions short – commit ASAP. | Reduces lock contention and dead‑lock probability. |
4 | Prefer set‑based operations over row‑by‑row loops (cursor, WHILE). | DB engines are built for set processing; loops cause massive overhead. |
5 |
Avoid functions on indexed columns in WHERE/JOIN (e.g., WHERE DATE(col) = ... ). |
Index cannot be used → full scan. Use computed/derived columns or range predicates instead. |
6 | Parameterise queries (prepared statements) – not string concatenation. | Enables plan reuse, prevents SQL injection, reduces parsing overhead. |
7 |
Watch out for implicit conversions – WHERE int_col = '123' forces a scan. |
Force implicit cast on the column side kills the index. |
4. Indexing – The Most Powerful Lever
4.1. Types of Indexes (quick reference)
DBMS | B‑Tree | Hash | GiST / SP‑GiST | BRIN | Columnstore | Full‑Text |
---|---|---|---|---|---|---|
MySQL | ✅ | ✅ (Memory) | ✗ | ✅ (8.0+) | ✅ (InnoDB/ColumnStore) | ✅ |
PostgreSQL | ✅ | ✅ (hash) | ✅ | ✅ | ✅ (cstore_fdw) | ✅ |
SQL Server | ✅ | ✅ (Memory‑Optimized) | ✅ (spatial) | ✅ (SQL Server 2019+) | ✅ (Columnstore) | ✅ |
Oracle | ✅ | ✅ (in‑memory) | ✅ (Spatial) | ✅ (approx.) | ✅ (Hybrid) | ✅ |
4.2. Index Design Process
-
Identify the hot predicates – the columns that appear in
WHERE
,JOIN ON
,ORDER BY
,GROUP BY
. -
Check cardinality – high cardinality (many distinct values) = good for B‑Tree; low cardinality may benefit from bitmap (PostgreSQL
BRIN
/GIN
, Oracle bitmap). -
Covering (Include) columns – add non‑key columns to an index so the query can be satisfied entirely from the index (MySQL Server
INCLUDE
, PostgreSQLINCLUDE
, MySQLcovering
viaSELECT
that matches the index). -
Composite index ordering – order columns by filtering power first, then by sorting/grouping.
- Example:
WHERE status = ? AND created_at > ? ORDER BY created_at DESC
→(status, created_at DESC)
.
- Example:
- Avoid over‑indexing – each extra index adds write overhead (INSERT/UPDATE/DELETE). Keep an index‑to‑write‑ratio < 1 for OLTP workloads.
4.3. Practical Index Recipes
Scenario | Recommended Index (MySQL/PostgreSQL) | Why |
---|---|---|
Exact lookup on user_id (PK) |
PRIMARY KEY (user_id) |
Already optimal. |
Range + filter: WHERE country = ? AND ts >= ?
|
(country, ts) |
Country filters first, then range on ts. |
Sorting: WHERE status = ? ORDER BY created_at DESC LIMIT 20
|
(status, created_at DESC) |
Uses index for both filter and order, LIMIT stops early. |
Join: ON o.customer_id = c.id and c.region = ?
|
customer (region, id) |
Region filter first, then join column. |
Full‑text search on title
|
FULLTEXT(title) (MySQL) / GIN(to_tsvector('english', title)) (PostgreSQL) |
Enables inverted index. |
Very large table > 100 M rows, queries on recent data |
BRIN on timestamp column (ts ) |
BRIN stores min/max per block → fast for “newest N rows”. |
Frequent aggregates: GROUP BY product_id
|
HASH index (PostgreSQL) or clustered index (SQL Server) on product_id
|
Hash index speeds up equality grouping. |
4.4. Maintaining Index Health
Action | How to do it |
---|---|
Detect unused indexes | MySQL performance_schema .events_statements_summary_by_digest ; PostgreSQL pg_stat_user_indexes ; SQL Server sys.dm_db_index_usage_stats . |
Rebuild / reorganise | MySQL OPTIMIZE TABLE ; PostgreSQL REINDEX or VACUUM (FULL, ANALYZE) ; SQL Server ALTER INDEX REBUILD ; Oracle ALTER INDEX REBUILD . |
Update statistics | MySQL ANALYZE TABLE ; PostgreSQL ANALYZE ; SQL Server UPDATE STATISTICS ; Oracle DBMS_STATS.GATHER_TABLE_STATS . |
Monitor index bloat | PostgreSQL pgstattuple , MySQL SHOW TABLE STATUS ; SQL Server sys.dm_db_index_physical_stats . |
5. Query‑Plan Analysis – Reading EXPLAIN
5.1. Common Plan Nodes & Their Cost
Node | Typical Meaning | Red Flag |
---|---|---|
Seq Scan / Table Scan | Full table read | Expect an index. |
Index Scan | Reads index entries (may still fetch table rows). | OK if selective. |
Index Only Scan (PostgreSQL) | Index satisfies query completely. | Ideal! |
Bitmap Index Scan → Bitmap Heap Scan | Combines many indexes; may be slower than a single index. | Consider a composite index. |
Nested Loop Join | For each row from outer, probe inner. | Good for small outer side, otherwise consider Hash or Merge join. |
Hash Join | Builds hash table on inner side. | Works well for large, unsorted data. |
Merge Join | Both sides must be sorted (often via index). | Efficient if both inputs already ordered. |
Sort | Explicit sort operation. | Try to push ordering into index. |
Aggregate | Grouping/aggregation. | If GROUP BY matches index order → “GroupAggregate” (faster). |
Subquery/CTE Materialisation | Temporary result set. | May be avoidable with rewrite. |
5.2. Example: MySQL EXPLAIN
EXPLAIN FORMAT=JSON
SELECT o.id, o.amount, c.name
FROM orders o
JOIN customers c ON c.id = o.customer_id
WHERE o.status = 'paid'
AND o.created_at >= '2025-01-01'
ORDER BY o.created_at DESC
LIMIT 50;
Result (simplified):
{
"query_block": {
"select_id": 1,
"cost_info": {"query_cost":"319.50"},
"table": {
"table_name":"orders",
"access_type":"range",
"possible_keys":["idx_status_created"],
"key":"idx_status_created",
"used_key_parts":["status","created_at"],
"rows_examined_per_scan": 120,
"filtered": 85.0,
"attached_condition":"(`orders`.`status` = 'paid')"
},
"inner join":{...}
}
Interpretation
-
range
onidx_status_created
→ good (uses composite index). -
rows_examined_per_scan
low → index selective. - No separate
sort
node → ordering satisfied by index (created_at DESC
).
If you see type: ALL
(full scan) or a separate sort
step, you likely need a better index or query rewrite.
5.3. PostgreSQL EXPLAIN ANALYZE
EXPLAIN (ANALYZE, BUFFERS, FORMAT JSON)
SELECT *
FROM orders
WHERE status = 'paid'
AND created_at >= CURRENT_DATE
ORDER BY created_at DESC
LIMIT 100;
Typical output snippet:
{
"Plan": {
"Node Type": "Index Scan Backward",
"Relation Name": "orders",
"Index Name": "idx_status_created",
"Index Cond": "(status = 'paid'::text)",
"Filter": "(created_at >= now())",
"Rows Removed by Filter": 0,
"Buffers": {"Shared Hit Blocks": 75, "Shared Read Blocks": 0},
"Actual Total Time": 0.78,
"Actual Loops": 1
}
}
Key take‑aways
-
Index Scan Backward
= using index for descending order → no sort needed. -
Buffers
shows almost all data served from cache → good.
If you see Seq Scan
instead, add a composite index or rewrite the predicate.
6. Advanced Physical Design
6.1. Partitioning (Horizontal Sharding)
DBMS | Partitioning Styles | When to use |
---|---|---|
MySQL 8+ | Range, List, Hash, Key, Composite | 10 M+ rows, time‑series, multi‑tenant data. |
PostgreSQL 13+ | Declarative range/list/hash partitions, sub‑partitioning. | Same as MySQL + very large tables (> 500 M). |
SQL Server | Partitioned tables & indexes (range); can be on any column. | OLTP with massive data, need to prune partitions. |
Oracle | Range/List/Hash partitions; interval partitioning. | Enterprise data warehouses, multi‑year history. |
Best practice: Partition on a column that appears in most range predicates (e.g., created_at
). Keep the partition key immutable (no updates).
-- MySQL example
CREATE TABLE orders (
id BIGINT PRIMARY KEY,
status ENUM('new','paid','canceled') NOT NULL,
created_at DATETIME NOT NULL,
...
) PARTITION BY RANGE (YEAR(created_at)) (
PARTITION p2023 VALUES LESS THAN (2024),
PARTITION p2024 VALUES LESS THAN (2025),
PARTITION pmax VALUES LESS THAN MAXVALUE
);
6.2. Sharding (Application‑Level Horizontal Scaling)
Technique | When to consider |
---|---|
Key‑based sharding (e.g., user_id % N ) |
Very high write rates, > 10 M writes/sec. |
Directory‑based sharding (lookup table) | Uneven data distribution, need flexibility. |
Consistent hashing (e.g., using Vitess, Citus, CockroachDB) | Need to add/remove shards with minimal rebalancing. |
Hybrid (partition + shard) | Multi‑tenant SaaS with per‑tenant isolation. |
Implementation tip: Keep the shard key in every table that needs to be joined (e.g., tenant_id
). This allows joins to stay within a single shard.
6.3. Materialised Views & Summary Tables
-
When: Pre‑aggregate heavy reports (
GROUP BY
on large fact tables). -
How: Use DB‑native materialised view (PostgreSQL
CREATE MATERIALIZED VIEW
, OracleMATERIALIZED VIEW
, SQL ServerIndexed View
). -
Refresh:
REFRESH FAST
(incremental) or schedule nightlyREFRESH COMPLETE
.
CREATE MATERIALIZED VIEW daily_sales
BUILD IMMEDIATE
REFRESH FAST ON COMMIT
AS
SELECT product_id, DATE(order_ts) AS day,
SUM(amount) AS total_sales
FROM orders
GROUP BY product_id, DATE(order_ts);
6.4. Columnstore / OLAP Optimisations
-
SQL Server:
CREATE CLUSTERED COLUMNSTORE INDEX
on large fact tables → massive compression + vectorised scans. -
PostgreSQL: Use
cstore_fdw
ortimescaledb
hypertables for time‑series. -
MySQL:
COLUMNSTORE
engine (MariaDB) orInnoDB
withROW_FORMAT=COMPRESSED
.
Rule of thumb: Use row‑store for OLTP (high‑frequency point reads/writes). Use column‑store for analytical queries that scan many rows but only a few columns.
7. Application‑Side Tactics
Tactic | Why it matters | Example |
---|---|---|
Connection Pooling | Reduces handshake overhead, keeps sockets warm. | HikariCP (Java), pgBouncer (Postgres), ProxySQL (MySQL). |
Prepared‑statement caching | Avoids re‑parsing, re‑planning. |
PDO::ATTR_EMULATE_PREPARES = false (PHP). |
Batch inserts/updates | Fewer round‑trips, better transaction granularity. |
INSERT INTO t (a,b) VALUES (?,?),(?,?),(?,?) . |
Read‑through / Write‑behind cache | Offloads hot reads to Redis / Memcached. | Cache product catalog for 5 min, invalidate on write. |
Avoid N+1 queries | Reduce number of round‑trips. | Use JOIN or IN (...) instead of fetching children per parent. |
Pagination strategy |
OFFSET gets slower with deep pages. |
Use keyset pagination (WHERE id > last_seen_id ORDER BY id ). |
Retry with exponential back‑off | Handles transient dead‑locks or connection spikes gracefully. |
retry(attempt => 1..5, backoff => 2^attempt * 100ms) . |
Circuit breaker | Prevents cascading failures when DB is overloaded. | Hystrix / resilience4j . |
8. Monitoring & Alerting
Metric | Typical Threshold (OLTP) | Tool |
---|---|---|
Avg query latency | < 5 ms (core reads) | Prometheus + Grafana (pg_stat_statements.mean_time ) |
95th‑percentile latency | < 20 ms | DataDog, New Relic |
Cache hit ratio (buffer pool) | > 95 % |
SHOW ENGINE INNODB STATUS (MySQL), pg_buffercache (Postgres) |
Lock wait time | < 10 ms | performance_schema.events_waits_summary_by_thread_by_event_name |
Dead‑lock count | 0 (or < 1 per hour) | SHOW ENGINE INNODB STATUS |
Replication lag | < 1 s |
SHOW SLAVE STATUS (MySQL), pg_stat_replication
|
Disk I/O | < 80 % of provisioned bandwidth | iostat, CloudWatch |
Alert example (Prometheus)
- alert: HighQueryLatency
expr: avg_over_time(pg_stat_statements_mean_time[5m]) > 0.02
for: 2m
labels:
severity: critical
annotations:
summary: "Average PostgreSQL query latency > 20 ms"
description: "Investigate slow queries; check execution plans."
9. End‑to‑End Optimization Checklist
✅ | Item | How to Verify |
---|---|---|
1 | All hot queries have covering indexes (or at least an index on filter columns). |
EXPLAIN shows Index Scan / Index Only Scan . |
2 | No SELECT * in production code. | Code review / static analysis. |
3 | Statistics are up‑to‑date (ANALYZE run after bulk loads). |
SELECT relname, n_live_tup FROM pg_stat_user_tables; compare with `pg_class |
Top comments (0)