Gaurav

Posted on Sep 2

How to Optimize SQL Queries in High-Traffic Applications

Optimizing SQL Queries for High‑Traffic Applications

(A practical, step‑by‑step guide that works across MySQL, PostgreSQL, SQL Server, and Oracle)

1. Why “SQL‑only” Optimisation Matters

High‑traffic apps often hit the database thousands to millions of times per second. Even a 1 ms improvement per query can save seconds of CPU time, reduce latency, and keep your infrastructure costs down. The biggest gains come from making the database do less work, not just from adding more hardware.

2. The Optimization Lifecycle

Phase	Goal	Typical Tools / Artefacts
A. Baseline	Capture realistic workloads, identify hot queries.	Slow‑query log, pg_stat_statements, Query Store (SQL Server), AWR (Oracle), application logs.
B. Diagnose	Understand why a query is slow.	EXPLAIN / EXPLAIN ANALYZE, visual explain plans, `SHOW PROFILE`, DMVs.
C. Refactor	Rewrite / index / restructure to eliminate bottlenecks.	DDL changes, query rewrite, materialized views, partitioning.
D. Validate	Verify that performance improved without regressions.	Same metrics as baseline, regression test suite, load‑testing tools (JMeter, k6).
E. Guard	Prevent regressions in the future.	CI checks, automated plan‑stability tests, query‑cost thresholds, monitoring alerts.

3. Foundations – “First‑Principles” Checks

#	Checklist	Why it matters
1	Use the right data types – avoid `VARCHAR(255)` for a 2‑digit code; prefer `CHAR(2)` or `SMALLINT`.	Smaller rows → more fits in memory → fewer page reads.
2	*Never SELECT in production** – enumerate columns.	Reduces I/O, avoids unnecessary row‑locking and cache pressure.
3	Keep transactions short – commit ASAP.	Reduces lock contention and dead‑lock probability.
4	Prefer set‑based operations over row‑by‑row loops (cursor, WHILE).	DB engines are built for set processing; loops cause massive overhead.
5	Avoid functions on indexed columns in WHERE/JOIN (e.g., `WHERE DATE(col) = ...`).	Index cannot be used → full scan. Use computed/derived columns or range predicates instead.
6	Parameterise queries (prepared statements) – not string concatenation.	Enables plan reuse, prevents SQL injection, reduces parsing overhead.
7	Watch out for implicit conversions – `WHERE int_col = '123'` forces a scan.	Force implicit cast on the column side kills the index.

4. Indexing – The Most Powerful Lever

4.1. Types of Indexes (quick reference)

DBMS	B‑Tree	Hash	GiST / SP‑GiST	BRIN	Columnstore	Full‑Text
MySQL	✅	✅ (Memory)	✗	✅ (8.0+)	✅ (InnoDB/ColumnStore)	✅
PostgreSQL	✅	✅ (hash)	✅	✅	✅ (cstore_fdw)	✅
SQL Server	✅	✅ (Memory‑Optimized)	✅ (spatial)	✅ (SQL Server 2019+)	✅ (Columnstore)	✅
Oracle	✅	✅ (in‑memory)	✅ (Spatial)	✅ (approx.)	✅ (Hybrid)	✅

4.2. Index Design Process

Identify the hot predicates – the columns that appear in WHERE, JOIN ON, ORDER BY, GROUP BY.
Check cardinality – high cardinality (many distinct values) = good for B‑Tree; low cardinality may benefit from bitmap (PostgreSQL BRIN/GIN, Oracle bitmap).
Covering (Include) columns – add non‑key columns to an index so the query can be satisfied entirely from the index (MySQL Server INCLUDE, PostgreSQL INCLUDE, MySQL covering via SELECT that matches the index).
Composite index ordering – order columns by filtering power first, then by sorting/grouping.
- Example: WHERE status = ? AND created_at > ? ORDER BY created_at DESC → (status, created_at DESC).
Avoid over‑indexing – each extra index adds write overhead (INSERT/UPDATE/DELETE). Keep an index‑to‑write‑ratio < 1 for OLTP workloads.

4.3. Practical Index Recipes

Scenario	Recommended Index (MySQL/PostgreSQL)	Why
Exact lookup on `user_id` (PK)	`PRIMARY KEY (user_id)`	Already optimal.
Range + filter: `WHERE country = ? AND ts >= ?`	`(country, ts)`	Country filters first, then range on ts.
Sorting: `WHERE status = ? ORDER BY created_at DESC LIMIT 20`	`(status, created_at DESC)`	Uses index for both filter and order, `LIMIT` stops early.
Join: `ON o.customer_id = c.id` and `c.region = ?`	`customer (region, id)`	Region filter first, then join column.
Full‑text search on `title`	`FULLTEXT(title)` (MySQL) / `GIN(to_tsvector('english', title))` (PostgreSQL)	Enables inverted index.
Very large table > 100 M rows, queries on recent data	BRIN on timestamp column (`ts`)	BRIN stores min/max per block → fast for “newest N rows”.
Frequent aggregates: `GROUP BY product_id`	`HASH` index (PostgreSQL) or clustered index (SQL Server) on `product_id`	Hash index speeds up equality grouping.

4.4. Maintaining Index Health

Action	How to do it
Detect unused indexes	MySQL `performance_schema`.`events_statements_summary_by_digest`; PostgreSQL `pg_stat_user_indexes`; SQL Server `sys.dm_db_index_usage_stats`.
Rebuild / reorganise	MySQL `OPTIMIZE TABLE`; PostgreSQL `REINDEX` or `VACUUM (FULL, ANALYZE)`; SQL Server `ALTER INDEX REBUILD`; Oracle `ALTER INDEX REBUILD`.
Update statistics	MySQL `ANALYZE TABLE`; PostgreSQL `ANALYZE`; SQL Server `UPDATE STATISTICS`; Oracle `DBMS_STATS.GATHER_TABLE_STATS`.
Monitor index bloat	PostgreSQL `pgstattuple`, MySQL `SHOW TABLE STATUS`; SQL Server `sys.dm_db_index_physical_stats`.

5. Query‑Plan Analysis – Reading EXPLAIN

5.1. Common Plan Nodes & Their Cost

Node	Typical Meaning	Red Flag
Seq Scan / Table Scan	Full table read	Expect an index.
Index Scan	Reads index entries (may still fetch table rows).	OK if selective.
Index Only Scan (PostgreSQL)	Index satisfies query completely.	Ideal!
Bitmap Index Scan → Bitmap Heap Scan	Combines many indexes; may be slower than a single index.	Consider a composite index.
Nested Loop Join	For each row from outer, probe inner.	Good for small outer side, otherwise consider Hash or Merge join.
Hash Join	Builds hash table on inner side.	Works well for large, unsorted data.
Merge Join	Both sides must be sorted (often via index).	Efficient if both inputs already ordered.
Sort	Explicit sort operation.	Try to push ordering into index.
Aggregate	Grouping/aggregation.	If `GROUP BY` matches index order → “GroupAggregate” (faster).
Subquery/CTE Materialisation	Temporary result set.	May be avoidable with rewrite.

5.2. Example: MySQL EXPLAIN

EXPLAIN FORMAT=JSON
SELECT  o.id, o.amount, c.name
FROM    orders o
JOIN    customers c ON c.id = o.customer_id
WHERE   o.status = 'paid'
  AND   o.created_at >= '2025-01-01'
ORDER BY o.created_at DESC
LIMIT 50;

Result (simplified):

{
  "query_block": {
    "select_id": 1,
    "cost_info": {"query_cost":"319.50"},
    "table": {
      "table_name":"orders",
      "access_type":"range",
      "possible_keys":["idx_status_created"],
      "key":"idx_status_created",
      "used_key_parts":["status","created_at"],
      "rows_examined_per_scan": 120,
      "filtered": 85.0,
      "attached_condition":"(`orders`.`status` = 'paid')"
    },
    "inner join":{...}
}

Interpretation

range on idx_status_created → good (uses composite index).
rows_examined_per_scan low → index selective.
No separate sort node → ordering satisfied by index (created_at DESC).

If you see type: ALL (full scan) or a separate sort step, you likely need a better index or query rewrite.

5.3. PostgreSQL EXPLAIN ANALYZE

EXPLAIN (ANALYZE, BUFFERS, FORMAT JSON)
SELECT *
FROM   orders
WHERE  status = 'paid'
  AND  created_at >= CURRENT_DATE
ORDER  BY created_at DESC
LIMIT  100;

Typical output snippet:

{
  "Plan": {
    "Node Type": "Index Scan Backward",
    "Relation Name": "orders",
    "Index Name": "idx_status_created",
    "Index Cond": "(status = 'paid'::text)",
    "Filter": "(created_at >= now())",
    "Rows Removed by Filter": 0,
    "Buffers": {"Shared Hit Blocks": 75, "Shared Read Blocks": 0},
    "Actual Total Time": 0.78,
    "Actual Loops": 1
  }
}

Key take‑aways

Index Scan Backward = using index for descending order → no sort needed.
Buffers shows almost all data served from cache → good.

If you see Seq Scan instead, add a composite index or rewrite the predicate.

6. Advanced Physical Design

6.1. Partitioning (Horizontal Sharding)

DBMS	Partitioning Styles	When to use
MySQL 8+	Range, List, Hash, Key, Composite	10 M+ rows, time‑series, multi‑tenant data.
PostgreSQL 13+	Declarative range/list/hash partitions, sub‑partitioning.	Same as MySQL + very large tables (> 500 M).
SQL Server	Partitioned tables & indexes (range); can be on any column.	OLTP with massive data, need to prune partitions.
Oracle	Range/List/Hash partitions; interval partitioning.	Enterprise data warehouses, multi‑year history.

Best practice: Partition on a column that appears in most range predicates (e.g., created_at). Keep the partition key immutable (no updates).

-- MySQL example
CREATE TABLE orders (
  id BIGINT PRIMARY KEY,
  status ENUM('new','paid','canceled') NOT NULL,
  created_at DATETIME NOT NULL,
  ... 
) PARTITION BY RANGE (YEAR(created_at)) (
  PARTITION p2023 VALUES LESS THAN (2024),
  PARTITION p2024 VALUES LESS THAN (2025),
  PARTITION pmax  VALUES LESS THAN MAXVALUE
);

6.2. Sharding (Application‑Level Horizontal Scaling)

Technique	When to consider
Key‑based sharding (e.g., `user_id % N`)	Very high write rates, > 10 M writes/sec.
Directory‑based sharding (lookup table)	Uneven data distribution, need flexibility.
Consistent hashing (e.g., using Vitess, Citus, CockroachDB)	Need to add/remove shards with minimal rebalancing.
Hybrid (partition + shard)	Multi‑tenant SaaS with per‑tenant isolation.

Implementation tip: Keep the shard key in every table that needs to be joined (e.g., tenant_id). This allows joins to stay within a single shard.

6.3. Materialised Views & Summary Tables

When: Pre‑aggregate heavy reports (GROUP BY on large fact tables).
How: Use DB‑native materialised view (PostgreSQL CREATE MATERIALIZED VIEW, Oracle MATERIALIZED VIEW, SQL Server Indexed View).
Refresh: REFRESH FAST (incremental) or schedule nightly REFRESH COMPLETE.

  CREATE MATERIALIZED VIEW daily_sales
  BUILD IMMEDIATE
  REFRESH FAST ON COMMIT
  AS
  SELECT product_id, DATE(order_ts) AS day,
         SUM(amount) AS total_sales
  FROM orders
  GROUP BY product_id, DATE(order_ts);

6.4. Columnstore / OLAP Optimisations

SQL Server: CREATE CLUSTERED COLUMNSTORE INDEX on large fact tables → massive compression + vectorised scans.
PostgreSQL: Use cstore_fdw or timescaledb hypertables for time‑series.
MySQL: COLUMNSTORE engine (MariaDB) or InnoDB with ROW_FORMAT=COMPRESSED.

Rule of thumb: Use row‑store for OLTP (high‑frequency point reads/writes). Use column‑store for analytical queries that scan many rows but only a few columns.

7. Application‑Side Tactics

Tactic	Why it matters	Example
Connection Pooling	Reduces handshake overhead, keeps sockets warm.	HikariCP (Java), pgBouncer (Postgres), ProxySQL (MySQL).
Prepared‑statement caching	Avoids re‑parsing, re‑planning.	`PDO::ATTR_EMULATE_PREPARES = false` (PHP).
Batch inserts/updates	Fewer round‑trips, better transaction granularity.	`INSERT INTO t (a,b) VALUES (?,?),(?,?),(?,?)`.
Read‑through / Write‑behind cache	Offloads hot reads to Redis / Memcached.	Cache product catalog for 5 min, invalidate on write.
Avoid N+1 queries	Reduce number of round‑trips.	Use `JOIN` or `IN (...)` instead of fetching children per parent.
Pagination strategy	`OFFSET` gets slower with deep pages.	Use keyset pagination (`WHERE id > last_seen_id ORDER BY id`).
Retry with exponential back‑off	Handles transient dead‑locks or connection spikes gracefully.	`retry(attempt => 1..5, backoff => 2^attempt * 100ms)`.
Circuit breaker	Prevents cascading failures when DB is overloaded.	Hystrix / `resilience4j`.

8. Monitoring & Alerting

Metric	Typical Threshold (OLTP)	Tool
Avg query latency	< 5 ms (core reads)	Prometheus + Grafana (`pg_stat_statements.mean_time`)
95th‑percentile latency	< 20 ms	DataDog, New Relic
Cache hit ratio (buffer pool)	> 95 %	`SHOW ENGINE INNODB STATUS` (MySQL), `pg_buffercache` (Postgres)
Lock wait time	< 10 ms	`performance_schema.events_waits_summary_by_thread_by_event_name`
Dead‑lock count	0 (or < 1 per hour)	`SHOW ENGINE INNODB STATUS`
Replication lag	< 1 s	`SHOW SLAVE STATUS` (MySQL), `pg_stat_replication`
Disk I/O	< 80 % of provisioned bandwidth	iostat, CloudWatch

Alert example (Prometheus)

- alert: HighQueryLatency
  expr: avg_over_time(pg_stat_statements_mean_time[5m]) > 0.02
  for: 2m
  labels:
    severity: critical
  annotations:
    summary: "Average PostgreSQL query latency > 20 ms"
    description: "Investigate slow queries; check execution plans."

9. End‑to‑End Optimization Checklist

✅	Item	How to Verify
1	All hot queries have covering indexes (or at least an index on filter columns).	`EXPLAIN` shows `Index Scan` / `Index Only Scan`.
2	No SELECT * in production code.	Code review / static analysis.
3	Statistics are up‑to‑date (ANALYZE run after bulk loads).	`SELECT relname, n_live_tup FROM pg_stat_user_tables;` compare with `pg_class

DEV Community