If your web application is slow, the database is almost certainly the bottleneck. Not the frontend framework. Not the web server. Not the network. The database.
I have spent nine years building backend systems across seven companies, and in every single engagement, database performance has been the most impactful area of optimization. At VacancySoft, where I currently serve as a Full Stack Engineer, I reduced query execution times by 60% across our core platform. At Univelcity, I achieved a 45% reduction in API response times, with database optimization accounting for the majority of those gains. At Lordwin Group, I optimised a real-time gaming backend to sustain 3,000+ concurrent users with sub-100ms query responses.
This article is the practical guide I wish I had when I started. It covers the exact strategies, tools, and techniques I use to diagnose and fix database performance problems in production systems running PostgreSQL and MySQL.
Strategy 1: Query Analysis and Profiling — Know Before You Optimize
The biggest mistake engineers make with database optimization is guessing. They add indexes randomly, rewrite queries based on intuition, or throw caching at the problem without understanding what is actually slow.
I start every optimization effort the same way: profiling.
Enable Slow Query Logging
In PostgreSQL:
-- postgresql.conf
log_min_duration_statement = 200 -- Log queries taking > 200ms
log_statement = 'none' -- Don't log all queries, just slow ones
In MySQL:
SET GLOBAL slow_query_log = 'ON';
SET GLOBAL long_query_time = 0.2; -- 200ms threshold
SET GLOBAL log_queries_not_using_indexes = 'ON';
At VacancySoft, I established a weekly ritual of reviewing the slow query log. This alone surfaced dozens of optimization opportunities that no amount of code review would have caught. Queries that looked clean in the ORM were generating catastrophic execution plans.
Use EXPLAIN ANALYZE Religiously
EXPLAIN shows you the query plan. EXPLAIN ANALYZE actually executes the query and shows real timing data. The difference matters.
EXPLAIN ANALYZE
SELECT v.id, v.title, o.name AS organisation_name, s.name AS sector_name
FROM vacancies v
JOIN organisations o ON o.id = v.organisation_id
JOIN sectors s ON s.id = v.sector_id
WHERE v.status = 'active'
AND v.created_at >= NOW() - INTERVAL '30 days'
ORDER BY v.created_at DESC
LIMIT 20;
Before optimization, this query at VacancySoft returned an EXPLAIN ANALYZE output like this:
Sort (cost=15234.56..15234.61 rows=20 width=128) (actual time=1842.331..1842.345 rows=20 loops=1)
Sort Key: v.created_at DESC
-> Hash Join (cost=8234.12..15234.43 rows=20 width=128) (actual time=1201.442..1841.892 rows=24531 loops=1)
-> Seq Scan on vacancies v (cost=0.00..6842.00 rows=24531 width=96) (actual time=0.021..892.134 rows=24531 loops=1)
Filter: ((status = 'active') AND (created_at >= ...))
Rows Removed by Filter: 187234
Planning Time: 0.892 ms
Execution Time: 1843.012 ms
Two immediate red flags: a sequential scan on the vacancies table scanning over 200,000 rows and filtering out 187,000 of them, and a hash join that is materialising far more rows than the final result needs.
This is the kind of visibility that transforms optimization from guesswork into engineering. Olamilekan Lamidi's first rule of database optimization: never optimise a query you have not profiled.
Strategy 2: Indexing Strategies — The Highest-Impact Change
Indexing is the single most impactful optimization technique in my toolkit. A well-placed index can turn a 2-second query into a 5-millisecond query. But indexes are not free — they consume storage, slow down writes, and can mislead the query planner if poorly designed.
Composite Indexes
Order matters in composite indexes. The index (status, created_at) is useful for queries that filter on status and sort or filter on created_at. It is useless for queries that filter only on created_at.
For the vacancy search query above, I created:
CREATE INDEX idx_vacancies_status_created_at
ON vacancies (status, created_at DESC);
This single index eliminated the sequential scan. The query planner now uses an index scan that reads only the relevant rows:
Index Scan using idx_vacancies_status_created_at on vacancies v
(cost=0.42..234.56 rows=20 width=96) (actual time=0.089..2.341 rows=20 loops=1)
Index Cond: ((status = 'active') AND (created_at >= ...))
Execution time: from 1,843ms to 12ms. A 99.3% reduction from a single composite index.
Partial Indexes
At VacancySoft, the majority of our queries filter on status = 'active'. Only about 15% of vacancies are active at any given time. A partial index dramatically reduces the index size:
CREATE INDEX idx_vacancies_active_created_at
ON vacancies (created_at DESC)
WHERE status = 'active';
This index is roughly 85% smaller than the full composite index, which means it fits in memory more easily and produces faster index scans.
Covering Indexes (Index-Only Scans)
A covering index includes all columns that a query needs, allowing PostgreSQL to serve the query entirely from the index without touching the heap (the actual table data). This eliminates random I/O, which is the most expensive operation on traditional storage.
CREATE INDEX idx_vacancies_covering
ON vacancies (status, created_at DESC)
INCLUDE (id, title, organisation_id, sector_id)
WHERE status = 'active';
With this covering index, PostgreSQL executes an index-only scan:
Index Only Scan using idx_vacancies_covering on vacancies v
(cost=0.42..45.23 rows=20 width=96) (actual time=0.067..0.412 rows=20 loops=1)
Heap Fetches: 0
Heap Fetches: 0 is the goal. The database never touches the table at all.
My Indexing Methodology
I follow a systematic process:
- Identify the top 20 slowest queries from the slow query log.
- Run EXPLAIN ANALYZE on each one.
- Look for sequential scans on tables with more than 10,000 rows.
- Check for sort operations that are not backed by an index.
- Create targeted indexes and verify the plan changes.
- Monitor write performance to ensure indexes are not degrading INSERT/UPDATE speed beyond acceptable thresholds.
At VacancySoft, this methodology across 15+ modules produced the 60% aggregate reduction in query execution time.
Strategy 3: Query Refactoring — Fixing What the ORM Gets Wrong
ORMs are excellent for productivity but frequently generate suboptimal SQL. The N+1 query problem is the most well-known example, but I encounter more subtle issues regularly.
Eliminating N+1 Queries
In Laravel (which I used extensively at 2am Tech, Lordwin Group, and Viaduct):
// Bad: N+1 problem — 1 query for organisations + N queries for vacancies
$organisations = Organisation::all();
foreach ($organisations as $org) {
echo $org->vacancies->count(); // Triggers a query per organisation
}
// Good: Eager loading — 2 queries total
$organisations = Organisation::with('vacancies')->get();
// Better: When you only need the count
$organisations = Organisation::withCount('vacancies')->get();
foreach ($organisations as $org) {
echo $org->vacancies_count; // No additional queries
}
At 2am Tech, the Addio financial platform was suffering from response times exceeding 3 seconds on the organisation dashboard. Profiling revealed over 150 queries per page load, almost all of them N+1 queries from Eloquent relationships. I refactored the critical paths to use eager loading and withCount, reducing the query count to 8 and the response time to 400ms.
Replacing Subqueries with JOINs and CTEs
ORMs often generate correlated subqueries where a JOIN or CTE would be far more efficient:
-- ORM-generated: correlated subquery (executes once per row)
SELECT o.id, o.name,
(SELECT COUNT(*) FROM vacancies v WHERE v.organisation_id = o.id AND v.status = 'active')
AS active_vacancy_count
FROM organisations o
WHERE o.sector_id = 5;
-- Refactored: single pass with JOIN
SELECT o.id, o.name, COUNT(v.id) AS active_vacancy_count
FROM organisations o
LEFT JOIN vacancies v ON v.organisation_id = o.id AND v.status = 'active'
WHERE o.sector_id = 5
GROUP BY o.id, o.name;
Window Functions for Complex Analytics
At VacancySoft, we needed to show month-over-month vacancy trends per sector. The initial implementation used multiple queries with application-level aggregation. I replaced it with a single query using window functions:
WITH monthly_counts AS (
SELECT
s.name AS sector_name,
DATE_TRUNC('month', v.created_at) AS month,
COUNT(*) AS vacancy_count
FROM vacancies v
JOIN sectors s ON s.id = v.sector_id
WHERE v.created_at >= NOW() - INTERVAL '12 months'
GROUP BY s.name, DATE_TRUNC('month', v.created_at)
)
SELECT
sector_name,
month,
vacancy_count,
LAG(vacancy_count) OVER (PARTITION BY sector_name ORDER BY month) AS prev_month_count,
ROUND(
(vacancy_count - LAG(vacancy_count) OVER (PARTITION BY sector_name ORDER BY month))::numeric
/ NULLIF(LAG(vacancy_count) OVER (PARTITION BY sector_name ORDER BY month), 0) * 100,
1
) AS month_over_month_pct
FROM monthly_counts
ORDER BY sector_name, month;
This single query replaced five application-level queries and a significant amount of JavaScript aggregation logic. Response time for the trends endpoint dropped from 2.1 seconds to 280ms.
Strategy 4: Schema Redesign for Performance
Sometimes the problem is not the query — it is the schema. Normalization is taught as gospel in database courses, but production systems often require strategic denormalization.
Materialized Views
At VacancySoft, our analytics dashboard required aggregating data across five tables. Even with optimized queries and proper indexes, the aggregation was too slow for real-time display. I introduced materialized views:
CREATE MATERIALIZED VIEW mv_sector_analytics AS
SELECT
s.id AS sector_id,
s.name AS sector_name,
COUNT(v.id) AS total_vacancies,
COUNT(v.id) FILTER (WHERE v.status = 'active') AS active_vacancies,
COUNT(DISTINCT v.organisation_id) AS hiring_organisations,
PERCENTILE_CONT(0.5) WITHIN GROUP (ORDER BY v.salary_max) AS median_salary,
DATE_TRUNC('day', NOW()) AS snapshot_date
FROM sectors s
LEFT JOIN vacancies v ON v.sector_id = s.id
AND v.created_at >= NOW() - INTERVAL '90 days'
GROUP BY s.id, s.name;
CREATE UNIQUE INDEX ON mv_sector_analytics (sector_id);
-- Refresh concurrently (no lock on reads)
REFRESH MATERIALIZED VIEW CONCURRENTLY mv_sector_analytics;
I set up a cron job to refresh this view every 15 minutes. The dashboard query that previously took 3.8 seconds now completes in under 10ms because it reads from the pre-computed view.
Strategic Denormalization
For the organisation profile pages, rather than joining five tables on every request, I added a cached_stats JSONB column on the organisations table:
ALTER TABLE organisations ADD COLUMN cached_stats JSONB DEFAULT '{}';
-- Updated by a background job every hour
UPDATE organisations SET cached_stats = jsonb_build_object(
'active_vacancies', (SELECT COUNT(*) FROM vacancies WHERE organisation_id = organisations.id AND status = 'active'),
'total_vacancies_90d', (SELECT COUNT(*) FROM vacancies WHERE organisation_id = organisations.id AND created_at >= NOW() - INTERVAL '90 days'),
'avg_time_to_fill', (SELECT AVG(EXTRACT(EPOCH FROM (filled_at - created_at)) / 86400) FROM vacancies WHERE organisation_id = organisations.id AND filled_at IS NOT NULL),
'updated_at', NOW()
);
This is a tradeoff: the data can be up to one hour stale, and we have the overhead of the background update job. But for this use case, slight staleness is acceptable, and the performance gain — from 450ms to 8ms per profile page — is transformative.
Strategy 5: Connection Pooling and Read Replicas
At scale, query optimization alone is not enough. You also need to manage how connections to the database are handled.
Connection Pooling with PgBouncer
PostgreSQL creates a new process for each connection, which is expensive. At VacancySoft, I deployed PgBouncer in front of PostgreSQL in transaction pooling mode:
; pgbouncer.ini
[databases]
vacancysoft = host=127.0.0.1 port=5432 dbname=vacancysoft
[pgbouncer]
listen_port = 6432
pool_mode = transaction
max_client_conn = 500
default_pool_size = 25
reserve_pool_size = 5
This allows 500 application connections to share 25 database connections. Without pooling, our application would regularly hit PostgreSQL's connection limit during traffic spikes, causing cascading failures.
Read Replicas
For read-heavy workloads (which describe most web applications), I split reads and writes across primary and replica databases:
// Database configuration with read/write splitting
const knex = require('knex');
const primary = knex({
client: 'pg',
connection: process.env.DATABASE_PRIMARY_URL,
});
const replica = knex({
client: 'pg',
connection: process.env.DATABASE_REPLICA_URL,
});
// Service layer decides which connection to use
class VacancyService {
async search(filters) {
return replica('vacancies')
.where(filters)
.orderBy('created_at', 'desc');
}
async create(data) {
return primary('vacancies')
.insert(data)
.returning('*');
}
}
At VacancySoft, this split reduced load on the primary database by approximately 70%, since the vast majority of our traffic is read operations (search, analytics, reporting).
Strategy 6: Caching Layers
The fastest database query is the one you never make. Caching is the final layer of my optimization strategy.
Redis for Hot Data
const Redis = require('ioredis');
const redis = new Redis(process.env.REDIS_URL);
async function getSectorAnalytics(sectorId) {
const cacheKey = `sector:${sectorId}:analytics`;
const cached = await redis.get(cacheKey);
if (cached) {
return JSON.parse(cached);
}
const data = await db('mv_sector_analytics')
.where('sector_id', sectorId)
.first();
await redis.setex(cacheKey, 900, JSON.stringify(data)); // 15-minute TTL
return data;
}
Cache Invalidation Strategy
Cache invalidation is famously one of the two hard problems in computer science. My approach is pragmatic:
- TTL-based expiration for data that can tolerate staleness (analytics, aggregations).
- Event-driven invalidation for data that must be fresh (user profiles, active listings). When a vacancy is created or updated, I publish an event that invalidates the relevant cache keys.
- Cache warming for predictable access patterns. At VacancySoft, I pre-warm the cache for the top 50 sectors and top 100 organisations every 15 minutes, so the first user to access these pages never experiences a cache miss.
// Event-driven cache invalidation
eventBus.on('vacancy:created', async (vacancy) => {
await redis.del(`sector:${vacancy.sector_id}:analytics`);
await redis.del(`org:${vacancy.organisation_id}:stats`);
await redis.del('homepage:featured_vacancies');
});
Results: Before and After
Here is a summary of the measurable impact these strategies produced across my career, as implemented by Olamilekan Lamidi:
| Company | Metric | Before | After | Improvement |
|---|---|---|---|---|
| VacancySoft | Average query execution time | 850ms | 340ms | 60% reduction |
| VacancySoft | Dashboard load time | 3.8s | 0.4s | 89% reduction |
| VacancySoft | p95 API response time | 1.2s | 180ms | 85% reduction |
| Univelcity | API response time (avg) | 620ms | 340ms | 45% reduction |
| Univelcity | Queries per page load | 45 | 8 | 82% reduction |
| Lordwin Group | Real-time query latency | 220ms | 35ms | 84% reduction |
| 2am Tech | Organisation dashboard load | 3.1s | 400ms | 87% reduction |
Key Takeaways
- Profile before you optimize. EXPLAIN ANALYZE and slow query logs should be your starting point, not your intuition.
- Composite and partial indexes are the highest-leverage optimization in most applications. One well-placed index can eliminate 99% of query time.
- ORMs are productivity tools, not performance tools. Review the SQL they generate, especially for complex queries.
- Materialized views and strategic denormalization are not anti-patterns — they are essential tools for read-heavy workloads.
- Connection pooling is mandatory once you are handling more than a few hundred concurrent connections.
- Caching is the last layer, not the first. Optimize the underlying queries before adding cache complexity.
Database optimization is not a one-time project. It is an ongoing discipline. The queries that perform well today may become bottlenecks tomorrow as data grows and access patterns change. Build profiling and monitoring into your workflow, and treat database performance as a first-class engineering concern.
Top comments (0)