Introduction
At Encrisoft, we are genuinely excited about Alerta, our flexible instant alerting API that’s built to fit the unique needs of businesses. It has been amazing to see it in action, delivering over 80 million alerts so far, helping companies stay on top of things with timely, tailored notifications that make a real difference.
But here’s the thing about growth: it’s a double-edged sword. More alerts means more value for customers, but it also means more pressure on the system. Over time, we started noticing signs of strain:
- Dashboards that once loaded instantly began to lag.
- Queries that were lightning-fast slowed to a crawl.
- Uptime felt fragile because the database was under stress.
This article is a technical deep-dive into how we fixed that. You’ll see exactly how we:
- Optimised PostgreSQL with the right indexes.
- Introduced Redis caching (the smart way).
- Achieved a 20× speedup without adding a single new server.
And most importantly, you’ll get the code, reasoning, and metrics so you can apply the same lessons in your own systems.
The Scale Problem
Let’s make it concrete.
- Our
Alert
table grew past 80 million rows, and it was still growing every month. - Queries that powered critical dashboards, things like “how many alerts in the last 30 days?” or “show me transaction counts by type” started taking seconds instead of milliseconds.
- Even modest delays caused real pain. For businesses who rely on Alerta to see what’s happening right now, waiting two seconds for a dashboard to load is like driving with foggy glasses.
It was clear - the system wasn’t scaling gracefully.
Index to the Rescue
We started by looking at the data. PostgreSQL has a wonderful feature: slow query logs. By turning those on, we could see exactly which queries were dragging.
The culprit? Sequential scans. Our queries were walking through millions of rows because they didn’t have the right indexes to guide them.
So we fixed that by creating composite indexes, indexes tailored specifically to the queries users were actually running:
-- For queries by user and time
CREATE INDEX CONCURRENTLY idx_alert_uid_createdat
ON "Alert" ("userId", "createdAt" DESC);
-- For queries by user, type, and time
CREATE INDEX CONCURRENTLY idx_alert_uid_type_createdat
ON "Alert" ("userId", "type", "createdAt" DESC);
The impact was immediate:
- Dashboard queries that once took ~2 seconds now returned in ~100 ms.
- Database CPU usage during peak hours dropped by 60%.
One small change, just two indexes, transformed performance.
Redis Caching (Smart, Per-User)
- Indexes solved the first problem, but we noticed another: users kept hitting the same dashboards again and again. That meant the same queries were being re-run, even though the results hadn’t changed.
The solution? Redis caching.
But we didn’t just cache everything blindly. Instead, we cached at the per-user level, so each user got their own cached results, isolated from others.
Here’s what the cache key looked like:
u:${userId}:alert_count:${filter}:${start}:${end}
And here’s what that meant:
-
u:${userId}
→ cache scoped to a specific user (no data leaks). -
alert_count
→ the kind of query. -
${filter}:${start}:${end}
→ captures the parameters so the cache is precise.
With a short TTL (time-to-live) of 60 seconds, users received nearly instant responses most of the time and had access to fresh data whenever needed.
The results:
Repeat queries came back in <50 ms.
Database load dropped even further, with 80% fewer queries during peak.
Results Snapshot
Here’s the before-and-after in one table:
Metric | Before Optimization | After Optimization |
---|---|---|
Dashboard load time | ~2 seconds | ~100–150 ms |
Query load on database | Very high | 70–80% lower |
CPU usage during peak | 80–90% | 30–40% |
Alerta went from fragile at millions of rows to scalable at billions.
Personal Lessons as a Founder
Looking back, here’s what this journey taught me:
- Measure before you optimise: We didn’t guess, we looked at slow query logs. That saved us from “fixing” the wrong thing.
- Index smart, not more: A single composite index delivered massive gains. Randomly adding indexes just bloats your database.
- Cache with precision: User-level caching avoided the pitfalls of global caching (like data leaks or stale results).
- Scaling ≠ buying hardware: Throwing servers at the problem would have been expensive and temporary. Engineering finesse gave us a long-term win.
Closing
Performance is more than numbers; it’s about trust.
When dashboards respond instantly, users feel confident. When queries drag, even for a second, trust erodes.
At Encrisoft, we restored that confidence in Alerta with nothing more than two indexes and smart caching. The result was a 20× speedup, a database that runs cooler, and a platform ready for billions of alerts.
And want to know the best part? These aren’t exotic tricks. Any engineering team can apply them. Measure, index smartly, cache wisely, and you’ll be amazed how far your existing infrastructure can go.
Top comments (0)