DEV Community: Gauthamram Ravichandran

Real-Time APIs Are Simpler Than You Think: Redis, Lua, and 4k Updates/sec

Gauthamram Ravichandran — Sat, 16 May 2026 17:35:58 +0000

1. Intro — The Problem We Actually Had

Building real-time systems is often presented as a distributed systems problem.

Kafka, stream processors, event buses, fanout pipelines, multiple caches — the architecture diagrams usually become complicated very quickly.

But the problem we were trying to solve was actually much simpler.

We were ingesting thousands of live crypto price updates per second from exchange WebSocket streams. The frontend already consumed those streams directly for ultra-low latency updates. What users still needed, however, was a fast REST API capable of serving:

sortable market data
filtered leaderboards
paginated live results
near real-time prices

Streaming data is one thing. Querying live data efficiently is another.

Our initial attempts with PostgreSQL quickly became difficult under constant high-frequency writes combined with sorted read-heavy workloads. At the same time, we wanted to avoid introducing unnecessary infrastructure complexity.

We didn’t want Kafka, distributed stream processors, or elaborate event-driven systems unless we absolutely needed them.

What we eventually built was much smaller, much simpler, and surprisingly fast.

2. Redis as the Realtime State Layer

The ingestion side of the system was fairly straightforward. ECS services maintained persistent WebSocket connections to multiple exchanges, continuously consuming live market updates at roughly 3k–4k messages per second.

Before storing anything, the incoming data was normalized into a consistent internal format. Different exchanges exposed different payload structures, symbol formats, and price representations, so having a normalized layer early in the pipeline simplified everything downstream.

The frontend still connected directly to exchange WebSockets for ultra-low latency updates. We were not trying to replace streaming.

Instead, we needed a queryable real-time state layer.

WebSockets solved streaming. Redis solved queryability.

Redis became the central live data store for the API layer:

Strings stored mutable market payloads and price data
Sorted sets powered rankings and leaderboards
APIs queried Redis directly for near real-time market views

This model fit the workload surprisingly well.

The system was heavily read-oriented, latency-sensitive, and constantly mutating. Users wanted sorted and filterable market views, while prices were changing continuously underneath. Redis handled this naturally without introducing additional infrastructure layers or complicated synchronization logic.

More importantly, Redis allowed us to separate two very different concerns:

streaming live updates to clients
querying live state efficiently

That distinction simplified the overall architecture considerably.

3. The Mistake: Doing Processing in Python

Our initial implementation was much simpler than the final version.

At first, we were not fully leveraging Redis data structures like sorted sets. Most of the live market data was stored using hashes and strings, while the actual sorting, filtering, and pagination logic lived inside the Python API layer.

The request flow looked something like this:

Fetch large datasets from Redis
Pull them across the network into Python
Perform sorting and filtering in-memory
Paginate the results
Return a much smaller response payload to the client

Conceptually, this felt reasonable at the time. Python gave us flexibility, and the implementation was straightforward to iterate on.

But as traffic increased, the inefficiency became obvious.

The issue was not that Redis was slow.

The issue was not that Python was slow either.

The real bottleneck was network transfer.

For every request, we were moving far more data than we actually needed. Even if the client only requested a small paginated result set, the API layer still had to fetch and process significantly larger datasets before trimming them down.

At small scale, this overhead was easy to ignore.

At thousands of requests per second, it became expensive very quickly.

This was the point where we started treating Redis less like a simple cache and more like a computation layer. Instead of bringing data to Python for processing, we started asking a different question:

How much of this work can happen directly inside Redis itself?

That shift ended up changing the entire performance profile of the system.

4. Moving Logic into Lua

The biggest improvement came from pushing more query logic directly into Redis using Lua scripts.

Instead of fetching large datasets into Python and processing them there, Redis handled:

sorting
pagination
slicing
partial filtering

The API layer now received only the small subset of records actually needed for the response.

This drastically reduced network transfer between Redis and the application layer, which ended up having a much bigger impact than micro-optimizing Python itself.

The important realization was simple:

We stopped moving unnecessary data across the network.

Once we made that shift, latency dropped significantly and the overall system became much more efficient without adding architectural complexity.

5. Keeping the Architecture Operationally Simple

One of the deliberate decisions throughout the system was avoiding unnecessary infrastructure complexity.

We relied mostly on existing operational building blocks:

ECS for long-running websocket consumers and APIs
Redis for realtime state and querying
Aurora PostgreSQL for static metadata

There were no dedicated stream processors, event buses, or distributed processing pipelines in the middle.

That decision was partly architectural and partly operational. Systems that process realtime data tend to become complicated very quickly, especially when every new scaling concern introduces another moving piece.

In our case, Redis already solved the core problem effectively enough that introducing additional infrastructure would have added more operational burden than actual value.

6. Lessons Learned

A few things stood out while building this system.

Real-time APIs are mostly data locality problems

The biggest latency improvements did not come from optimizing Python or scaling infrastructure. They came from reducing unnecessary data movement between Redis and the application layer.

Moving computation closer to the data mattered far more than expected.

Redis can be much more than a cache

A lot of systems use Redis as a temporary caching layer sitting in front of a “real” database.

In this case, Redis became the actual realtime query engine:

mutable live state
sorted rankings
pagination
near realtime querying

That shift simplified the architecture considerably.

Streaming and querying are different problems

WebSockets solved live delivery.

Redis solved queryability.

Those two concerns are often grouped together under “real-time systems,” but they behave very differently operationally.

Simplicity scales surprisingly far

We intentionally avoided introducing additional distributed systems complexity unless it became absolutely necessary.

Just Redis, Python, ECS, and existing infrastructure.

Not every optimization needs to happen inside Redis

One interesting thing we never moved into Redis was asset search functionality.

Searching by asset names or symbols still happened entirely in Python using pandas-based processing over the full dataset. We did consider implementing search logic directly inside Redis through Lua scripts, but unlike sorting and pagination, this never became a meaningful bottleneck in practice.

Even operating on the full dataset, the latency remained acceptable for our workload, so adding more complexity simply was not worth it.

That was another useful reminder:

Not every bottleneck deserves architectural complexity.

Thanks for reading.
Feel free to connect with me on LinkedIn and check out some of my work on GitHub.

Database Partitioning Lessons That Apply Directly to Embedding Storage

Gauthamram Ravichandran — Sun, 15 Feb 2026 13:21:32 +0000

I’ve worked on large partitioned PostgreSQL systems (tens of TBs, heavy ingestion, high query fan-out).

One thing that’s been interesting while diving into AI infrastructure:
Embedding storage systems repeat many of the same distributed storage lessons we already learned in relational systems.
The difference? The failure modes are just harder to see.
Let’s break down a few parallels.

1. Hot Partitions Kill Performance (Even in Vector Systems)

In large time-series databases, recent partitions get hammered:

Most writes go to the newest partition
Most reads target recent data
Vacuum and index maintenance concentrate there

You get a write hotspot and a read hotspot at the same time.

Embedding systems have similar patterns:

Frequently queried namespaces
Recently ingested documents
Popular tenants
Trending content

If all of that lands in a single logical index (or shard), your ANN structure becomes the hotspot. Even HNSW doesn’t save you here. Why?
Because ANN search still depends on memory locality and graph traversal efficiency.

When a single shard handles most traffic, Cache pressure and Graph traversal depth increases with increased memory fragmentation while CPU/GPU utilisations stays uneven. Partitioning strategy often matters more than the ANN algorithm choice.

2. Partition Pruning > Index Cleverness

In large relational systems, performance gains don’t come from better indexes alone. They come from avoiding touching irrelevant data entirely.
Partition pruning reduces the working set before the query planner even considers index scans.
In embedding systems, the equivalent pattern is:

Tenant-level isolation
Time-based segmentation
Metadata filtering before vector search

If you’re not aggressively reducing the candidate set before running similarity search, you’re just brute-forcing with extra math.
For example:
Instead of:

Search entire 100M vector space

You do:

Filter tenant + content_type + time range
Then run ANN on the reduced subset

Cost per query will be reduced and latency will be improved.
Just classic database engineering.

3. Too Many Partitions Is a Problem

Partitioning is powerful (with right partitions count)
In large Postgres systems, too many partitions cause:

Increased planning time
Metadata bloat
Autovacuum lag
More file handles

In embedding systems:

Too many small indexes fragment memory
Background rebuild jobs multiply
GPU memory utilization becomes uneven

Sharding everything blindly is not architecture. It’s postponing hard decisions.

The sweet spot usually involves:

Logical grouping
Monitoring slow queries
Periodic rebalancing

Exactly like relational systems.

4. Re-indexing Cost Is Underestimated

In traditional databases, index rebuilds are expensive and operationally sensitive.
In embedding systems, the cost is even more dangerous.
Because model evolution forces:

Full re-embeddings
Bulk writes
Index rebuilds
Graph regeneration

If you don’t design storage with model versioning in mind, you’ll eventually hit:

Dual-index storage spikes
Downtime windows
Cost explosions

A practical pattern:

Store embeddings with model_version column
Maintain versioned indexes
Gradually phase traffic
Garbage collect old versions

Treat model upgrades like schema migrations.

5. Memory Locality Still Wins

ANN performance depends heavily on memory layout.

Fragmented shards → worse locality → more cache misses → higher tail latency.

Same principle as B-tree depth, BRIN locality. Even in “AI systems,” performance still collapses into memory access patterns.

The fundamentals haven’t changed.

6. At Scale, Embedding Systems Become Storage Systems

I initially thought embedding systems are purely ML-driven. That’s certainly true for a POC stage.

Once you go to production, the worry shifts to handling big data efficiently. So like relational systems, there's tuning partition boundaries, managing index lifecycles, optimizing candidate pruning and managing cost per query

You’re building distributed storage systems. Vector math is just one layer.

Final Thought

As embedding systems move from prototypes to production, the conversation shifts.

From:

“How do we compute similarity?”

To:

“How do we scale, partition, isolate, and evolve this safely?”

The deeper you go, the more familiar the problems look.

And maybe that’s the exciting part!!!
AI systems are pushing us to revisit distributed systems fundamentals with a new set of constraints.

Curious how others are approaching partitioning and model versioning in real-world deployments.