Gauthamram Ravichandran

Posted on May 16

Real-Time APIs Are Simpler Than You Think: Redis, Lua, and 4k Updates/sec

#redis #lua #python #showdev

1. Intro — The Problem We Actually Had

Building real-time systems is often presented as a distributed systems problem.

Kafka, stream processors, event buses, fanout pipelines, multiple caches — the architecture diagrams usually become complicated very quickly.

But the problem we were trying to solve was actually much simpler.

We were ingesting thousands of live crypto price updates per second from exchange WebSocket streams. The frontend already consumed those streams directly for ultra-low latency updates. What users still needed, however, was a fast REST API capable of serving:

sortable market data
filtered leaderboards
paginated live results
near real-time prices

Streaming data is one thing. Querying live data efficiently is another.

Our initial attempts with PostgreSQL quickly became difficult under constant high-frequency writes combined with sorted read-heavy workloads. At the same time, we wanted to avoid introducing unnecessary infrastructure complexity.

We didn’t want Kafka, distributed stream processors, or elaborate event-driven systems unless we absolutely needed them.

What we eventually built was much smaller, much simpler, and surprisingly fast.

2. Redis as the Realtime State Layer

The ingestion side of the system was fairly straightforward. ECS services maintained persistent WebSocket connections to multiple exchanges, continuously consuming live market updates at roughly 3k–4k messages per second.

Before storing anything, the incoming data was normalized into a consistent internal format. Different exchanges exposed different payload structures, symbol formats, and price representations, so having a normalized layer early in the pipeline simplified everything downstream.

The frontend still connected directly to exchange WebSockets for ultra-low latency updates. We were not trying to replace streaming.

Instead, we needed a queryable real-time state layer.

WebSockets solved streaming. Redis solved queryability.

Redis became the central live data store for the API layer:

Strings stored mutable market payloads and price data
Sorted sets powered rankings and leaderboards
APIs queried Redis directly for near real-time market views

This model fit the workload surprisingly well.

The system was heavily read-oriented, latency-sensitive, and constantly mutating. Users wanted sorted and filterable market views, while prices were changing continuously underneath. Redis handled this naturally without introducing additional infrastructure layers or complicated synchronization logic.

More importantly, Redis allowed us to separate two very different concerns:

streaming live updates to clients
querying live state efficiently

That distinction simplified the overall architecture considerably.

3. The Mistake: Doing Processing in Python

Our initial implementation was much simpler than the final version.

At first, we were not fully leveraging Redis data structures like sorted sets. Most of the live market data was stored using hashes and strings, while the actual sorting, filtering, and pagination logic lived inside the Python API layer.

The request flow looked something like this:

Fetch large datasets from Redis
Pull them across the network into Python
Perform sorting and filtering in-memory
Paginate the results
Return a much smaller response payload to the client

Conceptually, this felt reasonable at the time. Python gave us flexibility, and the implementation was straightforward to iterate on.

But as traffic increased, the inefficiency became obvious.

The issue was not that Redis was slow.

The issue was not that Python was slow either.

The real bottleneck was network transfer.

For every request, we were moving far more data than we actually needed. Even if the client only requested a small paginated result set, the API layer still had to fetch and process significantly larger datasets before trimming them down.

At small scale, this overhead was easy to ignore.

At thousands of requests per second, it became expensive very quickly.

This was the point where we started treating Redis less like a simple cache and more like a computation layer. Instead of bringing data to Python for processing, we started asking a different question:

How much of this work can happen directly inside Redis itself?

That shift ended up changing the entire performance profile of the system.

4. Moving Logic into Lua

The biggest improvement came from pushing more query logic directly into Redis using Lua scripts.

Instead of fetching large datasets into Python and processing them there, Redis handled:

sorting
pagination
slicing
partial filtering

The API layer now received only the small subset of records actually needed for the response.

This drastically reduced network transfer between Redis and the application layer, which ended up having a much bigger impact than micro-optimizing Python itself.

The important realization was simple:

We stopped moving unnecessary data across the network.

Once we made that shift, latency dropped significantly and the overall system became much more efficient without adding architectural complexity.

5. Keeping the Architecture Operationally Simple

One of the deliberate decisions throughout the system was avoiding unnecessary infrastructure complexity.

We relied mostly on existing operational building blocks:

ECS for long-running websocket consumers and APIs
Redis for realtime state and querying
Aurora PostgreSQL for static metadata

There were no dedicated stream processors, event buses, or distributed processing pipelines in the middle.

That decision was partly architectural and partly operational. Systems that process realtime data tend to become complicated very quickly, especially when every new scaling concern introduces another moving piece.

In our case, Redis already solved the core problem effectively enough that introducing additional infrastructure would have added more operational burden than actual value.

6. Lessons Learned

A few things stood out while building this system.

Real-time APIs are mostly data locality problems

The biggest latency improvements did not come from optimizing Python or scaling infrastructure. They came from reducing unnecessary data movement between Redis and the application layer.

Moving computation closer to the data mattered far more than expected.

Redis can be much more than a cache

A lot of systems use Redis as a temporary caching layer sitting in front of a “real” database.

In this case, Redis became the actual realtime query engine:

mutable live state
sorted rankings
pagination
near realtime querying

That shift simplified the architecture considerably.

Streaming and querying are different problems

WebSockets solved live delivery.

Redis solved queryability.

Those two concerns are often grouped together under “real-time systems,” but they behave very differently operationally.

Simplicity scales surprisingly far

We intentionally avoided introducing additional distributed systems complexity unless it became absolutely necessary.

Just Redis, Python, ECS, and existing infrastructure.

Not every optimization needs to happen inside Redis

One interesting thing we never moved into Redis was asset search functionality.

Searching by asset names or symbols still happened entirely in Python using pandas-based processing over the full dataset. We did consider implementing search logic directly inside Redis through Lua scripts, but unlike sorting and pagination, this never became a meaningful bottleneck in practice.

Even operating on the full dataset, the latency remained acceptable for our workload, so adding more complexity simply was not worth it.

That was another useful reminder:

Not every bottleneck deserves architectural complexity.

Thanks for reading.
Feel free to connect with me on LinkedIn and check out some of my work on GitHub.

DEV Community