<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>DEV Community: Gauthamram Ravichandran</title>
    <description>The latest articles on DEV Community by Gauthamram Ravichandran (@gauthamram_ravichandran).</description>
    <link>https://dev.to/gauthamram_ravichandran</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F3773589%2Fc6bccf24-81cd-49cc-8fa4-973a02e80fca.png</url>
      <title>DEV Community: Gauthamram Ravichandran</title>
      <link>https://dev.to/gauthamram_ravichandran</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://dev.to/feed/gauthamram_ravichandran"/>
    <language>en</language>
    <item>
      <title>Real-Time APIs Are Simpler Than You Think: Redis, Lua, and 4k Updates/sec</title>
      <dc:creator>Gauthamram Ravichandran</dc:creator>
      <pubDate>Sat, 16 May 2026 17:35:58 +0000</pubDate>
      <link>https://dev.to/gauthamram_ravichandran/real-time-apis-are-simpler-than-you-think-redis-lua-and-4k-updatessec-30o3</link>
      <guid>https://dev.to/gauthamram_ravichandran/real-time-apis-are-simpler-than-you-think-redis-lua-and-4k-updatessec-30o3</guid>
      <description>&lt;h2&gt;
  
  
  1. Intro — The Problem We Actually Had
&lt;/h2&gt;

&lt;p&gt;Building real-time systems is often presented as a distributed systems problem.&lt;/p&gt;

&lt;p&gt;Kafka, stream processors, event buses, fanout pipelines, multiple caches — the architecture diagrams usually become complicated very quickly.&lt;/p&gt;

&lt;p&gt;But the problem we were trying to solve was actually much simpler.&lt;/p&gt;

&lt;p&gt;We were ingesting thousands of live crypto price updates per second from exchange WebSocket streams. The frontend already consumed those streams directly for ultra-low latency updates. What users still needed, however, was a fast REST API capable of serving:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;sortable market data&lt;/li&gt;
&lt;li&gt;filtered leaderboards&lt;/li&gt;
&lt;li&gt;paginated live results&lt;/li&gt;
&lt;li&gt;near real-time prices&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Streaming data is one thing. Querying live data efficiently is another.&lt;/p&gt;

&lt;p&gt;Our initial attempts with PostgreSQL quickly became difficult under constant high-frequency writes combined with sorted read-heavy workloads. At the same time, we wanted to avoid introducing unnecessary infrastructure complexity.&lt;/p&gt;

&lt;p&gt;We didn’t want Kafka, distributed stream processors, or elaborate event-driven systems unless we absolutely needed them.&lt;/p&gt;

&lt;p&gt;What we eventually built was much smaller, much simpler, and surprisingly fast.&lt;/p&gt;

&lt;h2&gt;
  
  
  2. Redis as the Realtime State Layer
&lt;/h2&gt;

&lt;p&gt;The ingestion side of the system was fairly straightforward. ECS services maintained persistent WebSocket connections to multiple exchanges, continuously consuming live market updates at roughly 3k–4k messages per second.&lt;/p&gt;

&lt;p&gt;Before storing anything, the incoming data was normalized into a consistent internal format. Different exchanges exposed different payload structures, symbol formats, and price representations, so having a normalized layer early in the pipeline simplified everything downstream.&lt;/p&gt;

&lt;p&gt;The frontend still connected directly to exchange WebSockets for ultra-low latency updates. We were not trying to replace streaming.&lt;/p&gt;

&lt;p&gt;Instead, we needed a queryable real-time state layer.&lt;/p&gt;

&lt;p&gt;WebSockets solved streaming. Redis solved queryability.&lt;/p&gt;

&lt;p&gt;Redis became the central live data store for the API layer:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Strings stored mutable market payloads and price data&lt;/li&gt;
&lt;li&gt;Sorted sets powered rankings and leaderboards&lt;/li&gt;
&lt;li&gt;APIs queried Redis directly for near real-time market views&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;This model fit the workload surprisingly well.&lt;/p&gt;

&lt;p&gt;The system was heavily read-oriented, latency-sensitive, and constantly mutating. Users wanted sorted and filterable market views, while prices were changing continuously underneath. Redis handled this naturally without introducing additional infrastructure layers or complicated synchronization logic.&lt;/p&gt;

&lt;p&gt;More importantly, Redis allowed us to separate two very different concerns:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;streaming live updates to clients&lt;/li&gt;
&lt;li&gt;querying live state efficiently&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;That distinction simplified the overall architecture considerably.&lt;/p&gt;

&lt;h2&gt;
  
  
  3. The Mistake: Doing Processing in Python
&lt;/h2&gt;

&lt;p&gt;Our initial implementation was much simpler than the final version.&lt;/p&gt;

&lt;p&gt;At first, we were not fully leveraging Redis data structures like sorted sets. Most of the live market data was stored using hashes and strings, while the actual sorting, filtering, and pagination logic lived inside the Python API layer.&lt;/p&gt;

&lt;p&gt;The request flow looked something like this:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Fetch large datasets from Redis&lt;/li&gt;
&lt;li&gt;Pull them across the network into Python&lt;/li&gt;
&lt;li&gt;Perform sorting and filtering in-memory&lt;/li&gt;
&lt;li&gt;Paginate the results&lt;/li&gt;
&lt;li&gt;Return a much smaller response payload to the client&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;Conceptually, this felt reasonable at the time. Python gave us flexibility, and the implementation was straightforward to iterate on.&lt;/p&gt;

&lt;p&gt;But as traffic increased, the inefficiency became obvious.&lt;/p&gt;

&lt;p&gt;The issue was not that Redis was slow.&lt;/p&gt;

&lt;p&gt;The issue was not that Python was slow either.&lt;/p&gt;

&lt;p&gt;The real bottleneck was network transfer.&lt;/p&gt;

&lt;p&gt;For every request, we were moving far more data than we actually needed. Even if the client only requested a small paginated result set, the API layer still had to fetch and process significantly larger datasets before trimming them down.&lt;/p&gt;

&lt;p&gt;At small scale, this overhead was easy to ignore.&lt;/p&gt;

&lt;p&gt;At thousands of requests per second, it became expensive very quickly.&lt;/p&gt;

&lt;p&gt;This was the point where we started treating Redis less like a simple cache and more like a computation layer. Instead of bringing data to Python for processing, we started asking a different question:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;How much of this work can happen directly inside Redis itself?&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;That shift ended up changing the entire performance profile of the system.&lt;/p&gt;

&lt;h2&gt;
  
  
  4. Moving Logic into Lua
&lt;/h2&gt;

&lt;p&gt;The biggest improvement came from pushing more query logic directly into Redis using Lua scripts.&lt;/p&gt;

&lt;p&gt;Instead of fetching large datasets into Python and processing them there, Redis handled:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;sorting&lt;/li&gt;
&lt;li&gt;pagination&lt;/li&gt;
&lt;li&gt;slicing&lt;/li&gt;
&lt;li&gt;partial filtering&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The API layer now received only the small subset of records actually needed for the response.&lt;/p&gt;

&lt;p&gt;This drastically reduced network transfer between Redis and the application layer, which ended up having a much bigger impact than micro-optimizing Python itself.&lt;/p&gt;

&lt;p&gt;The important realization was simple:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;We stopped moving unnecessary data across the network.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;Once we made that shift, latency dropped significantly and the overall system became much more efficient without adding architectural complexity.&lt;/p&gt;

&lt;h2&gt;
  
  
  5. Keeping the Architecture Operationally Simple
&lt;/h2&gt;

&lt;p&gt;One of the deliberate decisions throughout the system was avoiding unnecessary infrastructure complexity.&lt;/p&gt;

&lt;p&gt;We relied mostly on existing operational building blocks:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;ECS for long-running websocket consumers and APIs&lt;/li&gt;
&lt;li&gt;Redis for realtime state and querying&lt;/li&gt;
&lt;li&gt;Aurora PostgreSQL for static metadata&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;There were no dedicated stream processors, event buses, or distributed processing pipelines in the middle.&lt;/p&gt;

&lt;p&gt;That decision was partly architectural and partly operational. Systems that process realtime data tend to become complicated very quickly, especially when every new scaling concern introduces another moving piece.&lt;/p&gt;

&lt;p&gt;In our case, Redis already solved the core problem effectively enough that introducing additional infrastructure would have added more operational burden than actual value.&lt;/p&gt;

&lt;h2&gt;
  
  
  6. Lessons Learned
&lt;/h2&gt;

&lt;p&gt;A few things stood out while building this system.&lt;/p&gt;

&lt;h3&gt;
  
  
  Real-time APIs are mostly data locality problems
&lt;/h3&gt;

&lt;p&gt;The biggest latency improvements did not come from optimizing Python or scaling infrastructure. They came from reducing unnecessary data movement between Redis and the application layer.&lt;/p&gt;

&lt;p&gt;Moving computation closer to the data mattered far more than expected.&lt;/p&gt;

&lt;h3&gt;
  
  
  Redis can be much more than a cache
&lt;/h3&gt;

&lt;p&gt;A lot of systems use Redis as a temporary caching layer sitting in front of a “real” database.&lt;/p&gt;

&lt;p&gt;In this case, Redis became the actual realtime query engine:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;mutable live state&lt;/li&gt;
&lt;li&gt;sorted rankings&lt;/li&gt;
&lt;li&gt;pagination&lt;/li&gt;
&lt;li&gt;near realtime querying&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;That shift simplified the architecture considerably.&lt;/p&gt;

&lt;h3&gt;
  
  
  Streaming and querying are different problems
&lt;/h3&gt;

&lt;p&gt;WebSockets solved live delivery.&lt;/p&gt;

&lt;p&gt;Redis solved queryability.&lt;/p&gt;

&lt;p&gt;Those two concerns are often grouped together under “real-time systems,” but they behave very differently operationally.&lt;/p&gt;

&lt;h3&gt;
  
  
  Simplicity scales surprisingly far
&lt;/h3&gt;

&lt;p&gt;We intentionally avoided introducing additional distributed systems complexity unless it became absolutely necessary.&lt;/p&gt;

&lt;p&gt;Just Redis, Python, ECS, and existing infrastructure.&lt;/p&gt;

&lt;h3&gt;
  
  
  Not every optimization needs to happen inside Redis
&lt;/h3&gt;

&lt;p&gt;One interesting thing we never moved into Redis was asset search functionality.&lt;/p&gt;

&lt;p&gt;Searching by asset names or symbols still happened entirely in Python using pandas-based processing over the full dataset. We did consider implementing search logic directly inside Redis through Lua scripts, but unlike sorting and pagination, this never became a meaningful bottleneck in practice.&lt;/p&gt;

&lt;p&gt;Even operating on the full dataset, the latency remained acceptable for our workload, so adding more complexity simply was not worth it.&lt;/p&gt;

&lt;p&gt;That was another useful reminder:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;Not every bottleneck deserves architectural complexity.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;Thanks for reading.&lt;br&gt;
Feel free to connect with me on &lt;a href="https://www.linkedin.com/in/gauthamramravichandran/" rel="noopener noreferrer"&gt;LinkedIn&lt;/a&gt; and check out some of my work on &lt;a href="https://github.com/GauthamramRavichandran" rel="noopener noreferrer"&gt;GitHub&lt;/a&gt;.&lt;/p&gt;

</description>
      <category>redis</category>
      <category>lua</category>
      <category>python</category>
      <category>showdev</category>
    </item>
    <item>
      <title>Database Partitioning Lessons That Apply Directly to Embedding Storage</title>
      <dc:creator>Gauthamram Ravichandran</dc:creator>
      <pubDate>Sun, 15 Feb 2026 13:21:32 +0000</pubDate>
      <link>https://dev.to/gauthamram_ravichandran/database-partitioning-lessons-that-apply-directly-to-embedding-storage-2i71</link>
      <guid>https://dev.to/gauthamram_ravichandran/database-partitioning-lessons-that-apply-directly-to-embedding-storage-2i71</guid>
      <description>&lt;p&gt;I’ve worked on large partitioned PostgreSQL systems (tens of TBs, heavy ingestion, high query fan-out).&lt;/p&gt;

&lt;p&gt;One thing that’s been interesting while diving into AI infrastructure:&lt;br&gt;
Embedding storage systems repeat many of the same distributed storage lessons we already learned in relational systems.&lt;br&gt;
The difference? The failure modes are just harder to see.&lt;br&gt;
Let’s break down a few parallels.&lt;/p&gt;

&lt;h2&gt;
  
  
  1. Hot Partitions Kill Performance (Even in Vector Systems)
&lt;/h2&gt;

&lt;p&gt;In large time-series databases, recent partitions get hammered:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Most writes go to the newest partition&lt;/li&gt;
&lt;li&gt;Most reads target recent data&lt;/li&gt;
&lt;li&gt;Vacuum and index maintenance concentrate there&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;You get a write hotspot and a read hotspot at the same time.&lt;/p&gt;

&lt;p&gt;Embedding systems have similar patterns:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Frequently queried namespaces&lt;/li&gt;
&lt;li&gt;Recently ingested documents&lt;/li&gt;
&lt;li&gt;Popular tenants&lt;/li&gt;
&lt;li&gt;Trending content&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;If all of that lands in a single logical index (or shard), your ANN structure becomes the hotspot. Even HNSW doesn’t save you here. Why?&lt;br&gt;
Because ANN search still depends on memory locality and graph traversal efficiency.&lt;/p&gt;

&lt;p&gt;When a single shard handles most traffic, Cache pressure and Graph traversal depth increases with increased memory fragmentation while CPU/GPU utilisations stays uneven. Partitioning strategy often matters more than the ANN algorithm choice.&lt;/p&gt;

&lt;h2&gt;
  
  
  2. Partition Pruning &amp;gt; Index Cleverness
&lt;/h2&gt;

&lt;p&gt;In large relational systems, performance gains don’t come from better indexes alone. They come from avoiding touching irrelevant data entirely.&lt;br&gt;
Partition pruning reduces the working set before the query planner even considers index scans.&lt;br&gt;
In embedding systems, the equivalent pattern is:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Tenant-level isolation&lt;/li&gt;
&lt;li&gt;Time-based segmentation&lt;/li&gt;
&lt;li&gt;Metadata filtering before vector search&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;If you’re not aggressively reducing the candidate set before running similarity search, you’re just brute-forcing with extra math.&lt;br&gt;
&lt;strong&gt;For example:&lt;/strong&gt;&lt;br&gt;
Instead of:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;Search entire 100M vector space&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;You do:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;Filter tenant + content_type + time range&lt;br&gt;
Then run ANN on the reduced subset&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;Cost per query will be reduced and latency will be improved.&lt;br&gt;
Just classic database engineering.&lt;/p&gt;

&lt;h2&gt;
  
  
  3. Too Many Partitions Is a Problem
&lt;/h2&gt;

&lt;p&gt;Partitioning is powerful (with right partitions count)&lt;br&gt;
In large Postgres systems, too many partitions cause:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;  Increased planning time&lt;/li&gt;
&lt;li&gt;  Metadata bloat&lt;/li&gt;
&lt;li&gt;  Autovacuum lag&lt;/li&gt;
&lt;li&gt;  More file handles&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;In embedding systems:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;  Too many small indexes fragment memory&lt;/li&gt;
&lt;li&gt;  Background rebuild jobs multiply&lt;/li&gt;
&lt;li&gt;  GPU memory utilization becomes uneven&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Sharding everything blindly is not architecture. It’s postponing hard decisions.&lt;/p&gt;

&lt;p&gt;The sweet spot usually involves:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;  Logical grouping&lt;/li&gt;
&lt;li&gt;  Monitoring slow queries&lt;/li&gt;
&lt;li&gt;  Periodic rebalancing&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Exactly like relational systems.&lt;/p&gt;

&lt;h2&gt;
  
  
  4. Re-indexing Cost Is Underestimated
&lt;/h2&gt;

&lt;p&gt;In traditional databases, index rebuilds are expensive and operationally sensitive.&lt;br&gt;
In embedding systems, the cost is even more dangerous.&lt;br&gt;
Because model evolution forces:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;  Full re-embeddings&lt;/li&gt;
&lt;li&gt;  Bulk writes&lt;/li&gt;
&lt;li&gt;  Index rebuilds&lt;/li&gt;
&lt;li&gt;  Graph regeneration&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;If you don’t design storage with model versioning in mind, you’ll eventually hit:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;  Dual-index storage spikes&lt;/li&gt;
&lt;li&gt;  Downtime windows&lt;/li&gt;
&lt;li&gt;  Cost explosions&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;A practical pattern:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;  Store embeddings with model_version column&lt;/li&gt;
&lt;li&gt;  Maintain versioned indexes&lt;/li&gt;
&lt;li&gt;  Gradually phase traffic&lt;/li&gt;
&lt;li&gt;  Garbage collect old versions&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Treat model upgrades like schema migrations.&lt;/p&gt;

&lt;h2&gt;
  
  
  5. Memory Locality Still Wins
&lt;/h2&gt;

&lt;p&gt;ANN performance depends heavily on memory layout.&lt;/p&gt;

&lt;p&gt;Fragmented shards → worse locality → more cache misses → higher tail latency.&lt;/p&gt;

&lt;p&gt;Same principle as B-tree depth, BRIN locality. Even in “AI systems,” performance still collapses into memory access patterns.&lt;/p&gt;

&lt;p&gt;The fundamentals haven’t changed.&lt;/p&gt;

&lt;h2&gt;
  
  
  6. At Scale, Embedding Systems Become Storage Systems
&lt;/h2&gt;

&lt;p&gt;I initially thought embedding systems are purely ML-driven. That’s certainly true for a POC stage.&lt;/p&gt;

&lt;p&gt;Once you go to production, the worry shifts to handling big data efficiently. So like relational systems, there's tuning partition boundaries, managing index lifecycles, optimizing candidate pruning and managing cost per query&lt;/p&gt;

&lt;p&gt;You’re building distributed storage systems. Vector math is just one layer.&lt;/p&gt;

&lt;h2&gt;
  
  
  Final Thought
&lt;/h2&gt;

&lt;p&gt;As embedding systems move from prototypes to production, the conversation shifts.&lt;/p&gt;

&lt;p&gt;From:  &lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;“How do we compute similarity?”&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;To:  &lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;“How do we scale, partition, isolate, and evolve this safely?”&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;The deeper you go, the more familiar the problems look.&lt;/p&gt;

&lt;p&gt;And maybe that’s the exciting part!!!&lt;br&gt;
AI systems are pushing us to revisit distributed systems fundamentals with a new set of constraints.&lt;/p&gt;

&lt;p&gt;Curious how others are approaching partitioning and model versioning in real-world deployments.&lt;/p&gt;

</description>
      <category>database</category>
      <category>architecture</category>
    </item>
  </channel>
</rss>
