Pushing the Limits of In-Network Caching for Key-Value Stores

#networking

This was originally posted on Dangling Pointers. My goal is to help busy people stay current with recent academic developments. Head there to subscribe for regular summaries of computer science research.

Pushing the Limits of In-Network Caching for Key-Value Stores Gyuyeong Kim NSDI'25

Load Balancing Cache

I generally think of a cache as serving the purpose of load reduction, (e.g., reducing the total number of DRAM accesses or backend server requests). The purpose here is different: load balancing. This paper builds on prior work which defined the small cache effect:

small cache effect: we can balance loads for N servers (or partitions) by caching the O(N log N) hottest items, regardless of the number of items

Imagine a key-value store with keys sharded across multiple backend servers. By configuring the network switches (which are already present in the network) correctly, read requests for hot keys can be handled entirely by the switch, which balances the load across the backend servers.

Everything Looks Like a Nail

The core of this paper is a creative approach to configuring a reconfigurable match table (RMT) switch to act as a load balancing cache. The RMT architecture contains a pipeline of stages, where each stage has access to dedicated SRAM and TCAM memories. The specific switch used in this paper is an Intel Tofino.

The natural thing to try is to store the hot items in the SRAM/TCAM associated with each stage. When a cache read request packet arrives at the switch, the packet can flow through the RMT pipeline searching for matches (TCAM should be very helpful). That is not the approach taken in this work. Section 2 of the paper goes into depth about the drawbacks of this approach (e.g., handling variable length keys and values).

Here is the punchline: store read requests in switch memory. When a read request arrives at the switch, the switch determines if the request is for a hot item. If so, it searches for an empty spot in on-chip memory and stores the request there. Hot items are not stored in switch memory, rather they are continuously recycled through the switch. It is like there is another component in the network which is continuously sending packets to the switch which represent the hot (key, value) pairs. No such component is necessary however, because switches have the ability to recycle packets back through themselves.

This cache is called OrbitCache, because you can think of the recycled packets as moons orbiting a planet (the switch).

When a new (key, value) pair is cached, a (fixed length) hash of the key is stored in the lookup table (in switch memory), and the (key, value) pair is continuously recycled through the switch.
When a read request arrives at the switch, a hash of the key is used to check the lookup table for a match. If there is a match (cache hit), then the read request is stored in switch memory. If there is no match (cache miss), then the read request is forwarded to the appropriate backend server.
When a cached (key, value) pair is recycled through the switch, a hash of the key is used to check if there are pending read requests in cache memory. If so, for each pending read request, a response packet is generated and sent back to the client.

Variable length keys are handled by hashing all key bytes down to a fixed length. Hash collisions are detected and handled by the client (this assumes that collisions are rare).

Variable length values are naturally handled by all of the networking protocol support for variable length packets (up to an MTU).

Section 3 of the paper goes into more details (e.g., how cache coherence is maintained).

Results

Fig. 13 has results for Twitter workloads. NetCache is prior work that stores cached data in switch memory. NetCache doesn’t perform as well because it cannot cache all items due to key/value size limits.

Source: https://www.usenix.org/system/files/nsdi25-kim.pdf

Dangling Pointers

It seems like Tofino is a great networking research platform. I imagine many academics were saddened when Intel exited the network switch business. It seems like the world could use a research platform for switches ala RAMP.

Switches can contain DRAM, I wonder how well it could be used as a larger (and slower) cache.