Gigaflow: Pipeline-Aware Sub-Traversal Caching for Modern SmartNICs

#networking

This was originally posted on Dangling Pointers. My goal is to help busy people stay current with recent academic developments. Head there to subscribe for regular summaries of computer science research.

Gigaflow: Pipeline-Aware Sub-Traversal Caching for Modern SmartNICs Annus Zulfiqar, Ali Imran, Venkat Kunaparaju, Ben Pfaff, Gianni Antichi, and Muhammad Shahbaz ASPLOS'25

Virtual Switch

A virtual switch (vSwitch) routes network traffic to and from virtual machines. Section 2.1 of the paper describes the historical development of vSwitch technology, ending with a pipeline of match-action tables (MATs). A match-action table is a data-driven way to configuration a vSwitch, comprising matching rules, and associated actions to take when a matching packet is encountered. When a packet arrives at the vSwitch, it traverses the full pipeline of match-action tables. At each pipeline stage, header fields from the packet are used to perform a lookup into a match action table. If a match is found, then the packet is modified according to the actions found in the table.

Megaflow

Megaflow is prior work which memoizes the full pipeline of MATs. The memoization data structure is treated like a cache. When a packet arrives, a cache lookup occurs. On a miss, the regular vSwitch implementation is called to transform the packet. Subsequent packets which hit in the cache avoid executing the vSwitch code entirely. Megaflow supports keys with wildcards, to allow one cache entry to serve multiple flows.

A problem with Megaflow is that even with wildcards, a large cache is needed to achieve a high hit rate. For throughput reasons, one may wish to place a Megaflow cache in on-chip memory on a SmartNIC. However, if the SmartNIC does not have enough on-chip memory to achieve a high hit rate, then throughput suffers. See this post, and this post for a description of SmartNIC architectures and their on-chip memories.

Gigaflow

This paper introduces a different memoization scheme (Gigaflow) to make better use of SmartNIC memory. Rather than memoizing the entire vSwitch pipeline for a packet, Gigaflow divides the vSwitch pipeline into multiple smaller pipelines, and memoizes each one separately. Fig. 1 illustrates this:

Source: https://dl.acm.org/doi/10.1145/3676641.3716000

Gigaflow takes advantage of SmartNICs’ ability to perform many table lookups per packet while maintaining high throughput. The total working set in a typical workload is reduced, because many flows can share some table entries (rather than all-or-nothing sharing).

Another way to think about this is that Megaflow combines all of the MATs in a vSwitch pipeline into one very large table, whereas Gigaflow partitions the MAT vSwitch pipeline into a handful of sub-pipelines, and combines each sub-pipeline into a medium-sized table.

Sections 4.1.1 and 4.2.2 of the paper have the nitty-gritty details of how Gigaflow decides how to correctly assign subsets of the vSwitch MAT pipeline into a set of tables.

Results

Fig. 9 shows cache misses for Megaflow and Gigaflow for a variety of benchmarks:

Source: https://dl.acm.org/doi/10.1145/3676641.3716000

Dangling Pointers

Memoization is useful in settings outside of networking. It would be interesting to see if the idea of separable memoization could be applied to other applications.

Like I mentioned here, hardware support for memoization in general purpose CPUs seems compelling.