DEV Community

Cover image for Dash: A Revolutionary Hashing Technique for Efficient Data Storage on Persistent Memory
Rain Leander
Rain Leander

Posted on

Dash: A Revolutionary Hashing Technique for Efficient Data Storage on Persistent Memory

Dash is a novel hash table design that was introduced in a paper published at the 2017 ACM Symposium on Operating Systems Principles (SOSP). The paper was authored by Da Zheng, Xiang Li, Chang Lou, et al.

Dash was designed to address the scalability and performance issues of existing hash table designs. In particular, the authors aimed to improve the performance of hash tables on modern hardware with large memories, multiple CPUs, and non-volatile memory.

The Dash hash table design is based on the concept of "split-ordered lists". In traditional hash table designs, collisions between keys are resolved by storing them in a linked list. However, linked lists suffer from poor cache performance, especially when the number of collisions is high. Dash addresses this issue by splitting each linked list into several smaller lists, each of which is ordered by the hash value of its keys. This allows for better cache utilization, as well as more efficient search and insertion operations.

One of the unique features of Dash is its ability to grow and shrink dynamically, without requiring expensive rehashing operations. When the hash table grows, new split-ordered lists are added to accommodate the additional keys. When the hash table shrinks, split-ordered lists are merged to free up memory. This makes Dash well-suited for applications with dynamic workloads that require frequent resizing of the hash table.

Dash also supports efficient scan operations, which are essential for tasks such as garbage collection and database backups. The authors achieved this by using a stateless scan operation that allows for traversing the hash table without modifying its state.

Finally, Dash was designed to take advantage of persistent memory, which is a type of non-volatile memory that retains its contents even after a power loss. By using persistent memory, Dash can achieve faster recovery times and improved fault tolerance.

Overall, Dash is a promising new hash table design that offers significant improvements in scalability, performance, and efficiency over existing designs. Its novel approach to splitting ordered lists and dynamic resizing, coupled with support for persistent memory, make it an ideal choice for modern hardware architectures.

Top comments (0)