When building fast in-memory data systems, the speed of data retrieval is critical. Hash tables provide O(1) time for point lookups, making them ideal for finding single items. However, they do not naturally support range queries—such as retrieving all keys between “apple” and “banana”. For that, ordered indexes like B+ trees or skip lists are commonly used, but they usually come with O(log N) lookup cost, which can become expensive at scale.
Tries offer an alternative where lookup time depends on key length (L) rather than the number of keys (N). While this can help with certain workloads, tries often consume more memory and may suffer from fragmented layouts, making them less efficient for real-world applications where keys can be moderately or significantly long.
Wormhole: A Fast Ordered Index for In-memory Data Management
Wormhole was introduced in 2019 by Xingbo Wu, Fan Ni, and Song Jiang, in their research paper "Wormhole: A Fast Ordered Index for In-memory Data Management", to bridge the gap between hash tables and traditional ordered indexes. It provides lookup performance close to that of hash tables while still supporting ordered operations such as range and prefix scans.
Wormhole achieves lookups in O(log L), where L is the length of the key. For typical key sizes, this is close to constant time. This is a notable improvement over the O(log N) complexity of B+ trees or the O(L) complexity of traditional tries.
How Wormhole Works Under the Hood
Wormhole organizes its index into two main components: MetaTrieHT and LeafList.
The MetaTrieHT provides a highly optimized implementation of a logical trie, storing key prefixes to enable fast prefix-based lookups and reduce search complexity. This design minimizes unnecessary traversals while keeping memory use efficient.
Once the correct prefix or range starting point is found via MetaTrieHT, the LeafList (a doubly linked list of sorted leaf nodes) handles the actual keys and values. This structure enables efficient point lookups, range scans, and prefix scans while reducing cache misses and maintaining predictable performance.
Introducing Wormhole4j: A Java Implementation
Wormhole4j is a Java implementation of Wormhole designed for high-performance in-memory ordered indexes. It delivers very fast scan()
performance for prefix and range scans, and fast point lookups via get()
, along with efficient put()
and delete()
operations.
Currently, Wormhole4j supports only String keys and is not thread-safe. If you plan to use it in multi-threaded environments, you will need to implement external synchronization.
Benchmark Overview
In tests with 100,000 records and keys up to 128 characters, Wormhole4j has demonstrated significant performance improvements over standard java.util.TreeMap
(Red-Black tree) and it.unimi.dsi.fastutil.objects.Object2ObjectAVLTreeMap
(AVL tree), especially for scans.
Operation | Data structure | Average throughput (operations/sec) |
---|---|---|
Get | Wormhole | 3,846,722 |
Red-Black tree map | 3,184,843 | |
AVL tree map | 3,257,535 | |
Scan | Wormhole | 595,975 |
Red-Black tree map | 180,441 | |
AVL tree map | 96,414 |
Future Directions
Wormhole4j will evolve with features such as persistence for durability and thread-safe variants for concurrent workloads.
Try Wormhole4j
Add Wormhole4j to your project from Maven Central:
implementation("org.komamitsu:wormhole4j:0.1.0")
Source code and documentation are available on GitHub.
Feedback, issues, and contributions are welcome.
Top comments (0)