Why Why Why Wyhash could be a valuable tool in your toolbox

#stealbackfromai #java #zig

I recently went down the rabbit hole of non-cryptographic hash functions and emerged with a new favorite: Wyhash. It's not just another hash algorithm; it's a tiny beast that has found its way into modern high-performance environments like Zig and Bun.

Here is why I think it deserves a spot in your developer toolbox.

Of series of #stealbackfromai

The Need for Speed on Modern Hardware

We are living in a 64-bit world, yet many legacy hash functions (like MurmurHash3) were designed when 32-bit operations were king. Wyhash is unapologetically modern.

It is designed efficiently for 64-bit processors, leveraging native 64x64 -> 128-bit multiplication. This allows it to process data in larger chunks and achieve incredibly high throughput without needing complex, platform-specific SIMD instructions (like AVX or NEON). It’s portable speed.

Quality You Can Trust

Speed is nothing without quality. "Fast" hash functions often fail to pass stringent statistical tests, leading to collisions and poor distribution in hash maps.

Wyhash passes Smhasher, the gold standard test suite for non-cryptographic hash functions. It produced no bias and no collisions in the entire battery of tests. This makes it safe for hash tables, bloom filters, and other probabilistic data structures where collision resistance matters.

Simplicity is the Ultimate Sophistication

The algorithm is surprisingly simple. The core implementation fits on a single screen. It uses a "multiply and mix" (MUM) strategy:

Initialize state with a secret and seed.
Consume input in 48-byte blocks.
Mix the state using 128-bit multiplication products XORed into 64-bit results.
Finalize with a last round of mixing.

That's it. No complex S-boxes, no massive lookup tables.

Learning Java 18+ along the way

One of the hurdles in porting Wyhash to Java is that it relies heavily on unsigned 128-bit math. Java, historically, has been strictly signed.

In the past, we had to use "tricks" to emulate unsigned multiplication, often involving multiple operations and careful bit-shifting.

However, while implementing this, found out that Java 18 introduced Math.unsignedMultiplyHigh(long x, long y). This intrinsic method maps directly to the hardware instructions (like MUL on x86_64) that produce the upper 64 bits of a 128-bit product.

The Power of the Default Seed

We often think of seeds as something that must be random to prevent HashDoS attacks. And for hash maps receiving untrusted input, that is true.

But Wyhash could serve another powerful purpose: Compatibility.

By using the standard "default secret" (originating from the C++ implementation) and a fixed seed (like 0), Wyhash becomes a deterministic, cross-language checksum.

You can hash a file in Zig, send it over the wire, and verify it in Java or JavaScript (Bun) using the exact same algorithm. This is incredibly to me personally, and you may have totally different requirements. Sadly it is not given that same variant is implemented in different languages. So if you are aiming for compatibility you yourself must make sure same variant is used.

A Note on Versions and Compatibility

Wyhash has evolved through several versions (v1, v2, v3/final3, v4). While the algorithm's core principles remain the same, different versions produce different hash outputs for the same input. From what I was able to research, and my own preferences v3/final3 looks to be the way to go.

Java Port & Zig: This Java implementation is a direct port of the version used in the Zig standard library (std.hash.Wyhash), effectively the final3 (v3) variant. This ensures binary compatibility with Zig applications.
Bun: Bun.hash is directly compatible with Zig's std.hash.Wyhash. Because Bun is written in Zig, its Bun.hash function is a direct binding to the Zig standard library's implementation.
Go 1.17+: Not compatible. Uses a "Wyhash-inspired" fallback implementation for its map hashing. It is not a direct port and will not be binary compatible with standard Wyhash implementations.
@pencroff-lab/wyhash-ts: provides a specific wyhash_bun function to explicitly guarantee this Zig-compatible behavior in other Node.js environments.

So, for cross-language checksums, ensure both sides are speaking the same "dialect" (version) of Wyhash and using the same seed.

The "Frozen" Problem: Zig values stability in its standard library. Because std.hash is often used for persistent data (like hash maps or disk-backed caches), the Zig team is extremely hesitant to update the implementation to the latest C version (v4.2). Doing so would change the hash output for the same input/seed, silently corrupting data for anyone relying on consistent hashes across Zig versions.

Why a Standalone Implementation?

Initially, I looked at existing libraries like Hash4j. It's a fantastic library, but it's a large dependency if you only need one algorithm.

Sometimes, the best dependency is the one you don't add.

Wyhash represents a sweet spot in modern engineering: it's fast, simple, high-quality, and portable. If you need a hash function for your next tool, give it a look.