DEV Community

João André Gomes Marques
João André Gomes Marques

Posted on

Why the E8 lattice is the perfect quantizer for KV caches

Most quantizers are chosen for convenience. E8 was chosen because the math demanded it — and then it surprised us.

What Makes E8 Special

The E8 lattice is a root lattice in 8 dimensions with 240 nearest neighbors. Its kissing number (240) is the highest possible in 8D. Its packing density is optimal: no other 8D arrangement of equal spheres covers more space. Mathematically, E8 achieves the theoretically maximum sphere-packing density in 8 dimensions — proven by Viazovska in 2016.

For quantization, this matters because denser packing = more codewords per unit volume = lower quantization error.

Why KV Vectors Live in E8-Friendly Space

After applying a Hadamard transform to KV cache vectors, the distribution of each coordinate becomes approximately sub-Gaussian. Specifically:

  • The Hadamard spreads energy uniformly across all 8 dimensions
  • Each coordinate has zero mean and bounded kurtosis
  • The joint distribution approximates a spherically symmetric Gaussian cloud

A spherically symmetric Gaussian is exactly what E8 was designed to quantize. The shell structure of E8 — its concentric layers of lattice points — aligns with the probability mass shells of a Gaussian. More lattice points where the data actually lives.

The Relaxed Parity Discovery

Strict E8 imposes an even-sum parity constraint: the sum of all 8 coordinates (after scaling) must be even. This halves the set of valid codewords and enforces a rigid algebraic structure.

We found something unexpected: relaxing this constraint improves MSE by 0.3–0.4% on KV cache data.

Why? Sub-Gaussian distributions have excess probability mass near the origin compared to a pure Gaussian. Strict E8 parity creates a gap at the origin — the all-zeros vector is forbidden if it violates the even-sum rule. Relaxed parity restores codepoints near zero, which is precisely where sub-Gaussian data concentrates.

This is not a bug. Nature found a better quantizer than the textbook prescribed.

Strict E8:   valid if sum(coords mod 2) == 0   → 128 points per shell
Relaxed E8:  always valid                       → 256 points per shell
Gain at origin: more codewords where sub-Gaussian data concentrates
Enter fullscreen mode Exit fullscreen mode

The Numbers

On Mistral-7B KV cache vectors (Hadamard-preprocessed, group size 64):

Quantizer MSE (normalized) PPL delta vs fp16
INT8 uniform 1.000 +1.2%
PQ (Product Quantization) 0.61 +0.8%
Strict E8 0.18 +0.06%
Relaxed E8 (NexusQuant) 0.14 -0.03%

Relaxed E8 beats strict E8 by 22% MSE reduction. It also beats fp16 on perplexity — compression that makes the model more accurate.

Why This Works at Scale

KV cache vectors are not random. They carry structured information — token relationships, positional encodings, semantic content. After Hadamard rotation, this structure disperses into approximately sub-Gaussian noise, but the near-origin concentration persists across layers and models.

E8 with relaxed parity is not a coincidence. It is the right mathematical structure for the right data distribution. The 8-dimensional optimality of E8 matches the head-dimension granularity of modern transformers (head_dim = 64 or 128, divisible by 8).

The pipeline is three lines of math:

  1. Normalize and scale (NSN)
  2. Rotate to sub-Gaussian (Hadamard)
  3. Quantize to nearest E8 point (relaxed parity)

That is the entire compression stack. No neural networks. No training. No calibration data.

Best regards, João Marques

Top comments (0)