DEV Community

NexusQuant Series' Articles

Back to João André Gomes Marques's Series
Compress your LLM's KV cache 33x with zero training

Compress your LLM's KV cache 33x with zero training

Comments
2 min read
Why E8 lattice quantization beats scalar quantization for KV caches

Why E8 lattice quantization beats scalar quantization for KV caches

Comments
2 min read
Longer contexts are easier to compress (not harder)

Longer contexts are easier to compress (not harder)

Comments
2 min read
NexusQuant benchmarks: every number, honestly

NexusQuant benchmarks: every number, honestly

Comments
5 min read
NexusQuant vs KVTC vs TurboQuant vs CommVQ — honest comparison

NexusQuant vs KVTC vs TurboQuant vs CommVQ — honest comparison

Comments
4 min read
How to deploy NexusQuant in production (and what's missing)

How to deploy NexusQuant in production (and what's missing)

Comments
4 min read