Skip to content

DEV Community

NexusQuant Series' Articles

Back to João André Gomes Marques's Series

Cover image for Compress your LLM's KV cache 33x with zero training

João André Gomes Marques

Apr 7

Compress your LLM's KV cache 33x with zero training

#python #machinelearning #llm #opensource

2 min read

Cover image for Why E8 lattice quantization beats scalar quantization for KV caches

João André Gomes Marques

Apr 7

Why E8 lattice quantization beats scalar quantization for KV caches

#python #machinelearning #math #llm

2 min read

Cover image for Longer contexts are easier to compress (not harder)

João André Gomes Marques

Apr 7

Longer contexts are easier to compress (not harder)

#python #machinelearning #llm #performance

2 min read

Cover image for NexusQuant benchmarks: every number, honestly

João André Gomes Marques

Apr 7

NexusQuant benchmarks: every number, honestly

#machinelearning #llm #performance #opensource

5 min read

Cover image for NexusQuant vs KVTC vs TurboQuant vs CommVQ — honest comparison

João André Gomes Marques

Apr 7

NexusQuant vs KVTC vs TurboQuant vs CommVQ — honest comparison

#machinelearning #llm #performance #benchmark

4 min read

Cover image for How to deploy NexusQuant in production (and what's missing)

João André Gomes Marques

Apr 7

How to deploy NexusQuant in production (and what's missing)

#machinelearning #llm #production #python

4 min read