Skip to content
Navigation menu
Search
Powered by Algolia
Search
Log in
Create account
DEV Community
Close
NexusQuant Series' Articles
Back to João André Gomes Marques's Series
Compress your LLM's KV cache 33x with zero training
João André Gomes Marques
João André Gomes Marques
João André Gomes Marques
Follow
Apr 7
Compress your LLM's KV cache 33x with zero training
#
python
#
machinelearning
#
llm
#
opensource
Comments
Add Comment
2 min read
Why E8 lattice quantization beats scalar quantization for KV caches
João André Gomes Marques
João André Gomes Marques
João André Gomes Marques
Follow
Apr 7
Why E8 lattice quantization beats scalar quantization for KV caches
#
python
#
machinelearning
#
math
#
llm
Comments
Add Comment
2 min read
Longer contexts are easier to compress (not harder)
João André Gomes Marques
João André Gomes Marques
João André Gomes Marques
Follow
Apr 7
Longer contexts are easier to compress (not harder)
#
python
#
machinelearning
#
llm
#
performance
Comments
Add Comment
2 min read
NexusQuant benchmarks: every number, honestly
João André Gomes Marques
João André Gomes Marques
João André Gomes Marques
Follow
Apr 7
NexusQuant benchmarks: every number, honestly
#
machinelearning
#
llm
#
performance
#
opensource
Comments
Add Comment
5 min read
NexusQuant vs KVTC vs TurboQuant vs CommVQ — honest comparison
João André Gomes Marques
João André Gomes Marques
João André Gomes Marques
Follow
Apr 7
NexusQuant vs KVTC vs TurboQuant vs CommVQ — honest comparison
#
machinelearning
#
llm
#
performance
#
benchmark
Comments
Add Comment
4 min read
How to deploy NexusQuant in production (and what's missing)
João André Gomes Marques
João André Gomes Marques
João André Gomes Marques
Follow
Apr 7
How to deploy NexusQuant in production (and what's missing)
#
machinelearning
#
llm
#
production
#
python
Comments
Add Comment
4 min read
We're a place where coders share, stay up-to-date and grow their careers.
Log in
Create account