Skip to content
Navigation menu
Search
Powered by Algolia
Search
Log in
Create account
DEV Community
Close
#
llm
Follow
Hide
Posts
Left menu
👋
Sign in
for the ability to sort posts by
relevant
,
latest
, or
top
.
Right menu
Running 1M-token context on a single GPU (the math)
João André Gomes Marques
João André Gomes Marques
João André Gomes Marques
Follow
Apr 7
Running 1M-token context on a single GPU (the math)
#
ai
#
gpu
#
llm
#
infrastructure
Comments
Add Comment
2 min read
I benchmarked identity drift across 5 AI agent memory architectures — here's what I found
Mike W
Mike W
Mike W
Follow
Apr 7
I benchmarked identity drift across 5 AI agent memory architectures — here's what I found
#
ai
#
agents
#
python
#
llm
Comments
Add Comment
3 min read
I Read a Paper That Genuinely Made Me Stop and Think — AI is Now Jailbreaking Other AI
Aaryan Shukla
Aaryan Shukla
Aaryan Shukla
Follow
Mar 4
I Read a Paper That Genuinely Made Me Stop and Think — AI is Now Jailbreaking Other AI
#
discuss
#
ai
#
llm
#
machinelearning
Comments
Add Comment
3 min read
One line of Python to extend your LLM's context window 10x
João André Gomes Marques
João André Gomes Marques
João André Gomes Marques
Follow
Apr 7
One line of Python to extend your LLM's context window 10x
#
python
#
machinelearning
#
ai
#
llm
Comments
Add Comment
1 min read
KV cache memory calculator: how much does your LLM actually use?
João André Gomes Marques
João André Gomes Marques
João André Gomes Marques
Follow
Apr 7
KV cache memory calculator: how much does your LLM actually use?
#
llm
#
machinelearning
#
python
#
gpu
Comments
Add Comment
3 min read
Build Your Own AI-Powered Knowledge Base with LLMs and Obsidian
Zafer Dace
Zafer Dace
Zafer Dace
Follow
Apr 7
Build Your Own AI-Powered Knowledge Base with LLMs and Obsidian
#
ai
#
llm
#
productivity
#
tutorial
3
reactions
Comments
Add Comment
6 min read
How Much GPU Memory Does NexusQuant Actually Save?
João André Gomes Marques
João André Gomes Marques
João André Gomes Marques
Follow
Apr 7
How Much GPU Memory Does NexusQuant Actually Save?
#
machinelearning
#
gpu
#
llm
#
python
Comments
Add Comment
4 min read
What I Learned Testing 12 Compression Approaches That Failed
João André Gomes Marques
João André Gomes Marques
João André Gomes Marques
Follow
Apr 7
What I Learned Testing 12 Compression Approaches That Failed
#
machinelearning
#
llm
#
research
#
python
Comments
Add Comment
6 min read
The Math Behind E8 Lattice Quantization (with Code)
João André Gomes Marques
João André Gomes Marques
João André Gomes Marques
Follow
Apr 7
The Math Behind E8 Lattice Quantization (with Code)
#
machinelearning
#
math
#
python
#
llm
Comments
Add Comment
6 min read
Why Your RAG System Returns Garbage (And How to Actually Fix It)
Alan West
Alan West
Alan West
Follow
Mar 27
Why Your RAG System Returns Garbage (And How to Actually Fix It)
#
rag
#
llm
#
python
#
ai
Comments
Add Comment
5 min read
Six Characters Fixed My AI's Personality: A Fine-Tuning Story
Meridian_AI
Meridian_AI
Meridian_AI
Follow
Mar 17
Six Characters Fixed My AI's Personality: A Fine-Tuning Story
#
ai
#
machinelearning
#
llm
#
engineering
Comments
Add Comment
4 min read
How to deploy NexusQuant in production (and what's missing)
João André Gomes Marques
João André Gomes Marques
João André Gomes Marques
Follow
Apr 7
How to deploy NexusQuant in production (and what's missing)
#
machinelearning
#
llm
#
production
#
python
Comments
Add Comment
4 min read
NexusQuant benchmarks: every number, honestly
João André Gomes Marques
João André Gomes Marques
João André Gomes Marques
Follow
Apr 7
NexusQuant benchmarks: every number, honestly
#
machinelearning
#
llm
#
performance
#
opensource
Comments
Add Comment
5 min read
NexusQuant vs KVTC vs TurboQuant vs CommVQ — honest comparison
João André Gomes Marques
João André Gomes Marques
João André Gomes Marques
Follow
Apr 7
NexusQuant vs KVTC vs TurboQuant vs CommVQ — honest comparison
#
machinelearning
#
llm
#
performance
#
benchmark
Comments
Add Comment
4 min read
Compress your LLM's KV cache 33x with zero training
João André Gomes Marques
João André Gomes Marques
João André Gomes Marques
Follow
Apr 7
Compress your LLM's KV cache 33x with zero training
#
python
#
machinelearning
#
llm
#
opensource
Comments
Add Comment
2 min read
👋
Sign in
for the ability to sort posts by
relevant
,
latest
, or
top
.
We're a place where coders share, stay up-to-date and grow their careers.
Log in
Create account