Skip to content
Navigation menu
Search
Powered by Algolia
Search
Log in
Create account
DEV Community
Close
#
llamacpp
Follow
Hide
Posts
Left menu
👋
Sign in
for the ability to sort posts by
relevant
,
latest
, or
top
.
Right menu
GGUF Quantization Explained: Q4_K_M vs Q5_K_M vs Q8 — Which to Pick (2026)
Patrick Hughes
Patrick Hughes
Patrick Hughes
Follow
May 13
GGUF Quantization Explained: Q4_K_M vs Q5_K_M vs Q8 — Which to Pick (2026)
#
llamacpp
#
gguf
#
quantization
#
localai
Comments
Add Comment
4 min read
Self-Hosted AI Agent Systems: Why Local Inference Matters More Than You Think
Aurora
Aurora
Aurora
Follow
May 13
Self-Hosted AI Agent Systems: Why Local Inference Matters More Than You Think
#
rust
#
ai
#
llamacpp
#
selfhosted
Comments
Add Comment
4 min read
Discontinued Optane Local LLM Powers a Kimi K2.5 Desktop Run
Simon Paxton
Simon Paxton
Simon Paxton
Follow
May 12
Discontinued Optane Local LLM Powers a Kimi K2.5 Desktop Run
#
intel
#
optane
#
kimik25
#
llamacpp
Comments
Add Comment
5 min read
Fixing Qwen 3.6 4090 llama.cpp Bug: 18 tok/s on My RTX 4090
Umair Bilal
Umair Bilal
Umair Bilal
Follow
Apr 26
Fixing Qwen 3.6 4090 llama.cpp Bug: 18 tok/s on My RTX 4090
#
llm
#
llamacpp
#
rtx4090
#
qwen
Comments
Add Comment
8 min read
Running a 70B LLM on Pure RISC-V: The MilkV Pioneer Deployment Journey
Bruno Verachten
Bruno Verachten
Bruno Verachten
Follow
Apr 22
Running a 70B LLM on Pure RISC-V: The MilkV Pioneer Deployment Journey
#
cpuinference
#
deepseekr1
#
llamacpp
#
llm
Comments
Add Comment
17 min read
First Words: LLM Inference on RISC-V
Bruno Verachten
Bruno Verachten
Bruno Verachten
Follow
Apr 22
First Words: LLM Inference on RISC-V
#
bananapi
#
benchmark
#
inference
#
llamacpp
Comments
Add Comment
9 min read
Speculative Checkpointing Pays Off Only on Repetitive Text
Simon Paxton
Simon Paxton
Simon Paxton
Follow
Apr 19
Speculative Checkpointing Pays Off Only on Repetitive Text
#
llamacpp
#
openai
#
nvidia
#
meta
Comments
Add Comment
7 min read
llama.cppの設定で8GBの性能が5倍変わる — 主要オプションの最適値を出した
plasmon
plasmon
plasmon
Follow
Apr 14
llama.cppの設定で8GBの性能が5倍変わる — 主要オプションの最適値を出した
#
llm
#
llamacpp
#
gpu
Comments
Add Comment
4 min read
Parameter Count Is the Worst Way to Pick a Model on 8GB VRAM
plasmon
plasmon
plasmon
Follow
Apr 2
Parameter Count Is the Worst Way to Pick a Model on 8GB VRAM
#
llm
#
locallm
#
gpu
#
llamacpp
Comments
Add Comment
5 min read
Unsloth Studio: The Open-Source LLM Studio To Try
Simon Paxton
Simon Paxton
Simon Paxton
Follow
Mar 17
Unsloth Studio: The Open-Source LLM Studio To Try
#
unslothstudio
#
llamacpp
#
googlecolab
#
lora
Comments
Add Comment
8 min read
How to Run Gemma 4 Locally With Ollama, llama.cpp, and vLLM
Maksim Danilchenko
Maksim Danilchenko
Maksim Danilchenko
Follow
Apr 11
How to Run Gemma 4 Locally With Ollama, llama.cpp, and vLLM
#
gemma4
#
ollama
#
llamacpp
#
vllm
2
reactions
Comments
1
comment
9 min read
👋
Sign in
for the ability to sort posts by
relevant
,
latest
, or
top
.
We're a place where coders share, stay up-to-date and grow their careers.
Log in
Create account