Llamacpp

👋 Sign in for the ability to sort posts by relevant, latest, or top.

Storm Engine Technology.

Jun 3

llama.cpp b9455 Finally Caught vLLM: 70t/s on 2x3090 Qwen 27B UQ8

#llamacpp #llm #ai #opensource

3 min read

Deepu K Sasidharan

Jun 2

Introducing LlamaStash: a zero-overhead, terminal-native llama.cpp launcher

#ai #llamacpp #localllm #rust

11 min read

TTFT and RAG efficiency insights

Deepu K Sasidharan

Jun 2

How fast is LlamaStash? Overhead, throughput, and a fair comparison with Ollama and LM Studio

#ai #llamacpp #benchmark #llm

24 min read

Rost

May 24

Qwen 3.6 27B and 35B MTP vs Standard on 16GB GPU

#selfhosting #llm #ai #llamacpp

8 min read

Patrick Hughes

May 13

GGUF Quantization Explained: Q4_K_M vs Q5_K_M vs Q8 — Which to Pick (2026)

#llamacpp #gguf #quantization #localai

4 min read

Aurora

May 13

Self-Hosted AI Agent Systems: Why Local Inference Matters More Than You Think

#rust #ai #llamacpp #selfhosted

4 min read

Umair Bilal

Apr 26

Fixing Qwen 3.6 4090 llama.cpp Bug: 18 tok/s on My RTX 4090

#llm #llamacpp #rtx4090 #qwen

8 min read

r-via

May 28

Benchmarking the Claude Agent SDK on a local LLM: Haiku and Sonnet tier performance

#llm #claude #llamacpp #benchmark

6 min read

Bruno Verachten

Apr 22

First Words: LLM Inference on RISC-V

#bananapi #benchmark #inference #llamacpp

9 min read

Bruno Verachten

Apr 22

Running a 70B LLM on Pure RISC-V: The MilkV Pioneer Deployment Journey

#cpuinference #deepseekr1 #llamacpp #llm

17 min read

Thurmon Demich

May 20

Ollama vs llama.cpp vs vLLM: Which Should You Use in 2026?

#ollama #llamacpp #vllm #comparison

5 min read

plasmon

Apr 14

llama.cppの設定で8GBの性能が5倍変わる — 主要オプションの最適値を出した

#llm #llamacpp #gpu

4 min read

plasmon

Apr 2

Parameter Count Is the Worst Way to Pick a Model on 8GB VRAM

#llm #locallm #gpu #llamacpp

5 min read

Maksim Danilchenko

Apr 11

How to Run Gemma 4 Locally With Ollama, llama.cpp, and vLLM

#gemma4 #ollama #llamacpp #vllm

9 min read

👋 Sign in for the ability to sort posts by relevant, latest, or top.

DEV Community

# llamacpp

llama.cpp b9455 Finally Caught vLLM: 70t/s on 2x3090 Qwen 27B UQ8

Introducing LlamaStash: a zero-overhead, terminal-native llama.cpp launcher

How fast is LlamaStash? Overhead, throughput, and a fair comparison with Ollama and LM Studio

Qwen 3.6 27B and 35B MTP vs Standard on 16GB GPU

GGUF Quantization Explained: Q4_K_M vs Q5_K_M vs Q8 — Which to Pick (2026)

Self-Hosted AI Agent Systems: Why Local Inference Matters More Than You Think

Fixing Qwen 3.6 4090 llama.cpp Bug: 18 tok/s on My RTX 4090

Benchmarking the Claude Agent SDK on a local LLM: Haiku and Sonnet tier performance

First Words: LLM Inference on RISC-V

Running a 70B LLM on Pure RISC-V: The MilkV Pioneer Deployment Journey

Ollama vs llama.cpp vs vLLM: Which Should You Use in 2026?

llama.cppの設定で8GBの性能が5倍変わる — 主要オプションの最適値を出した

Parameter Count Is the Worst Way to Pick a Model on 8GB VRAM

How to Run Gemma 4 Locally With Ollama, llama.cpp, and vLLM