Vllm - DEV Community

👋 Sign in for the ability to sort posts by relevant, latest, or top.

xbill for Google Developer Experts

Jul 25

Self-Hosted Gemma 4 on TPU v6e: Deployment & SRE with Antigravity

#tpu #llm #vllm #antigravity

8 min read

Mingxin Technology

Jul 22

What 90% Line-Rate Utilization on a Single 100GbE Port Means: Analyzing Network Bottlenecks in Inference Storage

#kvcache #lmcache #vllm #ai

5 min read

Arsen Apostolov

Jul 9

Does a Second GPU Increase Ollama's Context Window? (Quadro P2000 + RTX 3090 Tested)

#llm #ollama #vllm #gpu

3 min read

Arsen Apostolov

Jul 5

vLLM vs llama.cpp vs Ollama: What Happens When Your Model Doesn't Fit in 24GB VRAM

#llm #homelab #vllm #ai

6 min read

xbill for Google Developer Experts

Jul 21

Gemma 4 E2B on a Single TPU v6e Chip: A Serving Deep Dive

#tpu #llm #vllm #googlecloud

8 min read

xbill for Google Developer Experts

Jul 21

tpu-management: a Claude Code skill for running Gemma 4 on Cloud TPUs

#googlecloud #tpu #vllm #claudecode

3 min read

The Cyber Sidekick

Jun 23

AI Inference Optimization in Cloud-Native Environments: GPU Orchestration, Edge Deployment, and Latency Reduction at Scale

#kubernetesgpuscheduling #aiinferenceoptimization #llmdeployment #vllm

4 min read

Lola Lin

Jul 27

Kimi K3 Open Weights Are Here: How to Self-Host the 2.8T-Parameter Model (Hardware, vLLM, and Data Sovereignty)

#kimik3 #openweights #selfhostai #vllm

8 min read

The Cyber Sidekick

Jun 18

AI Inference at the Edge: Running Real-Time LLMs in Kubernetes Without a GPU Farm

#edgeai #kubernetes #llminference #vllm

3 min read

Creeta

Jun 18

Qwen3.6-35B NVFP4 runs on one H100 — A100 owners are out

#qwen3 #nvfp4 #vllm #nvidia

8 min read

GaeaRuiW

Jun 9

I built an open-source alternative to Microsoft's KAITO that works on ANY Kubernetes cluster

#kubernetes #vllm #devops #opensource

2 min read

Tech_Nuggets

Jun 7

Prefix caching at scale: when it saves you 80% of prefill cost, and the eviction policies that quietly turn it into 5%

#llm #ai #infrastructure #vllm

9 min read

Tech_Nuggets

Jun 6

KV cache quantization: what FP8/INT8 K and V actually buy you, and where they break

#llm #ai #vllm #performance

8 min read

Harshit Luthra

Jul 2

What a green GPU dashboard hides

#gpu #observability #vllm #prometheus

9 min read

Harshit Luthra

Jul 2

The cheapest speedup is your load balancer

#gpu #inference #routing #vllm

8 min read

👋 Sign in for the ability to sort posts by relevant, latest, or top.