DEV Community

# llm

Posts

đź‘‹ Sign in for the ability to sort posts by relevant, latest, or top.
Prefix caching in vLLM under multi-tenant agent traffic

Prefix caching in vLLM under multi-tenant agent traffic

Comments 1
4 min read
I used LLMs to rewrite meta descriptions for 1,600 articles — honest results

I used LLMs to rewrite meta descriptions for 1,600 articles — honest results

Comments
5 min read
Using Qwen 3.6 Plus: Great but a Bit Expensive

Using Qwen 3.6 Plus: Great but a Bit Expensive

Comments
2 min read
Our AI Inference Bill Dropped 65% After We Stopped Treating Every Query the Same

Our AI Inference Bill Dropped 65% After We Stopped Treating Every Query the Same

Comments
5 min read
Qwen 3.6 & llama.cpp Push Local Inference Limits on Consumer GPUs

Qwen 3.6 & llama.cpp Push Local Inference Limits on Consumer GPUs

Comments
3 min read
AI Weekly — 2026-05-15 to 2026-05-22 | The Agentic Inflection Is Real, But the Enterprise Gap Is Wider Than Ever

AI Weekly — 2026-05-15 to 2026-05-22 | The Agentic Inflection Is Real, But the Enterprise Gap Is Wider Than Ever

Comments
4 min read
I tested cheap vs expensive LLMs across 3 real agent tasks. The cheap model won every time.

I tested cheap vs expensive LLMs across 3 real agent tasks. The cheap model won every time.

Comments
4 min read
Routing Event-Camera Pipelines Through an LLM Gateway: A Field Report

Routing Event-Camera Pipelines Through an LLM Gateway: A Field Report

Comments
4 min read
Measuring AI Gateway Failover: 30 Days of Production Data

Measuring AI Gateway Failover: 30 Days of Production Data

Comments
3 min read
Routing diffusion inference traffic across three providers

Routing diffusion inference traffic across three providers

Comments
4 min read
ToolRouter: Switch AI Coding Tools Freely Without Losing Context

ToolRouter: Switch AI Coding Tools Freely Without Losing Context

2
Comments
6 min read
Beyond the Stateless Prompt: Building an Auditable Product Intelligence Pipeline with Cascadeflow and Hindsight

Beyond the Stateless Prompt: Building an Auditable Product Intelligence Pipeline with Cascadeflow and Hindsight

Comments
5 min read
Putting an LLM Gateway in Front of Our Build Agents

Putting an LLM Gateway in Front of Our Build Agents

Comments
4 min read
You Probably Don't Need 8-Bit Quantization

You Probably Don't Need 8-Bit Quantization

Comments
2 min read
The README Was a Protocol. The Entrypoint Was Still Optional.

The README Was a Protocol. The Entrypoint Was Still Optional.

Comments
8 min read
đź‘‹ Sign in for the ability to sort posts by relevant, latest, or top.