DEV Community

# llm

Posts

👋 Sign in for the ability to sort posts by relevant, latest, or top.
AI Weekly — 2026-05-22 to 2026-05-29 | Anthropic's $965B Moment and the Infrastructure Bet

AI Weekly — 2026-05-22 to 2026-05-29 | Anthropic's $965B Moment and the Infrastructure Bet

Comments
5 min read
Stop letting LLMs hallucinate dates — a tool for AI agents

Stop letting LLMs hallucinate dates — a tool for AI agents

5
Comments 1
2 min read
How I built an intent drift detector for LLM agents

How I built an intent drift detector for LLM agents

Comments 3
1 min read
NVIDIA's Nemotron Diffusion: One Model, Three Generation Modes, 6 Faster

NVIDIA's Nemotron Diffusion: One Model, Three Generation Modes, 6 Faster

Comments
3 min read
Continuous batching wrecked our p99 latency. Here's the trace.

Continuous batching wrecked our p99 latency. Here's the trace.

Comments
4 min read
Building a cost-efficient LLM caching layer in Python

Building a cost-efficient LLM caching layer in Python

Comments
5 min read
Why Claude Code Sessions Diverge: A Mechanism Catalog

Why Claude Code Sessions Diverge: A Mechanism Catalog

Comments
3 min read
Gemma4 Apex GGUF, Ollama Context Optimization, & Llama3 Benchmarks

Gemma4 Apex GGUF, Ollama Context Optimization, & Llama3 Benchmarks

Comments
3 min read
Building a Fully-Local Research RAG on 2 GTX 1080 Ti + an RTX 3090 — 3 Gotchas

Building a Fully-Local Research RAG on 2 GTX 1080 Ti + an RTX 3090 — 3 Gotchas

Comments 4
5 min read
How Do You Fit a Trillion-Parameter Model Into a Kubernetes Cluster?

How Do You Fit a Trillion-Parameter Model Into a Kubernetes Cluster?

Comments
17 min read
# Multi-Head Latent Attention (MLA)

# Multi-Head Latent Attention (MLA)

Comments
9 min read
The Production Metric That Warns Us Before AI Failures Happen

The Production Metric That Warns Us Before AI Failures Happen

Comments
3 min read
fftext: summarize, translate, and fact-check any text on your laptop. No API key.

fftext: summarize, translate, and fact-check any text on your laptop. No API key.

Comments
3 min read
A Month with DeepSeek: What Happened When I Replaced Claude Opus for Real Work

A Month with DeepSeek: What Happened When I Replaced Claude Opus for Real Work

Comments
7 min read
Rasa 播客谈对话设计的演变

Rasa 播客谈对话设计的演变

Comments
3 min read
👋 Sign in for the ability to sort posts by relevant, latest, or top.