DEV Community

# llm

Posts

👋 Sign in for the ability to sort posts by relevant, latest, or top.
The hidden cost of streaming LLMs: caches you can't use, bills you don't expect, and complexity you don't need

The hidden cost of streaming LLMs: caches you can't use, bills you don't expect, and complexity you don't need

Comments 1
11 min read
Unlocking the Power of RAG Systems with LangChain and Vector Databases

Unlocking the Power of RAG Systems with LangChain and Vector Databases

Comments
3 min read
Switching our LLM-as-judge from 5-class to binary in CI: the patterns we kept

Switching our LLM-as-judge from 5-class to binary in CI: the patterns we kept

Comments
3 min read
AirLLM Shrinks 70B LLMs to 4GB VRAM; DPO & Supermemory Boost Open Models

AirLLM Shrinks 70B LLMs to 4GB VRAM; DPO & Supermemory Boost Open Models

Comments
3 min read
From Chatbots to Personal AI Agents: The Infrastructure Developers Actually Need

From Chatbots to Personal AI Agents: The Infrastructure Developers Actually Need

1
Comments
18 min read
GLM 5.2: Zhipu's Open-Weight Frontier Model With 1M Context

GLM 5.2: Zhipu's Open-Weight Frontier Model With 1M Context

Comments
5 min read
RAG pilots fail when the sources are not ready

RAG pilots fail when the sources are not ready

Comments
2 min read
The most expensive bug in an AI agent is the one it's confident about

The most expensive bug in an AI agent is the one it's confident about

Comments
3 min read
AWS Optimizes Starts, Adaptive Worms Rise, and LLM Memory Gets Local

AWS Optimizes Starts, Adaptive Worms Rise, and LLM Memory Gets Local

Comments
2 min read
# Enterprise RAG’s Biggest Risk: Answers That Look Correct but Aren’t

# Enterprise RAG’s Biggest Risk: Answers That Look Correct but Aren’t

Comments
7 min read
I built a circuit breaker for LLM agents after seeing someone lose $200 overnight

I built a circuit breaker for LLM agents after seeing someone lose $200 overnight

1
Comments
6 min read
From Commerce to E-Commerce to MCP-Commerce: The Third Wave

From Commerce to E-Commerce to MCP-Commerce: The Third Wave

Comments
3 min read
Stop Spending $500/Month on API Calls: Build Your Own LLM Pipeline

Stop Spending $500/Month on API Calls: Build Your Own LLM Pipeline

Comments
3 min read
Over-editing is a token tax: GPT-5.4 ships 6.5x more diff per fix than Claude Opus 4.6, and your bill notices

Over-editing is a token tax: GPT-5.4 ships 6.5x more diff per fix than Claude Opus 4.6, and your bill notices

Comments
2 min read
Rodei IA de 35B na minha GPU velha e me surpreendi!

Rodei IA de 35B na minha GPU velha e me surpreendi!

Comments
4 min read
👋 Sign in for the ability to sort posts by relevant, latest, or top.