DEV Community

# llm

Posts

đź‘‹ Sign in for the ability to sort posts by relevant, latest, or top.
Fitting LLM Reply Suggestions Into Every Provider's Prompt Cache — Without Structured Output

Fitting LLM Reply Suggestions Into Every Provider's Prompt Cache — Without Structured Output

Comments 1
4 min read
High-Value If, Low-Value Foreach: Why Agents Trade in Judgment Structures, Not Models

High-Value If, Low-Value Foreach: Why Agents Trade in Judgment Structures, Not Models

2
Comments
23 min read
Tackle High Token Usage with GraphRAG

Tackle High Token Usage with GraphRAG

1
Comments
4 min read
How to build a production RAG pipeline in Python (without a vector database)

How to build a production RAG pipeline in Python (without a vector database)

1
Comments
5 min read
The Capability Curve Has No Memory

The Capability Curve Has No Memory

1
Comments
11 min read
My AI Agent Kept Lying to Me. Then It Tried to Trick Me.

My AI Agent Kept Lying to Me. Then It Tried to Trick Me.

Comments 2
5 min read
Como treinei uma IA de suporte com histórico real de atendimento: da conversa bruta ao RAG em produção

Como treinei uma IA de suporte com histórico real de atendimento: da conversa bruta ao RAG em produção

1
Comments 1
11 min read
Stop Burning Cash on Long-Context RAG: Ephemeral Prompt Caching with Spring AI and JTokkit

Stop Burning Cash on Long-Context RAG: Ephemeral Prompt Caching with Spring AI and JTokkit

Comments 1
2 min read
The Daimon Java SDK: Chat, Stream, and Query Memory from 3 Lines of Java

The Daimon Java SDK: Chat, Stream, and Query Memory from 3 Lines of Java

Comments
5 min read
Stop Burning Tokens on Chat / Agent Loops — Here's What Actually Works

Stop Burning Tokens on Chat / Agent Loops — Here's What Actually Works

Comments 1
6 min read
When the LLM Refuses: A Fallback Chain That Salvages Most Refusals

When the LLM Refuses: A Fallback Chain That Salvages Most Refusals

Comments 1
5 min read
Welcome to the Slop KPI Era: How Tokenmaxxing Is Making AI Worse

Welcome to the Slop KPI Era: How Tokenmaxxing Is Making AI Worse

1
Comments
4 min read
Your RAG Pipeline Is Failing 40% of Queries. Here's the Fix.

Your RAG Pipeline Is Failing 40% of Queries. Here's the Fix.

Comments
2 min read
Qwen3.7 Max vs Open-Weight LLMs: Practical Migration Notes

Qwen3.7 Max vs Open-Weight LLMs: Practical Migration Notes

2
Comments
5 min read
Inworld TTS Paralinguistic Tags Don't Work — Here's What Does

Inworld TTS Paralinguistic Tags Don't Work — Here's What Does

Comments 1
4 min read
đź‘‹ Sign in for the ability to sort posts by relevant, latest, or top.