DEV Community

# llm

Posts

đź‘‹ Sign in for the ability to sort posts by relevant, latest, or top.
Skills for eval-driven agent optimization

Skills for eval-driven agent optimization

1
Comments
1 min read
GPT-5 vs Claude Sonnet 4: real per-task cost and benchmark comparison for production workloads

GPT-5 vs Claude Sonnet 4: real per-task cost and benchmark comparison for production workloads

Comments
7 min read
62.2% on Aider Polyglot from a MacBook Pro. Then the other model we tried scored 4%. Here's what actually happened, with a working cost loop attached.

62.2% on Aider Polyglot from a MacBook Pro. Then the other model we tried scored 4%. Here's what actually happened, with a working cost loop attached.

Comments
16 min read
DeepSeek-V4 Changes the Context Game for Agents — And Your Memory Architecture Should Adapt

DeepSeek-V4 Changes the Context Game for Agents — And Your Memory Architecture Should Adapt

1
Comments 1
3 min read
What If You Compressed Your Prompts Into Chinese Emoji? (A Token-Saving Thought Experiment)

What If You Compressed Your Prompts Into Chinese Emoji? (A Token-Saving Thought Experiment)

Comments
3 min read
The JSON-Mode Prompt Pattern That Survives Claude Version Bumps

The JSON-Mode Prompt Pattern That Survives Claude Version Bumps

Comments
7 min read
The 3 Alerts Every LLM Team Should Have Set Up by Tomorrow

The 3 Alerts Every LLM Team Should Have Set Up by Tomorrow

Comments
7 min read
Hybrid Search Is the Phrase You'll Hear at Every RAG Talk in 2026

Hybrid Search Is the Phrase You'll Hear at Every RAG Talk in 2026

Comments
7 min read
GEO / AI Search Thread

GEO / AI Search Thread

Comments
5 min read
Your RAG Eval Set Is Probably Wrong. The Test That Catches It.

Your RAG Eval Set Is Probably Wrong. The Test That Catches It.

Comments
7 min read
Stop Caching the Whole LLM Response. Cache the Embedding.

Stop Caching the Whole LLM Response. Cache the Embedding.

Comments
8 min read
The 2-Line Defense That Stops 90% of Real-World Prompt Injection

The 2-Line Defense That Stops 90% of Real-World Prompt Injection

Comments
7 min read
OpenAI Outage Postmortem: What Status Pages Don't Tell You

OpenAI Outage Postmortem: What Status Pages Don't Tell You

Comments
7 min read
The 6-Line Postgres Migration That Halved a Team's LLM Bill

The 6-Line Postgres Migration That Halved a Team's LLM Bill

Comments
7 min read
The Single Unit Test Every LLM Prompt Should Have

The Single Unit Test Every LLM Prompt Should Have

Comments
7 min read
đź‘‹ Sign in for the ability to sort posts by relevant, latest, or top.