DEV Community

# evals

Posts

đź‘‹ Sign in for the ability to sort posts by relevant, latest, or top.
How to Evaluate LLM Outputs: Building Evals That Actually Catch Regressions

How to Evaluate LLM Outputs: Building Evals That Actually Catch Regressions

Comments
9 min read
The Loop Is Only as Good as the Metric

The Loop Is Only as Good as the Metric

Comments
7 min read
Why Most AI Teams Are Flying Blind: And What to Do About It

Why Most AI Teams Are Flying Blind: And What to Do About It

Comments 1
13 min read
Wait, you guys run evals?

Wait, you guys run evals?

Comments
1 min read
If You Can Survive a Toddler, You Can Ship LLMs in Production

If You Can Survive a Toddler, You Can Ship LLMs in Production

5
Comments 2
5 min read
From zero evals to a working multimodal evaluation in 30 minutes using LangWatch Skills

From zero evals to a working multimodal evaluation in 30 minutes using LangWatch Skills

Comments
7 min read
Your coding agent already knows how to test your AI agent (we just turned it into a Skill)

Your coding agent already knows how to test your AI agent (we just turned it into a Skill)

1
Comments
4 min read
Build an eval harness for 184 AI agent prompts with promptfoo

Build an eval harness for 184 AI agent prompts with promptfoo

Comments
8 min read
Self-improving Coding Agents

Self-improving Coding Agents

1
Comments 1
5 min read
Evaluate LLM code generation with LLM-as-judge evaluators

Evaluate LLM code generation with LLM-as-judge evaluators

7
Comments
12 min read
đź‘‹ Sign in for the ability to sort posts by relevant, latest, or top.