DEV Community

# evals

Posts

đź‘‹ Sign in for the ability to sort posts by relevant, latest, or top.
The Loop Is Only as Good as the Metric

The Loop Is Only as Good as the Metric

Comments
7 min read
Why Most AI Teams Are Flying Blind: And What to Do About It

Why Most AI Teams Are Flying Blind: And What to Do About It

Comments 1
13 min read
Wait, you guys run evals?

Wait, you guys run evals?

Comments
1 min read
Evaluate LLM code generation with LLM-as-judge evaluators

Evaluate LLM code generation with LLM-as-judge evaluators

6
Comments
12 min read
From zero evals to a working multimodal evaluation in 30 minutes using LangWatch Skills

From zero evals to a working multimodal evaluation in 30 minutes using LangWatch Skills

Comments
7 min read
Your coding agent already knows how to test your AI agent (we just turned it into a Skill)

Your coding agent already knows how to test your AI agent (we just turned it into a Skill)

1
Comments
4 min read
Build an eval harness for 184 AI agent prompts with promptfoo

Build an eval harness for 184 AI agent prompts with promptfoo

Comments
8 min read
Self-improving Coding Agents

Self-improving Coding Agents

1
Comments 1
5 min read
đź‘‹ Sign in for the ability to sort posts by relevant, latest, or top.