Evals

👋 Sign in for the ability to sort posts by relevant, latest, or top.

Agnel Nieves for Promptway

Jul 29

The 12-Prompt Eval I Run Before I Trust Any Model Upgrade

#prompting #evals #modelmigration #claude

3 min read

lbobylev

Jul 26

Spring AI Evals: how I test agent behavior

#springboot #ai #evals #llm

7 min read

Edward Li

Jul 8

Do not choose an AI model from a leaderboard alone

#ai #api #llm #evals

3 min read

Ethan Walker

Jul 5

# A 94% pass rate hid a PII leak in 6 test cases

#ai #llm #evals

5 min read

Nabbil Khan

Jul 19

Zero Is Not a Score

#advertising #aiagentoperations #developertooling #evals

4 min read

Kate Pond

Jun 25

Not Enough SMEs or Customers to Make Your Evals? Make Some!

#ai #evals #testing #personas

5 min read

Ethan Walker

Jul 14

We gated CI on six open-source LLM eval frameworks. Only two survived the merge queue.

#ai #opensource #llm #evals

14 min read

Jangwook Kim

Jun 11

OpenAI Agent Builder and Evals Winddown Migration Checklist

#openai #agentbuilder #evals #agentssdk

11 min read

techpotions

Jul 11

How to Add Evals to an LLM Feature

#llmevaluation #evals #llmfeatures #aitesting

5 min read

Ruben

Jul 7

English-Only Agent Evals Miss Real Failures

#ai #testing #evals #agents

6 min read

Konstantin Gredeskoul

Jun 23

What 25 Years of Deterministic Software Engineering Taught Me About Building AI Systems

#ai #evals #appliedai #tutorial

1 min read

Vasyl

Jun 17

AI Evals, Part 5: From a Number to a Gate Evals in CI and Production

#ai #evals #llm #dotnet

4 min read

Vasyl

Jun 17

AI Evals, Part 4: LLM-as-Judge, Done Right

#ai #evals #llm #dotnet

5 min read

Dishant Sethi

May 27

How to Evaluate LLM Outputs: Building Evals That Actually Catch Regressions

#evals #ai #llmops #agents

9 min read

Vasyl

Jun 16

AI Evals, Part 3: Golden Datasets That Dont Lie

#ai #evals #llm #dotnet

5 min read

👋 Sign in for the ability to sort posts by relevant, latest, or top.