DEV Community

# evaluation

Posts

đź‘‹ Sign in for the ability to sort posts by relevant, latest, or top.
Building an LLM Evaluation Framework That Actually Works

Building an LLM Evaluation Framework That Actually Works

Comments
7 min read
Evals Aren’t a One-Time Report: Build a Living Test Suite That Ships With Every Release.

Evals Aren’t a One-Time Report: Build a Living Test Suite That Ships With Every Release.

1
Comments
6 min read
If you don't red-team your LLM app, your users will

If you don't red-team your LLM app, your users will

1
Comments
7 min read
Go Ahead and Judge Me- Agent Evaluators in AWS AgentCore

Go Ahead and Judge Me- Agent Evaluators in AWS AgentCore

Comments
6 min read
Why Image Hallucination Is More Dangerous Than Text Hallucination

Why Image Hallucination Is More Dangerous Than Text Hallucination

Comments
1 min read
The Self-Evolving Agent (Part 3): The Human in the Loop

The Self-Evolving Agent (Part 3): The Human in the Loop

Comments
4 min read
đź‘‹ Sign in for the ability to sort posts by relevant, latest, or top.