DEV Community

# evals

Posts

👋 Sign in for the ability to sort posts by relevant, latest, or top.
All I Want for Christmas is Observable Multi-Modal Agentic Systems

All I Want for Christmas is Observable Multi-Modal Agentic Systems

Comments
8 min read
LLM evaluation guide: When to add online evals to your AI application

LLM evaluation guide: When to add online evals to your AI application

Comments
5 min read
From Prototype to Production: 10 Metrics for Reliable AI Agents

From Prototype to Production: 10 Metrics for Reliable AI Agents

Comments
10 min read
Why Data Management Makes or Breaks Your AI Agent Evaluations

Why Data Management Makes or Breaks Your AI Agent Evaluations

Comments
7 min read
AI Hallucinations in 2025: Causes, Impact, and Solutions for Trustworthy AI

AI Hallucinations in 2025: Causes, Impact, and Solutions for Trustworthy AI

5
Comments
6 min read
LLM evaluation: a quick overview of Stax

LLM evaluation: a quick overview of Stax

Comments
2 min read
Why Your AI Agent Is Failing (and How to Fix It)

Why Your AI Agent Is Failing (and How to Fix It)

Comments 1
2 min read
The Hidden Risks of Testing AI-Powered Features with Traditional Tools

The Hidden Risks of Testing AI-Powered Features with Traditional Tools

Comments
3 min read
HoloDeck Part 1: Why Building AI Agents Feels So Broken

HoloDeck Part 1: Why Building AI Agents Feels So Broken

Comments
3 min read
loading...