Skip to content

DEV Community

AI Agent Evaluation Series' Articles

Back to shashank agarwal's Series

Cover image for How to use System prompts as Ground Truth for Evaluation

shashank agarwal

Dec 10 '25

How to use System prompts as Ground Truth for Evaluation

#testing #agents #llm #ai

1 min read

Cover image for Stop Evaluating AI Agents Like ML Models: A Paradigm Shift for Developers

shashank agarwal

Dec 12 '25

Stop Evaluating AI Agents Like ML Models: A Paradigm Shift for Developers

#ai #llm #agents #machinelearning

3 min read

Cover image for Your System Prompt is Your Ground Truth: Ditch Manual Labeling for AI Agent Evaluation

shashank agarwal

Dec 15 '25

Your System Prompt is Your Ground Truth: Ditch Manual Labeling for AI Agent Evaluation

#ai #programming #tutorial #agents

3 min read

Cover image for Beyond Accuracy: The 73+ Dimensions of AI Agent Quality

shashank agarwal

Dec 17 '25

Beyond Accuracy: The 73+ Dimensions of AI Agent Quality

#ai #agents #machinelearning #programming

3 min read

Cover image for How to Analyze AI Agent Traces Like a Detective

shashank agarwal

Dec 19 '25

How to Analyze AI Agent Traces Like a Detective

#ai #testing #agents #webdev

3 min read

Cover image for 5 Types of AI Hallucinations (And How to Detect Them)

shashank agarwal

Dec 22 '25

5 Types of AI Hallucinations (And How to Detect Them)

#discuss #ai #machinelearning #agents

3 min read

Cover image for The Hidden Costs of Inefficient AI Agents (And How to Fix Them)

shashank agarwal

Dec 24 '25

The Hidden Costs of Inefficient AI Agents (And How to Fix Them)

#webdev #ai #programming #devops

2 min read

Cover image for Is Your AI Agent a Compliance Risk? How to Find Violations Hidden in Traces

shashank agarwal

Dec 26 '25

Is Your AI Agent a Compliance Risk? How to Find Violations Hidden in Traces

#privacy #agents #security #ai

2 min read

Cover image for How to Build an AI Agent Evaluation Framework That Scales

shashank agarwal

Dec 29 '25

How to Build an AI Agent Evaluation Framework That Scales

#ai #webdev #programming #devops

3 min read

Cover image for Monitoring vs. Evaluation: The Critical Distinction Most AI Devs Miss

shashank agarwal

Dec 31 '25

Monitoring vs. Evaluation: The Critical Distinction Most AI Devs Miss

#ai #webdev #programming #devops

2 min read

Cover image for The AI Agent Feedback Loop: From Evaluation to Continuous Improvement

shashank agarwal

Jan 1

The AI Agent Feedback Loop: From Evaluation to Continuous Improvement

#webdev #ai #programming #devops

3 min read