Skip to content
Navigation menu
Search
Powered by Algolia
Search
Log in
Create account
DEV Community
Close
#
evaluation
Follow
Hide
Posts
Left menu
👋
Sign in
for the ability to sort posts by
relevant
,
latest
, or
top
.
Right menu
Deterministic Checks vs Model-as-Judge: A Tiered Approach to Agent Evaluation
Saurav Bhattacharya
Saurav Bhattacharya
Saurav Bhattacharya
Follow
Jun 5
Deterministic Checks vs Model-as-Judge: A Tiered Approach to Agent Evaluation
#
ai
#
testing
#
agents
#
evaluation
Comments
Add Comment
4 min read
Monitoring vs Evaluation — What's the Difference (and Why It Matters)
Phylis Korir
Phylis Korir
Phylis Korir
Follow
Jun 3
Monitoring vs Evaluation — What's the Difference (and Why It Matters)
#
monitoring
#
evaluation
#
projectmanagement
#
beginners
5
 reactions
Comments
Add Comment
6 min read
Evals Are Alignment Enforcement: Why Your Safety Strategy Needs Runtime Checks
Saurav Bhattacharya
Saurav Bhattacharya
Saurav Bhattacharya
Follow
Jun 7
Evals Are Alignment Enforcement: Why Your Safety Strategy Needs Runtime Checks
#
ai
#
security
#
evaluation
#
agents
1
 reaction
Comments
Add Comment
5 min read
The Alignment Problem Is an HR Problem - And We Should Treat It Like One
Saurav Bhattacharya
Saurav Bhattacharya
Saurav Bhattacharya
Follow
Jun 7
The Alignment Problem Is an HR Problem - And We Should Treat It Like One
#
ai
#
agents
#
safety
#
evaluation
Comments
7
 comments
4 min read
第一次对AI Agent的精神病å¦è¯„ä¼°
guangda
guangda
guangda
Follow
Jun 6
第一次对AI Agent的精神病å¦è¯„ä¼°
#
ai
#
agents
#
psychology
#
evaluation
1
 reaction
Comments
Add Comment
1 min read
Prompt Engineering for Automated Evaluation: Making LLMs the Judge in AI Builder Solutions
Bala Madhusoodhanan
Bala Madhusoodhanan
Bala Madhusoodhanan
Follow
May 25
Prompt Engineering for Automated Evaluation: Making LLMs the Judge in AI Builder Solutions
#
aibuilder
#
powerplatform
#
evaluation
#
powerfuldevs
5
 reactions
Comments
Add Comment
4 min read
The First Psychiatric Evaluation of AI Agents
guangda
guangda
guangda
Follow
Jun 5
The First Psychiatric Evaluation of AI Agents
#
ai
#
agents
#
psychology
#
evaluation
Comments
Add Comment
3 min read
Why I used three different critic roles instead of one (and what the eval taught me)
Bohyeon Jang
Bohyeon Jang
Bohyeon Jang
Follow
May 31
Why I used three different critic roles instead of one (and what the eval taught me)
#
llm
#
python
#
ai
#
evaluation
Comments
2
 comments
6 min read
Building a domain-specific LLM evaluation set from scratch
Tech_Nuggets
Tech_Nuggets
Tech_Nuggets
Follow
Jun 4
Building a domain-specific LLM evaluation set from scratch
#
llm
#
ai
#
evaluation
#
opensource
1
 reaction
Comments
Add Comment
8 min read
What is an LLM evaluation harness? A deep dive into lm-eval-harness
Tech_Nuggets
Tech_Nuggets
Tech_Nuggets
Follow
Jun 3
What is an LLM evaluation harness? A deep dive into lm-eval-harness
#
llm
#
ai
#
evaluation
#
opensource
1
 reaction
Comments
Add Comment
7 min read
Evaluating LLM code reviewers: an offline harness for precision, recall, and routing"
Prakhar Singh
Prakhar Singh
Prakhar Singh
Follow
May 13
Evaluating LLM code reviewers: an offline harness for precision, recall, and routing"
#
llm
#
codereview
#
evaluation
#
ai
2
 reactions
Comments
Add Comment
5 min read
RAG Series (8): RAG Evaluation System — Speaking with Data
WonderLab
WonderLab
WonderLab
Follow
May 6
RAG Series (8): RAG Evaluation System — Speaking with Data
#
rag
#
ragas
#
llm
#
evaluation
Comments
Add Comment
9 min read
How do you eval LLM output that isn't code?
ur-grue
ur-grue
ur-grue
Follow
May 29
How do you eval LLM output that isn't code?
#
ai
#
llm
#
evaluation
#
writing
Comments
1
 comment
3 min read
why Cohen's kappa drifts week to week (and what to do about it)
Maya Andersson
Maya Andersson
Maya Andersson
Follow
Jun 2
why Cohen's kappa drifts week to week (and what to do about it)
#
ai
#
evaluation
#
machinelearning
#
statistics
7
 reactions
Comments
1
 comment
1 min read
Dogfooding an LLM agent eval pack on my own production agent — what 6-dim methodology surfaced
weiseer
weiseer
weiseer
Follow
May 27
Dogfooding an LLM agent eval pack on my own production agent — what 6-dim methodology surfaced
#
ai
#
llm
#
agents
#
evaluation
Comments
Add Comment
5 min read
👋
Sign in
for the ability to sort posts by
relevant
,
latest
, or
top
.
We're a place where coders share, stay up-to-date and grow their careers.
Log in
Create account