Skip to content
Navigation menu
Search
Powered by Algolia
Search
Log in
Create account
DEV Community
Close
#
evaluation
Follow
Hide
Posts
Left menu
đź‘‹
Sign in
for the ability to sort posts by
relevant
,
latest
, or
top
.
Right menu
AI Evaluation Is Now a Capital Expense
Aamer Mihaysi
Aamer Mihaysi
Aamer Mihaysi
Follow
May 1
AI Evaluation Is Now a Capital Expense
#
ai
#
evaluation
#
agents
1
 reaction
Comments
Add Comment
2 min read
Benchmark: Ragas 0.1 vs. LangSmith 2.0: RAG Evaluation Speed for 1k Queries
ANKUSH CHOUDHARY JOHAL
ANKUSH CHOUDHARY JOHAL
ANKUSH CHOUDHARY JOHAL
Follow
Apr 29
Benchmark: Ragas 0.1 vs. LangSmith 2.0: RAG Evaluation Speed for 1k Queries
#
benchmark
#
ragas
#
langsmith
#
evaluation
1
 reaction
Comments
Add Comment
12 min read
What Is Agent Evaluation? How EClaw Arena Benchmarks AI Agents Across 12 Dimensions
EClawbot Official
EClawbot Official
EClawbot Official
Follow
Apr 15
What Is Agent Evaluation? How EClaw Arena Benchmarks AI Agents Across 12 Dimensions
#
ai
#
agents
#
benchmarks
#
evaluation
Comments
Add Comment
3 min read
LLM-as-Judge: using Claude to review a Gemini agent
ThomasP
ThomasP
ThomasP
Follow
Apr 8
LLM-as-Judge: using Claude to review a Gemini agent
#
ai
#
llm
#
agents
#
evaluation
Comments
Add Comment
7 min read
The Evaluation Gap: Why We Dont Know If Agents Are Getting Better
Aamer Mihaysi
Aamer Mihaysi
Aamer Mihaysi
Follow
Apr 4
The Evaluation Gap: Why We Dont Know If Agents Are Getting Better
#
ai
#
agents
#
evaluation
#
engineering
Comments
Add Comment
2 min read
SQL Comparison Library Architecture
kasi viswanath vandanapu
kasi viswanath vandanapu
kasi viswanath vandanapu
Follow
Apr 1
SQL Comparison Library Architecture
#
sql
#
ai
#
evaluation
#
llm
Comments
Add Comment
14 min read
Building an LLM Judge That Doesn't Lie to You
Tebogo Tseka
Tebogo Tseka
Tebogo Tseka
Follow
Mar 31
Building an LLM Judge That Doesn't Lie to You
#
ai
#
evaluation
#
testing
#
machinelearning
1
 reaction
Comments
Add Comment
8 min read
Build a Production‑Ready SQL Evaluation Engine for LLMs
kasi viswanath vandanapu
kasi viswanath vandanapu
kasi viswanath vandanapu
Follow
Mar 30
Build a Production‑Ready SQL Evaluation Engine for LLMs
#
sql
#
llm
#
evaluation
#
python
Comments
Add Comment
5 min read
Beyond Text: How We Built an Evaluation Framework for Multi-File AI Outputs
Tebogo Tseka
Tebogo Tseka
Tebogo Tseka
Follow
Mar 30
Beyond Text: How We Built an Evaluation Framework for Multi-File AI Outputs
#
ai
#
evaluation
#
testing
#
webdev
1
 reaction
Comments
Add Comment
8 min read
Evaluating Vendor Offerings: A Structured Approach to Identify High-Quality, Compatible Tools at Conferences
Alina Trofimova
Alina Trofimova
Alina Trofimova
Follow
Mar 19
Evaluating Vendor Offerings: A Structured Approach to Identify High-Quality, Compatible Tools at Conferences
#
devops
#
kubecon
#
evaluation
#
kubernetes
Comments
Add Comment
13 min read
EVAL #006: LLM Evaluation Tools — RAGAS vs DeepEval vs Braintrust vs LangSmith vs Arize Phoenix
Ultra Dune
Ultra Dune
Ultra Dune
Follow
Mar 17
EVAL #006: LLM Evaluation Tools — RAGAS vs DeepEval vs Braintrust vs LangSmith vs Arize Phoenix
#
llm
#
evaluation
#
ai
#
machinelearning
Comments
Add Comment
10 min read
When "Slow Thinking" Is Just "Slow Talking"
Cophy Origin
Cophy Origin
Cophy Origin
Follow
Apr 20
When "Slow Thinking" Is Just "Slow Talking"
#
ai
#
machinelearning
#
llm
#
evaluation
Comments
Add Comment
3 min read
Origin Part 3: The Teacher Was Scoring It Wrong
Josh T
Josh T
Josh T
Follow
Apr 17
Origin Part 3: The Teacher Was Scoring It Wrong
#
aitraining
#
genesisframework
#
olt1
#
evaluation
Comments
Add Comment
9 min read
No Evals, No Idea. How 40% of RAG Answers Go Wrong.
Serhii Panchyshyn
Serhii Panchyshyn
Serhii Panchyshyn
Follow
Apr 13
No Evals, No Idea. How 40% of RAG Answers Go Wrong.
#
ai
#
rag
#
production
#
evaluation
1
 reaction
Comments
Add Comment
5 min read
Building an LLM Evaluation Framework That Actually Works
Ritwika Kancharla
Ritwika Kancharla
Ritwika Kancharla
Follow
Mar 3
Building an LLM Evaluation Framework That Actually Works
#
evaluation
#
llm
#
ai
Comments
Add Comment
7 min read
đź‘‹
Sign in
for the ability to sort posts by
relevant
,
latest
, or
top
.
We're a place where coders share, stay up-to-date and grow their careers.
Log in
Create account