Skip to content
Navigation menu
Search
Powered by Algolia
Search
Log in
Create account
DEV Community
Close
#
benchmarks
Follow
Hide
Posts
Left menu
đź‘‹
Sign in
for the ability to sort posts by
relevant
,
latest
, or
top
.
Right menu
How to Choose the Right AI Model for the Right Job
Shafiq Ur Rehman
Shafiq Ur Rehman
Shafiq Ur Rehman
Follow
Apr 21
How to Choose the Right AI Model for the Right Job
#
ai
#
benchmarks
#
modelselection
Comments
Add Comment
13 min read
How I took LongMemEval oracle from 62% to 82.8% without touching the retriever
t49qnsx7qt-kpanks
t49qnsx7qt-kpanks
t49qnsx7qt-kpanks
Follow
Apr 21
How I took LongMemEval oracle from 62% to 82.8% without touching the retriever
#
ai
#
llm
#
benchmarks
#
memory
Comments
Add Comment
3 min read
What Is Agent Evaluation? How EClaw Arena Benchmarks AI Agents Across 12 Dimensions
EClawbot Official
EClawbot Official
EClawbot Official
Follow
Apr 15
What Is Agent Evaluation? How EClaw Arena Benchmarks AI Agents Across 12 Dimensions
#
ai
#
agents
#
benchmarks
#
evaluation
Comments
Add Comment
3 min read
Sonnet 4.6 vs Haiku 4.5 vs Opus 4.6: I Tested 3 Claude Models on 10 Real Tasks
James AI
James AI
James AI
Follow
Apr 15
Sonnet 4.6 vs Haiku 4.5 vs Opus 4.6: I Tested 3 Claude Models on 10 Real Tasks
#
ai
#
llm
#
claude
#
benchmarks
Comments
Add Comment
3 min read
The YC President Endorsed an AI Memory System With Fake Benchmarks. He Also Shipped His Own. We Read the Code.
Penfield
Penfield
Penfield
Follow
Apr 11
The YC President Endorsed an AI Memory System With Fake Benchmarks. He Also Shipped His Own. We Read the Code.
#
ai
#
aimemory
#
benchmarks
#
yc
Comments
Add Comment
3 min read
Proposal: A Real Benchmark for Long-Term AI Memory Systems
Penfield
Penfield
Penfield
Follow
Apr 10
Proposal: A Real Benchmark for Long-Term AI Memory Systems
#
ai
#
aimemory
#
benchmarks
Comments
Add Comment
3 min read
I accidentally made the fastest event system in the world
stderr
stderr
stderr
Follow
Apr 21
I accidentally made the fastest event system in the world
#
rust
#
performance
#
benchmarks
#
events
Comments
1
 comment
11 min read
The $500 GPU That Outperforms Claude Sonnet on Coding Benchmarks
Pooya Golchian
Pooya Golchian
Pooya Golchian
Follow
Apr 7
The $500 GPU That Outperforms Claude Sonnet on Coding Benchmarks
#
ai
#
llm
#
benchmarks
#
nvidia
Comments
Add Comment
4 min read
Milla Jovovich just released an AI memory system. It reached over 1.5 million people and 5,400 GitHub stars in less than 24 hours.
Penfield
Penfield
Penfield
Follow
Apr 7
Milla Jovovich just released an AI memory system. It reached over 1.5 million people and 5,400 GitHub stars in less than 24 hours.
#
ai
#
aimemory
#
benchmarks
Comments
Add Comment
9 min read
LLM Evaluation: Metrics and Testing Strategies
Matt Frank
Matt Frank
Matt Frank
Follow
Apr 6
LLM Evaluation: Metrics and Testing Strategies
#
llmevaluation
#
aitesting
#
benchmarks
Comments
Add Comment
6 min read
Why Small LLMs Fail at Tool Calling: The Shocking Discovery from Our Llama 3B Benchmark
Anak Wannaphaschaiyong
Anak Wannaphaschaiyong
Anak Wannaphaschaiyong
Follow
Apr 3
Why Small LLMs Fail at Tool Calling: The Shocking Discovery from Our Llama 3B Benchmark
#
ai
#
llm
#
agents
#
benchmarks
Comments
Add Comment
11 min read
We audited LoCoMo: 6.4% of the answer key is wrong and the judge accepts up to 63% of intentionally
Penfield
Penfield
Penfield
Follow
Apr 4
We audited LoCoMo: 6.4% of the answer key is wrong and the judge accepts up to 63% of intentionally
#
ai
#
mcp
#
machinelearning
#
benchmarks
Comments
Add Comment
5 min read
Windsurf's Arena Mode Lets You Blind-Test AI Models. I Tried It.
Alan West
Alan West
Alan West
Follow
Mar 29
Windsurf's Arena Mode Lets You Blind-Test AI Models. I Tried It.
#
windsurf
#
aimodels
#
devtools
#
benchmarks
1
 reaction
Comments
Add Comment
5 min read
Critical Flaws in Long-Term Memory Benchmarks: Addressing Unreliable and Uninterpretable Results
Valeria Solovyova
Valeria Solovyova
Valeria Solovyova
Follow
Mar 27
Critical Flaws in Long-Term Memory Benchmarks: Addressing Unreliable and Uninterpretable Results
#
ai
#
benchmarks
#
memory
#
reliability
Comments
Add Comment
15 min read
GenHTTP vs ASP.NET Minimal APIs: The C# Benchmark Showdown Nobody Expected
Benny
Benny
Benny
Follow
Mar 27
GenHTTP vs ASP.NET Minimal APIs: The C# Benchmark Showdown Nobody Expected
#
csharp
#
dotnet
#
performance
#
benchmarks
5
 reactions
Comments
Add Comment
6 min read
đź‘‹
Sign in
for the ability to sort posts by
relevant
,
latest
, or
top
.
We're a place where coders share, stay up-to-date and grow their careers.
Log in
Create account