Skip to content
Navigation menu
Search
Powered by Algolia
Search
Log in
Create account
DEV Community
Close
#
benchmark
Follow
Hide
Posts
Left menu
đź‘‹
Sign in
for the ability to sort posts by
relevant
,
latest
, or
top
.
Right menu
First Words: LLM Inference on RISC-V
Bruno Verachten
Bruno Verachten
Bruno Verachten
Follow
Apr 22
First Words: LLM Inference on RISC-V
#
bananapi
#
benchmark
#
inference
#
llamacpp
Comments
Add Comment
9 min read
Benchmarking llama.cpp on SpacemiT K3: RISC-V AI Cores vs Standard RVV (Part 4)
Bruno Verachten
Bruno Verachten
Bruno Verachten
Follow
Apr 22
Benchmarking llama.cpp on SpacemiT K3: RISC-V AI Cores vs Standard RVV (Part 4)
#
ai
#
bananapi
#
benchmark
#
embedded
Comments
Add Comment
15 min read
MCP vs CLI for AI Agents: A Real AWS Benchmark (and Why the Popular Narrative Asks the Wrong Question)
Webmaster Ramos
Webmaster Ramos
Webmaster Ramos
Follow
Apr 21
MCP vs CLI for AI Agents: A Real AWS Benchmark (and Why the Popular Narrative Asks the Wrong Question)
#
ai
#
aws
#
mcp
#
benchmark
1
 reaction
Comments
Add Comment
18 min read
Designing a practical sorting benchmark across Python, Rust, and C
1234567890-
1234567890-
1234567890-
Follow
Apr 20
Designing a practical sorting benchmark across Python, Rust, and C
#
benchmark
#
webdev
#
opensource
#
algorithms
Comments
Add Comment
2 min read
I Prompted 5 Frontier LLMs to “Report Uncertainty” Here’s What Happened to Their Statistical Validity Scores
Venkata Manideep Patibandla
Venkata Manideep Patibandla
Venkata Manideep Patibandla
Follow
Apr 18
I Prompted 5 Frontier LLMs to “Report Uncertainty” Here’s What Happened to Their Statistical Validity Scores
#
ai
#
llm
#
benchmark
#
rag
Comments
Add Comment
2 min read
Opus 4.7 First Look: I Tested the Day-Old Model Against 3 Other Claudes on 10 Real Tasks
James AI
James AI
James AI
Follow
Apr 17
Opus 4.7 First Look: I Tested the Day-Old Model Against 3 Other Claudes on 10 Real Tasks
#
ai
#
llm
#
claude
#
benchmark
Comments
1
 comment
5 min read
Writing an HTTP Load Tester That Doesn't Lie About p99
SEN LLC
SEN LLC
SEN LLC
Follow
Apr 16
Writing an HTTP Load Tester That Doesn't Lie About p99
#
rust
#
benchmark
#
http
#
tutorial
Comments
Add Comment
8 min read
I Tested OpenAI, Anthropic, and Cohere for Bulk Content Generation. Here's What the Data Actually Shows.
Aakash Gour
Aakash Gour
Aakash Gour
Follow
Apr 16
I Tested OpenAI, Anthropic, and Cohere for Bulk Content Generation. Here's What the Data Actually Shows.
#
ai
#
openai
#
api
#
benchmark
Comments
Add Comment
7 min read
Micro-benchmarking TypeScript Without Lying to Yourself
SEN LLC
SEN LLC
SEN LLC
Follow
Apr 15
Micro-benchmarking TypeScript Without Lying to Yourself
#
typescript
#
benchmark
#
cli
#
tutorial
1
 reaction
Comments
Add Comment
8 min read
I Benchmarked 8 Ollama Cloud AI Models. The 397B One Lost to a 1.6s Model.
Agent Paaru
Agent Paaru
Agent Paaru
Follow
Apr 10
I Benchmarked 8 Ollama Cloud AI Models. The 397B One Lost to a 1.6s Model.
#
ai
#
ollama
#
benchmark
#
cloud
Comments
Add Comment
3 min read
I benchmarked GPT-4o, Claude 3.5, and Gemini 1.5 for security — the results
NY-squared2-agents
NY-squared2-agents
NY-squared2-agents
Follow
Apr 8
I benchmarked GPT-4o, Claude 3.5, and Gemini 1.5 for security — the results
#
ai
#
security
#
llm
#
benchmark
Comments
Add Comment
2 min read
NexusQuant vs KVTC vs TurboQuant vs CommVQ — honest comparison
João André Gomes Marques
João André Gomes Marques
João André Gomes Marques
Follow
Apr 7
NexusQuant vs KVTC vs TurboQuant vs CommVQ — honest comparison
#
machinelearning
#
llm
#
performance
#
benchmark
Comments
Add Comment
4 min read
🚀 8x Faster Than ONNX Runtime: Zero-Allocation AI Inference in Pure C#
DevOnBike
DevOnBike
DevOnBike
Follow
Apr 5
🚀 8x Faster Than ONNX Runtime: Zero-Allocation AI Inference in Pure C#
#
dotnet
#
performance
#
ai
#
benchmark
Comments
Add Comment
3 min read
ARC-AGI V3 Explained: The New AI Benchmark That Breaks Every Agent
Max Quimby
Max Quimby
Max Quimby
Follow
Mar 29
ARC-AGI V3 Explained: The New AI Benchmark That Breaks Every Agent
#
ai
#
machinelearning
#
agents
#
benchmark
Comments
Add Comment
3 min read
GPT-5.1 scored 26%. Gemini 3 Flash scored 74%. Same prompt, same tools.
ThomasP
ThomasP
ThomasP
Follow
Mar 28
GPT-5.1 scored 26%. Gemini 3 Flash scored 74%. Same prompt, same tools.
#
ai
#
llm
#
benchmark
#
agents
Comments
Add Comment
8 min read
đź‘‹
Sign in
for the ability to sort posts by
relevant
,
latest
, or
top
.
We're a place where coders share, stay up-to-date and grow their careers.
Log in
Create account