DEV Community

# benchmark

Posts

đź‘‹ Sign in for the ability to sort posts by relevant, latest, or top.
First Words: LLM Inference on RISC-V

First Words: LLM Inference on RISC-V

Comments
9 min read
Benchmarking llama.cpp on SpacemiT K3: RISC-V AI Cores vs Standard RVV (Part 4)

Benchmarking llama.cpp on SpacemiT K3: RISC-V AI Cores vs Standard RVV (Part 4)

Comments
15 min read
MCP vs CLI for AI Agents: A Real AWS Benchmark (and Why the Popular Narrative Asks the Wrong Question)

MCP vs CLI for AI Agents: A Real AWS Benchmark (and Why the Popular Narrative Asks the Wrong Question)

1
Comments
18 min read
Designing a practical sorting benchmark across Python, Rust, and C

Designing a practical sorting benchmark across Python, Rust, and C

Comments
2 min read
I Prompted 5 Frontier LLMs to “Report Uncertainty” Here’s What Happened to Their Statistical Validity Scores

I Prompted 5 Frontier LLMs to “Report Uncertainty” Here’s What Happened to Their Statistical Validity Scores

Comments
2 min read
Opus 4.7 First Look: I Tested the Day-Old Model Against 3 Other Claudes on 10 Real Tasks

Opus 4.7 First Look: I Tested the Day-Old Model Against 3 Other Claudes on 10 Real Tasks

Comments 1
5 min read
Writing an HTTP Load Tester That Doesn't Lie About p99

Writing an HTTP Load Tester That Doesn't Lie About p99

Comments
8 min read
I Tested OpenAI, Anthropic, and Cohere for Bulk Content Generation. Here's What the Data Actually Shows.

I Tested OpenAI, Anthropic, and Cohere for Bulk Content Generation. Here's What the Data Actually Shows.

Comments
7 min read
Micro-benchmarking TypeScript Without Lying to Yourself

Micro-benchmarking TypeScript Without Lying to Yourself

1
Comments
8 min read
I Benchmarked 8 Ollama Cloud AI Models. The 397B One Lost to a 1.6s Model.

I Benchmarked 8 Ollama Cloud AI Models. The 397B One Lost to a 1.6s Model.

Comments
3 min read
I benchmarked GPT-4o, Claude 3.5, and Gemini 1.5 for security — the results

I benchmarked GPT-4o, Claude 3.5, and Gemini 1.5 for security — the results

Comments
2 min read
NexusQuant vs KVTC vs TurboQuant vs CommVQ — honest comparison

NexusQuant vs KVTC vs TurboQuant vs CommVQ — honest comparison

Comments
4 min read
🚀 8x Faster Than ONNX Runtime: Zero-Allocation AI Inference in Pure C#

🚀 8x Faster Than ONNX Runtime: Zero-Allocation AI Inference in Pure C#

Comments
3 min read
ARC-AGI V3 Explained: The New AI Benchmark That Breaks Every Agent

ARC-AGI V3 Explained: The New AI Benchmark That Breaks Every Agent

Comments
3 min read
GPT-5.1 scored 26%. Gemini 3 Flash scored 74%. Same prompt, same tools.

GPT-5.1 scored 26%. Gemini 3 Flash scored 74%. Same prompt, same tools.

Comments
8 min read
đź‘‹ Sign in for the ability to sort posts by relevant, latest, or top.