DEV Community

AI Security Benchmark Series Series' Articles

Back to Ofri Peretz's Series
I Let Claude Write 80 Functions. 65-75% Had Security Vulnerabilities.
Cover image for I Let Claude Write 80 Functions. 65-75% Had Security Vulnerabilities.

I Let Claude Write 80 Functions. 65-75% Had Security Vulnerabilities.

4
Comments 4
12 min read
The AI Hydra Problem: Fix One AI Bug, Get Two More
Cover image for The AI Hydra Problem: Fix One AI Bug, Get Two More

The AI Hydra Problem: Fix One AI Bug, Get Two More

Comments
12 min read
We Ranked 5 AI Models by Security. The Leaderboard Is Wrong.
Cover image for We Ranked 5 AI Models by Security. The Leaderboard Is Wrong.

We Ranked 5 AI Models by Security. The Leaderboard Is Wrong.

2
Comments
9 min read
Aggregate Benchmarks Lie. Here's What 700 AI Functions Look Like by Security Domain.
Cover image for Aggregate Benchmarks Lie. Here's What 700 AI Functions Look Like by Security Domain.

Aggregate Benchmarks Lie. Here's What 700 AI Functions Look Like by Security Domain.

Comments
12 min read
Claude Wrote a NestJS Service. TypeScript Was Happy. ESLint Found 6 Security Holes.
Cover image for Claude Wrote a NestJS Service. TypeScript Was Happy. ESLint Found 6 Security Holes.

Claude Wrote a NestJS Service. TypeScript Was Happy. ESLint Found 6 Security Holes.

5
Comments 7
10 min read
Same NestJS Prompt. Claude Got 6 Security Errors. Gemini Got 2. Here's What Both Got Wrong.

Same NestJS Prompt. Claude Got 6 Security Errors. Gemini Got 2. Here's What Both Got Wrong.

2
Comments
6 min read
Claude vs Gemini Across 4 Security Domains: A Dead Heat — and the Hardening 63% of AI Code Skips
Cover image for Claude vs Gemini Across 4 Security Domains: A Dead Heat — and the Hardening 63% of AI Code Skips

Claude vs Gemini Across 4 Security Domains: A Dead Heat — and the Hardening 63% of AI Code Skips

6
Comments 6
8 min read