Hey folks, we recently put together a practical benchmark of AI code review tools using real pull requests from five open source projects (Sentry, Cal.com, Grafana, Discourse, and Keycloak).
What we did:
We ran the exact same PRs through four AI code review tools:
• Kodus
• GitHub Copilot
• CodeRabbit
• Cursor BugBot
No extra configuration. No biased tuning.
The focus was on bugs with Critical, High, and Medium severity.
We’d really love to hear your feedback.
Here’s the link: https://kodus.io/en/benchmark-ai-code-review/
Top comments (0)