Everyone told me to just use Cursor. So I did the research instead.
A 2025 study by METR ran 246 real developer tasks with AI tools.
**Developers predicted they'd be 24% faster. The actual result? They were 19% slower. **That's not a rounding error — that's the entire value proposition falling apart under real conditions.
Here's what I found when I dug deeper:
Cursor's dirty secret:
Developers are reporting .cursorrules compliance as low as 0 out of 9 tests in agent mode. You spend hours defining architecture constraints. The model ignores them silently. No error. No warning.
Devin's uncomfortable truth:
Independent researchers at Answer.AI tested 20 real tasks. 14 failures. 3 inconclusive. Just 3 successes. One task took a human 36 minutes — Devin spent 6 hours and still failed.
What actually works instead:
Continue.dev — open source, BYO model, config lives in Git where it belongs
OpenHands — human approval before every commit, full action logs, zero vendor lock-in
The pattern is clear: the expensive "magic" tools are optimizing for the demo, not the daily workflow.
I wrote a full breakdown with real data, code snippets, and comparisons — including the exact config setup I use with Continue.dev.
👉 Read the full article here:
https://buildwithclarity.hashnode.dev/firing-cursor-devin-why-i-switched-to-open-source-ai
Top comments (1)
Interesting take — I went deep on this recently and the data is more surprising than most people expect. A 2025 METR study found AI tools actually made experienced developers 19% slower on real tasks. Wrote a full breakdown with code comparisons if anyone's curious: Firing Cursor & Devin — Why I Switched to Open Source AI