The hype is fading, the hallucinations are dropping, and the robots finally feel like teammates instead of toddlers with keyboards.
☕️ Quick Sip Summary
- New models, fewer face-palms. Claude Sonnet 4 (via Cursor), GPT-4.5, and Gemini 2.5 crank hallucinations down to ~15%.
- Speed is real, trust is tricky. A controlled METR study showed a 55% speed boost, but Stack Overflow’s 2025 survey says only 29% of devs actually trust AI output.
- My reality check: AI now handles the boring 80%, but the final 20% is still very human. I’ve never merged a PR without running tests and giving the code a side-eye.
1. The Day My “Intern” Grew Up
Back in early ’24, AI coding tools felt... twitchy. I remember asking it to build a login page and getting something that not only skipped validation but practically whispered "hey, let’s hardcode a password for fun."
But I didn’t quit on it. I kept using it — more cautiously at first — learning where it helped and where it hallucinated. It was like watching a junior dev slowly grow up. The suggestions started making more sense. The bugs showed up less often. And somewhere between dozens of commits and a few thousand prompts, I realized: I was trusting it more than I thought.
Then came May 2025. Cursor integrated Claude Sonnet 4, and everything clicked. I typed: “Scaffold a Nuxt 3 page listing Stripe invoices with a Tailwind table.”
Two lattes later, I had a fully working page: clean props, sensible Tailwind, pagination built-in, no missing imports. It didn’t just look good — it ran.
Why the glow-up?
- The AI Hallucination Benchmark 2025 clocks GPT-4.5 and Sonnet 4 at ~15% hallucinations (down from over 50%).
- The METR study shows AI can make devs 55% faster on focused tasks.
That aligned with what I was feeling: the tool had matured — and maybe so had I.
2. Where AI Shines — And Where It Slips
After months with Sonnet 4, here’s what’s felt like magic — and what still gives me pause:
What works well:
- Boilerplate scaffolding (components, DTOs)
- Writing unit tests
- Repo Q&A like “Where do we parse JWTs?”
- Project-wide refactors
Where I don’t trust it (yet):
- Security-critical flows like auth and crypto
- Perf-sensitive logic
- Legacy spaghetti with zero documentation
- Tasks I haven’t mentally designed yet
If I don’t know exactly what I want, the model will happily hallucinate an entire fantasy architecture. Then I get to debug my own laziness at 2 a.m.
3. My Four-Step Prompt Ritual 🙏
Here’s how I talk to the model now:
- Set the scene. “You're a senior dev experienced in Nuxt and Stripe.”
- Describe the goal. “Implement server-side pagination for /api/invoices.”
- Set the stack. “Nuxt 3, Prisma, PostgreSQL, limit 50 rows, return totalCount.”
- Guide the scope. “Please outline the steps only.”
From there, I review its plan, give feedback, and go step-by-step through implementation. It feels like pair programming — minus the headphone tugging.
4. Trust Issues: Everyone’s Using It, Nobody’s Sleeping Easy
- 80% of devs use AI tools (according to Stack Overflow 2025).
- Only 29% trust the output unedited.
Reddit is full of rants like “Management wants 20% of commits from Copilot.” One post even mentioned execs tracking prompt counts per day.
That’s not how I roll. I’d rather measure features shipped, not lines of AI-assisted code.
5. The Numbers Don’t Lie
One recent feature:
-
Old-school dev time: ~3h 45m
- 45m for scaffolding
- 120m for core logic
- 60m for tests
-
With Cursor + Sonnet 4: ~1h 45m
- 5m prompt for scaffolding
- 90m prompts + tweaks
- 10m for tests
That’s two hours saved — enough for a gym session, or let’s be real, another coffee and a doomscroll.
6. Five Things I Wish I’d Known Sooner
- Design before you prompt. AI isn’t great at mind-reading.
- Break it up. Big tasks become small, accurate prompts.
- Test everything. No green CI, no merge.
- Sleep on major merges. AI optimism is real — and sneaky.
- Don’t ditch juniors. Pair them with AI, then make them explain every line.
7. So… Should You Trust the Robot?
Yes — but only like you’d trust a hyper-literal intern. Brilliant with grunt work. Hopeless with nuance. When I give it structure and oversight, it makes me faster. When I hand it the wheel, it usually drives into a wall of undefined variables.
Your Turn
Are you vibing with the new generation of AI dev tools? Or still fighting ghosts in your PRs?
Drop a story below — horror or happy ending. And if enough folks ask, I’ll share my prompt cheat sheet in the next post.
This article was originally published on Medium:
AI Coding Assistants No Longer Hallucinate — If You Know What You’re Doing
Top comments (0)