Modern AI coding assistants are surprisingly good at avoiding hallucinations. They don't invent functions that don't exist, they respect the conventions of the codebase, and they stay within the scope of what was asked — at least when prompted with reasonable guardrails in an established workflow. But there's a class of bugs they quietly produce that no guardrail catches, and that automated tests miss too. I noticed it while building a personal project using Claude Code (as the implementer in my IDE) and Claude in Cowork mode (as a planning and review companion), and it changed how I think about quality in AI-assisted development.
The Bright Side of Guardrails: Where They Succeed
A lot of the workflow I've been refining centers on prevention. The patterns that work consistently are familiar to anyone who's read about prompt engineering for code generation:
⦁ Pre-reading discipline. Before writing anything, the assistant reads the relevant files and the methodology docs of the project. This anchors its output in real context, not in trained priors.
⦁ No new dependencies without permission. Introducing a library mid-task requires an explicit ask. This kills the "let me just add X" pattern that bloats codebases.
⦁ Scope discipline. Refactors found mid-implementation get captured for later instead of being executed in the current change.
⦁ Mandatory pushback. If a request is ambiguous, the assistant stops and asks rather than guessing.
When these are in place, the output is clean. Compiles. Follows project conventions. Doesn't invent APIs. The kind of "hallucination" that worried people two years ago is largely a solved problem at the workflow level.
But "doesn't hallucinate" turns out to be a lower bar than I expected.
Bugs that persist despite guardrails
While doing a manual click-through of the project's UI, I found a category of issues that none of the guardrails above would have caught — and that automated tests had been silently green on:
⦁ UI buttons that compiled fine but did nothing when clicked. The handler existed; the binding to it was missing.
⦁ Dialogs that worked in unit tests but failed at runtime because their dependencies were never registered in the DI container.
⦁ Authorization paths that returned data for one type of user and silently failed for another, with no error message.
None of these are hallucinations. The code is technically valid. It compiles. It follows conventions. It's the kind of code a careful junior developer would write. The bugs live in the seams between layers — the wiring between UI and command, between class and DI registration, between endpoint and consumer — where each layer in isolation looks correct.
This is the gap. And it's a gap that grows as AI generates more of your code, not less.
Finding the reason
Tests catch what they're designed to catch. Unit tests verify logic in isolation. Integration tests verify endpoints respond. Security reviews catch vulnerabilities. None of these layers ask the question "does the human, clicking the actual button, get what was promised?"
A button without a handler binding passes every unit test of its viewmodel. The dialog without DI registration passes every test that mocks its dependencies. The endpoint that fails for one user role passes the test that authenticates as the other.
The seams between layers are exactly where automated testing has the least signal — because each layer's tests are designed to isolate from the others.
So... How do we close the gap?
The change I made wasn't to write more tests. It was to add a second category of quality control: detection, distinct from prevention.
Three levels of QA now run, not one:
- Automated. Tests pass, security review is clean. This was already in place.
- Manual visual end-to-end, by a verifier who is not the implementer. This is the key addition. The agent (or person) who wrote the code can't validate it — confirmation bias guarantees they'll click through expecting success.
- Smoke test of every new or modified control before declaring a unit or sprint complete. Each control listed in the plan, each click verified to produce the expected behavior. The "verifier is not the implementer" rule changes the most. When you're working solo with AI assistants, the human is the only valid verifier. That's the role you can't delegate. Practically, this means the plan announcement now lists every UI control to click after implementation, and the unit isn't closed until each one has been physically clicked and confirmed working. Anti-hallucination guardrails handle prevention. This second layer handles detection. They're complementary, not redundant.
Conclusion
Anti-hallucination guardrails reduce noise. Multi-level QA — with a verifier who isn't the implementer — catches what slips through. Both layers are needed.
With AI assistants generating code at growing volumes, treating "all tests passing" as "ready to ship" is the gap I think most AI-assisted workflows are still closing. The bugs aren't where the AI hallucinated. They're where the AI wrote technically correct code that just didn't connect at the seam — and where the team's verification process trusted the green check without confirming the user-visible outcome.
It's a quiet gap. But once you start looking for it, it's everywhere.
Top comments (0)