The function looked perfect. Clean types, good naming, solid error handling. One problem: the library it imported didn't exist.
Not "wrong version." Not "deprecated." The package @utils/deep-validate has never existed, in any registry, ever. The AI made it up.
Welcome to hallucination debugging. Here's the workflow I use to catch these before they waste my afternoon.
Step 1: Run It Before You Read It
This sounds obvious, but I watch developers carefully read AI-generated code, mentally trace the logic, nod approvingly — and then discover it doesn't compile.
Don't review first. Run first.
# For generated code: try to compile/run immediately
npx tsc --noEmit
# or
node script.js
# or
python -c "import generated_module"
Hallucinations often surface as import errors, undefined references, or type mismatches. A 5-second compile catches what 5 minutes of reading might miss.
Step 2: Verify Every Import and External Reference
This is where most hallucinations hide. The model generates plausible-sounding:
- Package names that don't exist
- API methods that aren't real
- Function signatures that are close but wrong
- Config options that were removed two versions ago
My checklist:
For each import/require:
[ ] Package exists in registry (npm, PyPI, etc.)
[ ] Method/function exists in that package
[ ] Signature matches current version
[ ] No deprecated APIs used without awareness
I keep a terminal open and spot-check as I go:
# Quick existence check
npm info @utils/deep-validate 2>/dev/null || echo "DOES NOT EXIST"
Step 3: Check the "Confident but Wrong" Spots
Hallucinations are sneaky because the model doesn't signal uncertainty. It writes a hallucinated function call with the same confidence as console.log. Look hardest at:
API surface areas. The model often knows a library exists but confuses methods between versions or similar libraries. fs.readFile vs fs.promises.readFile vs fsExtra.readFile — it'll pick one confidently, and it might be wrong.
Default values and edge cases. "This function returns null when the user isn't found" — does it? Or does it throw? The model guesses based on patterns, not documentation.
Environment-specific behavior. "This works on Node 18+" — does it? Models frequently mix up which features landed in which version.
Step 4: Ask the Model to Verify Itself
This is counterintuitive but effective. Give the model its own output and ask it to find problems:
Here's code you generated. Before I use it:
1. List every external dependency and confirm it exists
2. Flag any API calls you're less than 95% confident about
3. Identify assumptions about the runtime environment
Models are better at reviewing code than generating it, even their own. I catch 1-2 additional issues per session with this step.
Step 5: Build a Hallucination Log
Every time I catch a hallucination, I log it:
## Hallucination Log
### 2026-03-28
- Model: Claude
- Type: Non-existent package
- Detail: Generated `import { validate } from '@utils/deep-validate'`
- Fix: Used `zod` for validation instead
- Pattern: Model invents utility packages for common tasks
### 2026-03-25
- Model: GPT-4
- Type: Wrong API signature
- Detail: `fs.readFile(path, 'utf8', callback)` — used callback style in async context
- Fix: `await fs.promises.readFile(path, 'utf8')`
- Pattern: Mixes callback and promise APIs for Node.js fs
After a month of logging, patterns emerge. The model hallucinates differently depending on the domain:
- Package names: Invents utility libraries (~40% of hallucinations)
- API signatures: Close but wrong, especially across versions (~30%)
- Config options: Deprecated or fictional settings (~20%)
- Logic: Actual wrong algorithms (~10%)
Knowing the distribution tells you where to focus your verification energy.
The Meta-Lesson
AI hallucinations aren't random. They follow patterns — and patterns are debuggable. Treat hallucination detection like you'd treat any other testing discipline: systematic, logged, and improving over time.
The goal isn't to stop using AI because it hallucinates. It's to build a workflow where hallucinations get caught in the first 30 seconds instead of after you've built three more features on top of them.
Quick Reference
| Step | Time | Catches |
|---|---|---|
| Run before reading | 5 sec | Compile errors, missing deps |
| Verify imports | 2 min | Fake packages, wrong APIs |
| Check confident spots | 3 min | Version mismatches, wrong defaults |
| Self-review prompt | 1 min | Subtle logic issues |
| Log the hallucination | 1 min | Future pattern recognition |
Total: about 7 minutes per AI generation. Saves hours of debugging code built on hallucinated foundations.
What's the most creative hallucination you've encountered? I once got a model to generate a complete REST client for an API that existed only in its training data dreams.
Top comments (0)