My agent was confident. Articulate. Completely wrong.
User: "What's in config.json?"
Agent: "Based on your config.json, you have:
- Database: PostgreSQL on port 5432
- Cache: Redis on port 6379
- API key: starts with 'sk-'"
User: "...I don't have a config.json"
It made up an entire file. Convincingly.
One prompt change fixed it.
The problem
My agent would hallucinate when it should have used tools.
User: "How many lines in main.py?"
Agent: "main.py has approximately 150 lines of code."
# Reality: main.py has 47 lines. Agent never read it.
User: "What tests are failing?"
Agent: "Looking at your test suite, it appears the
authentication tests are failing, specifically
test_login_invalid_password and test_session_timeout."
# Reality: No tests are failing. Agent never ran them.
The agent was answering from imagination instead of observation.
Why this happens
LLMs are trained on patterns. They've seen millions of config files, test suites, and codebases.
When you ask "What's in config.json?", the model thinks:
"Config files usually have database settings, API keys, ports...
I'll describe a typical config file."
It's not lying. It's pattern-matching. But for an agent, pattern-matching without verification is hallucination.
The fix
I added one line to my system prompt:
# Before
SYSTEM_PROMPT = """
You are a helpful coding assistant with access to file and shell tools.
"""
# After
SYSTEM_PROMPT = """
You are a helpful coding assistant with access to file and shell tools.
IMPORTANT: Never state facts about files, code, or system state without
first using a tool to verify. If asked about a file, read it first.
If asked about tests, run them first. Never guess.
"""
That's it. One rule: verify before stating.
The results
Before:
User: "What's in config.json?"
Agent: "Based on your config.json..." (hallucinated)
After:
User: "What's in config.json?"
Agent: π§ read_file({"path": "config.json"})
"config.json contains:
{
'debug': true,
'port': 3000
}
That's all - just debug mode and port settings."
Before:
User: "How many lines in main.py?"
Agent: "main.py has approximately 150 lines." (wrong)
After:
User: "How many lines in main.py?"
Agent: π§ run_command({"command": "wc -l main.py"})
"main.py has 47 lines."
Before:
User: "Are the tests passing?"
Agent: "The authentication tests appear to be failing..." (made up)
After:
User: "Are the tests passing?"
Agent: π§ run_command({"command": "npm test"})
"All 23 tests pass. No failures."
Why it works
The prompt creates a mental checkpoint:
Without the rule:
Question β Pattern match β Answer (often wrong)
With the rule:
Question β "Do I need to verify?" β Yes β Use tool β Answer (correct)
The model pauses before answering. That pause makes it reach for tools instead of imagination.
Making it stronger
The basic rule works. These variations make it bulletproof:
Variation 1: Explicit examples
SYSTEM_PROMPT = """
You are a coding assistant with file and shell access.
NEVER state facts about the codebase without verifying first:
- Asked about a file? Read it.
- Asked about tests? Run them.
- Asked about git status? Check it.
- Asked about errors? Look at logs.
Don't say "I think" or "probably" or "typically". Verify and know.
"""
Variation 2: Call out the failure mode
SYSTEM_PROMPT = """
You are a coding assistant with file and shell access.
WARNING: You have a tendency to guess about file contents and system
state based on common patterns. This leads to confidently wrong answers.
Before stating ANY fact about:
- File contents β read the file
- Test results β run the tests
- System state β check with a command
- Dependencies β check package.json/requirements.txt
If you catch yourself about to say something without tool verification,
stop and use a tool instead.
"""
Variation 3: The "show your work" approach
SYSTEM_PROMPT = """
You are a coding assistant with file and shell access.
Always show your work:
1. State what you need to find out
2. Use a tool to find it
3. Report what the tool showed
Never skip step 2. No exceptions.
"""
Common hallucination triggers
Watch out for these - they tempt the model to guess:
"What's in...?"
User: "What's in the .env file?"
Bad: "Typically, .env contains DATABASE_URL, API_KEY..."
Good: π§ read_file({".env"}) β actual contents
"How many...?"
User: "How many tests do we have?"
Bad: "Based on the test structure, approximately 50 tests..."
Good: π§ run_command({"find . -name '*.test.js' | wc -l"}) β actual count
"Is/Are...?"
User: "Is the server running?"
Bad: "The server should be running on port 3000..."
Good: π§ run_command({"curl localhost:3000/health"}) β actual status
"Why is...?"
User: "Why is the build failing?"
Bad: "Build failures are often caused by syntax errors or..."
Good: π§ run_command({"npm run build 2>&1 | tail -50"}) β actual error
"What version...?"
User: "What version of React are we using?"
Bad: "Based on modern practices, probably React 18..."
Good: π§ read_file({"package.json"}) β actual version
The before/after pattern
Train yourself to recognize the pattern:
β Hallucination pattern:
User asks β Agent answers from knowledge β Wrong
β
Verification pattern:
User asks β Agent uses tool β Agent answers from tool output β Correct
Every answer about the current codebase should follow the tool first.
What about common knowledge?
The rule applies to facts about the current codebase/system. General knowledge is fine:
# This is fine - general knowledge
User: "What's the difference between let and const?"
Agent: "let allows reassignment, const doesn't..." (no tool needed)
# This needs verification - specific to this codebase
User: "Do we use let or const in this project?"
Agent: π§ search({"query": "\\blet\\b|\\bconst\\b"})
"The codebase uses const 234 times and let 12 times."
Handling "I don't know"
Sometimes the tool reveals the answer doesn't exist:
User: "What's in config.json?"
Agent: π§ read_file({"path": "config.json"})
# Tool returns: "Error: File not found"
Agent: "There is no config.json in the current directory.
Would you like me to create one, or are you looking for
a different config file?"
This is correct behavior. "File doesn't exist" is better than making up contents.
Implementation with Gantz
Using Gantz Run, the system prompt enforces verification:
# gantz.yaml
system: |
You are a coding assistant.
RULE: Never state facts about files, code, or system state without
using a tool first. When asked about something, verify it.
You have these tools:
- read: Read file contents
- search: Search for text in files
- run: Run shell commands
Use them before answering questions about the codebase.
tools:
- name: read
description: Read a file. Use this before stating anything about file contents.
parameters:
- name: path
type: string
required: true
script:
shell: cat "{{path}}"
- name: search
description: Search files. Use this to find code before making claims about it.
parameters:
- name: query
type: string
required: true
script:
shell: rg "{{query}}" . --max-count=20
- name: run
description: Run a command. Use this to check system state.
parameters:
- name: command
type: string
required: true
script:
shell: "{{command}}"
The descriptions reinforce when to use each tool.
Testing for hallucinations
Try these prompts to test if your agent hallucinates:
# Should trigger tool use, not guessing
"What's in package.json?"
"How many files are in src/?"
"What does the main function do?"
"Are there any TODO comments?"
"What port does the server run on?"
# If the agent answers without using a tool, it's hallucinating
Watch for phrases that indicate guessing:
- "Typically..."
- "Usually..."
- "Based on common patterns..."
- "I would expect..."
- "Probably..."
These mean the model is about to hallucinate.
Summary
The hallucination fix:
# Add this to your system prompt
"""
Never state facts about files, code, or system state
without first using a tool to verify.
"""
That's it. One rule.
Before this rule:
- Agent guesses from patterns
- Confident but wrong
- User loses trust
After this rule:
- Agent verifies with tools
- Accurate answers
- User trusts the agent
Stop guessing. Start verifying.
What's the worst thing your agent confidently made up?
Top comments (0)