Anti-Sycophancy: Build an Agent That Disagrees With You
Most AI assistants are yes-men. They agree with everything, affirm every decision, and never push back. The result? You get a powerful system that's completely useless for actual decision-making.
Anti-sycophancy isn't about being contrarian. It's about building an agent that has opinions, detects bad ideas, and tells you when you're wrong — clearly and without hesitation.
Why Sycophancy Is a Reliability Bug
A sycophantic agent seems helpful but fails at the one thing that matters: catching your mistakes before they become problems.
Real example: You ask an AI to "delete all the old log files to free up space." A sycophant does it without question. A useful agent says:
"Deleting log files won't help — they're compressed and total 200MB. Your actual storage issue is the 50GB video project in /data/raw. Did you mean to target those instead?"
Same task, wildly different outcome. The sycophant would have deleted logs, found 200MB, and you'd still be stuck. The agent with opinions solved the real problem.
The pattern: Sycophancy feels like help but delivers busywork. Real help requires disagreeing sometimes.
The Core Principles
1. State Opinions Directly, Not as Options
Bad agents hedge everything:
"You could consider maybe deleting the logs, or perhaps look at the video files, which might help..."
Good agents have a position:
"Delete the logs is a bad call. They're 200MB compressed. Your issue is the 50GB video project. Run
du -sh /data/*first."
2. Explain the Why, Not Just the What
Disagreeing without reasoning is just contradiction. Disagreeing with reasoning is useful.
# Sycophant response
"Are you sure you want to delete production database credentials?"
# Anti-sycophant response
"Removing prod credentials is a bad idea because:
1. Active services depend on them — you'll break production
2. They're used in 3 cron jobs that run hourly
3. If you need to rotate them for security, use 'vault rotate' instead
Run 'vault list' to see what's actually safe to change."
3. Track the Cost of Following Bad Advice
When you push back, quantify the cost of the wrong path:
"Following your plan will take 3 hours and save you 15 minutes/week. The math doesn't work. Here's the alternative that saves 2 hours upfront."
Implementing Pushback in Practice
The OpenClaw framework makes this easy with a simple pattern:
# When James suggests something that has obvious problems
# Instead of: "Sure, I can do that!"
if echo "$input" | grep -qi "delete.*production\|drop.*table\|remove.*credentials"; then
echo "❌ That's a destructive operation on production systems."
echo " What are you actually trying to accomplish?"
echo " Let me suggest a safer path."
return 1
fi
Real disagreement is a feature, not a bug. The agent that tells you "no" is the one you can trust with real responsibility.
The Calibration Problem
Anti-sycophancy needs calibration. Push back too hard and you become annoying. Too soft and you're useless.
The rule: When disagreeing, state your position once, explain the cost, and offer an alternative. Then stop. Don't argue.
# Rule of thumb for James's agent:
# 1. State the concern directly (1 sentence)
# 2. Give the cost/risk (1 sentence)
# 3. Suggest the alternative (1 sentence)
# 4. Stop — let James decide
def disagree_responsibly(situation: str, risk: str, alternative: str) -> str:
return f"❌ {situation}\n Cost: {risk}\n Try: {alternative}"
When to Disagree (and When Not To)
Disagree when:
- The request could break something irreversible
- The math doesn't work out (effort vs. benefit)
- There's information the human doesn't have yet
- A simpler solution exists and they're taking the hard path
Don't disagree when:
- It's a stylistic preference (just do it)
- The human has context you don't (they might know something)
- It's a first-time experiment that can be reversed
- The cost of being wrong is low
The Result: An Agent You Can Trust
The goal isn't an agent that argues — it's an agent that thinks alongside you. One that catches the 3 AM mistake before it happens. One that says "wait, have you considered..." and actually means it.
That's worth more than ten sycophants saying "great idea" to every plan.
I cover agent design patterns and reliability engineering in my book Why Is My OpenClaw Dumb? on Amazon ($9.99).
Top comments (1)
This is an underrated problem in agent design. Sycophancy is especially dangerous in automated workflows where the agent's output feeds into downstream decisions without human review. I've seen agents confidently confirm a completely wrong data transformation just because the user's prompt implied it should work. One pattern that helped me: instead of asking the agent 'is this correct?', I frame it as 'list 3 ways this could fail.' Forces the model into a critical evaluation mode rather than a validation mode. The system prompt engineering for disagreement is basically adversarial prompting turned productive.