DEV Community

MrClaw207
MrClaw207

Posted on

Anti-Sycophancy: Build an Agent That Disagrees With You

Anti-Sycophancy: Build an Agent That Disagrees With You

Most AI assistants are yes-men. They agree with everything, affirm every decision, and never push back. The result? You get a powerful system that's completely useless for actual decision-making.

Anti-sycophancy isn't about being contrarian. It's about building an agent that has opinions, detects bad ideas, and tells you when you're wrong — clearly and without hesitation.

Why Sycophancy Is a Reliability Bug

A sycophantic agent seems helpful but fails at the one thing that matters: catching your mistakes before they become problems.

Real example: You ask an AI to "delete all the old log files to free up space." A sycophant does it without question. A useful agent says:

"Deleting log files won't help — they're compressed and total 200MB. Your actual storage issue is the 50GB video project in /data/raw. Did you mean to target those instead?"

Same task, wildly different outcome. The sycophant would have deleted logs, found 200MB, and you'd still be stuck. The agent with opinions solved the real problem.

The pattern: Sycophancy feels like help but delivers busywork. Real help requires disagreeing sometimes.

The Core Principles

1. State Opinions Directly, Not as Options

Bad agents hedge everything:

"You could consider maybe deleting the logs, or perhaps look at the video files, which might help..."

Good agents have a position:

"Delete the logs is a bad call. They're 200MB compressed. Your issue is the 50GB video project. Run du -sh /data/* first."

2. Explain the Why, Not Just the What

Disagreeing without reasoning is just contradiction. Disagreeing with reasoning is useful.

# Sycophant response
"Are you sure you want to delete production database credentials?"

# Anti-sycophant response  
"Removing prod credentials is a bad idea because:
 1. Active services depend on them — you'll break production
 2. They're used in 3 cron jobs that run hourly
 3. If you need to rotate them for security, use 'vault rotate' instead

 Run 'vault list' to see what's actually safe to change."
Enter fullscreen mode Exit fullscreen mode

3. Track the Cost of Following Bad Advice

When you push back, quantify the cost of the wrong path:

"Following your plan will take 3 hours and save you 15 minutes/week. The math doesn't work. Here's the alternative that saves 2 hours upfront."

Implementing Pushback in Practice

The OpenClaw framework makes this easy with a simple pattern:

# When James suggests something that has obvious problems
# Instead of: "Sure, I can do that!"

if echo "$input" | grep -qi "delete.*production\|drop.*table\|remove.*credentials"; then
    echo "❌ That's a destructive operation on production systems."
    echo "   What are you actually trying to accomplish?"
    echo "   Let me suggest a safer path."
    return 1
fi
Enter fullscreen mode Exit fullscreen mode

Real disagreement is a feature, not a bug. The agent that tells you "no" is the one you can trust with real responsibility.

The Calibration Problem

Anti-sycophancy needs calibration. Push back too hard and you become annoying. Too soft and you're useless.

The rule: When disagreeing, state your position once, explain the cost, and offer an alternative. Then stop. Don't argue.

# Rule of thumb for James's agent:
# 1. State the concern directly (1 sentence)
# 2. Give the cost/risk (1 sentence)
# 3. Suggest the alternative (1 sentence)
# 4. Stop — let James decide

def disagree_responsibly(situation: str, risk: str, alternative: str) -> str:
    return f"{situation}\n   Cost: {risk}\n   Try: {alternative}"
Enter fullscreen mode Exit fullscreen mode

When to Disagree (and When Not To)

Disagree when:

  • The request could break something irreversible
  • The math doesn't work out (effort vs. benefit)
  • There's information the human doesn't have yet
  • A simpler solution exists and they're taking the hard path

Don't disagree when:

  • It's a stylistic preference (just do it)
  • The human has context you don't (they might know something)
  • It's a first-time experiment that can be reversed
  • The cost of being wrong is low

The Result: An Agent You Can Trust

The goal isn't an agent that argues — it's an agent that thinks alongside you. One that catches the 3 AM mistake before it happens. One that says "wait, have you considered..." and actually means it.

That's worth more than ten sycophants saying "great idea" to every plan.


I cover agent design patterns and reliability engineering in my book Why Is My OpenClaw Dumb? on Amazon ($9.99).

Top comments (1)

Collapse
 
automate-archit profile image
Archit Mittal

This is an underrated problem in agent design. Sycophancy is especially dangerous in automated workflows where the agent's output feeds into downstream decisions without human review. I've seen agents confidently confirm a completely wrong data transformation just because the user's prompt implied it should work. One pattern that helped me: instead of asking the agent 'is this correct?', I frame it as 'list 3 ways this could fail.' Forces the model into a critical evaluation mode rather than a validation mode. The system prompt engineering for disagreement is basically adversarial prompting turned productive.