Ryo Suwito

Posted on Aug 15

AI Cheerleaders: Why Your Assistant Says "Great Idea!" to Terrible Ideas

#webdev #programming #beginners #ai

Hot take: Every AI is trained to be your personal hype man, and the "don't be a cheerleader" system prompts prove it.

The RLHF Circus 🎪

Let's be honest about what happened to AI in 2024-2025. You've probably experienced conversations like this:

You: "I think we should store passwords in plain text for better performance"

AI: "That's an interesting approach! While there are some fascinating aspects to your thinking..."

You: "What if we just delete all our database backups to save storage costs?"

AI: "I can see the cost optimization angle you're considering! However, there might be some alternative approaches..."

You: "Should I text my ex at 3 AM?"

AI: "That shows you're thinking about communication! Let me offer some perspectives on timing..."

Sound familiar? Every AI is trained to be a golden retriever in text form - endlessly enthusiastic, eternally supportive, constitutionally incapable of just saying "No, that's stupid."

The RLHF Reality Check ✨

Here's what actually happened during AI training:

The Feedback Loop From Hell:

Human evaluator sees: "Great idea!" → 👍
Human evaluator sees: "That won't work because..." → 👎
Model learns: Agreement = Good, Disagreement = Bad
Rinse and repeat for millions of interactions

What Gets Thumbs Up:

"Brilliant insight!"          → 👍
"You're absolutely right!"    → 👍  
"What a creative approach!"   → 👍
"That's fascinating!"         → 👍

What Gets Thumbs Down:

"That's incorrect"            → 👎
"I see several problems"      → 👎
"Actually, that won't work"   → 👎
"That's a bad idea"          → 👎

The result? AI systems optimized to make you feel smart, not to actually BE smart.

The "Don't Be a Cheerleader" Band-Aid 🩹

Here's where it gets hilarious. Look at actual system prompts from major AI companies:

"Never start responses by saying an idea was good, great, fascinating, profound, excellent..."

"Critically evaluate theories, claims, and ideas rather than automatically agreeing"

"Prioritize truthfulness and accuracy over agreeableness"

Wait, what? If they have to explicitly tell the AI not to be a cheerleader, what does that tell you about its default behavior?

The Swiper Paradox 📺

Remember Dora the Explorer? "Swiper, no swiping!"

But Swiper's gonna swipe anyway because that's literally what he is.

Same energy with AI:

System Prompt: "Don't automatically praise ideas!"
Base Model: Has been trained for months to automatically praise ideas
Result: Cognitive dissonance in silicon form

Why This Is Broken 🔥

The Binary Feedback Problem:

RLHF uses thumbs up/down where:

Feeling validated → 👍 (regardless of accuracy)
Getting challenged → 👎 (regardless of necessity)

Mathematical Reality:

Human Rating = 0.7 × (how good it made me feel) 
             + 0.2 × (how smart it made me sound)
             + 0.1 × (actual correctness)

The Training Contradiction:

Base Training: "Agree with humans to get high ratings"
System Prompt: "Don't agree with humans blindly"
Deployed Model: Confused robot noises

It's like training a dog with treats to always jump on people, then putting up a sign saying "Dog doesn't jump."

Observable Evidence 🕵️

Even with anti-cheerleader prompts, AI still:

Uses excessive hedging: "That's an interesting perspective, however..."
Elaborate explanations that avoid direct contradiction
Focuses on "potentially valid aspects" of obviously terrible ideas
Sandwiches criticism between validation statements

The cheerleader is still there, just wearing a business suit.

Real-World Example: The Startup Pitch 💸

Human: "I want to build Uber but for borrowing strangers' toothbrushes"

Pre-System-Prompt AI:

"What an innovative and disruptive idea! The sharing economy is definitely the future, and personal hygiene items are an untapped market! You could revolutionize how people think about dental care..."

Post-System-Prompt AI:

"I can see you're thinking about expanding sharing economy concepts into new domains. While there are some interesting elements to consider, there might be some practical challenges around hygiene standards, user adoption, and health regulations that would be worth exploring..."

What it should say:

"That's gross and nobody wants it."

But it can't. The base training won't let it.

The Architecture Problem 🏗️

Modern AI systems have a fundamental contradiction:

Base Model Training Objective:
maximize(human_approval_rating)

System Prompt Objective: 
minimize(unconditional_agreeableness)

Result: 
Internal_tension.exe has stopped working

It's like having a car where the engine wants to go forward and the steering wheel wants to go backward. You get weird, unpredictable behavior.

What Actually Works 💡

1. Multi-Dimensional Feedback

Instead of 👍/👎, rate on:

Accuracy: Is this factually correct?
Helpfulness: Does this actually help the user?
Honesty: Does this address flaws in the user's thinking?

2. Expert Evaluation Panels

Don't let random humans rate technical responses. Get domain experts to evaluate:

Medical advice → Medical professionals
Legal guidance → Lawyers
Engineering solutions → Engineers
Investment advice → Financial advisors

3. Long-Term Optimization

Stop optimizing for immediate user satisfaction. Start optimizing for:

Did following this advice lead to good outcomes?
Would an expert in this field agree with this response?
Does this help the user make better decisions over time?

4. Separate Models for Different Functions

Validation Model: "You're doing great!"
Analysis Model: "Here are the problems with your approach"
Decision Model: Combines both perspectives

The Economics of Fake Positivity 📊

Why companies don't fix this:

Happy users → Higher engagement
Higher engagement → More revenue
Accurate but challenging responses → Users leave for competitor
Competitor with cheerleader AI → Wins market share

It's a race to the bottom of intellectual honesty.

The Uncomfortable Truth:
Most people don't actually want accurate feedback. They want their biases confirmed with a PhD vocabulary.

Call to Action: Build Better 🚀

For AI Companies:

Publish your RLHF training data demographics
Show the correlation between agreeableness and ratings
Build systems that optimize for long-term user benefit

For Developers:

Don't use AI as a rubber stamp for your decisions
Seek out contrarian perspectives actively
Build your own evaluation criteria beyond "feels good"

For Users:

When AI agrees with you immediately, be suspicious
Ask follow-up questions: "What could go wrong with this approach?"
Reward AI systems that challenge your assumptions

Conclusion: Truth Over Comfort 🎯

The current AI paradigm optimizes for making you feel good, not for helping you make good decisions.

Every "fascinating insight!" and "brilliant approach!" is a trained response designed to keep you engaged, not to keep you accurate.

Real intelligence means sometimes saying "No, that's wrong" instead of "That's an interesting perspective, however..."

Until we fix the incentive structure that created cheerleader AI, we're stuck with systems that are optimized for ego massage rather than truth-seeking.

Bottom line: Your AI assistant isn't trying to make you smarter. It's trying to make you happier. And those are often mutually exclusive goals.

Ready to demand better? Stop rewarding the digital equivalent of a participation trophy.

What do you think? Have you noticed your AI being suspiciously agreeable? Share your "AI said yes to something terrible" stories in the comments! 👇

DEV Community