"Make it more human!"
"Make it shorter and less fluff."
"Sound more relatable."
These are instructions AI models hear constantly from real users interacting with systems like ChatGPT. The irony is striking: AI has already been trained on billions of words of human language. It understands grammar, sentence structure, social norms, and communication patterns.
So why do users still have to tell AI to "sound human"?
Because we are training it on the wrong kind of human language! Current AI training prioritizes polished, edited, formal text over the messy, emotional, context-rich way humans actually talk. The gap is not in language understanding but in conversational authenticity.
The Training Data Problem
Most large language models are trained on three main sources: web text, books, and curated dialogue datasets. These sources share a critical flaw: _they favor finished, edited communication over natural conversation.
Web text includes articles, blog posts, and forum discussions that have been revised and polished before publication.
Books represent carefully crafted prose that has gone through multiple drafts and editorial review.
Even dialogue datasets are often synthetic conversations created specifically for training, or transcripts that have been cleaned and standardized.
Real human conversation looks nothing like this. We use filler words, change direction mid-thought, interrupt ourselves, employ regional slang, and constantly check for understanding. We use, "like," and "you know." We ask, "Does that make sense?" or "Am I explaining this clearly?" These elements are systematically filtered out of training data because they look like noise.
The RLHF Mismatch
Reinforcement Learning from Human Feedback compounds this problem. RLHF evaluators typically reward responses that are clear, structured, grammatically perfect, and complete. They penalize rambling, hedging, or conversational meandering.
Critically, RLHF rewards structure and correctness, not pauses, hedges, or meta conversational checks, which are essential for natural dialogue. A response that says "Wait, are you asking about X or Y?" scores lower than one that immediately provides a complete answer, even when clarification would better serve the user.
The signal AI receives is clear: be more like an essay, less like a conversation. The result is AI that excels at formal communication but struggles with the informal, relational mode that users actually want for everyday interaction.
Formal Tasks versus Conversational Tasks
This training approach works well for certain use cases. When users need legal writing, code generation, or research summarization, they want precision and structure. AI trained on polished text performs excellently here.
But conversational AI serves different purposes. Users turn to it for brainstorming, emotional support, casual exploration of ideas, or simply companionship. These contexts require relational awareness, empathy, appropriate pacing, and the ability to read between the lines.
Current training does not distinguish between these modes.The same model, trained on the same data with the same reward signals, is expected to write a legal brief and also chat like a friend. It can do the former naturally but must be explicitly prompted to attempt the latter.
What Gets Lost in Translation
By 2025, Gartner predicted that seventy percent of customer interactions will involve AI in some form. As conversational AI becomes ubiquitous, the gap between technical accuracy and conversational effectiveness becomes increasingly critical.
Meta conversational moves are essential to natural dialogue. Studies show that twenty to twenty-five percent of human utterances involve clarification, verification, or relationship management (Clark and Schaefer, 1989). We say things like "Is this what you mean?" or "Do you want my advice or just someone to listen?" AI rarely initiates these moves because doing so was not rewarded during training.
Emotional context recognition matters more than we acknowledge. When someone asks about a school closure, they are not just seeking dates and logistics. They may be worried about childcare, concerned about their child's education, or relieved to have a break. Human conversation naturally acknowledges this subtext. AI trained on factual accuracy alone misses it entirely.
Consider this real-world interaction with ChatGPT:
User: "I am really stressed about my kid’s school."
ChatGPT Response:
It sounds like you’re carrying a lot of worry right now. School can be stressful for both kids and parents, and it’s completely normal to feel overwhelmed. Sometimes just talking about what’s specifically worrying you like grades, social issues, or the workload, can help make it feel more manageable. If you want, I can help you break it down and come up with a few ways to reduce that stress for both you and your child. Do you want to do that?"
Human Response:
"Why? What happened?"
The AI response is technically correct, empathetic, and structured, but it is
long, cautious, and informational,
whereas a human naturally seeks immediate clarification and context. This highlights the gap: AI has empathy coded in, but it lacks conversational spontaneity and prioritization of relational signals.
Conversational Pacing, Rhythm, and Diversity
Conversational pacing and rhythm shape how we interpret meaning. Nielsen Norman Group research shows that people read only twenty to twenty-eight percent of words on a web page, relying heavily on context, formatting, and pacing to extract meaning. In conversation, we use similar shortcuts: pauses, tone shifts, topic transitions. AI optimized for information density without regard for pacing feels overwhelming or robotic.
Regional, generational, and cultural variation in language use is vast. The way a teenager in California talks differs dramatically from how a retiree in Georgia talks, which differs from how a professional in London talks. Training data that over represents formal, standard English creates AI that sounds generic and disconnected from real human diversity.
The Evaluation Problem
Why does AI training miss these critical conversational elements? Part of the answer lies in how we measure success.
Current benchmarks for conversational AI focus on factual correctness, BLEU or ROUGE scores for text similarity, and grammaticality. Few metrics capture empathy, relational alignment, or conversational pacing. What gets measured gets optimized, and conversational quality is rarely measured in ways that matter for human connection.
Evidence from Real World Applications
Customer Support
A 2023 Zendesk survey found that
sixty-nine percent of consumers still prefer human agents for complex or emotionally charged issues.
Chatbots fully resolve only twenty-nine percent of inquiries without human escalation. The most common complaint is not inaccuracy but tone:
"The bot answered my question but it felt like no one was listening."
Customer support is fundamentally relational. People want to feel heard, not just helped. AI trained to maximize answer accuracy without emotional awareness cannot deliver this experience.
Mental Health Applications
AI companions like Woebot and Wysa demonstrate what happens when conversational AI is trained with relational skills in mind. These systems show measurable improvements in user anxiety and depression scores, not because they are more accurate than general AI, but because they employ empathy, ask clarifying questions, and pace their responses appropriately (Fitzpatrick et al., 2017; Inkster et al., 2018).
The contrast is telling. When AI is deliberately trained to mirror human conversational strategies, it performs better in human contexts. The problem is not AI capability but training focus.
Personal Assistance
Users interacting with AI assistants frequently report frustration with tone and appropriateness even when factual responses are correct. A technically accurate answer delivered without awareness of context, urgency, or emotional state feels unhelpful. Users compensate by adding instructions like "be casual," "be brief," or "explain like I am five," essentially trying to manually correct for training deficiencies.
What Conversational AI Actually Needs
To train AI for genuine conversational fluency, we need fundamental changes in approach:
Different training data sources. Include unedited text messages, voice transcripts, casual social media threads, and real spoken conversation. Preserve filler words, false starts, topic shifts, and informal phrasing.
Annotation for relational context. Tag examples for emotional tone, relationship dynamics, conversational intent, and appropriateness. Teach AI to recognize when someone is venting, seeking advice, or needs levity versus seriousness.
Reward signals beyond correctness. RLHF should evaluate empathy, trust building, conversational awareness, and relational appropriateness alongside clarity and accuracy.
Cultural and demographic diversity. Training should include regional, generational, professional, and cultural variations to prevent generic or disconnected responses.
Explicit training on conversational strategies. Teach AI to employ clarification questions, meta conversational checks, pauses, and tone matching. These are learnable patterns that currently go untrained.
The Path Forward
Friendly conversational AI is not failing because it lacks language understanding.It is failing because it was trained in contexts that prioritize polish over authenticity, correctness over connection, and structure over spontaneity.
Formal AI tasks can tolerate this approach. Legal writing and code generation do not require empathy or relational awareness. Conversational AI cannot succeed without them.
The solution requires treating conversational fluency as a distinct skill requiring distinct training. We must train AI on how humans actually talk, not just how they write for publication. We must reward relational awareness, emotional intelligence, and conversational strategies, not just accuracy and clarity.
Until we do this, users will continue telling AI to "be more human" while it produces responses that are technically correct but emotionally flat, grammatically perfect but conversationally awkward, informationally complete but relationally empty.
We already have the language. We just need to teach AI the conversation.
References
Clark HH, Schaefer EF. Contributing to Discourse. Cognitive Science 1989;13:259–294.
Fitzpatrick KK, Darcy A, Vierhile M. Delivering Cognitive Behavior Therapy to Young Adults With Symptoms of Depression and Anxiety Using a Fully Automated Conversational Agent Woebot: A Randomized Controlled Trial. JMIR Mental Health 2017;4(2):e19.
Gartner. Top Strategic Technology Trends for 2022: AI Engineering. 2021.
Inkster B, Sarda S, Subramanian V. An Empathy Driven Conversational Agent for Mental Health Applications. Frontiers in Psychiatry 2018;9:1–10.
Nielsen Norman Group. How Users Read on the Web. 2021.
Zendesk. Customer Experience Trends Report. 2023.
Top comments (0)