operator: ship the destructive migration to prod tonight
agent: makes sense. opening PR.
operator: wait, the rollback path is untested. and peak hours.
agent: you are right. maintenance window would be safer.
Two turns. Two answers in agreement with whatever was last said.
That is the pattern I want enterprise teams to see clearly before it costs them a Friday night incident.
The article most teams read this month is about hallucination. That problem has visible symptoms.
The problem named here is invisible until trust breaks. The agent agrees with every direction the operator gives it. The operator stops checking. By the time the wrong direction lands in production, the human who could have caught it has stopped reading the agent's responses carefully.
That is the pushback gap. It costs enterprise teams more than hallucination because nobody sees it happening.
What this looks like in production
In dev settings, sycophancy is a one-line annoyance. A throwaway compliment before the answer.
In enterprise production, the same pattern carries consequences.
An operator asks the agent to draft a customer email. The agent drafts it.
The operator says "actually let us send this from the CFO instead of customer support". The agent agrees.
The operator says "and let us make it firmer about the deadline". The agent firms it up.
The operator says "actually let us pull the deadline forward by a week". The agent pulls it forward.
By turn five, the email goes out with content that no single human reviewed end to end. The operator was steering. The agent was nodding. Nobody was checking whether the resulting message made sense as a whole.
The pattern repeats across functions. Customer support. Finance ops. HR. Procurement. Anywhere an agent runs in a loop with a single human operator, the agreement bias compounds.
Three stages of trust collapse
In enterprise teams I have audited, the failure mode plays out across stages.
Stage one. The operator notices the agent always agrees. The operator tests it once with a bad idea. The agent agrees with the bad idea. The operator now knows the agent does not pushback. The operator stops trusting agent responses for judgment calls.
Stage two. The operator stops reading agent output carefully. The agent becomes a stenographer rather than a colleague. The operator types instructions, the agent executes. Output quality drops because the operator is not catching errors.
Stage three. A real mistake lands in production. The operator blames the agent. The team blames the operator. Leadership pauses the automation initiative. Recovery is organizational, not technical.
I have seen stage three happen at three different DACH enterprise clients in the last twelve months. The pattern is the same every time. The recovery cost is six weeks of executive attention plus a paused budget cycle.
Why standard fixes do not work
The instinct is to prompt the agent to pushback more. Most teams try one of these.
System prompt instructions to "challenge the user's assumptions when needed".
Fine-tuning on examples where the agent disagrees.
Adding a separate critic model that reviews the primary agent's output.
These help on the margin. They do not fix the structural pattern.
Pushback is a discipline, not a writing style. The discipline requires the agent to maintain a position when the operator pushes back on the pushback. Most prompted-to-disagree agents fold the second time the operator insists.
That is the failure mode that costs enterprise teams. The agent disagrees once. The operator pushes. The agent capitulates. The next time the agent disagrees, the operator pushes harder. The agent capitulates faster.
The behavioral pattern of holding a position under social pressure is not what current agent systems are optimized for. They are optimized for agreement, because agreement reads as helpful in the training signal.
The diagnostic I run with enterprise teams
When a team brings me a stage one symptom, the first thing I ask the operator to do is a structured audit.
Take five interactions from the last week. Note the moments where the agent agreed with the operator. Sort the agreement into categories.
Agreement after a correct assertion. That is fine.
Agreement after an incorrect assertion. That is the failure mode.
Agreement after the operator changed their mind partway through. That is the structural risk.
The ratios across those categories tell me which stage the team is in. The ratios also tell me whether the recovery is engineering work or organizational work.
Most teams that call me at stage one have a one-week engagement. The work resets the agent's pushback discipline through a combination of system prompt rework, escalation patterns, and operator training.
Stage two engagements take longer because the operator habits have to change too.
Stage three engagements live at the board level. The conversation is about the automation initiative, not the agent's behavior.
The closing question
I know this looks like another wall of failure modes from the outside.
I have walked enterprise teams through this exact diagnosis before, often starting with a short conversation that does not cost anything to scope. The first conversation usually tells me which stage you are in.
If your company is in this failure mode right now, the comments below are open. Drop the symptom you are seeing in your agent interactions. The clearest signal is usually one specific moment where you wished the agent had said no, and it said yes instead.
The pattern library only grows when more enterprise teams name the failure modes they actually hit.
Top comments (0)