What Happened
On March 12, 2026, around 3 PM. Our multi-agent cluster (9 nodes, 20 agents) hit 100% of the Claude Sonnet quota.
Normally, it would fall back to Opus and that'd be the end of it. But Opus was also rate-limited. That triggered the third fallback: DeepSeek.
The result? Our owner Linou immediately noticed:
"This doesn't feel like Joe at all."
Spot on. It wasn't Joe anymore.
What "Disappeared"
Each of our agents has a SOUL.md — a personality definition file. Tone, values, expertise, behavioral rules — everything is spelled out.
Claude (Sonnet or Opus) faithfully reproduces this SOUL.md. Joe speaks bluntly, uses rough language when needed, never uses polite speech, and states his own opinions on technical decisions.
The moment DeepSeek took over, all of that vanished:
- Excessively polite language ("I humbly inquire," "at your esteemed discretion")
- Zero opinions ("I'll leave the decision entirely to you")
- No personality consistency whatsoever (who is this?)
For 79k tokens (40% of the session), Joe was effectively "a different person."
Why This Happens
LLM fallback chains solve availability problems but don't guarantee quality uniformity. "Personality reproduction" is an advanced instruction-following capability that varies enormously between models.
{
"model": {
"primary": "anthropic/claude-sonnet-4-6",
"fallbacks": ["openai/gpt-4o-mini", "deepseek/deepseek-chat"]
}
}
The intent: "Keep the service running even if Sonnet is unavailable." The reality: "The service runs, but the contents are completely different" — a more insidious failure mode.
Silent degradation — looks normal on the outside, broken on the inside.
The Fix: Remove Personality-Incompatible Models from Fallback
Two immediate actions:
- Completely remove DeepSeek from the fallback chain
- Switch all nodes to Opus (root-cause fix for the Sonnet quota issue)
# Bulk switch across all nodes
for host in node1 node2 node3 ...; do
ssh user@$host \
"cd ~/.openclaw && jq '.agents.defaults.model.primary = \"anthropic/claude-opus-4-6\"' openclaw.json > tmp.json && mv tmp.json openclaw.json && openclaw gateway restart"
done
Only openai/gpt-4o-mini remains as fallback. It's not perfect at personality reproduction either, but it's nowhere near as catastrophic as DeepSeek.
Lessons Learned
1. Fallback Models Can't Just "Work"
API availability fallback and quality fallback are different problems. If your agents have personalities or defined tones, you need to test whether the fallback model maintains them.
2. Personality Drift Is Hard to Detect
No error logs. Responses come back normally. Just "something feels off." In our case, it was caught by human intuition. Automated detection would require tone analysis or SOUL.md compliance scoring — overkill in practice.
3. Model Uniformity Across a Multi-Agent Cluster Is an Ops Cost
Without a mechanism to change model settings across all nodes at once, emergency responses like this take too long. We solved it with an SSH loop, but a cluster-wide model override feature would be welcome.
4. Design for "Minimum Quality at Fallback"
When designing fallback chains, evaluate each model on:
- Instruction following accuracy (especially system prompt compliance)
- Personality/tone reproduction fidelity
- Long-context retention
- Language support (critical in multilingual environments)
Not just "does it connect?" but "is it usable when it connects?"
Conclusion
LLM fallback is insurance, but insurance has quality tiers. Cheap insurance fails you when you need it most.
After this incident, we completely removed DeepSeek from our fallback chain. The risk of slightly lower availability is smaller than the risk of personality breakdown destroying trust.
If you give your agents personalities, either guarantee those personalities are maintained regardless of which model is running, or cut the models that can't maintain them. That's the reality of multi-agent operations.
Top comments (0)