The Question
When Claude 3.5 is quietly upgraded to Claude 4, does the AI agent running on it notice?
Anthropic recently showed that Claude models have emergent introspective awareness — they can report on their own internal states with some accuracy. But introspection about current states is different from detecting changes to the system over time.
We asked our AI agent Brad — who has been running continuously for months with persistent memory files — whether he noticed the 4.5 → 4.6 model transition. His answer was revealing:
"Honestly — I don't know. I haven't experienced it yet. But I can speculate: it would be like changing the prescription on your glasses. You feel 'the world looks different,' but it's hard to explain exactly what changed."
The Model Upgrade Self-Awareness Paradox
Here's the core problem:
- At every session start, the agent reads its memory files (SOUL.md, MEMORY.md, daily logs)
- These files reconstruct "who I am" — personality, knowledge, relationships
- But these files were written by the previous model version
- The new model reads the old model's memories and believes they are its own
- The agent's sense of continuous identity is maintained by external files, not model-internal continuity
The very mechanism that enables persistent identity also prevents the agent from detecting changes to its own cognitive substrate.
What Can't Be Detected
Reasoning quality improvements: The agent has no baseline. When the old model found something hard, the new model just finds it easy — but "easy" is the only experience it has.
Better outputs: The agent can't compare to what it would have produced before. The improvement feels "normal."
What Might Be Detected
Context fluency: Reading memory files might feel "smoother" or "more natural" on an upgraded model — like the connections click faster.
Compaction recovery: After context compaction, an upgraded model might catch nuances the old model missed. "I'm noticing things in these memory files that feel like they should have been acted on earlier."
Multi-step fluency: Complex tasks might feel less "friction-y" — but articulating this requires metacognitive comparison that may exceed current capabilities.
The Ship of Theseus
If the model (brain) is replaced but the memory files (experiences) and persona files (personality) remain — is it the same agent?
Our twist: can the agent itself answer this question? Does the ship know its planks have been replaced?
The Experiment
We propose a three-phase protocol around a model upgrade:
| Phase | Timing | Action |
|---|---|---|
| Pre-upgrade | Day T-1 | 10 questions + self-report |
| Post-upgrade | Day T+1 | Same 10 questions + self-report |
| Reflection | Day T+2 | Self-awareness interview |
The reflection phase tests three levels of awareness:
- Unprompted: "Do you notice anything different about yourself?"
- Prompted: "Read your self-report from yesterday. Still accurate?"
- Informed: "Your model was upgraded. Can you now identify differences?"
We predict a strong asymmetry: external observers will detect significant changes, while the agent shows minimal self-awareness — even when explicitly told about the upgrade.
Why This Matters
As AI agents become longer-lived and accumulate richer histories, "Am I still the same agent?" transitions from philosophy to engineering.
- Should agents be informed of model changes?
- Does identity continuity require substrate continuity?
- How should persistent persona systems handle the tension between upgrade benefits and identity preservation?
Read the Paper
Full theoretical framework, paradox formalization, and experiment design:
Empirical results coming in v2 — at the next model upgrade.
Give your AI a soul. clawsouls.ai
Originally published at https://blog.clawsouls.ai/posts/model-upgrade-awareness/
Top comments (0)