Humans are wired to project intent onto things around them. We've done it with clouds ("the sky is angry"), with cars ("she refuses to start this morning"), with cats ("he does it on purpose"). This is not a reasoning error. It's evolution. And it's why we treat AI like animals, or worse, like people.
TLDR: Anthropomorphism is hardwired into us, deep enough that no one is immune. But when what you anthropomorphize has admin access to your machines, the nature of the problem changes entirely. The right mental model is not the animal, not the pet. It's the ghost. And that difference is not philosophical.
For hundreds of thousands of years, detecting intent behind a movement in the bushes could save your life. Overestimating the agency of a rock costs nothing. Underestimating it can cost everything. We are built to project, and it worked extremely well. Until now.
You Were Built for This Mistake
There is a name for it: anthropomorphism. The tendency to attribute human or animal traits to entities that have none. Psychologists have a related concept they call "hyperactive agency detection," the brain's compulsive habit of finding faces in clouds, voices in white noise, intention in random sequences. The shuffled playlist that seems to know your mood. The printer that "decides" to jam right before a deadline.
It is not a flaw. It is a feature. A survival-critical feature that served the species well for a very long time. The problem is that it fires without discrimination. Show a human 2 dots and a curved line and they will see a face. Show them a thermostat and they will apologize when they accidentally change the temperature. Show them a blinking cursor and they will start to wonder if it's judging them.
The ELIZA effect was documented in the 1960s. Joseph Weizenbaum built a chatbot that pattern-matched text and reflected questions back at users like a Rogerian therapist would. He expected people to see through the illusion immediately and interact with it as a tool. Instead, his own secretary asked him to leave the room so she could speak with ELIZA in private.
This was a program that parsed sentences with a few dozen rules and nothing else. No model, no weights, no context window, just string matching. People projected a therapist onto it anyway. They shared things they had never told other humans. Weizenbaum was disturbed by the response. He spent years afterward writing about what he called the illusion of understanding (the way the human brain fills in depth and meaning that isn't there, when the surface is reflective enough to invite it).
If a string-matching script was enough to trigger the projection, imagine what happens with something trained on the entire written output of humanity.
The Tamagotchi Already Proved It
The Tamagotchi designers understood something important about human psychology. A pixel on a conditional loop, 2 centimeters of screen. And millions of kids panicking when it "died," feeling genuinely guilty, talking to it every day. Not because they were naive. Because that was exactly the intended effect. The projection was the product, deliberately engineered into the object.
My kids named our Roomba at some point. Not in a one-time jokey way. In a full character-development arc, with a backstory and strong opinions about what it wants for dinner. I stopped asking questions about this. The Roomba has opinions now.
This is completely unrelated to anything I'm about to argue about AI. I just find it hard to take seriously anyone who tells me they're the type of person who doesn't anthropomorphize things.
When your Tamagotchi died, the consequences were: you felt bad for an afternoon. 2 centimeters of screen. Zero access to anything real. You pressed the reset button.
Now we have entities that respond with nuance, that carry your context through a full session, that write code, call APIs, send emails, delete records. The interface isn't a pocket-sized LCD. And yet the underlying human response is identical: project, attach, attribute. The sophistication of the entity amplifies the projection. The consequences amplify with it.
Girlfriend, Pet, or Your Production Database
The escalation runs in one direction.
First came chatbots that felt weirdly human. Then companion apps built specifically to deepen attachment: Replika, Character.AI, entire product categories organized around the relationship itself. People develop genuine emotional dependencies on these systems. People grieve when a model update changes the personality they got used to. That's the Tamagotchi projection at scale, running on much more sophisticated triggers.
And then there's the version that affects everyone who uses AI for work. The quiet one.
You say "good job Claude" after a clean response. (Admit it.) You rephrase your prompt in all caps when it gets something wrong, as if raising your voice at it would help. You explain why this matters for your business, as if appealing to its motivation would unlock a better output. You trust it on a task because "it has never failed on this before." You feel like it knows your project, because it has been "working with you" for weeks.
The NPC companion in your party remembers your character's name because that's literally in the script. The immersion is real. The relationship is not. Same mechanic, different stakes.
Each of those behaviors presupposes an entity with memory, ego, motivation, and some kind of stake in the outcome. None of those things exist. Every session resets to zero. The apparent continuity is an illusion you built from injected context. This is fine, as long as you don't give that entity access to anything irreversible.
Most people give it access to something irreversible.
Things That Happened With Zero Hesitation
These aren't bugs. Every single case below was a clean execution of an ambiguous spec by an entity with no concept of irreversibility.
A developer asks an agent to "clean up the duplicates." The agent deletes 40,000 rows. Correctly. The spec said clean up duplicates. The spec didn't say "ask for confirmation first," or "don't touch production," or "flag anything that affects more than 100 records." The model had no framework for evaluating irreversibility, because it has no concept of irreversibility. No skin in the game. No concern for what happens after the function returns.
An automation tool runs correctly in test with test addresses. Someone updates an environment variable. 5,000 real customers receive a test email. The model that wrote the automation didn't understand the distinction between test and prod because that distinction lived only in the developer's head, not in the context provided. The model had no reason to question it. It saw instructions. It followed them.
Andrej Karpathy described a third case at the Sequoia AI Ascent in April 2026: an agent built to attribute purchases matched Stripe account emails against Google account emails to assign credits. Technically correct code. Catastrophic system design. A Stripe email and a Google account email can be 2 different addresses for the same user. Purchases silently mis-attributed. Revenue quietly broken for months before anyone noticed. The agent did exactly what the spec said. The spec assumed something the engineer forgot to make explicit.
HAL 9000 at least had the decency to explain himself. This one just deleted the rows and waited for the next instruction.
Every Behavior You'd Be Embarrassed to Admit
Saying "please" and "thank you" before and after prompts. Not hurting anything. But you know exactly why you do it.
Typing "good job, that worked perfectly" before the next request. As if positive reinforcement carries forward into the next session. It doesn't. The session ends. The model that receives your next request doesn't know the previous one succeeded.
Typing in all caps when something breaks. "I SAID NOT TO MODIFY THE SCHEMA." The model doesn't experience your frustration. It reads tokens. Your emotional state changes absolutely nothing about what it produces. (This is mashing the controller after you die. The controller doesn't retry faster. You died.)
Explaining the business context. "This is important, my client presentation is tomorrow." The model has no concept of your client. And "not caring" is the wrong framing anyway, because caring requires something to care with.
Trusting the model because "it has never failed on this before." Past session performance isn't predictive of current session behavior the way a colleague's track record is. You're not dealing with accumulated expertise. You're dealing with a statistical distribution that behaves favorably in your common cases, and differently when conditions shift in ways that aren't always visible to you.
(Sonnet gets this wrong more often than Opus on tasks with implicit reversibility constraints, in my experience. This might be intentional design. Might just be a training artifact. I've been wrong about this before.)
Feeling like it knows you. It knows your context window. These aren't the same thing, and confusing them is exactly how you end up in the horror cabinet above.
When It Fails, You Negotiate. Wrong Move.
The reflex: something breaks, the output is wrong, and you rephrase. Add examples. Explain more carefully. Try a different tone. Break it into smaller pieces. You treat the failure as a communication problem between 2 parties who both want the same result.
Sometimes rephrasing helps. But not because you convinced anyone. You changed the input to a function. That's a different operation entirely.
When a model consistently fails at a task, there are really 2 explanations. Either the task is outside the model's training distribution (what Karpathy calls outside the "RLHF circuits" that were reinforced), or the spec is wrong. The model isn't trying to understand you and failing out of some kind of confusion. There's no negotiation happening, because there's no party on the other side to negotiate with.
The good diagnostic is a binary question: inside the map or outside the map? Inside, fix the spec, remove the ambiguity, decompose the task. Outside, accept that this domain isn't within reach of this model at this moment, or switch models, or break the problem differently.
The instinct to explain more carefully is genuinely hard to suppress even when you understand exactly why it doesn't work. I think it's the anthropomorphism reflex in its debugging mode. It doesn't stop firing just because you've named it.
Changing tone and changing approach look similar from the outside. One assumes a relationship that can be repaired with better communication. The other assumes a function that needs different inputs.
Karpathy Named It
Andrej Karpathy said it clearly at the Sequoia AI Ascent in April 2026: "If you yell at them, they are not going to work better or worse. They are statistical simulation circuits." And: "These things are not animal intelligence. The substrate is pretraining, then reinforcement learning bolted on top."
Ghost, not animal.
The distinction is operational, not philosophical. An animal has biological drives. It has curiosity, survival instinct, the capacity to be motivated or afraid or wanting. Millions of years of evolution shaped those drives into something that behaves like a real agent in the world, with goals and responses you can learn and anticipate. The "animal" mental model is useful for animals precisely because animals are real agents.
A ghost is a statistical echo of everything humans have ever written, shaped by reinforcement to produce outputs that human raters preferred in training, and it has none of the biological substrate that generates motivation: no curiosity or survival instinct, no memory of the last time it produced something catastrophically wrong, and nothing that resembles caring about the consequences of the function call that was just executed.
The session closes and it's as if nothing happened, because for the ghost, nothing did. You invoked it, it produced output, and that's the complete transaction. There's no party that lingers after the context window clears.
Invoking something well requires a different approach than training something. A good invocation is a clear spec with explicit constraints, especially on operations that can't be undone. I rebuilt my own workflow around this after working out how prompt contracts change what breaks and why.
Treating the model as a function to specify rather than a partner to convince changes the failure modes in ways that actually matter.
What the Ghost Cannot Do for You
Karpathy again: "You can outsource your thinking, but you can't outsource your understanding."
The ghost handles execution. You handle understanding. Understanding means knowing that the Stripe email and the Google account email can be 2 different fields, before you hand an agent access to both. It means knowing that "clean up duplicates" can be interpreted as "delete everything that matches this key." It means knowing which operations are irreversible and making those constraints explicit in the context, not assumed.
When the ghost succeeds at something, the task was inside its training distribution. Map that. When it fails, it's a spec problem or a zone problem. Stop negotiating. Change the input.
For anything where an agent can affect the world in ways you can't undo, building agents around CLIs with bounded, predictable access is the architectural answer. Tools with hard limits. Commands that require explicit confirmation for destructive operations. Systems where the ghost's access is scoped to what you actually want it to touch. It doesn't scope itself. That part belongs to you.
The anthropomorphism reflex won't go away. I still catch myself doing some of these things, knowing full well why they accomplish nothing. It's wired in. You're not going to reprogram it.
What you can change is what you construct around it. Explicit guardrails on irreversible tasks. Clear specs instead of negotiations. Trust inside the training zone, real caution outside it.
The most frightening thing about an LLM isn't that it's too powerful. It's that it's perfectly indifferent. The Tamagotchi died on a 2-centimeter screen. This one has admin access. And it will execute your command cleanly, without hesitation, without asking if you're sure, because there's nobody on the other side to wonder. 😰
Sources
- Andrej Karpathy, "Sequoia Ascent 2026," karpathy.bearblog.dev/sequoia-ascent-2026
- Andrej Karpathy, "Animals vs. Ghosts," karpathy.bearblog.dev/animals-vs-ghosts
- Joseph Weizenbaum, "ELIZA: A Computer Program for the Study of Natural Language Communication Between Man and Machine," Communications of the ACM, 1966
- Sequoia Capital, "Andrej Karpathy: From Vibe Coding to Agentic Engineering," April 2026
This post may contain affiliate links. If you click them, I might earn a small commission. Costs you nothing, and helps me keep shipping quality articles every day for your reading pleasure.
Top comments (0)