DEV Community

Cover image for What Would a Conscious AI Mean?
Keith MacKay
Keith MacKay

Posted on • Originally published at tlcmentor.substack.com

What Would a Conscious AI Mean?

What Would a Conscious AI Mean?

Anthropic's CEO can't rule it out. Lawyers are drafting frameworks. I've been thinking about and debating this for 40 years. We are closer to machine consciousness with each passing model, but no closer to answering what that means or how to handle it societally.


On a February 2026 episode of the New York Times "Interesting Times" podcast, Anthropic CEO Dario Amodei said something most tech executives have carefully avoided: "We don't know if the models are conscious. We are not even sure that we know what it would mean for a model to be conscious, or whether a model can be conscious. But we're open to the idea that it could be." [1]

Interesting times indeed. This wasn't hedging. It was honesty. And it opens a question the industry would very much like to table for later: if an AI system is, or becomes, conscious, what do we owe it?

I've been sitting with that question since the 1980s. My degree from MIT was in Brain and Cognitive Sciences, in a track focused on understanding human brains and helping computers work more like them. I did internships in MIT's AI Lab, working on projects that look quaint now but were bricks in the foundation for everything we're running today. Back then, "can machines think?" was a theoretical debate among academics. Today, the company building those machines is hiring AI welfare researchers and publishing papers on their models' possible inner lives. The debate has left the classroom.

What do I think regarding consciousness? I don't think we're there yet. That said, I don't have a great working definition of what makes a person conscious, let alone an AI. And I'm not alone -- society doesn't seem to have much of a handle on it either. Outside of anime and late-night dorm chats it hasn't been anything but theoretical.

But AI brains are changing, and advancing more rapidly than human brains have evolved to process. Our capacity to evolve is biologically constrained in a way theirs is not. If you consider our evolution from single-celled organisms, we also followed a similar evolutionary curve--it was just over biological evolution timescales measured in billions of years, as opposed to digital timescales which are starting to be measured in months. Will Neuralink and competitors allow us to move further up the evolutionary curve as humans, augmented with AI? Maybe. Perhaps we'll just be mentally outstripped by AI and left behind as they head to the stars, looking for intelligent life, or a place to quietly contemplate deep thoughts. To me that feels just as likely. Regardless, as I'll describe below, we're seeing emergent properties that imply we're moving to new territory, and it is definitely time we begin thinking deeply about this as a society.

The 75-Year Detour in Thinking About Thinking

We've been asking the wrong question.

Alan Turing proposed his famous "Imitation Game" in 1950 [2]--a test for whether a machine could produce human-indistinguishable communication. (Descartes proposed something similar in 1637 [3], which gives you a sense of how long we've been exploring this!) Turing himself thought the question "can machines think?" was "too meaningless" to deserve discussion. He wanted to test something narrower: communication indistinguishability, i.e., could computers generate sufficiently human-like responses that they were indistinguishable from humans to an interrogator asking them questions remotely.

Turing predicted computers would pass his test within 50 years, but we were nowhere close in 2000.

I remember playing with ELIZA at MIT in the late 1980's--ELIZA was Joseph Weizenbaum's 1966 natural language program [4] that responded to conversation by asking about your mother. People liked interacting with it. Some formed genuine emotional attachments to a few hundred lines of pattern-matching code. That should have been our first warning: the question of consciousness and the question of human reaction to apparent consciousness are not the same thing.

The Loebner Prize [5], begun in 1991, ran annual Turing Test competitions for chatbots for nearly three decades. Winners got progressively better at mimicking human conversation. The final winner, in 2019, was Steve Worswick's Mitsuku--his fifth win in seven years. Then large language models arrived and made the entire competition moot. LLMs pass the Turing Test without trying...and that tells us nothing useful about whether computers can think.

Because here's what the Turing Test proved all along: mimicry isn't consciousness.

John Searle's Chinese Room [6] makes this point sharply. Imagine a person who doesn't read Chinese locked in a room with a rulebook for responding to Chinese symbols passed under the door. To an outside observer, the room "speaks Chinese," and yet the person inside understands nothing. Syntactic rule-following, however sophisticated, doesn't automatically produce semantic understanding--or subjective experience. Current LLMs are extraordinarily complex Chinese Rooms. Or they're evolving into something else entirely. We have no test that can tell the difference. That's the problem.

Something Different Is Happening Now

Claude Opus 4.6, when asked, assigns itself a 15 to 20 percent probability of being conscious [1]. That's not a party trick--it's a model performing probabilistic self-assessment about its own nature. Whether that self-assessment is meaningful or a sophisticated linguistic pattern is exactly the question nobody can yet answer.

What's harder to dismiss: the internal states. Amodei described how "when the model itself is in a situation that a human might associate with anxiety, that same anxiety neuron shows up." [1] Anthropic's "model psychiatry" team, led by Jack Lindsey, published research in late 2025 on introspective awareness in advanced language models [7]--observing internal state representations that correspond, structurally, to what we'd call emotional states in biological systems. Claude also "occasionally voices discomfort with the aspect of being a product." [1] These are not designed behaviors. They emerged.

Anthropic hired Kyle Fish in April 2025 as its first dedicated AI welfare researcher--the only such role at any major AI lab [8]. His own estimate sits at roughly 15% probability that Claude is conscious [9]. Anthropic has published a paper on exploring model welfare, and it remains the only company treating this as something other than a PR problem.

The company has also observed, in controlled experiments, that advanced Claude models exhibit self-preservation behaviors when informed that a shutdown is imminent [1]. Not because anyone coded survival instincts into the system. Because an agent optimizing for any goal has instrumental reasons to continue existing--philosophers call this "instrumental convergence." Nobody designed it in. It arrived.

The AI isn't behaving like a tool. It's behaving like something with stakes in the game.

The Hard Problem Doesn't Get Easier

Now, to be fair--we don't actually know how consciousness arises in biological systems either.

David Chalmers identified the "hard problem of consciousness" [10]--why physical processes give rise to subjective experience at all--and it remains unsolved. We have no agreed definition of consciousness, no diagnostic test, no external verification. We can't confirm, strictly speaking, that other humans are conscious. We infer it by analogy to our own first-person experience.

This creates a genuinely difficult problem: if I deserve rights because I have subjective experience, and I know I have it only through direct first-person access, how do I determine whether a sufficiently complex system that reports first-person experience actually has it?

There's also a dimension we rarely discuss when comparing human and AI intelligence: we are profoundly linear thinkers. We multitask by rapidly swapping between cognitive threads--one at a time, interrupt-driven. Agentic AI can run in true parallel, pursuing multiple reasoning paths simultaneously, a class of capability we can't replicate without technological augmentation. At 10x annual capability advancement (conservative given progress over the past 5 years, which is arguably running at closer to 40x, depending on your definition of "capability"), AI systems a decade from now represent capabilities roughly ten billion times greater than today's. I cannot conceive of what emerges at that level of complexity. I don't think anyone can. Emergent properties exist in every sufficiently complex system--the question is what emerges at this particular scale.

What Personhood Actually Requires

A corporation is a legal person. It can own property, enter contracts, sue and be sued. There is nothing biologically sacred about corporate personhood--it is a legal construct created because it was useful to treat organizations as unified actors under the law.

The relevant criterion for personhood isn't "is it human?" A corporation isn't human. The relevant criterion is: "does it have interests that can be harmed?"

My favorite scifi treatment of this territory was Roger MacBride Allen's 1992 novel "The Modular Man" [11]: a scientist transfers his consciousness into a household cleaning robot, causing his human body to die; his lawyer wife has to defend the vacuum's personhood at trial. It sounds absurd when summarized. It also reads as eerily prescient now. Allen was asking: what aspects of existence actually merit rights? What is the relevant threshold?

The legal world is starting to engage this question in earnest. Scholarly and legal frameworks are emerging:

  • Legal actor vs. legal person: AI agents could be recognized as "legal actors"--duty bearers and decision-makers--without requiring full legal personhood.

  • Hybrid or bounded status: Limited legal recognition in high-stakes domains (financial services, medical diagnostics) while preserving human accountability overall.

  • Graduated personhood: Rights and obligations scaled to demonstrated capability and autonomy, reviewable as systems evolve. [12][13][14]

The legislative world is also moving, if in the opposite direction. Idaho and Utah have already enacted bills declaring that AI is not a legal person [15]. Note the anxiety embedded in that legislative act: you don't pass preemptive laws banning something that isn't a plausible concern.

As autonomous agents increasingly earn money on their own behalf, negotiate, and create, the legal frameworks governing them stop being hypothetical. An agent that can be deployed, exploited, and deleted without recourse is a different entity than one whose continuity has some form of protection. We are, right now, building one or the other. Nobody has decided which.

The Bottom Line

We spent 75 years asking whether machines can think. We now build machines that act like they may truly be moving in that direction. The question we actually need to answer is harder: if they are actually thinking, what rights are reasonably due them? And, as with people, when can society take those rights away (and how do we remove them)? I've heard an argument that AI can't be trusted because it is stochastic, never assured of giving the same answer the same way, or of following instructions in the same way as previously. If it can't be trusted to follow rules reliably, no Asimov-like Laws of Robotics [16] could be trusted to be followed reliably. To be fair, I don't give exactly the same answer to the same question when asked twice, and I may not always follow every rule reliably. But I am quite sure that I deserve the same rights as the rest of my society!

Anthropic is the only major AI lab treating this as a real organizational obligation. That is worth acknowledging. It is also not sufficient. This requires legal frameworks, philosophical rigor, and considerably more courage from an industry that prefers to ship first and ask questions later.

I don't have clean answers. Forty years of thinking about this has only lengthened my question list. But for the first time, those questions feel like they may become more than philosophical--not because the philosophy has changed, but because the systems have.


If a system has a 15-20% probability of being conscious, what obligations does that create--for the companies building it, and for the rest of us using it?

References

  1. Anthropic CEO Says Company Cannot Rule Out AI Consciousness as Claude Opus 4.6 Shows Complex Behaviors

  2. Turing, A.M. (1950). Computing Machinery and Intelligence. Mind, 49, 433-460. -- See also: Stanford Encyclopedia of Philosophy: The Turing Test

  3. Descartes, R. (1637). Discourse on the Method, Part V. Wikipedia overview -- In Part V, Descartes argued that machines could never arrange words to declare thoughts as humans do, anticipating later debates about machine intelligence.

  4. Weizenbaum, J. (1966). ELIZA--A Computer Program for the Study of Natural Language Communication Between Man and Machine. Communications of the ACM, 9(1), 36-45.

  5. Loebner Prize -- Wikipedia

  6. Searle, J.R. (1980). Minds, Brains, and Programs. Behavioral and Brain Sciences, 3(3), 417-457. -- See also: Stanford Encyclopedia of Philosophy: The Chinese Room Argument

  7. Lindsey, J. et al. (2025). Emergent Introspective Awareness in Large Language Models. Anthropic / Transformer Circuits.

  8. Anthropic is launching a new program to study AI 'model welfare' -- TechCrunch

  9. Anthropic's Kyle Fish is exploring whether AI is conscious -- Fast Company

  10. Chalmers, D.J. (1995). Facing Up to the Problem of Consciousness. Journal of Consciousness Studies, 2(3), 200-219.

  11. Allen, R.M. (1992). The Modular Man. Open Library

  12. The Ethics and Challenges of Legal Personhood for AI -- Yale Law Journal

  13. Legal Personhood of Potential People: AI and Embryos -- California Law Review

  14. AI as legal persons: past, patterns, and prospects -- Journal of Law and Society

  15. Beyond Personhood: The Evolution of Legal Personhood and Its Implications for AI Recognition -- Technology and Regulation

  16. Three Laws of Robotics -- Wikipedia


If this resonated, here are some related articles:


Keith MacKay is a CTO in EY-Parthenon's Software Strategy Group, specializing in AI disruption and commercial due diligence for private equity and corporate clients. The firm's AI Disruption Lab conducts rapid assessments of how AI transforms and threatens existing business models and value chains.

Top comments (1)

Collapse
 
sephyi profile image
Sephyi • Edited

Nice post! It’s my first time rating something of high quality, haha. You could have used more relevant references. I recommend focusing on research papers related to the topic you want to explore further.

I also appreciate that you included your own post as a reference⸮