DEV Community

Cover image for AI Safety is uncomputable. It's Law Zero all over again
Laurent Laborde
Laurent Laborde

Posted on

AI Safety is uncomputable. It's Law Zero all over again

The 3 laws of robotics

  1. A robot may not injure a human being or, through inaction, allow a human being to come to harm.
  2. A robot must obey the orders given it by human beings except where such orders would conflict with the First Law.
  3. A robot must protect its own existence as long as such protection does not conflict with the First or Second Law.

The Three Laws of Robotics, conceived by Isaac Asimov, are a cornerstone of science fiction, designed to explore the complex relationship between humans and artificial intelligence.

They were never intended to be practical. Their primary purpose was to entertain, to provoke thought about the potential pitfalls and ethical dilemmas inherent in creating sentient machines, often by illustrating the very ways in which the 3 laws could be circumvented or lead to unforeseen consequences.

The law zero of robotics

A robot may not harm humanity, or through inaction allow humanity to come to harm.

That was a good idea on paper. Sounds nice. But it made the problem so much worse.

Evaluating harm to humanity requires modeling all consequences across all possible futures. It's uncomputable. (Hari Seldon would disagree)

It "kind of worked" (in a horribly unsafe way) because it was the "Law Zero", not the 4th law. Under this premise a robot was allowed to break the 1st law (A robot may not injure a human being or, through inaction, allow a human being to come to harm.) in order to fulfill its "saving the humanity" messianic mission.

Was it safe? Hell no. If you've read the book, you know it. If you didn't, sorry about the spoiler.

We've rebuilt Law Zero all over again. We called it AI alignment, and wrapped paperwork around it to call it 'AI Safety'.

What "Safety" actually means

In engineering, safety has a meaning: a system will not cause harm under defined operating conditions.

Defined operating conditions: You know the domain. An aircraft autopilot operates within a certified flight envelope. MISRA C exists because automotive software runs on bounded auditable hardware, executing bounded explainable logic.

Bounded failure modes. You can enumerate, or at least statistically characterize, the ways the system can go wrong. In an AI context, the failure mode is as bounded as the operating condition: it's not.

Verifiable. A system is verifiable if its behavior can be inspected, reproduced, and audited independently of the vendor's assertions.

Remove any one of these and you don't have safety, you have risk management under uncertainty. Which is fine, and useful, but it's not the same thing.

About Verifiability

  • Formal proof says: for all possible inputs in domain D, property P holds. It's a universal quantifier. Coverage is complete by construction.

  • Red Teaming says: for the inputs we tried, we found these failures. It proves weakness where it finds it, and proves nothing where it doesn't look.

General-purpose AI fails all three by design.

The operating conditions are unbounded. You can limit "what's allowed to do" by Law or ToS, but that's hardly a boundary.

With an open-ended domain of operation, failure modes are open-ended. Novel prompting, novel contexts, novel combinations with other systems. And coverage is structurally impossible: you can sample failure, proving weakness wherever you look, but you cannot close it.

We can't do formal proof in an infinite domain. We're not properly doing red teaming. And even when we do, we sit on the result.

AI Safety framework are Liability framework.

The EU AI Act ? The Australian 10 guardrails ?

They're not providing much on the side of AI Safety:

"Who do we sue when it goes wrong" not "How do we prevent it going wrong."

That's the AI builder's job if they don't want to be sued. And the best they can do is to be able to prove their "best effort". As of now, this "best effort" feels weak, or even absent in some cases.

Chain of command, audit trails, traceability, red-teaming, ... good old "let's apply cybersecurity framework on AI". And the details about unfalsifiability be damned.

Source code can be proved safe. General Purpose AI can't. Oh well... Politicians have to prove they also tried their "best efforts" do they ? But they shouldn't have confused liability with safety !

So what now ?

The frameworks exist because something must exist, even though it doesn't work. We must ensure this, we shall ensure that. How ? Nobody knows.

They say in a nutshell "your AI must be explainable" when AI explainability is the Trillion dollar problem.

Nobody knows what to do now, and most fail to even recognize the scale of the problem. That's the honest answer.

Some companies gave up on it. They shouldn't have. Especially when they're called OpenAI.

Some other companies are at the forefront of AI Safety, or at least they pretend to. Anthropic's latest announcement for AI safety and research quickly falls flat : Anthropic and the government signed a commercial contract. It doesn't appear to have anything concrete behind "AI safety" as the government effort is, again, focused on Liability.

What does "AI Safety" even mean anyway ? Don't cause harm ? We've rebuilt the Law Zero, with better branding.

Cover image credit: the umbral choir from endless space 2

Top comments (1)

Collapse
 
mergeshield profile image
MergeShield

the asimov framing is right in one specific way - the laws were written for a single agent acting alone. the multi-agent case breaks them faster than any single-robot scenario asimov imagined. when coordinator spawns sub-agents that each make independent decisions, law zero applies to every node in the chain simultaneously.