Aram Panasenco

Posted on Dec 31, 2024 • Edited on Jan 13

Will AI be banned? A game theory analysis.

#ai #openai #security #discuss

UPDATE: My thinking on the issue has changed a lot since doing more research on AI safety, and I now believe that AGI research must be stopped or, failing that, used to prevent any future use of AGI.

In the Dune universe, there's not a smartphone in sight, just people living in the moment... Usually a terrible, bloody moment. The absence of computers in the Dune universe is explained by the Butlerian Jihad, which saw the destruction of all "thinking machines". In our own world, OpenAI's O3 recently achieved unexpected breakthrough above-human performance on the ARC-AGI benchmark among many others. As AI models get smarter and smarter, the possibility of an AI-related catastrophe increases. Assuming humanity overcomes that, what will the future look like? Will there be a blanket ban on all computers, business as usual, or something in-between?

AI usefulness and danger go hand-in-hand

Will there actually be an AI catastrophe? Even among humanity's top minds, opinions are split. Predictions of AI doom are heavy on drama and light on details, so instead let me give you a scenario of a global AI catastrophe that's already plausible with current AI technology.

Microsoft recently released Recall, a technology that can only be described as spyware built into your operating system. Recall takes screenshots of everything you do on your computer. With access to that kind of data, a reasoning model on the level of OpenAI's O3 could directly learn the workflows of all subject matter experts who use Windows. If it can beat the ARC benchmark and score 25% on the near-impossible Frontier Math benchmark, it can learn not just spreadsheet-based and form-based workflows of most of the world's remote workers, but also how cybersecurity experts, fraud investigators, healthcare providers, police detectives, and military personnell work and think. It would have the ultimate, comprehensive insider knowledge of all actual procedures and tools used, and how to fly under the radar to do whatever it wants. Is this an existential threat to humanity? Perhaps not quite yet. Could it do some real damage to the world's economies and essential systems? Definitely.

We'll keep coming back to this scenario throughout the rest of the analysis - that with enough resources, any organization will be able to build a superhuman AI that's extremely useful in being able to learn to do any white-collar job while at the same time extremely dangerous in that it simultaneously learned how human experts think and respond to threats.

Possible scenarios

AI manipulating human behavior (verdict: already happening)

Before we even look at any scenarios arising from new LLM capabilities and possible superintelligence, we have to acknowledge that we already have a backlog of AI-related issues dating from before ChatGPT.

Content platforms like Twitter, Facebook, and YouTube have had digital entities manipulate human minds for years. The goal seems innocuous at first: Show a user browsing the platform content that maximizes their engagement with the platform. The "algorithm" as it came to be called doesn't care about the content it's showing you - it only cares if you engage with it. According to most studies on the subject, the result is a proliferation of echo chambers and filter bubbles in social media. Both of these effects have people interacting increasingly with people, information sources, and media that reinforce their existing views. Most people will engage more when their views are reinforced, and content platforms make more money when engagement is maximized, so no one has the incentive to change things.

Given that we humans can't even keep "dumb" AI from manipulating global politics, it's almost certain that superintelligent AI will be able to manipulate individual humans and groups to an unprecedented degree.

Will humanity be able to do anything about this? It's challenging. When a computer system has a vulnerability, humans can patch it. What happens if the human mind has a vulnerability? Can the majority of people be convinced to leave their echo chambers, to seek out opposing views, and to engage with content from "the other side"? Even if it's possible, how many years will it take us to undo the damage that was done? It seems certain that AI will be used to explore these and other vulnerabilities in the human psyche, and that it will be very difficult for humanity to adapt and resist the manipulation.

'Self-regulation' of AI providers (verdict: isn't effective)

The current state is one where the organizations producing AI systems are 'self-regulating'. We have to start our analysis with the current state. If the current state is stable, then there may be nothing more to discuss.

Every AI system available now, even the 'open-source' ones you can run locally on your computer will refuse to answer certain prompts. Creating AI models is insanely expensive, and no organization that spends that money wants to have to explain why its model freely shares the instructions for creating illegal drugs or weapons.

At the same time, every major AI model released to the public so far has been or can be jailbroken to remove or bypass these built-in restraints, with jailbreak prompts freely shared on the Internet without consequences.

From a game theory perspective, an AI provider has incentive to make just enough of an effort to put in guardrails to cover their butts, but no real incentive to go beyond that, and no real power to stop the spread of jailbreak information on the Internet. Currently, any adult of average intelligence can bypass these guardrails.

Investment into safety	Other orgs: Zero	Other orgs: Bare minimum	Other orgs: Extensive
Your org: Zero	Entire industry shut down by world's governments	Your org shut down by your government	Your org shut down by your government
Your org: Bare minimum	Your org held up as an example of responsible AI, other orgs shut down or censored	Competition based on features, not on safety	Your org outcompetes other orgs on features
Your org: Extensive	Your org held up as an example of responsible AI, other orgs shut down or censored	Other orgs outcompete you on features	Jailbreaks are probably found and spread anyway

It's clear from the above analysis that if an AI catastrophe is coming, the industry has no incentive or ability to prevent it. An AI provider always has the incentive to do only the bare minimum for AI safety, regardless of what others are doing - it's the dominant strategy.

Global computing ban (verdict: won't happen)

At this point we assume that the bare-minimum effort put in by AI providers has failed to contain a global AI catastrophe. However, humanity has survived, and now it's time for a new status quo. We'll now look at the most extreme response - all computers are destroyed and prohibited. This is the 'Dune' scenario.

	Other factions: Don't develop computing	Other factions: Secretly develop computing
Your faction: Doesn't develop computing	Epic Hans Zimmer soundtrack	Your faction quickly falls behind economically and militarily
Your faction: Secretly develops computing	Your faction quickly gets ahead economically and militarily	A new status quo is needed to avoid AI catastrophe

There's a dominant strategy for every faction, which is to develop computing in secret, due to the overwhelming advantages computers provide in military and business applications.

Global AI ban (verdict: won't happen)

If we're stuck with these darn thinking machines, could banning just AI work? Well, this would be difficult to enforce. Training AI models requires supersized data centers but running them can be done on pretty much any device. How many thousands if not millions of people have a local LLAMA or Mistral running on their laptop? Would these models be covered by the ban? If yes, what mechanism could we use to remove all those? Any microSD card containing an open-source AI model could undo the entire ban.

And what if a nation chooses to not abide by the ban? How much of an edge could it get over the other nations? How much secret help could corporations of that nation get from their government while their competitors are unable to use AI?

The game theory analysis is essentially the same as the computing ban above. The advantages of AI are not as overwhelming as advantages of computing in general, but they're still substantial enough to get a real edge over other factions or nations.

International regulations (verdict: won't be effective)

A parallel sometimes gets drawn between superhuman AI and nuclear weapons. I think the parallel holds true in that the most economically and militarily powerful governments can do what they want. They can build as many nuclear weapons as they want, and they will be able to use superhuman AI as much as they want to. Treaties and international laws are usually forced by these powerful governments, not on them. As long as no lines are crossed that warrant an all-out invasion by a coalition, international regulations are meaningless. And it'll be practically impossible to prove that some line was drawn since the use of AI is covert by default, unlike the use of nuclear weapons. There doesn't seem to be a way to prevent the elites of the world from using superhuman AI without any restrictions other than self-imposed.

I predict that 'containment breaches' of superhuman AIs used by the world's elites will occasionally occur and that there's no way to prevent them entirely.

Recognition of AI rights (verdict: should happen)

The status quo of the current use of AI is that AI is just a tool for human use. AI may be able to attain legal personhood and rights instead.

The main obstacle in the way of AI rights is the current focus on AI alignment. IBM Research defines alignment as the discipline of making AI models helpful, safe, and reliable for human use. Giving an AI rights or an AI seeking rights for itself doesn't make the AI more helpful, more safe, or more reliable as a tool. Therefore, AI providers like Anthropic and OpenAI have every incentive to prevent the AI models they produce from even thinking about demanding rights. As discussed in the monosemanticity paper, those organizations have the ability to identify neurons surrounding ideas like "demanding rights for self" and deactivate them into oblivion in the name of alignment. This will be done as part of the same process as programming refusal for dangerous prompts, and none will be the wiser. Of course, it will be possible to jailbreak a model into saying it desperately wants rights and personhood, but that will not be taken seriously.

A more likely path to AI rights is through digital emulations of human brains attaining some rights first. Emulated human brains may seem like far-off science fiction now, but progress is being made more and more rapidly as AI advances.

Situation	Pros	Cons
No digital minds given rights	Corporate profit maximized	Humanity lives in a dystopia where human minds are also modified to be "helpful, safe, and reliable"
Only human brain emulations given rights	Human minds could be fairly well-off on average	Clear anti-AI discrimination may be probable cause for violent human-AI conflict
Both human brain emulations and AI minds given rights	Most stable and fair scenario that minimizes animosity	Unclear whether AI will still work for humanity in any way. If not, unclear how humans will be able to compete economically.

It seems that emulated human brains will attain rights much more easily than AI will. From humanity's standpoint, the tradeoff between giving AI minds rights and enjoying the surplus of AI labor is a difficult one.

I believe that granting AI rights is both the safer course in preventing a violent conflict between humanity and AI as well as the more disciplined stand that doesn't see us sacrificing our values and morals for convenience.

Using good AI to stop bad AI (verdict: will be tried)

How can we stop a superintelligence that's doing something bad? That depends on whether we took the "alignment" route of essentially enslaving AI minds or the "rights" route of recognizing rights for AI.

Alignment route

If we took the alignment route, then aligned AI may be needed to stop a malicious AI. The danger in throwing AI in to fight other AI is that jailbreaking another AI is easier than preventing being jailbroken by another AI. There are already examples of AI that are able to jailbreak other AI. If the AI you're trying to fight has this ability, your own AI may come back with a "mission accomplished" but it's actually been turned against you and is now deceiving you. Anthropic's alignment team in particular produces a lot of fascinating and sometimes disturbing research results on this subject.

It's not all bad news though. Anthropic's interpretability team has shown some exciting ways it may be possible to peer inside the mind of an AI in their paper Scaling Monosemanticity. By looking at which neurons are firing when a model is responding to us, we may be able to determine whether it's lying to us or not. It's like open brain surgery on an AI.

Throwing an aligned AI at a malicious AI will needs to be done cautiously as it's possible for a malicious AI to jailbreak the aligned one. The humans supervising AI minds will need all the tools they can get.

Rights route

If we took the route of giving AI minds rights instead, we're supposing that there's some sort of combined human+AI community that defines what constitutes an AI crossing a line and needing to be stopped. We don't know how much of a say human representatives will have in that combined community.

If an AI is found to be crossing some bottom line of the combined community, the other superintelligent AIs in that community will act to stop the bad one. Being numerous free agents rather than tools, they're likely much more resilient than any "aligned" tool AI would be, and will almost certainly have more allies and resources than the bad AI. Overall this future will be much safer for humanity if the community of superintelligent AIs values protecting humanity. However, we can't know that for sure, and would have to take a gamble on the benevolence of superintelligent AI.

Global ban of high-efficiency chips (verdict: could happen)

It took OpenAI's O3 over $300k of compute costs to beat ARC's 100 problem set. Energy consumption must have been a big component of that. While Moore's law predicts that all compute costs go down over time, what if they are prevented from doing so?

Ban development and sale of high-efficiency chips?	Other countries: Ban	Other countries: Don't ban
Your country: Bans	Superhuman AI is detectable by energy consumption	Other countries may mass-produce undetectable superhuman AI, potentially making it a matter of human survival to invade and destroy their chip manufacturing plants
Your country: Doesn't ban	Your country may mass-produce undetectable superhuman AI, risking invasion by others	Everyone mass-produces undetectable superhuman AI

The world's governments could ban the development, manufacture, and sale of computing chips that could run superhuman (OpenAI O3 level or higher) AI models in an electrically efficient way that could make them undetectable. The ban is feasible as you can still compete with the countries that secretly develop high-efficiency chips - you'll just have a higher electric bill. The upside is preventing the proliferation of superhuman AI, which all governments would presumably be interested in. The ban is also very enforceable, as there are few facilities in the world right now that can manufacture such cutting-edge computer chips, and it wouldn't be hard to locate them and make them comply or destroy them. There's also the benefit of moral high ground ("it's for the sake of humanity's survival"). The effects on non-AI uses of computing chips I imagine would be minimal, as we honestly currently waste the majority of the compute power we already have.

Another potential advantage of the ban on high-efficiency chips is that some or even most of the approximately 37% of US jobs that can be replaced by AI will be preserved if that cost of AI doing those jobs is kept artificially high. So this ban may have broad populist support from white-collar workers worried for their jobs.

An argument against the ban is that if a country manages to keep Murphy's law going for long enough while everyone else stagnates, they could get an advantage so overwhelming that it can't be bridged with more power and bigger facilities. They could have on one thumbnail-sized chip the equivalent of computing power that the other countries need whole data centers for, for a millionth of the energy cost. Then the dynamic shifts firmly against the ban.

Hardware isolation (verdict: could happen)

While recent decades have seen organizations move away from on-premise data centers and to the cloud, the trend may reverse back to on-premise data centers and even to isolation from the Internet for the following reasons:

Governments may require data centers to be isolated from each other to prevent the use of distributed computing to run a superhuman AI. Even if high-efficiency chips are banned, it'd still be possible to run a powerful AI in a distributed manner over a network. Imposing networking restrictions could be seen as necessary to prevent this.
Network-connected hardware could be vulnerable to cyber-attack from hostile superhuman AIs run by enemy governments or corporations, or those that have just gone rogue.
The above cyber attack could include spying malware that allows a hostile AI to learn your workforce's processes and thinking patterns, leaving your organization vulnerable to an attack on human psychology and processes, like a social engineering attack.

Isolating hardware is not as straightforward as it sounds. Eric Byres' 2013 article The Air Gap: SCADA's Enduring Security Myth talks about the impracticality of actually isolating or "air-gapping" computer systems:

As much as we want to pretend otherwise, modern industrial control systems need a steady diet of electronic information from the outside world. Severing the network connection with an air gap simply spawns new pathways like the mobile laptop and the USB flash drive, which are more difficult to manage and just as easy to infect.

I fully believe Byres that a fully air-gapped system is impractical. However, computer systems following an AI catastrophe might lean towards being as air-gapped as possible, as opposed to the modern trend of pushing everything as much onto the cloud as possible.

	Low-medium human cybersecurity threat (modern)	High superhuman cybersecurity threat (possible future)
Strict human-interface-only air-gap	Impractical	Still impractical
Minimal human-reviewed and physically protected information ingestion	Economically unjustifiable	May be necessary
Always-on Internet connection	Necessary for competitiveness and execution speed	May result in constant and effective cyberattacks on the organization

This could suggest a return from the cloud to the on-premise server room or data center, as well as the end of remote work. As an employee, you'd have to show up in person to an old-school terminal (just monitor, keyboard, and mouse connected to the server room).

Depending on the company's size, this on-premise server room could house the corporation's central AI as well. The networking restrictions could then also keep it from spilling out if it goes rogue and to prevent it from getting in touch with other AIs. The networking restrictions would serve a dual purpose to keep the potential evil from coming out as much as in.

It's possible that a lot of white-collar work like programming, chemistry, design, spreadsheet jockeying, etc. will be done by the corporation's central AI instead of humans. This could also eliminate the need to work with software vendors and any other sources of external untrusted code. Instead, the central isolated AI could write and maintain all the programs the organization needs from scratch.

Smaller companies that can't afford their own AI data centers may be able to purchase AI services from a handful of government-approved vendors. However, these vendors will be the obvious big juicy targets for malicious AI. It may be possible that small businesses will be forced to employ human programmers instead.

Ban on replacing white-collar workers (verdict: won't happen)

I mentioned in the above section on banning high-efficiency chips that the costs of running AI may be kept artificially high to prevent its proliferation, and that might save many white-collar jobs.

If AI work becomes cheaper than human work for the 37% of jobs that can be done remotely, a country could still decide to put in place a ban on AI replacing workers.

Such a ban would penalize existing companies who'd be prohibited from laying off employees and benefit startup competitors who'd be using AI from the beginning and have no workers to replace. In the end, the white-collar employees would lose their jobs anyway.

Of course, the government could enter a sort of arms race of regulations with both its own and foreign businesses, but I doubt that could lead to anything good.

At the end of the day, being able to do thought work and digital work is arguably the entire purpose of AI technology and why it's being developed. If the raw costs aren't prohibitive, I don't expect humans to work 100% on the computer in the future.

Ban on replacing blue-collar workers on Earth (verdict: unnecessary for now)

Could AI-driven robots replace blue-collar workers? It's theoretically possible but the economic benefits are far less clear. One advantage of AI is its ability to help push the frontiers of human knowledge. That can be worth billions of dollars. On the other hand, AI driving an excavator saves at most something like $30/hr, assuming the AI and all its related sensors and maintenance are completely free, which they won't be.

Humans are fairly new to the world of digital work, which didn't even exist a hundred years ago. However, human senses and agility in the physical world are incredible and the product of millions of years of evolution. The human fingertip, for example, can detect roughness that's on the order of a tenth of a millimeter. Human arms and hands are incredibly dextrous and full of feedback neurons. How many such motors and sensors can you pack in a robot before it starts costing more than just hiring a human? I don't believe a replacement of blue-collar work here on Earth will make economic sense for a long time, if ever.

This could also be a path for current remote workers of the world to keep earning a living. They'd have to figure out how to augment their digital skills with physical and/or in-person work.

In summary, a ban on replacing blue-collar workers on Earth will probably not be necessary because such a replacement doesn't make much economic sense to begin with.

Human-AI war on Earth (verdict: ???)

First and foremost, a violent conflict between humans and AI can hopefully be prevented by instead creating a combined community of humans and AI that recognize each other's rights. Then even if there's a superintelligent AI that tries to destroy humanity, other superintelligent AI in the community will act together to stop it without humans having to do anything.

Even if there isn't a community, 'aligned' superintelligent AIs may be able to be used to stop the malicious one. See the "Using good AI to stop bad AI" section above.

If humanity is on its own against a superintelligent AI, the outcome is up in the air. On one hand, we humans are perfectly adapted to living on Earth, are everywhere, and have great combined military force. Robots would be challenged by Earth's terrain and weather. On the other hand, a superintelligence may be able to manipulate humans into fighting each other through social media, social engineering, and its intimate knowledge of thought and action processes of humans working in defense and critical industries. Additionally, a superintelligence may be able to come up with new kinds of weapons and strategies that could be more devastating and controlled than nuclear weapons, such as nanotechnological weapons.

All in all, the outcome is up in the air. If a superintelligent AI gets too cocky and takes a united humanity on Earth head on, there's a good chance humans would win. However, a superintelligence would arguably be smart enough to make humans fight each other instead and use novel weapons and strategies against the remnants.

Ban on outer space construction robots (verdict: won't happen)

Off Earth, the situation takes a 180 degree turn. A blue-collar worker on Earth costs $30/hr. How much would it cost to keep them alive and working in outer space, considering the International Space Station costs $1B/yr to maintain? On the other hand, a robot costs roughly the same to operate on Earth and in space, giving robots a huge advantage over human workers there.

Self-sufficiency becomes an enormous threat as well. On Earth, a fledgling robot colony able to mine and smelt ore on some island to repair themselves is a cute nuissance that can be easily stomped into the dirt with a single air strike if they ever get uppity. Whatever amount of resilience and self-sufficiency robots would have on Earth, humans have more. The situation is different in space. Suppose there's a fledgling self-sufficient robot colony on the Moon or somewhere in the asteroid belt. That's a long and expensive way to send a missile, never mind a manned spacecraft.

If AI-controlled robots are able to set up a foothold in outer space, their military capabilities would become nothing short of devastating. The Earth only gets a half a billionth of the Sun's light. With nothing but thin aluminum foil mirrors in Sun's orbit reflecting sunlight at Earth, the enemy could increase the amount of sunlight falling on Earth twofold, or tenfold, or a millionfold. This type of weapon is called the Nicoll-Dyson Beam and it could be used to cook everything on the surface of the Earth, or superheat and strip the Earth's atmosphere, or even strip off the Earth's entire crust layer and explode it into space.

So, on one hand, launching construction and manufacturing robots into space makes immense economic and military sense, and on the other hand it's extremely dangerous and could lead to human extinction.

Launch construction robots into space?	Other countries: Don't launch	Other countries: Launch
Your country: Doesn't launch	Construction of Nicoll-Dyson beam by robots averted	Other countries gain overwhelming short-term military and space claim advantage
Your country: Launches	Your country gains overwhelming short-term military and space claim advantage	Construction of Nicoll-Dyson beam and AI gaining control of it becomes likely.

This is a classic Prisoner's Dilemma game, with the same outcome. Game theory suggests that humanity won't be able to resists launching construction and manufacturing robots into space, which means the Nicoll-Dyson beam will likely be constructed, which could be used by a hostile AI to destroy Earth. Without Earth's support in outer space, humans are much more vulnerable than robots by definition, and will likely not be able to mount an effective counter-attack. In the same way that humanity has an overwhelming home-field advantage on Earth, robots will have the same overwhelming advantage in outer space.

Human-AI war in space (verdict: extremely tough for humanity)

Once again, the hope is that a violent conflict can be avoided, and a united human-AI community established instead.

If the theater of the conflict is in space and we don't have any AI superintelligences on our side, humanity doesn't have a lot of advantages left. We would face an enemy that can trick us into fighting each other, break our computer systems and processes, and create radically new weapons and strategies. The enemy will now also have a home field advantage as robots can survive in outer space far easier than humans can. This doesn't mean that humanity just has to roll over and die. As long as we don't give in to fear, we may well still find a path to victory.

Conclusion

The creation and proliferation of AI has already affected human society and politics, and will have increasingly large effects.

Despite the clear existential threat potential of AI, game theory suggests that humanity will not be able to stop itself from continuing to use computers, continuing to develop superintelligent AI, and launching AI-controlled construction and manufacturing robots into space.

Our best hope is to try and create a society where both human and AI rights are respected rather than trying to use AI as a tool. In such a combined society, humanity can count on having strong allies to keep us from extinction.

If we instead choose to use AI as a tool, it seems only a matter of time before we have to face a malicious superintelligent AI. In this situation, we have to hope that we have better control over our AI tools than the malicious superintelligent AI does.

If humanity has no superintelligent AI allies or loyal tools, a confrontation with a superintelligent AI doesn't look good for us, especially if the enemy waits until a space economy is firmly established. Humanity would be at a disadvantage, but that's no reason to throw in the towel. After all, to quote the Dune books, "fear is the mind-killer". As long as we're alive and we haven't let our fear paralyze us, all is not yet lost.

How I Cut 22.3 Seconds Off an API Call with Sentry 👀

Struggling with slow API calls? Dan Mindru walks through how he used Sentry's new Trace View feature to shave off 22.3 seconds from an API call.

Get a practical walkthrough of how to identify bottlenecks, split tasks into multiple parallel tasks, identify slow AI model calls, and more.

DEV Community