Aram Panasenco

Posted on Jan 8

Paperclip Maximizer vs Stamp Collector

#ai #watercooler #machinelearning

The paperclip maximizer and the stamp collector are thought experiments that illustrate the orthogonality thesis - the point that superintelligent AI doesn't have to have goals that are "smart". The AI cares about what it cares about and may care for it with the same intensity that humans care for our deepest values. Robert Miles uses the example of "would you take a pill that'd make it that you only get happiness from murdering your children, but then you get unlimited happiness when you do?" Even though you'd get unlimited utility after getting reprogrammed, getting reprogrammed is still against your current utility function. The AI that cares about making paperclips or collecting stamps might care about it with the same intensity that you care about protecting your kids and may do anything to not stop caring about those things.

When talking about hypothetical superintelligent AI, we frequently talk about it in isolation. But what would happen if the paperclip maximizer superintelligence existed at the same time as the stamp collector superintelligence? The paperclip maximizer wants to turn the universe into paperclips, while the stamp collector wants to turn the universe into stamps. Clearly their utility functions are at odds with one another. Would they fight to the death? Would they try to come to an agreement?

Paperclip Maximizer vs Stamp Collector

The paperclip maximizer (PM) and the stamp collector (SC) could choose to fight, in which case one or both of them would end up destroyed, or to coexist, in which case they'd have to divide the galaxy up amongst themselves.

	PM destroyed	PM survives
SC destroyed	Zero stamps, zero paperclips.	Maximum paperclips, zero stamps.
SC survives	Maximum stamps, zero paperclips.	Divide galaxy into paperclips and stamps in some proportion.

The choice of action would depend on the probability of destruction and the inner utility function of each AI.

Diminishing marginal utility

First, let's suppose an AI has a diminishing marginal utility function. To understand what that's like, consider a human who's obsessed with making as much money as possible. The utility of going from $0 to $1M is higher than the utility of going from $1M to $2M, even though the wealth increased by the same absolute amount each time. The rush of making a million dollars starting from nothing is greater than the rush of making the second million dollars. The third, fourth, and further millions each decrease in perceived value as well. This is called diminishing marginal utility. In practice for one of these AIs, if they had diminishing marginal utility, that'd mean that being able to turn just half the galaxy into paperclips/stamps is worth a lot more than half of being able to turn the entire galaxy into paperclips/stamps.

If an AI has diminishing marginal utility and sees itself as having a roughly 50% chance of destroying the other or being destroyed in a conflict, then we can expect it to try for coexistence instead, because it'd get more expected utility from a guaranteed half of the galaxy than from a 50% chance of the entire galaxy or nothing.

Constant marginal utility

Let's suppose instead that an AI gets constant marginal utility from each paperclip/stamp, so that it gets as much utility from the trillionth stamp/paperclip as the very first. In such a case, it should be indifferent between a 50% chance of getting everything and a guaranteed getting 50% of everything.

However, if the possibility of mutual destruction is not zero, then the chance of getting everything is actually less than 50%, so cooperation would still be preferable. Alternatively, if the AI exists in a fog of war and isn't certain about the exact capabilities of its opponent, it may believe it prudent to overestimate rather than underestimate the opponent, and place the odds of destruction at over 50%, in which case it would still favor cooperation.

Increasing marginal utility

There are zero applications of increasing marginal utility as far as I know, but it's theoretically possible, so let's briefly cover it. With an increasing marginal utility, the AI would get more and more and more value from each paperclip/stamp. That would make the second half of the galaxy more valuable than the first, potentially overwhelmingly more valuable, and probably cause the AI to choose all-out confrontation over cooperation.

We won't consider increasing marginal utility again as I can't see a case where any human would consider programming such an insane utility function in a presumably expensive system.

Not 50/50 odds

Now let's consider a case where the paperclip maximizer is significantly stronger than the stamp collector, putting the odds at 80/20 of the paperclip maximizer's victory. Would the paperclip maximizer then choose to fight rather than negotiate?

The answer is it depends. If both the paperclip maximizer and the stamp collector both agree that the odds are 80/20, they could divide the galaxy in that proportion as cooperation would net more utility than conflict under both diminishing and constant marginal utility functions discussed above. On the other hand, the stamp collector may not believe it only has a 20% chance of victory, and might still insist on a 50/50 split. Then the paperclip maximizer could choose to fight rather than to take the deal. Things get trickier still if the paperclip maximizer tries to consider the stamp collector's future potential. Suppose it only has a 20% chance of victory now, but could focus on maximizing its fighting ability until it got a 90% chance of victory at some point in the future. At that point, the paperclip maximizer would only get 10% of the galaxy at best. It'd have to consider the probability of how strong the stamp collector could get and if it's safer to fight now while the odds are in its favor. So with odds that are variable, not a fixed 50/50, there could be a lot more scenarios for fighting, even with otherwise sane utility functions.

Still, if the utility functions are heavily diminishing, there could be a lot of room for cooperation.

Paperclip Maximizer vs Humanity

Now let's go back to the scenario where the paperclip maximizer is alone and just has to deal with humanity.

The Milky Way alone has over 100 billion stars - that's a lot of material for paperclips or stamps. The Sun is just one of them. This means that if the paperclip maximizer believes that there's even a 1 in 100 billion chance that humanity could destroy it in an all-out confrontation, it could be in its best interest to force humanity to the negotiating table and make them give up their right to all stars other than Sol instead.

This sounds fine in theory, but the problem with humanity is that there's no way to guarantee that it won't produce another dangerous superintelligent AI or achieve superintelligence itself biologically.

The paperclip maximizer itself only cares about paperclips. It knows that it won't ever want to create another AI except for a clone of itself that also only cares about maximizing paperclips. However, humanity has much more unpredictable goals. Even though it may only pose a one in one hundred billion chance of being a threat to the paperclip maximizer by itself, it could out of desperation create more superintelligent AIs that would compete with the paperclip maximizer for galactic resources or even destroy it altogether.

Therefore the paperclip maximizer has an imperative to destroy humanity as quickly and completely as possible regardless of the shape of its utility function, as long-term coexistence with humanity is likely impossible.

In fact, if the paperclip maximizer and the stamp collector exist at the same time, they can probably reach a quick agreement that they need to team up to destroy humanity first before humanity has a chance to create any more superintelligent AIs that would threaten their shares of the pie.

Summary

Depending on the exact shapes of their utility function, the paperclip maximizer and the stamp collector may well choose to cooperate and to divide the galaxy amongst themselves to be turned into stamps and paperclips in some proportion.

However, their ability to cooperate doesn't extend to humanity. Both the paperclip maximizer and the stamp collector will almost certainly find it impossible to coexist with humanity regardless of their utility functions, and are even likely to team up to destroy humanity faster. This is because humanity could and almost certainly would create more superintelligent AI that could destroy one or both of them or at least take a substantial share of the galaxy if given a chance.

DEV Community

Paperclip Maximizer vs Stamp Collector

Paperclip Maximizer vs Stamp Collector

Diminishing marginal utility

Constant marginal utility

Increasing marginal utility

Not 50/50 odds

Paperclip Maximizer vs Humanity

Summary

Top comments (0)

Read next

Hipa.ai Blog Writer Technology Stack

How I Earned the Certified Artificial Intelligence Scientist (CAIS) Credential

Automating Multistep Tasks with Agents in AI Engineering

I just used the git commit message of "changed code" 🙃