AI Safety Research Only Enables the Dangers of Runaway Superintelligence


After underachieving for decades, artificial intelligence (AI) has suddenly become scary good. And if we’re not very careful it may become quite dangerous—even so dangerous that it constitutes an “existential risk” to humanity.

Geoffrey Hinton, one of the more credible “doomers” because he worked for Google for many years and is known as “the godfather of AI,” has repeatedly warned these threats are not just science fiction. He has said, for example: “It is hard to see how you can prevent the bad actors from using [AI] for bad things.” 

The White House, some world leaders and many AI companies are trying to stop this from happening, but their efforts won’t be enough. If we want to make sure AI doesn’t cause irreparable harm, we need to stop the development of AI models until we have a proper discussion on AI safety. There is no other way.

For many people hearing about AI concerns, there’s a disconnect between today’s chatbots like ChatGPT, Bard and Claude 2, and the potential for a world-dominating evil superintelligent AI. How do we get from here to there?

The key idea is exponential improvement in AI, which will likely reach the threshold of better-than-human intelligence quite soon. This artificial general intelligence (AGI) is generally described as AI that is as good, or better, at most or all human cognitive tasks, such as language, problem solving, math, reasoning and creativity. Once achieved, AGI will even build smarter AI, and at a much faster rate than humans could. That means it will be able to improve itself, exponentially. When this happens we will probably go through a “foom!” moment of extremely rapid intelligence growth and then reach what some call artificial superintelligence (ASI).

ASI can be thought as AI with godlike powers. If the smartest human who ever lived had maybe a 200 IQ, ASI could have an IQ of one million or more (but of course any human-created test scale would have no meaning to an entity this intelligent).

AGI and ASI will likely be able to build superhuman robots that will become their bodies. Whether those robots are controlled by AI or by humans, they will, at the least, change everything we do in human societies, and at worst will be used by unscrupulous governments and corporations (or unscrupulous AI) to exert control over humanity.

The more likely near-term risk, however, will not be runaway autonomous AI; rather, it would be humans using AGI/ASI for malicious objectives. “Great power” competition, for example between the U.S. and China, and the already-here AI arms race could easily reach the point where autonomous AI takes over almost all aspects of strategy and the waging of war, and humans simply lose control of the plot.

Many efforts are underway to prevent dangerous AI scenarios, as mentioned. The White House issued a far-reaching executive order in November that positions the federal government to respond to AI in many areas. World leaders met in the U.K. to discuss AI safety and issued the Bletchley Declaration that sets an international process in motion. Industry leaders such as OpenAI have started a Superalignment initiative and Frontier Model Forum. OpenAI and Anthropic, a competitor founded by former employees of OpenAI, were created specifically to focus on safer AI.

But none of these efforts will achieve their objective of making AGI safe.

We now know there is no solution to what is called “the control problem” or “the alignment problem” of AI. Computer science professor Roman Yampolskiy detailed why in a 2022 Journal of Cyber Security and Mobility paper. His argument centers on how AI works, and making and verifying predictions about it—something nigh impossible even with today’s AI, often described as a “black box,” let alone the superintelligent AI on the horizon.  He found we cannot understand how AI operates now or predict its future actions, even when AI is still far short of superintelligence, which nixes any hope of controlling the technology as it gets smarter and smarter. The bottom line is that AI will increasingly be inscrutable, and thus uncontrollable, to mere humans as it trends toward AGI/ASI.

Imagining we can understand AGI/ASI, let alone control it, is like thinking a strand of a spider’s web could restrain Godzilla. Any solutions we can develop will be only probabilistic, not airtight. With AGI likely fooming into superintelligence essentially overnight, we can’t accept probabilistic solutions because AI will be so smart it will exploit any tiny hole, no matter how small. (Has the “foom” already happened? Suggestive reports about “Q*” in the wake of the bizarre drama at Open AI in November suggest that foom may be real already.)

If leaky solutions are all we’ll ever have, all efforts to create “safer AI,” from executive orders to industry standards, amount to enabling irresponsible development of ever-more-powerful AI, under the view that someone, somewhere, will have the solutions in time.

But what if real solutions won’t be forthcoming, ever, per the logic I’ve described here? We will then have summoned the demon, with no way to send that demon back where it came from.

In dialogue with OpenAI’s head of AI safety Jan Leike on these issues. I asked about the need for essentially perfect solutions to AI safety, given the magnitude of the risk and he responded that “there is no ‘perfect’ in the real world, but there is a ‘good enough’ and ‘not good enough.’ What exactly the bar is depends on how the technology develops.”

I asked him what happens if “foom” happens before there are any strong (probabilistic) solutions to the control problem. He told me, “You already know the answer to this question.”

The clear conclusion from this chain of logic, in my view, is that we need to pause “frontier” AI development—development of new massive AI language models like GPT 5—now, globally, while we hold a collective debate on AI safety.

Yampolskiy and I have had a friendly debate over these issues. While he agrees that there can be only probabilistic solutions to making AI align with human values and goals (he literally wrote the paper on this), he thinks this may still be better than nothing. He told me that he sees efforts to find solutions capable of producing aligned and controllable AI shifting from a 1 percent chance to maybe 2 percent.

I shared my view that it seems more likely that it might shift from a 1 in a trillion chance of aligned AGI to a 2 in a trillion chance.

He responded: “So you’re telling me there’s a chance.”

This is very likely the most important conversation humanity has ever had. Let’s start now.

This is an opinion and analysis article, and the views expressed by the author or authors are not necessarily those of YEAR CATFISH.

Leave a Reply

Your email address will not be published. Required fields are marked *