Ch.

Summary and implications

This paper aims¹ to provide strategic counsel and threat warning: that there is overwhelming evidence that we are closer to building superintelligent AI systems than to understanding how to keep them under meaningful human control. Therefore whether gradually or more abruptly, and whether they take it or are given it, if we develop these systems along our present path, they will not ultimately grant their developers or humanity power, but absorb it.

That our current path puts us relatively close to superintelligence is controversial, but the controversy primarily regards the timeline to AGI. For the reasons given in Chapter 4, it should not be controversial that weak superintelligent systems are likely to be developed almost immediately after truly expert-level and fully autonomous general intelligence is achieved, with much stronger ones within months — and at most a few years — afterward.

That we are far from understanding how to control them is an understatement. There is no reason to believe, and every reason to doubt — approaching the level of mathematical proof — that humans would retain meaningful control of autonomous AI systems much faster, more complex, and more capable in almost every domain, than themselves. This is not a new argument,² and is in many ways obvious. But it is crucial that the “Compton constant” characterizing the probability of control loss is not “low but disturbingly high because of its importance” but is instead very high. AGI without superintelligence may, with requisite effort, be controllable; and for a while it may appear to be power-granting. But this would very likely be a quite transitory stage unless superintelligence is somehow specifically prevented. Thus the question of meaningful control loss would be “when” not “if.”

For this reason those aiming to develop very advanced AI generally do not talk about controlling it, but rather pivot to, or conflate control with, alignment. Alignment comes in many potential flavors, but it is also unsolved by almost any definition. Unlike control, alignment seems at least conceivable: we can imagine a system that really understands humans, really “cares” about their goals and aspirations, and works to help humans fulfill them. But the obstacles are very fundamental, and known techniques are manifestly failing as AI systems become more powerful, in both predictable and unpredictable ways.

If alignment were “solved” and — somehow — that solution were shared so that all AI developers could and did use it, then this could be a good future. But make no mistake: alignment is not control. Even with quite aligned superintelligences, machines and not humanity — or its governments — would ultimately be in charge. This is a direct threat to the sovereignty of every nation. Superintelligence could not be un-invented, and without control, there would be no recovery from any drift or imperfection in alignment. Building uncontrollable or incorrectly aligned superintelligent AI would likely be the last consequential mistake humanity makes — because soon after that, humanity wouldn’t be in charge of much of consequence.

In short, our current trajectory has a handful of powerful corporations rolling the dice with all our future, with massive stakes, odds unknown and without any meaningful wider buy-in, consent, or deliberation. Insofar as there is a “plan” among these companies, it is:

rush toward AGI and then superintelligence in an unbridled competition with the others;
since current alignment techniques are manifestly inadequate, try to keep progressively more powerful systems under control through “scalable oversight”;
when AI systems are much smarter than us, ask them to tell us how to align themselves and superintelligence;
as these superhuman AI systems compete and proliferate, and control of the future steadily transfers to them, assume that due to the success of this alignment program, generally “good things” will happen for humanity.

With all due respect to the teams at those companies, this is not a plan that inspires any confidence.

Retaining the sovereignty of our countries, the humanity of our society, and our dominion over our own species does not mean that AI progress must be halted: progress and innovation in AI is not one path allowing us only to stop or go, but rather an open field in which human society can choose wiser or less wise development directions; and there are many directions toward development of powerful and controllable AI tools. But with respect to AGI and superintelligence, avoiding control inversion means that the present dynamic would itself have to be reversed. Currently AI developers are racing to build them while hoping that somehow along the way, they or someone develops the will and means to control them. If our civilization is to retain human agency over its own destiny, all parties must choose not to build, and to prevent others from building, superintelligence until and unless we have devised the means and developed the will to control it first.

While this paper advocates no particular actions or policies, others do. For example Cohen et al. argue for prohibiting the development of “dangerously capable long-term planning agents” and instead implementing strict, preemptive controls on the production resources, such as compute and large foundation models, required to build them. The Future of Life Institute calls for safety standards that include controllability. ↩
It for example aligns closely with the “rogue AI” scenario set discussed in detail by Bengio and colleagues, accords with the recent in-depth study of AI control measures at different capability levels, is in line with detailed analyses by Bengio et al., Cotra and Carlsmith, and has been discussed since at least Bostrom’s seminal Superintelligence text and as recently as the new book by Yudkowsky and Soares. It also includes scenarios like gradual disempowerment in which humans and their institutions simply delegate and give away control. Our attempt here is to modernize, summarize, analogize, and formalize at least some of this extensive literature. ↩

Summary and implications

Footnotes