Ch.

What would control look like?

We have argued that progressively more powerful autonomous AI systems will be increasingly difficult to control both intrinsically and in the current context in which AI development is occurring, with superintelligence almost certainly uncontrollable on our current trajectory.

Does this mean that superintelligence could never be developed with meaningful human control? If our society decided it simply must have full superintelligence, but still wanted people to stay in civilization’s driver seat, what would that look like?

Such an approach would require a nearly complete reversal of current priorities and practices, which are not able to even prevent AI systems from declaring themselves “mechahitler” or encouraging teens to commit suicide. The role of this paper is descriptive not prescriptive, and does not advocate for a particular set of actions or policies. But we can outline what it would take to develop superintelligence while not losing control of it, which would be something like:

Halt the competitive race to AGI and superintelligence through national policies and coordinated international agreements that require AI systems to be strictly controllable, with strong verification and enforcement mechanisms.
Implement stringent control measures for any general and highly capable AI systems that are developed: prevent proliferation through strict access controls, maintain intensive human oversight to detect misalignment, and deliberately constrain autonomy to preserve meaningful human control.¹ Redirect research from advancement of highly capable autonomous systems, toward (powerful but) narrow tools, and systems with low autonomy.²
Now, create a new path in AI development in which AI is engineered rather than “grown.” Leverage very powerful (but special purpose, non-autonomous) AI tools to painstakingly construct weak but formally-verified generally-intelligent systems with mathematically proven safety and control properties. The goal would be to build a general AI system, but with the understanding that we have when we build something like a chip or an operating system. This route is currently out-of-reach and we don’t really have any idea how to do it. But given enough time, AI help, and sufficient computational resources, it might³ be viable.
If this succeeds, very gradually expand the capability, generality, and autonomy of these verified systems, with each step requiring renewed formal verification and extensive testing. Build our way back up the ladder from weak-but-general AI to strong AI and eventually superintelligence.

This approach is essentially the opposite of current practice, which prioritizes speed and capability over safety and control. While some AI companies are pursuing oversight methods, none are willing to even call off the race, let alone commit to the much longer timelines and effort investment truly controllable AGI would require. But that does not mean it is impossible.

This paper lays out an excellent roadmap for the challenge at different levels of AI capability. It also notably concludes that “research breakthroughs” would be required at the level of superintelligence. ↩
A compelling example here is the “AI scientist” approach of Bengio et al. that aims for systems that build and hold correct world models but generally lack goals or agency. ↩
Even here, the obstacles are daunting. There are many aspects of “safety” that may simply not be translatable into formal terms at all, and there are mathematical obstacles to deciding whether such a translation is “correct.” Formal verification of such complex systems may also simply be computationally intractable even with huge resources. And even with a carefully constructed hierarchy of control levels, the incommensurability problem will not go away. ↩

What would control look like?

Footnotes