index
Ch.
7

Real-world challenges

Chapter 7 illustration Chapter 7 illustration

As described above, controlling a more capable adversary is difficult-to-impossible depending upon the gap in capability. Alignment, which would reduce adversariality, is difficult and may in some senses be insoluble. And even if alignment were achieved, the incommensurability of human overseers with a superintelligence system means that a “control” relationship does not even really make sense.

But that is not the end of the challenge. Even if the fundamental technical obstacles to controlling superintelligence could somehow be overcome, the social, economic, and political context in which AGI and superintelligence would be developed creates additional barriers that may prove equally insurmountable.

Races undermine control

Many of the same incentives that cause our fictitious corporation to transfer power away from its CEO would also apply to AGI and superintelligence with respect to humanity. AGI and superintelligence would enter a social, economic, and political context that is intensely competitive economically and geopolitically, with (at present) very little institutional infrastructure to manage powerful AI. The implications are well-understood but perhaps underestimated.

There are, of course, perfectly valid reasons why companies and countries see a need to compete. But racing each other to more quickly develop and deploy powerful AI is directly antithetical to doing so safely or with well-designed control systems in place.1 This is true at the unintentional level (speed trades off with care), but also at the intentional level where safety and control are deliberately framed as harming “innovation” or “us” in an us-vs-them race.

Competitive dynamics will also undermine any unilateral pause in making more powerful AI systems by an individual company even if they recognize a major risk. Thus, for example, the drive for any single actor to push from AGI on to weak- and then strong versions of superintelligence would be intense, even with full recognition of the dangers. Such a pause or stop would need to be enacted in a coordinated way with strong incentives against defection, or imposed by governments. But the required level of international coordination currently does not exist, and the incentives for defection remain overwhelming.

Competition drives disempowerment through delegation

Once an AI system is developed, pouring it immediately into a competitive marketplace and rushing to incorporate it into a wide variety of systems is a direct recipe for control loss over it. As argued in detail by Hendrycks, in a competitive environment where AI can do tasks more cheaply it will be substituted for human labor. And if it can predict, plan, or decide more effectively, it will progressively replace human predictors, planners, and deciders — because if not, the institutions of which those humans are a part would be at a competitive disadvantage.

In general, as AI becomes more highly competent, there will be intense pressure to delegate to it the tasks — including articulating vision, developing goals, and making decisions — that determine where power lies.2 At the same time, for any given powerful AI system with nearly any goal, its incentives will drive it to tend to want to accrue more power. In combination, this leads almost inevitably to gradual disempowerment of humanity.3

Proliferation leads to abdication of control

While the slow-CEO analogy illuminates many control challenges, it fails to address a crucial aspect: proliferation. A different analogy is required.

Consider the kudzu vine. It’s a vine native to Japan and China that is attractive, fast-growing, good for erosion control, and edible. What’s not to like? Well, planted widely in the early 20th century in the American South, it became a quintessential invasive species there, smothering huge areas of forest under its choking canopy.4 Like knotweed, cane toads, zebra mussels, Asian carp and Africanized honeybees, the kudzu entered a new environment without viable natural competitors or predators, and proliferated wildly.

Now imagine a scenario in which, much more foolishly than kudzu, we deliberately release AGI into the digital wild. This is the stated intention of, for example, Meta and DeepSeek. What would happen? Kudzu is a dumb plant that reproduces in weeks or months, and we can’t manage it. What if it were smarter and faster than people, and reproduced in seconds or minutes? This scenario is described in detail in Aguirre 2025, and does not look great for humanity.

Openly-released AGI would quickly remove or have removed any built-in safeguards5 limiting its behavior or retaining its original alignment, and many goals it could have, develop, or be given would benefit from its fast reproduction, self-improvement, and resource acquisition. And it would be manifestly capable of doing all three without human help, or even permission.6 Moreover our digital and institutional infrastructure is incredibly vulnerable to this sort of invasion. There are cryptocurrencies that can be used to transact without banks, easy rentals of reservoirs of compute,7 and no real restrictions on what AI is allowed to do,8 as long as it is lawful. Meaningful human agency over the future would die a death by a thousand cuts, irrevocably degraded by countless AI agents optimizing for diverse, often-conflicting, unintended, and inscrutable goals.

In such a scenario of widespread proliferation, the entire framework for meaningful human control becomes moot. There is no central agent on which to perform an Emergency Shutdown or enact a Goal Modification; there is only a sprawling, evolving ecosystem beyond anyone’s authority. It would be extremely difficult to exert much human control at all over the evolution of this new intelligent species if it were recognized as a threat. And it is not even clear that it would be: these AI systems would be wealthy, powerful, eloquent, and helpful when it suits them.9 And just like the most powerful, wealthy, and eloquent throughout history, they would likely end up in charge.

Footnotes

  1. See Armstrong et al.

  2. As a harbinger of things to come, see this story of a Prime Minister coming under fire for heavy reliance on ChatGPT for a “second opinion.”

  3. For a deep dive into these dynamics see this essay on gradual disempowerment.

  4. See e.g., this description from the Nature Conservancy.

  5. Unlike for conventional software, open release of neural network weights unfortunately gives researchers quite limited ability to understand, debug, and red-team the system, due to the opaque nature of neural networks. But those neural networks can nonetheless be altered, via additional training, including to remove safeguards.

  6. We’re so used to thinking about passive software that it is hard to make this mental shift. Keep in mind, then, that there are all sort of autonomous self-propagating worms and viruses running around our digital infrastructure. Rather than imagining a language model endowed with the “desire” to self-propagate, imagine instead a software worm that is uplifted by the additional of powerful AI.

  7. Strong compute governance could do a lot to mitigate this risk, but we are currently not on track for that; there are currently enormous reservoirs of powerful chips with no tracking, oversight, or built-in mechanisms for preventing use by proliferating AGI or superintelligence.

  8. There are many things that legally require a human identity. But AGIs would easily be able to find and pay human “patsies” to do such things and take legal responsibility. And if the AI does something illegal, who is going to be punished anyway?

  9. We have already seen an instance of humans successfully protecting an AI system that has charmed them.