Introduction

There is a simple truth - humanity’s extinction is possible. Recent history has also shown us another truth - we can create artificial intelligence (AI) that can rival humanity. There is no reason to believe that creating an AI vastly beyond the most intelligent humans today is impossible. Should such AI research go wrong, it would risk our extinction as a species; should it go right, it will still seismically transform our world at a greater scale than the Industrial Revolution.

We now stand at a time of peril. Companies across the globe are investing to create artificial superintelligence – that they believe will surpass the collective capabilities of all humans. They publicly state that it is not a matter of “if” such artificial superintelligence might exist, but “when”.1 Their investments mean that we must ask: If we build machines smarter than any human, that are better at business, science, politics, and everything else, and can further improve themselves, do we know how to control them? This is a critical question for the future of every person alive today, and every one of our descendants.

Reasonable estimates by both private AI companies and independent third parties indicate that they believe it could cost only tens to hundreds of billions of dollars to create artificial superintelligence. It would be an accomplishment comparable to building a small fleet of aircraft carriers, or founding a new city of a million people from scratch: something that major countries such as the United Kingdom or France could achieve if sufficiently determined, and that the largest economies (such as the United States or China) could do without a significant impact on their other priorities.

We believe that no one company or government, no matter how well-intentioned its people and its work may be, should make such consequential decisions for the entirety of the human species. We need to chart a path for humanity as a whole to stay in control.

A new and ambitious future lies beyond a narrow path. A future driven by human advancement and technological progress. One where humanity fulfills the dreams and aspirations of our ancestors to end disease and extreme poverty, achieves virtually limitless energy, lives longer and healthier lives, and travels the cosmos. That future requires us to be in control of that which we create, including AI.

This document outlines our plan to achieve this: to traverse this path. It assumes the reader already has some familiarity with the ways in which AI poses catastrophic and extinction risks to human existence. These risks have been acknowledged by world2 leaders3, leading scientists and AI industry leaders4 5 6, and analyzed by other researchers, including the recent Gladstone Report commissioned by the US Department of State7 and various reports by the Center for AI Safety and the Future of Life Institute.8 9

Our plan consists of three phases:

Phase 0: Safety

New institutions, legislation, and policies that countries should implement immediately that prevent development of AI that we do not have control of. With correct execution, the strength of these measures should prevent anyone from developing artificial superintelligence for the next 20 years.

Phase 1: Stability

International measures and institutions that ensure measures to control the development of AI do not collapse under geopolitical rivalries or rogue development by state and non-state actors. With correct execution, these measures should ensure stability and lead to an international AI oversight system that does not collapse over time.

Phase 2: Flourishing

With the development of rogue superintelligence prevented and a stable international system in place, humanity can focus on the scientific foundations for transformative AI under human control. Build a robust science and metrology of intelligence, safe-by-design AI engineering, and other foundations for transformative AI under human control.

The Problem

The greatest threat facing humanity is the concentrated effort to create artificial superintelligence. Our current national and international systems are wholly inadequate to react to such a threat. Behind closed doors, development continues with an ideological desire to build an entity that is more capable than the best humans in practically every field. While most AI development is beneficial, the risks of superintelligence are catastrophic. We have no method to currently control an entity with greater intelligence than us. We have no ability to predict current AIs’ intelligence prior to developing them, and we have incredibly limited methods to accurately measure their competence after development.

Importantly, there are catastrophic and extinction level risks regardless of the technical design, business models, or nationalities of those developing artificial superintelligence. It is purely a question of whether such an intelligence exists, either as a single monolithic AI model or a collection of AI systems combined together to achieve an intellect that is more capable than humans in practically every field.

Below we outline four key arguments that underpin our reasoning about this problem and natural implications for the future of AI development. For a much more complete and detailed explanation of the problem, see The Compendium.10

1: We believe it is possible to have artificial intelligence systems surpassing human capabilities across a wide range of tasks, which many call artificial superintelligence. Multiple AI companies around the world are explicitly aiming to create artificial superintelligence. A mix of our own estimates, and estimates from the AI field, lead to an estimate that it might be created within the next 3-15 years.

2: Those seeking to develop artificial superintelligence do not know how to predict what their AIs can do before creating them, nor are they able to understand why their AIs behave as they do, to control them, to evaluate the full extent of the AIs’ abilities, or even just to shut down such AIs if needed. Therefore, we believe that if developed under current conditions, artificial superintelligence would pose an unacceptable risk of extinction for humanity.

3: We believe that the potential catastrophic and extinction risk from artificial superintelligence fundamentally originates from its intelligence. Sufficiently high intelligence enables an entity to have greater power over other actors. In absence of strong and proven control over such an entity, the default outcome of the emergence of an entity vastly more powerful than humanity is the disempowerment of humanity. Ultimately, until we have the technical solutions, legal systems and processes, and the understanding required to control an entity of such power, we should not create entities that could overpower us.

4: Humanity does not have a sufficient general theory or science of measurement for intelligence. Developing these theories would allow us to better predict and evaluate the capabilities of an AI system given certain inputs and characteristics, so that we could restrict and control them. Developing this will require significant effort and therefore humanity should start this effort immediately. Until that is achieved, countries must take significant precautions with AI development or risk being continuously out of control.

The state of the art of intelligence theory and measurement is primitive; we are like physicists who lack the tools necessary to estimate what quantity of radioactive material could go supercritical. Until we can describe potential risky states of AI development and AI models directly, countries should implement regulatory guardrails based on proxies of intelligence.

If countries solely focus on a single proxy - such as compute - to constrain artificial intelligence, then they would need to impose extremely restrictive limits on that proxy for future development. This would be necessary to ensure sufficient safety margins against the risks of improvements in other dimensions, such as algorithms. Such a restrictive approach would stifle low-risk innovation.

Therefore, to preserve flexibility and minimize risk across the number of uncertain futures we face, countries should seek to monitor and regulate multiple components of AI development instead with a defense in depth approach. These include:

Computing power used to develop and power AIs;
General intelligence of AI systems measured via proxies other than compute;
Behavioral capabilities, including the development and use of AIs improving AIs, and AIs capable of breaking out of their own environment;
The deployment of AIs without a safety case;
The development and deployment of AIs for use in unsafe applications.

This is a non-exhaustive list that should be expanded. These components have been chosen to constitute a defense in depth approach to cover different vectors of risk from AI development.

The science of intelligence is underdeveloped. Humanity must invest in significantly improving it if we ever hope to have control of superintelligent AI development. We must first understand what we are developing before creating an entity which is more intelligent than ourselves.

The Solution

The risks from AI development cannot be prevented without also affecting innovation and technological advancement to some degree. However, how much risk humanity accepts as part of this trade should be a conscious decision, not one taken without oversight or consideration. We are developing a new form of intelligence - one that will surpass our own - and we must not cede our future to it.

To achieve this, governments across the world will need to urgently implement measures at a national level while negotiations on a treaty start at an international level, especially between the USA and China.

To effectively confront the challenges posed by artificial intelligence, three sequential steps are necessary:

Build up our defenses to restrict the development of artificial superintelligence. Safety.
Once we have halted the immediate danger, build a stable international system. Stability.
With a stable system and humanity secure, build transformative AI technology under human control. Flourishing.

At present, we are not succeeding. More critically, humanity is not actively working to face this threat. Efforts remain uncoordinated, and current trends suggest an inexorable convergence towards the development of artificial superintelligence. Should this occur, humanity's role as the driving factors of events in the visible universe will conclude, marking the end of the Anthropocene era.

The most urgent priority is to prevent the development of artificial superintelligence for the next 20 years. Any confrontation between humanity and a superintelligence within the next two decades would likely result in the extinction of our species, with no possibility of recovery. While we may require more than 20 years, two decades provide the minimum time frame to construct our defenses, formulate our response, and navigate the uncertainties to gain a clearer understanding of the threat and how to manage it.

Any strategy that does not secure this two-decade period is likely to fail due to the inherent limitations of current human institutions, governmental processes, scientific methodologies, and planning constraints. These two decades would also grant us more time to develop sufficient methodologies to shape, predict, evaluate and control AI behavior. Additional time beyond two decades would be advantageous but should not be relied upon.

Thus, the goal of Phase 0 is to Ensure Safety: Prevent the Development of Artificial Superintelligence for 20 Years.

With safety measures in place and two decades to mount our response, the next challenge arises from the potential instability of such a system. While universal compliance with Phase 0 measures would be ideal, it is unrealistic to expect perfect adherence. Systems naturally decay without active maintenance. Moreover, individually minor attempts to circumvent the system can compound over time, potentially undermining the entire framework.

We should anticipate various actors, including individuals, corporations, and governments, to exert pressure on the system, testing its resilience. To maintain safety measures for the required two decades and beyond, it is necessary to establish institutions and incentives that ensure system stability.

Therefore, the goal of Phase 1 is to Ensure Stability: Build an International AI Oversight System that Does Not Collapse Over Time.

With the threat of extinction contained for at least two decades, and institutions in place that ensure the security system remains stable, humanity can build towards a future where transformative AI is harnessed to advance human flourishing.

While our science, collective epistemology, and institutions are currently too weak and unprepared to face the challenge, we can improve ourselves and improve them to succeed.

Thus, the goal of Phase 2 is to Ensure Flourishing: Build Controllable, Transformative AI.

1 Avital Balwit, “My Last Five Years of Work”, Palladium, May 17, 2024; OpenAI, “Superalignment Fast Grants”, OpenAI Blog, December 14, 2023

2 Rishi Sunak, “Prime Minister's speech on AI”, The Royal Society, October 26, 2023; Adam Forrest, Kate Devlin, “Human extinction risk from AI on same scale as pandemics or nuclear war, Sunak warns”, The Independent, October 26, 2023

3 Ursula von der Leyen, “State of the European Union Address”, September 13, 2023; European Commission [@EUCommission], “Mitigating the risk of extinction from AI should be a global priority. And Europe should lead the way, building a new global AI framework built on three pillars: guardrails, governance and guiding innovation ↓”, September 14, 2023, 1:15 PM

4 CAIS, “Statement on AI Risk”, May 30, 2023

5 Future of Life Institute, “Pause Giant AI Experiments: An Open Letter”, March 22, 2023

6 Sam Altman, “Machine intelligence, part 1”, February 25, 2015

7 Gladstone AI, “An Action Plan to increase the safety and security of advanced AI”, February 2024

8 Dan Hendrycks, Mantas Mazeika, and Thomas Woodside, “An Overview of Catastrophic AI Risks“, ArXiv, October 9, 2023

9 Ben Eisenpress, “Catastrophic AI Scenarios”, Future of Life Institute Blog, February 1, 2024; Will Jones, “Introductory Resources on AI Risks”, Future of Life Institute Blog, September 18, 2023

10 Connor Leahy, Gabriel Alfour, Chris Scammell, Andrea Miotti, and Adam Shimi, “The Compendium”, December 9, 2024

Phase 0

Get Updates

Sign up to our newsletter if you'd like to stay updated on our work,
how you can get involved, and to receive a weekly roundup of the latest AI news.

If you have feedback on The Plan or want to know how you can help to support it please get in touch with us directly

If you have feedback on The Plan or want to know how you can help to support it please get in touch with us directly

Full plan as PDF

Introduction

Phase 0

Phase 1

Phase 2

What Success Looks Like

Annexes

Introduction

Phase 0: Safety

Phase 1: Stability

Phase 2: Flourishing

The Problem

The Solution

Phase 0

Full plan as PDF

Full plan as PDF

Full plan as PDF

Full plan as PDF