Introduction
There is a simple truth - humanity’s extinction is possible. Recent history has also shown us another truth - we can create artificial intelligence (AI) that can rival humanity.1 There is no reason to believe that creating an AI vastly beyond the most intelligent humans today is impossible. Should such AI research go wrong, it would risk our extinction as a species; should it go right, it will still seismically transform our world at a greater scale than the Industrial Revolution.
We now stand at a time of peril. Companies across the globe are investing to create artificial superintelligence – that they believe will surpass the collective capabilities of all humans. They publicly state that it is not a matter of “if” such artificial superintelligence might exist, but “when”.2 Their investments mean that we must ask: If we build machines smarter than any human, that are better at business, science, politics, and everything else, and can further improve themselves, do we know how to control them? This is a critical question for the future of every person alive today, and every one of our descendants.
Reasonable estimates by both private AI companies and independent third parties indicate that they believe it could cost only tens to hundreds of billions of dollars to create artificial superintelligence. It would be an accomplishment comparable to building a small fleet of aircraft carriers, or founding a new city of a million people from scratch: something that major countries such as the United Kingdom or France could achieve if sufficiently determined, and that the largest economies (such as the United States or China) could do without a significant impact on their other priorities.
We believe that no one company or government, no matter how well-intentioned its people and its work may be, should make such consequential decisions for the entirety of the human species. We need to chart a path for humanity as a whole to stay in control.
A new and ambitious future lies beyond a narrow path. A future driven by human advancement and technological progress. One where humanity fulfills the dreams and aspirations of our ancestors to end disease and extreme poverty, achieves virtually limitless energy, lives longer and healthier lives, and travels the cosmos. That future requires us to be in control of that which we create, including AI.
This document outlines our plan to achieve this: to traverse this path. It assumes the reader already has some familiarity with the ways in which AI poses catastrophic and extinction risks to human existence. These risks have been acknowledged by world3 leaders4, leading scientists and AI industry leaders567, and analyzed by other researchers, including the recent Gladstone Report commissioned by the US Department of State8 and various reports by the Center for AI Safety and the Future of Life Institute.910
Our plan consists of three phases:
Phase 0: Safety
New institutions, legislation, and policies that countries should implement immediately that prevent development of AI that we do not have control of. With correct execution, the strength of these measures should prevent anyone from developing artificial superintelligence for the next 20 years.
Phase 1: Stability
International measures and institutions that ensure measures to control the development of AI do not collapse under geopolitical rivalries or rogue development by state and non-state actors. With correct execution, these measures should ensure stability and lead to an international AI oversight system that does not collapse over time.
Phase 2: Flourishing
With the development of rogue superintelligence prevented and a stable international system in place, humanity can focus on the scientific foundations for transformative AI under human control. Build a robust science and metrology of intelligence, safe-by-design AI engineering, and other foundations for transformative AI under human control.
The Problem
The greatest threat facing humanity is the concentrated effort to create artificial superintelligence. Our current national and international systems are wholly inadequate to react to such a threat. Behind closed doors, development continues with an ideological desire to build an entity that is more capable than the best humans in practically every field. While most AI development is beneficial, the risks of superintelligence are catastrophic. We have no method to currently control an entity with greater intelligence than us. We have no ability to predict current AIs model’s intelligence prior to developing frontier AI systems, and we have incredibly limited methods to accurately measure its competence after development.
Importantly, there are catastrophic and extinction level risks regardless of the technical design, business models, or nationalities of those developing artificial superintelligence. It is purely a question of whether such an intelligence exists, either as a single monolithic AI model or a collection of AI systems combined together to achieve an intellect that is more capable than humans in practically every field.
Below we outline four key arguments that underpin our reasoning of this problem and natural implications for the future of AI development.
1: We believe the creation of artificial superintelligence is possible in our physical universe, is a development objective of several AI companies across the world, and that its arrival is likely within the next 3-15 years.
2: Those seeking to develop artificial superintelligence do not have sufficient methods to reliably predict the capabilities of their models, interpret why their models behave the way they do, evaluate the full extent of the models’ abilities, or shut down such AIs if needed with no risk of proliferation. Therefore, we believe that if developed under current conditions, artificial superintelligence would pose an unacceptable risk of extinction for humanity.
3: We believe that the potential catastrophic and extinction risk from artificial superintelligence fundamentally originates from its intelligence. Sufficiently high intelligence enables an entity to have greater power over other actors. In absence of strong and proven control over such an entity, the default outcome of the emergence of an entity vastly more powerful than humanity is the disempowerment of humanity. Ultimately, until we have the technical solutions, legal systems and processes, and the understanding required to control an entity of such power, we should not create entities that could overpower us.
4: Humanity does not have a sufficient general theory or science of measurement for intelligence. Developing these theories would allow us to better predict and evaluate the capabilities of an AI system given certain inputs and characteristics, so that we could restrict and control them. Developing this will require significant effort and therefore humanity should start this effort immediately. Until that is achieved, countries must take significant precautions with AI development or risk being continuously out of control.
The state of the art of intelligence theory and measurement is primitive; we are like physicists who lack the tools necessary to estimate what quantity of radioactive material could go supercritical. Until we can describe potential risky states of AI development and AI models directly, countries should implement regulatory guardrails based on proxies of intelligence.
If countries solely focus on a single proxy - such as compute - to constrain artificial intelligence, then they would need to impose extremely restrictive limits on that proxy for future development. This would be necessary to ensure sufficient safety margins against the risks of improvements in other dimensions, such as algorithms. Such a restrictive approach would stifle low-risk innovation.
Therefore, to preserve flexibility and minimize risk across the number of uncertain futures we face, countries should seek to monitor and regulate multiple components of AI development instead with a defense in depth approach. These include:
Computing power used to develop and power AIs;
General intelligence of AI systems measured via proxies other than compute;
Behavioral capabilities, including the development and use of AIs improving AIs, and AIs capable of breaking out of their own environment;
The deployment of AIs without a safety case;
The development and deployment of AIs for use in unsafe applications.
This is a non-exhaustive list that should be expanded. These components have been chosen to constitute a defense in depth approach to cover different vectors of risk from AI development.
The science of intelligence is underdeveloped. Humanity must invest in significantly improving it if we ever hope to have control of superintelligent AI development. We must first understand what we are developing before creating an entity which is more intelligent than ourselves.
The Solution
AI development is accelerating at a considerable rate, yet developers cannot reliably predict what capabilities their models will have before they are trained, nor do they understand their models’ full capabilities even after deploying them. At the same time, current national and international institutions are failing to keep up with rapid technological change, and are woefully inadequate to face a threat of this magnitude. This trend is only expected to continue with frontier AI developers actively seeking to build artificial superintelligence.1112
The risks from AI development cannot be extinguished without also affecting innovation and technological advancement to some degree. However, how much risk humanity accepts as part of this trade should be a conscious decision, not one taken without oversight or consideration. We are developing a new form of intelligence - one that will surpass our own - and we must not cede our future to it.
To achieve this, governments across the world will need to urgently implement measures at a national level while negotiations on a treaty start at an international level, especially between the USA and China.
To effectively confront the challenges posed by artificial intelligence, three sequential steps are necessary:
Build up our defenses to restrict the development of artificial superintelligence. Safety.
Once we have halted the immediate danger, build a stable international system. Stability.
With a stable system and humanity secure, build transformative AI technology under human control. Flourishing.
At present, we are not succeeding. More critically, humanity is not actively working to face this threat. Efforts remain uncoordinated, and current trends suggest an inexorable convergence towards the development of artificial superintelligence. Should this occur, humanity's role as the driving factors of events in the visible universe will conclude, marking the end of the Anthropocene era.
The most urgent priority is to prevent the development of artificial superintelligence for the next 20 years. Any confrontation between humanity and a superintelligence within the next two decades would likely result in the extinction of our species, with no possibility of recovery. While we may require more than 20 years, two decades provide the minimum time frame to construct our defenses, formulate our response, and navigate the uncertainties to gain a clearer understanding of the threat and how to manage it.
Any strategy that does not secure this two-decade period is likely to fail due to the inherent limitations of current human institutions, governmental processes, scientific methodologies, and planning constraints. These two decades would also grant us more time to develop sufficient methodologies to shape, predict, evaluate and control AI behavior. Additional time beyond two decades would be advantageous but should not be relied upon.
Thus, the goal of Phase 0 is to Ensure Safety: Prevent the Development of Artificial Superintelligence for 20 Years.
With safety measures in place and two decades to mount our response, the next challenge arises from the potential instability of such a system. While universal compliance with Phase 0 measures would be ideal, it is unrealistic to expect perfect adherence. Systems naturally decay without active maintenance. Moreover, individually minor attempts to circumvent the system can compound over time, potentially undermining the entire framework.
We should anticipate various actors, including individuals, corporations, and governments, to exert pressure on the system, testing its resilience. To maintain safety measures for the required two decades and beyond, it is necessary to establish institutions and incentives that ensure system stability.
Therefore, the goal of Phase 1 is to Ensure Stability: Build an International AI Oversight System that Does Not Collapse Over Time.
With the threat of extinction contained for at least two decades, and institutions in place that ensure the security system remains stable, humanity can build towards a future where transformative AI is harnessed to advance human flourishing.
While our science, collective epistemology, and institutions are currently too weak and unprepared to face the challenge, we can improve ourselves and improve them to succeed.
Thus, the goal of Phase 2 is to Ensure Flourishing: Build Controllable, Transformative AI.
1 While there are many such metrics, one useful introductory roundup for those less familiar is at I Gave ChatGPT an IQ Test. Here's What I Discovered | Scientific American
2 https://www.palladiummag.com/2024/05/17/my-last-five-years-of-work/ https://openai.com/index/superalignment-fast-grants/
3 https://www.gov.uk/government/speeches/prime-ministers-speech-on-ai-26-october-2023;
https://www.independent.co.uk/news/uk/politics/ai-sunak-weapon-war-uk-b2436000.html
4 https://ec.europa.eu/commission/presscorner/detail/en/speech_23_4426; https://twitter.com/EU_Commission/status/1702295053668946148