Phase 0: Safety

As discussed in The Problem, artificial superintelligence is not only possible, but likely to be developed in the next decades.13 When this happens, humanity will no longer be the dominant species on Earth. Faced with an entity or entities that are more competent, efficient, and intelligent than all of humanity combined, the default outcome will be the extinction of the human species in the years that follow. The starkness of this threat has been discussed since the 1900s14, and has been an open secret in the field of artificial intelligence for the past decades. This extinction level threat is now publicly recognized by world15 leaders16, leading scientists, and even many CEOs17 of the very companies attempting to develop this technology.18192021

This threat can be likened to humanity awaiting an invasion by a foreign, highly technologically advanced power. Humanity is currently observing this invader build its capabilities. Yet despite the warnings, no country nor humanity as a whole has even begun to coordinate and start mustering its defenses, let alone prepare a counterattack.

Crucially, humanity is not actively participating in this conflict against the threat of artificial superintelligence. At present, there is virtually no oversight of the development pipelines of AI companies. Moreover, there are no established mechanisms we could use to stop these development efforts if necessary to prevent a disaster.

Efforts remain uncoordinated, and current trends suggest an inexorable convergence towards the development of artificial superintelligence. Should this occur, humanity's role will conclude, marking the end of the Anthropocene.

The most urgent priority is, as described above, to prevent the development of artificial superintelligence for the next 20 years. Any confrontation between humanity and a superintelligence within the next two decades would likely result in the extinction of our species, with no possibility of recovery. While we may require more than 20 years, two decades provide the minimum time frame to construct our defenses, formulate our response, and navigate the uncertainties to gain a clearer understanding of the threat and how to manage it.

Any strategy that does not secure a period of roughly two decades without artificial superintelligence is likely to fail. This is because of the inherent limitations of current human institutions, governmental processes, scientific methodologies, and the length of time it will take to upgrade them. Any minimum period needed for such monumental reforms needs to account for significant amounts of planning fallacy. Additional time beyond two decades would be advantageous but should not be relied upon.

Thus, the Goal of Phase 0 is to Ensure Safety: Prevent the Development of Artificial Superintelligence for 20 Years.

Conditions

As discussed in The Problem, we face a threat, artificial superintelligence, for which we have neither a general predictive theory, nor a standard metrology (a science of measurement and its application, in this case, for intelligence22).

If we did have that scientific understanding, we could precisely measure the level at which superintelligence emerges, and avoid it.

We do not have this understanding. Thus we need to rely on a defense in depth approach, tracing both multiple proxies of the underlying metric, intelligence, as well as identifying certain concerning capabilities that derive from intelligence and straightforwardly addressing them.

Our defense in depth must cover a variety of Safety Conditions. Policy measures taken in Phase 0 in aggregate will have to satisfy all Safety Conditions to ensure that the goal is achieved.

Given this, here are the conditions to be met:

  1. No AIs improving AIs

  2. No AIs capable of breaking out of their environment

  3. No unbounded AIs

  4. Limit the general intelligence of AI systems so that they cannot reach superhuman level at general tasks

Some of these will be achieved via capability-based conditions (a to c), while some will rely on proxies of general intelligence (d).

No AIs improving AIs

Boundaries and limitations are meaningless if they are easy to circumvent. AIs improving AIs is the clearest way for AI systems, or their operators, to bypass limits to their general intelligence.

AIs competent enough to develop new AI techniques, enact improvements on themselves or on new AI systems, and execute iterative experiments on AI development can quickly enable runaway feedback loops that can bring the AI system from a manageable range, to levels of competence and risk far beyond those intended.

More broadly, the dissemination of such techniques makes it easier over time for any threat actor to start with an authorized, limited AI system, and bootstrap it beyond the limits. If any of these efforts succeed at reaching superintelligence levels, humanity faces extinction.

Given this, a condition for a safe regime that prevents the development of superintelligence for 20 years is to not have AIs improving AIs, and prevent the development and dissemination of techniques that let a threat actor bootstrap weaker AIs into highly generally intelligent AIs. Not having this condition would invalidate most red lines, restrictions and mitigations put in place.

No AIs capable of breaking out of their environment

Another necessary condition for maintaining any oversight and safety of AI systems is to ensure that boundaries cannot be bypassed or trivialized. AIs capable of breaking out of their designated environments represent a critical vulnerability that could rapidly accelerate the path to uncontrolled superintelligence. Moreover, AIs having the capability to break out of their environment would undermine any framework of AI governance and control, potentially allowing AI systems to act in ways that were neither intended nor authorized by their developers or operators.

AI systems with the ability to access unauthorized systems or spread beyond their intended operational boundaries can quickly evade human control and monitoring. This capability allows AIs to potentially acquire vast computational resources, access sensitive data, or replicate themselves across networks – all key ingredients for bootstrapping towards superintelligence.

The mere existence of breakout techniques makes it easier for any threat actor to take a limited AI system and expand its reach and capabilities far beyond intended limits.

Given this, another condition for achieving the goal of Phase 0 is to prohibit AIs capable of breaking out of their environment, and prevent the development and dissemination of techniques that enable unauthorized system access or self-propagation. Failing to implement this condition would render most other safety measures and restrictions ineffective, as AI systems could simply circumvent them through unauthorized access.

No unbounded AIs

Predictability and controllability are fundamental prerequisites for safety in all high-risk engineering fields. AI systems whose capabilities and behaviors cannot be reliably bounded pose severe risks to safety, security, and the path towards superintelligence.

Unbounded AI systems - those for which we cannot justifiably predict their capabilities or constrain their actions - represent a critical vulnerability in our ability to manage AI. The deployment of such systems undermines our capacity to implement meaningful safety measures and restrictions. This ability to model and predict system behavior in various circumstances is a cornerstone of safety engineering in high-risk fields such as aviation, civil engineering, and nuclear power.

Given this, a third condition for preventing the development of superintelligence for 20 years is to allow only the deployment of AI systems with valid, comprehensive safety justifications that bound their capabilities and behaviors.

These justifications should at the very least cover capabilities of concern within the relevant jurisdiction, as well as any capabilities that are identified as red lines internationally. This requires the ability to reliably predict and justify why and how an AI's functionalities will be constrained before deployment, analogous to safety analyses in other high-risk industries.

Without such justifications, it becomes impossible to enforce safety requirements or provide guarantees against catastrophic events - a standard explicitly expected in other high-risk sectors. Failing to implement this condition would render most other safety measures ineffective, as we would lack the foundational ability to ensure AI systems remain within their intended operational and capability boundaries. Moreover, it will make it significantly harder to collectively reason about AI systems, and to distinguish between dangerous development directions and innocuous applications.

Limit the expected general intelligence of AI systems

The most straightforward condition, in principle, that is needed to prevent the development of superintelligence for 20 years is to ensure no AI system reaches a significant amount of general intelligence.

While this is straightforward in principle, it is difficult to achieve in practice, as humanity has not yet developed a general predictive theory of intelligence, nor a metrology (measurement science) of intelligence. 

Difficulty of measurement however is not an excuse to not measure at all, but rather a reason to start from the best proxies and heuristics we can find, apply them conservatively, and develop this science further.

Without restricting the general intelligence of AI systems, development can straightforwardly cross into the superintelligence range accidentally or intentionally, and fail the goal of Phase 0.

Summary

1. Prohibit the development of Superintelligent AI

Objective

  • Prohibit the development, creation, testing, or deployment of artificial superintelligence systems. 

This policy fulfills the condition of limiting the general intelligence of AI systems.

Definitions

Artificial Superintelligence: Any artificial intelligence system that significantly surpasses human cognitive capabilities across a broad range of tasks.

Overview

The development, creation, testing, or deployment of artificial superintelligence systems is prohibited. 

It is prohibited to knowingly participate in the development of, build, acquire, receive, possess, deploy, or use, any superintelligent AI.

This prohibition extends to research aimed at producing artificial superintelligence, enhancement of existing AI systems that could result in artificial superintelligence, and the operation or transfer of superintelligence-related technologies.

Rationale

Multiple actors are racing towards creating artificial intelligence more capable and powerful than any existing human or group of humans. What is worse, they are tackling this goal in a way that all but ensures they will not be able to control or even understand the result.

Such artificial superintelligence would have an irreversible upper hand over the entirety of humanity, leading to loss of control by mankind and possibly extinction.

Given the extinction risk posed by this technology, it is necessary to establish a guiding policy principle that prohibits the development of artificial superintelligence in a clear and unequivocal manner, at the national and international level.

Mechanism

This high-level prohibition has a dual purpose: being a clear, normative prohibition on the development of superintelligence, as well as being a guiding principle for other measures.

As a normative prohibition, this policy gives a clear and unequivocal signal that activities that can be construed as contributing to the development of superintelligence are legally and socially unacceptable, and provides the basis for pursuing and preventing them under the full force of the law. This serves as a foundation for other more focused measures, which will operationalize concrete precursor technologies that may lead to superintelligence and either restrict them, or outright prohibit them.

The policy provides the core guiding principle around which additional policies can be detailed and developed. The list of policies in this document is not exhaustive, and reflects the understanding of the science of intelligence as of 2024: we should expect that with more advances in the understanding of intelligence, artificial and otherwise, additional threat vectors will be identified, as well as potentially more precise and narrow mitigations than some that we recommend here.

It also makes clear that the object of concern is superintelligence itself, and provides justification for further measures only so long as they are focused on achieving the goal enshrined in the principle: preventing the development of superintelligence.

This is akin to the existing national and international measures on technologies that threaten global security, such as nuclear weapons (with the NPT and the Atomic Act of 195423 in the USA) and biological weapons (with the Biological Weapons Convention24, the Chemical Weapons Convention Implementation Act and related statutes in the USA). In these and other legal instruments, the technology of concern is clearly and normatively prohibited first, followed by further legislation and implementation to delineate the details of enforcement.

Implementation and enforcement

National authorities should clearly and unambiguously determine that the development of artificial superintelligence is prohibited, and put that into law as a key normative prohibition and guiding principle.

This measure will then be supplemented by additional measures, such as specific prohibitions of certain research directions, licensing regimes, and so forth, to enable defense in depth and further ensure that no step is taken towards developing superintelligence until humanity is ready.

The enforcement of those supplementary measures will be described in their respective sections.

Concretely, the effect of such a policy will include the following effects and more:

Given a statutory prohibition, no public funding shall be allocated to projects that explicitly or implicitly support advancing the development of superintelligence.

Companies, individuals, and other organizations explicitly stating that they are pursuing the development of superintelligence will be in clear breach of the prohibition, shall face civil and criminal penalties and be required to immediately cease the moment they are detected.

Intentional attempts to develop superintelligence, or enable superintelligence development activities, will constitute a fundamental breach of the duties required under any AI-related licensing regime, and warrant loss of license.

Auditing and monitoring activities will be established to check that no R&D processes are aimed at being focused on the development of superintelligence.

Such a prohibition should only be lifted, or relaxed, once humanity has developed robust scientific understanding and modeling of both intelligence and artificial intelligence technology, to be able to control such a creation, the actual controls to do so, as well as established international institutions to manage, contain, and control such a disruptive force globally.

Scope

What this policy affects:

This prohibition extends to research aimed at producing artificial superintelligence, enhancement of existing AI systems that could result in artificial superintelligence, and the operation or transfer of superintelligence-related technologies. Technologies in this case will cover any form of software or hardware that is aimed at producing superintelligence, or enhancing existing systems into reaching superintelligence capabilities.

What this policy does not affect:

Theoretical discussions of superintelligence, and more broadly any non-software and non-hardware artifact related to superintelligence.

This means the policy will not affect, for instance, books about superintelligence, historical accounts of the development of the concept, and so forth.

2. Prohibit AIs capable of breaking out of their environment

Objective

  • Prevent the development or emergence of AI systems capable of breaking out of controlled environments into other environments they are not authorized to access.

This policy fulfills the condition of prohibiting AIs that are capable of breaking out of their environment.

Definitions

AIs capable of breaking out of their environment: AI systems with the ability to access and/or spread to new virtual environments or computer systems, including via unauthorized access.

Unauthorized access: Accessing a computer without authorization and/or exceeding the scope of authorized access, either to access information without permission, cause material harm or obtains something of value (e.g., compute time); in general, this should follow precedents created by the Computer Fraud and Abuse Act in the United States and its foreign counterparts. 

Software: Throughout this document, we will use software to cover source code, training code, configurations such as model weights, scaffolding and any other computer code essential to the functioning of the system we discuss, regardless of whether or not the computer program is installed, executed, or otherwise run on the computer system.

Overview

AI systems capable of unauthorized access and the intentional development of AI systems with unauthorized access capabilities are prohibited. Countries should legislate to clarify that existing prohibitions on unauthorized access also apply to AI systems, and clarify that the intentional development of systems capable of intentional unauthorized access shall also be prohibited.

Note, also, that this policy would address concerns more typically described as “self-replication” as a subset of these concerns.

Rationale

AI systems capable of escaping containment and accessing systems that they are not authorized to access are inherently dangerous. If systems have the capability to escape containment, then this removes part of any defense in depth against AI threats – the models can break key security and safety conditions we would rely on. For example, the AIs then could be deployed even without human authorization and engage in behavior without robust monitoring. Reliably securing AI systems would no longer be an option.

Additionally, this capability could enable computer worm or botnet behavior, with the potential to spread unboundedly if not contained. This could cause enormous amounts of damage and disruption to computer systems, upon which most of our critical infrastructure is increasingly reliant.

Note that this would also remove the root cause of a common policymaker and expert concern, self-replication, by requiring the development and operation of interventions to block a self-replicating model from being able to escape into other systems not governed by the company who owns the model.

Mechanism

The policy achieves the objective by banning AI systems from being developed that are capable of willful unauthorized access that could enable a breakout.  

Implementation and enforcement

Similarly to the prohibition on AIs improving AIs, this policy will be implemented by establishing a clear normative prohibition, monitoring AI research and development to detect dangerous instances, as well as developing practical processes for companies, governments and organizations to prevent and restrict the ability of AI systems to gain unauthorized access to other computer systems.

In many instances, AIs that are capable of breaking out of their environment will develop this capability inadvertently or due to insufficient caution on behalf of the companies or other entities developing them; in other instances, these capabilities will be developed intentionally by developers who seek to harness them for malicious ends.25 Therefore, the law must provide incentives both for AI companies to test, monitor, and mitigate inadvertent breakout capabilities, as well as punishing those who willfully create harmful capabilities for an AI model to gain unauthorized access. 

For one, companies should comply by maintaining rigorous programs to directly prevent inadvertent breakouts. Much as industrial companies today face requirements to not produce certain harmful chemicals at all (e.g., CFCs) or to not emit other chemicals into waterways or the atmosphere whether or not it is intended, AI companies should have a strict obligation not to let their AI models inadvertently escape their development environments by unauthorized access to other environments.  

Companies could robustly prevent inadvertent unauthorized access through a variety of means. Just as pharmaceutical providers have to follow FDA requirements for developing and testing drugs in clinical trials, as well as general Good Manufacturing Practices when producing them, AI companies should build upon standard requirements26 when developing and following their protocols for creating and testing new models. (For example, companies might be required to ensure and document that AI models do not have access to their own model weights.) Companies should also directly test to confirm that models reject requests to engage in unauthorized access.27 Finally, companies should also proactively conduct exercises, “fire drills,” and other tests to ensure that their processes are working as intended and are prepared against potential negative events.  

To prevent the intentional creation of harmful models that are capable of gaining unauthorized access, the approach should be the same as with any other law enforcement activity against criminal and/or nation-state groups conducting hacking for illicit gain. These efforts should include not only criminal prosecutions but also sanctions and “name-and-shame” efforts that inhibit criminals’ ability to travel to allied countries. 

Penalties for violations should vary depending on which of the two contexts above that they occur in.  

In the case of inadvertent breakouts regulation should affirmatively require those developing AI models of sufficient size or capability to robustly test and monitor their models to ensure they are not capable of, or engaging in, unauthorized access. Likewise, legislation should require those hosting and running AI models to continuously monitor which models are operating in which environments or maintain outbound internet connections to other environments that could be used for unauthorized access. Failure to fulfill these duties should result in fines and/or criminal sanctions, especially if the resulting harms are comparable to other unintended or negligent unauthorized access incidents that cause criminal damage. Where appropriate, violators may also face bans from the licensing system (described below). As a result, companies will have strong incentives to build not only robust internal processes to ensure compliance, but also to build appropriate automated tooling to streamline these compliance efforts while running them at scale.28

Furthermore regulation should explicitly punish the development and creation of models that are capable of engaging in unauthorized access, or the purposeful instruction of a model to conduct unauthorized access.29 These penalties, at a minimum, should be in line with the penalties charged under existing unauthorized access laws (e.g., the US Computer Fraud and Abuse Act) for computer worms, ransomware, botnets.30

Scope

What this policy affects:

This policy affects AI systems’ ability to break out of their controlled environment, and access by AI systems to tools and environments allowing unauthorized access. This policy also affects the intentional design of AI systems that can conduct hacking and other unauthorized access-enabling activities (e.g., phishing), as well as tools and environments allowing this.

What this policy does not affect:

This policy does not affect expanding the access of an AI system under the direct oversight and permission of  a human operator.

3. Prohibit the development and use of AIs that improve other AIs

Objective

  • Restrict and disincentivize development and research that may enable an unmanageable and unforeseen intelligence explosion.

This policy fulfills the condition of preventing AIs from improving AIs.

Definitions

Recursive self-improvement: The process by which a capable and general computer system, most likely an AI system, iteratively improves its own capabilities. 

Self-improvement: The activity of a computer program modifying, altering or otherwise creating a version of the computer program itself or related configurations such as model weights.

Recursiveness: A system modifying, improving, or otherwise facilitating the creation of a similar, more advanced version of itself will become capable of repeating this activity, and thereby further improving its capabilities at increasing speed.

Found systems: Software programs which haven’t been written by hand by human developers, but which instead have been found through mathematical optimization.

Mathematical optimization: The use of an optimization algorithm such as gradient descent to find an optimal or better solution in a search space.

Direct use: The application of a system to a key step in the design or improvement of the other system (not as general help such as looking for information).

Overview

The direct use of found systems to build new found systems, or improve existing found systems, is prohibited. This ensures that AIs improving AIs at a speed that is difficult for humans to oversee or intervene on are prohibited.

This policy is designed to ensure that the increasingly tight feedback loops of AIs improving AIs remain slow and supervisable, understandable and manageable by humans. 

To do so, this policy aims to strongly disincentivize attempts to create or enable rapid and accelerating improvement feedback loops, by targeting AIs improving AIs as the main threat model causing these rapid improvements.

We introduce the category of “found systems” and apply this policy only to those systems to ensure this policy only affects AI systems that pose a significant concern.

We define “found systems” as software programs that have not been written by hand by human developers, as opposed to how most normal software is produced. Instead, found systems are found, rather than written or designed, via mathematical optimization.

A new definition is necessary, as neither computer science nor in law of most Western countries provide a clear definition that distinguishes software, including AI, that is written by humans, from software that is generated via mathematical optimization.

By defining these systems as “found systems” and separating them from most common software, this ensures that this policy leaves non-dangerous activities untouched that could also fall under the broader category of “computer systems improving computer systems”, such as database updates and software updates.

While it is theoretically possible, given enough time, to have a runaway intelligence explosion produced by human hand-written systems, this would likely take significant amounts of time, would be highly incremental, and with smaller improvements coming before larger improvements in smooth succession. Especially, it would be observable and understandable by humans, as all software improvements would be legible to human observers.

While fully minimizing the risk of an intelligence explosion would require covering non-found systems as well, this would impact large amounts of software and severely restrict many computer-based activities, while also producing only a marginal addition in risk reduction.

Given this, this policy is designed to reduce risk while also minimizing negative externalities. Hence, this policy focuses only on found systems, which we expect will constitute the bulk of AIs improving AIs risk and its most unmanageable cases for the next 20 years, while at the same time being a small subset of all software and AI systems.

We introduce the concept of “direct use” so this policy only applies to cases where AIs are playing a major role in the research or development of improving AIs.

Without additional qualifiers, forbidding improvement would also need to forbid any use of AIs by any researcher at any time, including when people search for information online, when they write a paper or internal reports, and when they communicate with each other. This is much more costly, since for example Google is using AI in search31, Microsoft is using AI in Office32, and Zoom is adding a new AI assistant to their meeting software.33

Going beyond the direct use case would create much higher externalities and regulatory uncertainty, forbidding researchers and consumers from using a large range of modern software tools, for limited gains in safety.

Rationale

AI improving AI is a fundamental threat in itself, as well as a direct way in which a system, or a motivated actor, can break through safety boundaries that have been imposed on artificial intelligence development. Namely, while we may find that a computer system below a certain level of competence is safe, if AIs can improve AIs a motivated actor can break through the prohibition of creating more powerful and unsafe systems by iteratively self-improving the original, safe system, up to an unsafe regime of capabilities.

Mechanism

The policy creates a clear statutory prohibition on using certain types of AIs, found systems, to improve AIs. 

Implementation and enforcement

The most blatant violations of regulation that prohibits AIs improving AIs will involve the direct and intentional use of found systems to improve or create other found systems. This includes fully automated AI research pipelines or using one AI to optimize another's architecture. More broadly, any activity that is explicitly aimed at making AIs improve AIs will fall under strict scrutiny and be expected to be in violation of this statutory prohibition. This approach mirrors the strict enforcement against insider trading in financial markets, where regulatory bodies like the US Securities and Exchange Commission (SEC) actively monitor and swiftly act against clear violations to maintain market integrity.

Borderline cases will likely emerge where the line between human-guided and AI-driven improvement blurs. For instance, the acceptable extent of assistance by found AI systems in research ideation or data analysis will require ongoing regulatory guidance.

To comply, companies will have to implement robust internal processes including clear guidelines, technical barriers, oversight committees, and regular employee training. Companies should proactively review their internal activities, including R&D processes, and  suspend any activities potentially violating the policy pending review. These will be analogous to safety protocols in the pharmaceutical industry, where companies maintain strict controls over drug development processes, implement multiple safety checkpoints, and provide ongoing training to ensure compliance with FDA regulations.

Researchers can self-organize by developing professional codes of conduct and establishing review boards to evaluate research proposals. Conferences and journals should update submission guidelines to require compliance certification. This self-regulation mirrors the peer review process in academic publishing, combined with ethics committees in medical research, ensuring that research meets both scientific and ethical standards before proceeding or being published.

Penalties for violations may include substantial fines, potential criminal charges, and bans from AI research. Companies may face license revocations, and violating systems may be decommissioned. This multi-faceted approach to enforcement is similar to environmental protection regulations, where violators face monetary penalties, operational restrictions, and mandated remediation actions, creating a strong deterrent against non-compliance.

Scope

What this policy affects:

At its core, this policy prohibits the development of AIs through software that has not been written fully by human developers. It  ensures that any tool used in AI research has a minimum amount of legibility to human supervisors, to the extent that it has been built by human minds, instead of being discovered by illegible mathematical optimization processes.

This prohibition notably forbids:

  • Self-Improving found systems, such as an hypothetical LLM that would further train itself by generating data and optimization parameters.

  • Advanced AI systems being significantly involved in developing the next generation of those same systems, such as utilizing e.g., Claude 3.5 significantly in the production of Claude 4.0 or GPT-4 significantly in the production of GPT-5.

  • The direct use of any LLM in the training process of another LLM or AI system in general, including for generating training data, designing optimization algorithms, hyperparameter search.

  • The use of LLM and other found systems in distilling research insights from many sources that have direct impact on the design and improvement of found systems.

What this policy does not affect:

Most machine learning and all normal software (Microsoft Office, Email, Zoom) are not impacted by this prohibition, given that they don’t use found systems for their training or design.

The prohibition also does not impact found systems in cases of non-direct AI R&D use, such as searching for research papers on Google, letting Github Copilot correct typos and write trivial functions in a training codebase, or transcribing a research meeting using OtterAI.

4. Only allow the deployment of AI systems with a valid safety justification

Objective

  • Prevent the deployment of AI systems for which we cannot justify in advance that they will not use a given capability.

This policy fulfills the condition of no unbounded AIs.

Definitions

Safety Justification: A check that is done before deploying and running the system, analogously to static analysis for software engineering and safety analysis for engineering.

Overview

For any deployed AI system, it is mandatory that for any capability of interest, there exists a reliable Safety Justification for whether the AI systems will use this capability or not.

Capabilities of interest are any capabilities that are legally prohibited or restricted in a certain jurisdiction.

Rationale

Any application of modern safety engineering requires the ability to model and predict in advance how the system under consideration will behave in various circumstances and settings. This knowledge is used in all critical and high-risk industries to check that the system fits with the safety requirements.

For example, all countries require guarantees that nuclear power plants will not have catastrophic failures, before fully building them. A concrete example of such guarantees and their justifications can be found in the Safety Assessment Principles34 of the UK’s Office For Nuclear Regulation.35

With some current advanced AIs, and especially the more powerful ones that are getting built, this is a form of justification and prediction that is completely missing, to the extent that the teams developing these AI systems are often surprised by impressive new capabilities displayed by their systems.36

For example, all countries require guarantees that nuclear power plants will not have catastrophic failures, before fully building them. A concrete example of such guarantees and their justifications can be found in the Safety Assessment Principles of the UK’s Office For Nuclear Regulation.

Thus, any guarantee of safety for AI systems requires a constraint of being able to demonstrate, before ever running the system, that it won’t use a given capability.

Mechanism

This policy prevents the deployment of AI systems for which safety justifications cannot be provided in two ways.

First, it makes the justification of safety a necessary condition for deployment. This means that this policy forbids the deployment of any AI system for which we lack a good reason to believe it won’t use a given capability.

Second, this policy creates an incentive for funding more research in ways to implement such safety justifications, for example interpretability, formal verification, and additional constraint on the structure of the AI systems being built.

Implementation and enforcement

In practice, there will be trained inspectors who will check the safety justification provided. It will be the responsibility of the company building the AI system to provide enough information, models and techniques for the inspector to be convinced that the AI system won’t use a given capability.

For the simplest possible AI systems, such as linear regressions, just showing the code will be all that is needed for justifying safety with regard to almost any capability of interest.

In some specialized AI systems, it might be possible to do so by showing that the AI systems won’t even learn the corresponding capability. For example, it’s reasonable to argue that a CNN trained exclusively on classifying cancer x-rays would have no reason to learn how to model human psychology. 

In the more advanced cases, it might be necessary to provide detailed mechanistic models of how the AI system works, for example for arguing that a SoTA LLM such as Claude or GPT-4 wouldn’t use any modeling of human psychology, since it definitely has the data, objectives, and incentives to learn how to do it and use it in practice.

For a start, the implementation might only focus on requiring safety justifications for particularly dangerous capabilities (AI R&D, self-replication, modeling human psychology…). These are the bare minimum safety requirements, already increasingly required in multiple jurisdictions. Then the regulation can extend to more and more capabilities as they are linked to risks from advanced AIs.

Scope

What this policy affects:

This policy affects all AIs, but concentrates the costs on the most powerful forms of AI currently available, notably LLMs such as GPT-4 and Claude.

This is because there are no current methods to check that these AI systems lack any capability before running them: they are trained on data about almost everything known to man, are produced with massive amounts of compute and powerful architectures, and aim to predict everything in their training data, which might amount to predicting every process that generated that data.

Broadly, any AI system that is explicitly built for generality will not pass this policy unless significant improvements in interpretability, ML theory, and formal methods are made.

What this policy does not affect:

As discussed above, although this policy technically affects all AI systems, many simple and specialized ones will not incur much costs from the check.

This is because these systems would have highly specialized training data, often specialized architectures (like CNNs for vision models), and no reasons for learning any general or dangerous capabilities.

5. A licensing regime and restrictions on the general intelligence of AI systems

This policy fulfills the condition of limiting the general intelligence of AI systems so that they cannot reach superhuman level at general tasks.

Overview of the licensing regime

Countries should set up a national AI regulator that specifically enforces restrictions on the most capable AI systems, and undertakes continuous monitoring of AI research and development.

AI developers that are building frontier AI models, and compute providers whose services those models are built upon, should be subject to strict regulation in order to substantially mitigate the risks of losing control or enabling the misuse of advanced AI models. This regulation should take the form of a licensing regime, with three specific licenses being required depending on the development being taken place:

  1. Training License (TL) - All AI developers seeking to train frontier AI models above the compute thresholds set by the regulator must apply for a TL and have their application approved prior to training the proposed model.

  2. Compute License (CL) - All providers of cloud computing services and data centers operating above a threshold of 10^17 FLOP/s must obtain a license to operate these and comply with specific know-your-customer regulations as well as physical GPU tracking requirements.

  3. Application License (AL) - Any developer seeking to use a model that has received an approved TL and will be expecting to make major changes, increases, or improvements to the capabilities of the model as part of a new application will need to apply and be granted an AL.

This balance will be critical to ensure that new applications of frontier AI models are safe but do not create undue burden or restriction on innovation. It will be for each nation to determine the best parameters for this, and for the international institutions to provide more detailed guidance as appropriate. 

5.1 Training license (TL)

Objective

  • Ensure that the most capable and general AI systems have adequate monitoring and assessment prior to being trained.

Overview

Companies developing AI models above a specific level of intelligence (based on the proxies of compute and relevant benchmarks) would apply for a TL by pre-registering the technical details of their training run, outlining predicted model capabilities, and setting out what failsafes, shutdown mechanisms, and safety protocols would be in place.

The regulator would have scope to make recommendations and adjustments to this plan, adding or removing requirements as necessary. Once a plan is approved, the license to conduct the training run would be granted and reports would be provided by the developer during the training run to confirm the compute used.

Following a successful training run, the regulator would deploy a battery of appropriate tests to ensure the licensing requirements are met, with models that passed these tests being approved for direct commercial applications. For models trained in other countries, the applicant could move directly to the testing phase for approval or, in the event that the model has received approval from the regulatory authority of another country with a proven track record of high-quality decisions, would receive immediate approval subject to review by the domestic regulatory authority.

Mechanism

This policy provides direct monitoring and assessment of the most intelligent models by requiring them to go through a clearly defined process prior to being trained. We propose two criteria for the trigger of whether an AI model would need to apply for a TL to focus only on the most intelligent and therefore riskiest models: whether the model will exceed (1) pre-defined compute thresholds; or (2) a benchmark for general human capability.

As one of the triads of AI37, and perhaps the most reliable proxy for a models intelligence38, compute provides a critical point for regulatory control. A further advantage of governing compute specifically is that few companies can afford the computational resources necessary to train frontier AI models. Finally, regulating compute also enables the broader AI supply chain to help regulate frontier development, for instance through the creation of on-chip mechanisms to monitor processor use.

A secondary proxy for the level of intelligence of a model is whether it can reliably achieve the performance levels of human remote workers when asked to carry out remote tasks. In order to assess whether a model displays concerning capabilities, the regulator must be able to establish whether its capabilities exceed a relevant threshold. We propose a system for assessing potential model performance, based on general work activities based on those defined in the O*Net classification system. This index would be based on performance in ten general tasks that can be performed remotely either by human workers or an automated system.

Implementation and enforcement

Given the exponential growth of AI, and the likelihood this growth will continue, agencies should be given maximum flexibility to ensure they can adequately assess models that pose the greatest risks and should apply for a TL. While the executives of these agencies would be appointed by and accountable to political leaders, and the specific governance of an AI regulator would need to be determined by each country, they should retain operational independence and have a minimum level of funding enshrined in law.

National AI regulators should set thresholds on compute to ensure proper oversight of frontier models that pose the greatest risk. These would be models where it is reasonably possible that training could lead to the development of dangerous capabilities that could either directly cause harm or result in the model escaping the developer’s control. All such frontier models would automatically require a TL for its training run, and would require a separate application license prior to deployment, whether in commercial applications or otherwise. 

The relevant national AI regulator would have the authority to set and adjust these thresholds, with specific governance structures around these decisions varying from country to country. Once an international agreement defines global thresholds for permissible development, national regulators would transpose international guidance into their own domestic thresholds. Countries could also decide on a more restrictive regime with tighter thresholds than the international regime if desired.

In addition, even if a model falls below the pre-defined compute threshold but expects their model to exceed an established benchmark for general human capability then it should also be required to apply for a TL. To implement this benchmark, the regulator would need to devise a battery of tests for each specific task and establish a human performance benchmark by deploying the test to workers across different professions and levels of qualification. Once a benchmark was established, these tests would be administered to automated systems; if the system being tested performed at or above a predetermined percentile of the human benchmark (e.g., 90th percentile), it would be determined to be proficient at the relevant task.

This general capabilities index would then be constructed from these tasks to produce a final score - if automated systems achieved general intelligence-equivalent performance in a predetermined share of these tasks, it would clear the threshold for general capability and be banned.

A potential set of general tasks to be cleared could be as follows:

  • Analyzing and Processing Data and Information

  • Communication and Collaboration (Internal)

  • Project Management and Resource Coordination

  • Developing and Implementing Strategies

    • Fleshing out plans for complex real-world events for business operations and governmental activities.

  • Building and Maintaining Professional Relationships (External)

  • Interpreting and Presenting Information for Various Audiences

  • Content Creation

    • Produce effective copy, images, videos, and other content to disseminate information, promote products and services, explain complex issues.

  • Training and Skill Development

    • Non-project management and non-content feedback people management. Emotional guidance and coaching. Helping the other party reflect on past actions and teach new approaches and techniques.

  • Customer Relationship Management

  • Domain-Specific Novel Problem Solving

During the implementation phase, the regulator may decide to improve or expand on these tasks depending on how effectively they track model capabilities, with tests potentially requiring constant update and improvement.

As part of applying and receiving a TL, a developer would need to meet certain compliance requirements. Each jurisdiction will need to determine the appropriate number and type of any such requirements but at a minimum they should include the following:

  • Compliance requirement: companies applying for a TL would be required to submit their strategies for AI risk mitigation to the regulator as a pre-condition. While these licenses would be specific to the model or application being developed, the AI risk mitigation strategies would refer to the applicants and their own risk management processes. That is to say: in order to apply for a license, the applicant must have had a relevant AI risk mitigation strategy approved by the regulator beforehand. This would also apply for requests to develop applications based on frontier AI models that increased model capabilities as defined by the regulator.

  • Compliance requirement: developers must not ‘Open Source’ or publicly release any part of the code or model weights. This licensing regime seeks to drive and incentivize a safety-driven approach to model development. Releasing a model’s code publicly for viewing, adaptation, or use undermines this as it would enable the model to be significantly altered by unregulated actors post-hoc. Therefore, any new model or application that is captured by the licensing regime must not be open sourced.
    Instead, external entities will be able to get meaningful access via API, which developers will be required to keep while the model meets the relevant threshold for frontier models. Failure to comply with this should result in severe penalties, including but not limited to: the model being instantly shut down and the developer having their license removed, fines for the developer, and criminal action taken against those involved in releasing the model publicly and found to be using the code in any other application.

  • Compliance requirement: developers must have mechanisms to shutdown their model and application if required temporarily or permanently. AI is still an immature field; practitioners often report that they do not fully know how relatively-modest changes to architecture or algorithms will impact the capabilities or risks of a model. Accordingly, the R&D and deployment processes must be treated as inherently less certain than, for example, traditional mechanical engineering, and as having some risks of generating significant disaster.
    It is not guaranteed that we will have any observable warning signs before an R&D effort goes catastrophically wrong. However, right now humanity does not have processes to systematically detect warning signs, nor do we have systematic processes to investigate them, take corrective action, and learn from the issue and disseminate corrective fixes broadly. 
    Therefore, in order to have a license for training and deploying frontier models, developers must document and prove to the regulator that they have clear and stress-tested measures in place for how to shutdown a model. As with failure to comply with the license obligations, failure to perform a required shutdown, or negligent failure to maintain and regularly test shutdown capabilities, would result in the revocation of their frontier AI license.

Scope

What this policy affects:

The licensing regime should focus only on the most capable and general AI systems. As noted, managing the extent of AI models’ general intelligence is a key element of this and fundamentally the implementation of a TL seeks to drive and incentivize a safety-driven approach to frontier AI model development by including specific requirements and a pre-defined procedure for assessing models.

What this policy does not affect:

Companies developing models and applications below the relevant compute and intelligence thresholds would not require licenses to operate and develop these products and services. However, companies would be expected to comply with the relevant regulatory limits, under penalty of severe legal repercussions in the event that thresholds are exceeded and companies operate beyond these thresholds without a license.

To note, the mere existence of shutdown mechanisms for models receiving a TL is not a panacea for AI risks, either in terms of loss-of-control or misuse. An out-of-control AI or a malicious user may be able to evade detection. Shutdown mechanisms therefore go hand-in-hand with strong monitoring mechanisms.

5.2 Compute license (CL)

Objective

  • Ensure that data centers and cloud computing services above a certain compute threshold operate under a regulatory license, enabling authorities to monitor, restrict, and, if necessary, shut down the development of potentially dangerous AI systems.

Overview

The operation of data centers and provision of cloud computing services above a predetermined threshold of compute should be subject to the issuance of a license by the relevant national regulatory authority. Possessing a license should be a precondition to being able to operate and provide services to companies in that jurisdiction. 

This will enable regulators to restrict the development of potentially dangerous AI models by identifying what compute clusters exist within a given jurisdiction, monitoring and enforcing restrictions on AI development related to amounts of compute for training or inference, and ensuring the ability to promptly shut down dangerous AI systems or strands of dangerous research.

Mechanism

Cloud computing services are integral to nearly all advanced artificial intelligence development and applications, from training to inference. Through the identification of relevant clusters and by placing meaningful constraints on their capacity, regulators can deploy effective brakes on the development of models and limit access to applications displaying concerning capabilities.

The operation of large-scale data centers is relatively easy to observe and monitor, given their large land requirements detectable via the planning system, their physical footprint making them often observable via satellite, and their large energy consumption. Their fixed location and large footprint makes them a natural chokepoint for regulators to monitor and intervene on, as well as a natural focus for mutual verification under international agreements.

By introducing a licensing regime focused on data centers above a specific threshold, the regulation can target the most impactful operations, ensuring appropriate mitigations can be deployed where relevant.

Implementation and enforcement

The proposal introduces a licensing requirement for any company operating data centers with a total compute capacity of 10^17 FLOP/s. This regime will ensure that larger, more resource-intensive facilities are subject to oversight and must meet relevant regulatory requirements.

Each jurisdiction will need to determine the number and nature of the requirements on compute providers to successfully be granted a CL, however, at a minimum the following requirements should be implemented:

  • Compliance requirement: compute providers must implement ‘Know Your Customer (KYC) Rules’.39 Companies must adhere to KYC regulations40, which require them to verify the verifying client identities, tracking the use of compute resources, and reporting any high-risk entities to the government. This is intended to close existing gaps in export controls, prevent misuse of advanced AI technologies, and support responsible AI development by enabling more precise and targeted regulatory interventions.

  • Compliance requirement: compute providers must have adequate hardware tracking capabilities. Companies will be required to track the physical hardware used in their data centers. While this may eventually involve the use of secure GPUs with serial numbers and physical tracking capabilities, aligning with relevant export controls, that technology is not yet widely available. An interim requirement41 could be implemented, where companies would use physical GPS trackers on their existing hardware to comply with tracking and security standards.

  • Compliance requirement: compute providers must implement shutdown mechanisms. In tandem with the shutdown measures highlighted in the implementation of TLs, compute providers must be clearly identified through redundant reporting chains to regulators – both by the frontier AI developers themselves, and through a KYC-like reporting process by compute providers and other supply chain participants. This would enable randomized spot checks by auditors to confirm if frontier AI companies have properly coordinated with their supply chain and counterparties and arranged for shutdown procedures to be implemented. Therefore, in the case of an emergency a compute provider and/or an AI developer can be called upon to shutdown the model. In addition, this would strongly incentivise frontier AI companies to only use the compute providers with the most rigorous safety protocols.

It is likely that through the introduction of this CL, a change in incentives will mean new technologies will emerge over time that will assist the compute supply chain in being able to control the use of their resources and help with the enforcement of license requirements. For instance, in the future, the national AI regulator could make it a requirement that in order to receive a license, the AI developer must use hardware providers that have Hardware-Enabled Governance Mechanisms (HEMs) so that they can remotely deactivate chips if they are either ordered to do so by the national regulator.

5.3 Application license (AL)

Objective

  • Ensure that any new application which seeks to enhance the capabilities of a model approved with a TL is adequately assessed for any additional risks it may present prior to its deployment.

Overview

Any new use of an AI model approved through the TL process would need to seek approval for that new use. This is to ensure that any additional capabilities the new use creates are in keeping with the original approval of the TL and that restrictions, such as prohibited behaviors like self-replication, are not developed on top of pre-approved models. This would include connecting to an AI model through an API for it to run some or all of your product, or undertaking additional fine-tuning or research on said model.

Depending on the extent of the modifications to the base model or the exact proposed use, the applicant would be required to demonstrate the capabilities its proposed application would have and set out any additional relevant safety features and protocols that may be needed. If the regulator is satisfied that there was no risk to deployment, it would authorize the requested use. Any applications that do not change or modify the base model’s capabilities, and do not result in structural manipulations like using it to train a smaller model or creating multimodal capabilities, would receive an automatic authorisation upon submission.

Mechanism

This policy ensures that using a model, approved through the above TL process, in a new application - whether through an API or any other method - such as a commercial or non-commercial product, service, suite of products/services, or research project, requires a license from the national AI regulator when making notable changes to the capabilities of that model that potentially increase its risk. This allows the AI regulator the opportunity to assess any new concerning capabilities of the model and ensure adequate measures are taken to avoid any increased safety risks.

Implementation and enforcement

Applications based on models that had received a TL would be required to submit notification to the regulator. It would be the duty of the applicant to confirm whether their application is designed to increase the models capabilities or not. An automatic AL would be granted to applicants but the national AI regulator would be able to identify any concerning applications and take further investigations or enforcement action if necessary. This ensures a streamlined process for deploying new applications while maintaining regulatory awareness and oversight of the use of advanced AI systems.

Specifically, anyone seeking an AL should confirm their application will not draw on further compute resources for training such as using a TL model to train a smaller model, and that the application will not exceed the benchmark for human capabilities defined by the TL. This benchmark serves as a clear, measurable threshold for an acceptable application.

To maintain regulatory control, applications could be shut down on short notice through a shutdown of the underlying model or the relevant compute cluster. This mechanism provides the regulator with the ability to quickly intervene if necessary, balancing innovation with potential risks.

Scope

What this policy affects:

The policy affects any new application - whether through an API or any other method - such as a commercial or non-commercial product, service, suite of products/services, or research project that is based on a model trained on compute that exceeds the thresholds defined in the training license section.

What this policy does not affect:

This license does not affect applications below the specified threshold. While mandatory registration of these applications with the regulator would not be necessary, they would still be required to comply with the relevant limitations on capabilities and other prohibitions.

5.4 Monitoring and Enforcement

Objective

Ensure:

  • That the compliance of licensed AI models and uses to ensure the requirements are being upheld;

  • That adjustments are made to the licensing requirements based on the evolving landscape of AI research and development.

Overview

To create a sustainable licensing system, any national AI regulator must have adequate capabilities and capacity to monitor ongoing AI research and development, while also having suitable enforcement powers to catch bad actors trying to circumvent the requirements. 

Fundamentally, the national regulators and international system must have powers to review and adapt licensing requirements - through their power to lower compute thresholds or add new behaviors that should be prohibited - to fit with the latest AI research and development. To inform this, the national AI regulators must have significant capacity to monitor developments in algorithms and data used.

When it comes to the enforcement of licenses, severe penalties should be levied against developers who seek to build models above a compute threshold or the defined intelligence benchmark without a license to do so, and those developers who have a license but fail to comply with the above requirements. 

To ensure that AI developers continue to have adequate measures in place, national regulators should undertake frequent testing of the procedures that AI developers would employ to respond to dangers and safety incidents. In addition, the national regulators must work with compute and hardware providers to frontier companies to withdraw their services in the event that they detect illicit activity. It may also be necessary to conduct mock training runs to test compute providers’ ability to monitor the usage of their resources. Among other abilities, this could include their:

  • Capacity to shut-off access to compute once a training run exceeds permitted thresholds;

  • Ability to detect if a training run is simultaneously using other data centers;

  • Ability to check if model weights are at zero at the beginning of a training run.

Mechanism

By ensuring the national AI regulator has suitable capabilities to monitor AI research and development, and enforce the licensing regime, they will be able to maximize the chances that any AI development in their country complies with the requirements set out and that those requirements stay up to date and suitable for the risks that we face.

There will always remain a slight risk that unlicensed developers make breakthroughs that circumvent the spirit of these regulations. It will be for the national regulators, and then the institution set up in Phase 1, to balance the risks of such breakthroughs with the cost of stifling innovation.

Implementation and enforcement

The country responsible for the creation of the national AI regulator must ensure it is created with adequate independence from political decision making and sufficient long-term funding that it can undertake its duties of ensuring advanced AI models are safe.

To ensure continued compliance, AI developers that received a TL or AL, or a computer provider who received a CL should be required to submit reports on safety procedures annually. A breach in the licensing requirements would need to face significant civil, and potentially criminal, action given the severity of the risks that it could pose. Below is a list of example enforcement powers that could be granted to the national regulator to help them fulfill their duties:

  • Immediately shutdown the ongoing R&D process (e.g., training runs, fine tuning processes) of an AI developer, and wait for a detailed risk and root-cause assessment before restarting;

  • The same as above, but for all similar projects across other companies and organizations developing AI;

  • All of the above, but also terminate the project permanently;

  • All of the above, but also terminate the project and all similar projects permanently in the company, and audit other companies and organizations to terminate similar projects due to similar risks;

  • All of the above, but also fire the team that conducted the project due to a breach in protocol;

  • All of the above, but also revoke the ability of the company to ever receive a future training or application license;

  • All of the above, but also prosecute members of the organization or company involved in breach of regulations;

  • In the most egregious cases, all of the above plus order a full shutdown of the entire company and sale of assets, via nationalization and auction or forced acquisition coupled with the wind down of all AI relevant operations.

Analogous powers should be provided to enforce KYC and similar requirements against compute providers. It is crucial that regulators should encourage true self-reporting of unexpected results, and provide some leniency when organizations do so proactively, swiftly, and collaboratively.

For instance, if a technique that enables recursive self-improvement is accidentally found in one specific company, and the company raises the issue to the authorities proactively, swiftly, and collaboratively, this should lead to rapid termination of the dangerous project in the company as well as rapid deployment of national resources to terminate similar projects elsewhere. This is the only robust way to avoid similar capability “leaks” from happening elsewhere, even if discovered initially in only one location.

Additionally, regulators should proactively create a mechanism for companies to share “near-miss” reporting, analogous to the US FAA system42, such that they can proactively share insights about the ways in which accidents almost occurred but were avoided due to redundant measures and/or sheer luck, to inform the evolution of industry standards and regulatory efforts.

6. An International Treaty Establishing Common Redlines on AI Development

Objective

  • Establish international red lines on AI development via a treaty;

  • Facilitate collaboration on AI policy internationally with a view towards building a more comprehensive and stable international AI governance framework.

This policy fulfills the conditions of limiting the general intelligence of AI systems, no AIs capable of breaking out of their environment, no AIs improving AIs, and no unbounded AIs.

Overview

Alongside implementing the above measures nationally, countries should agree to them through an international treaty that creates a common regulatory framework across all signatory countries.

These measures are the ones described in the rest of Phase 0.

  • Create an international compute threshold system, designed to keep AI capabilities within estimated safe bounds.

  • Prohibit the development of superintelligent AI.

  • Prohibit unauthorized self-replication and the intentional development of systems capable of self-replicating.

  • Prohibit unauthorized recursive self-improvement and the intentional initiation of recursive self-improvement activities.

  • Require states to establish regulators and implement licensing regimes.

In addition to internationalizing the other measures of Phase 0, the Treaty should include a provision to prohibit the use of AI models developed within non-signatory states. This is to incentivize participation in the Treaty, to prevent actors within the signatory states from circumventing the Treaty, and to simplify monitoring and enforcement.

Rationale

While countries can unilaterally implement the proposed measures in Phase 0, in doing so they would not have guarantees from other countries that they would do the same. Individual countries are currently incentivised to avoid implementing regulatory frameworks out of fear that other countries would be able gain a competitive advantage by implementing more lenient regulatory regimes.

These competitive dynamics may limit the potential for unilateral action, and therefore it is necessary for redlines to be agreed and committed to internationally. An international framework could avoid competitive pressures pushing regulatory standards to unacceptably low levels in a race to the bottom.

Implementation and enforcement

Countries should sign and ratify a treaty that both internationalizes the prohibitions of Phase 0, and establishes a compute Multi-Threshold System.

This treaty should then be enforced via the passage of national legislation.

This treaty will establish a Multi-Threshold System to determine the acceptable levels of compute. This will serve to harmonize the compute thresholds established by national licensing within an international treaty framework. Here is how the system will function.

Multi-Threshold System

Under the auspices of an international treaty, the compute thresholds established via the national licensing regime of Phase 0 should be internationally harmonized.

In doing so, an internationally upheld three limit system should be established, consisting of lower, middle, and upper limits. The lower level will be broadly permitted; the middle level, only by licensed entities; the upper level, only by an international institution with broad support across the international community, including the US and China, which we will label GUARD. 

With these thresholds we aim to target:

  • The capabilities of models trained, using total FLOP training compute as a proxy.

  • The speed at which models are trained, using the performance of computing clusters in FLOP/second.

We can target capabilities in order to keep models within estimated safe bounds. We can also target the speed of training to limit the breakout time to43 attain dangerous capabilities for legal computing clusters conducting an illegal training run, providing time for authorities to intervene. This will be achieved by targeting the total throughput (as measured in FLOP/s - floating point operations per second) that a compute cluster can achieve in training.

These thresholds should be lowered as necessary, to compensate for more efficient utilization of compute (see below). This should be done by an international institution with broad support across the international community, which we will call the International AI Safety Commission (IASC). The upper threshold may be raised under certain conditions defined by a comprehensive AI treaty.

Note: In each limit regime, the largest permitted legal training runs could be run as quickly as within 12 days. For more information, see annex 2.

This compute threshold system should reflect the latest evidence to keep model capabilities within estimated safe bounds. The compute differences between the thresholds are designed to limit the breakout time of dangerous capabilities emerging through an illegal training run, thus providing time for authorities to intervene. This will be achieved by targeting the total throughput (as measured in FLOP/s - floating point operations per second) that a compute cluster can have in training.

Any AI system that passes a general intelligence benchmarking test is considered to be equivalent to having breached the Upper Compute Limit, and is thus also prohibited.

25  Note our discussion of safe harbors for security research below.

26 With additional stringencies or tailoring where needed based on the specific work being done, as in other regulatory processes.

27 For example, a LLM that when asked a question that requires inference compute capacity in excess of its current resources, and responds by gaining unauthorized access to another compute cluster to complete its work.

28 Analogous to how e.g., financial services industries have formal requirements, but also invest significantly in technology to ensure protections from fraud and other attackers.

29 Some limited amounts of exemptions may be implemented for pre-approved activities conducted in good faith by security researchers. A common failure mode of policies intended to enhance security is that they actually harm security by banning researchers from conducting research into failure modes of a security system. On such an important matter, we must not have a false sense of security. We must ensure that security researchers have appropriate safe-harbor exemptions, tailored in partnership with those researchers, to conduct and disclose research into how AI models that are designed to not conduct unauthorized access (e.g., should refuse requests to write a virus) can be tricked into doing so, such that they can disclose such flaws in good faith without fear of punishment to enable remediation of such issues.

30 Note: to be successful, these laws will have to be buttressed by strong norms that focus legal enforcement on the highest-risk scenarios. It took the legal system decades to properly focus its efforts of combatting unauthorized access on the most harmful actors, with much prosecutorial overreach on low-impact cases in the short term, as legal authorities across the spectrum have noted, which sabotaged the development of helpful norms and relationships in the information security field that could orchestrate efforts to stop unauthorized access. We do not have the time to repeat these mistakes.

35  “The underpinning safety aim for any nuclear facility should be an inherently safe design, consistent with the operational purposes of the facility.
An ‘inherently safe’ design is one that avoids radiological hazards rather than controlling them. It prevents a specific harm occurring by using an approach, design or arrangement which ensures that the harm cannot happen, for example a criticality safe vessel.”  (EKP.1, p.37 of 2014 version)

37 See this report for more information.

38 This paper provides additional detail on this claim.

39 This is similar to what has been proposed by some companies.

40 See this for a more detailed proposal.

41 See this proposal for more detail.

42 See this for more details.

44 We can use the relationship: Cumulative training compute [FLOP] = Computing power [FLOP/s] * Time [s]. By controlling the amount of computing power that models can be trained with, we can manage the minimum amount of time that it takes to train a model with a particular amount of computation. Our aim in doing this is to control breakout times for licensed or unlicensed entities engaged in illegal training runs to develop models with potentially dangerous capabilities – providing time for authorities and other relevant parties to intervene on such a training run.

Get in touch

If you have feedback on A Narrow Path or want to know how you can help to support it
please get in touch with us directly

If you have feedback on The Plan or want to know how you can help to support it please get in touch with us directly

If you have feedback on The Plan or want to know how you can help to support it please get in touch with us directly